Performance on Devops Monk

Performance on Devops Monkhttps://blog.devops-monk.com/tags/performance/Recent content in Performance on Devops MonkHugoen-usSat, 23 May 2026 00:00:00 +0000Container Reuse for Fast Feedback Loopshttps://blog.devops-monk.com/tutorials/testcontainers/container-reuse/Sat, 23 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/testcontainers/container-reuse/The singleton pattern shares a container within a single test run. Container reuse goes further — it keeps the container alive after the JVM exits and reuses it in the next test run. The first run pays the startup cost (8 seconds for Kafka). Every subsequent run skips it entirely. For a developer who runs the test suite dozens of times per day, this saves minutes of waiting. What You’ll Learn Enabling container reuse with .Parallel Test Execution with Testcontainershttps://blog.devops-monk.com/tutorials/testcontainers/parallel-test-execution/Sat, 23 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/testcontainers/parallel-test-execution/A test suite with 50 integration test classes runs sequentially in 10 minutes. Configured for parallel execution across 4 threads, it completes in 3 minutes. Parallel test execution with Testcontainers requires careful data isolation — tests running simultaneously against the same database will step on each other’s data without it. This article covers JUnit 5 parallel configuration, container sharing strategies, and data isolation techniques. What You’ll Learn JUnit 5 parallel execution configuration with junit-platform.Singleton Containers and Shared Base Classeshttps://blog.devops-monk.com/tutorials/testcontainers/singleton-containers/Sat, 23 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/testcontainers/singleton-containers/With 20 integration test classes, each starting its own PostgreSQL container, you pay 20 container starts at 2 seconds each — 40 seconds of pure overhead before a single assertion runs. The singleton pattern starts one container per JVM, shares it across all test classes, and cuts that 40 seconds to 2. This is the most impactful performance optimization for large Testcontainers test suites. What You’ll Learn The singleton pattern using a static initializer block Abstract base class design for sharing containers and @DynamicPropertySource Why you must not combine @Testcontainers/@Container with the singleton pattern Spring TestContext caching — how it works and how to maximize reuse The @ImportTestcontainers approach for Spring Boot 3.AOT Compilation in Java 25 (JEP 514 & 515): Faster Startup, Zero Warm-Uphttps://blog.devops-monk.com/tutorials/java25/aot-compilation/Sun, 10 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java25/aot-compilation/The Java Startup Problem Java’s performance story has always had one weak spot: startup time. When a JVM starts, it: Loads and verifies class bytecode Interprets bytecode (slow) Profiles which methods are called most (warm-up) Compiles hot methods to native code via JIT (takes time and CPU) Eventually reaches peak throughput This process takes seconds for large applications. For a Spring Boot application, typical warm-up to peak throughput can take 10–30 seconds.Compact Object Headers (JEP 519): 33% Less Heap Overheadhttps://blog.devops-monk.com/tutorials/java25/compact-object-headers/Sun, 10 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java25/compact-object-headers/What Is an Object Header? Every single Java object — every String, every Integer, every record, every array — carries a header that the JVM uses for bookkeeping. Your code never sees this header; it lives alongside the object’s fields in memory. Before Java 25, on a 64-bit JVM, the header occupied 96 to 128 bits (12–16 bytes): ┌─────────────────────────────────────────────────────────┐ │ Mark Word (64 bits) │ │ ─ identity hash code │ │ ─ lock state (biased lock / thin lock / fat lock) │ │ ─ GC age bits │ ├─────────────────────────────────────────────────────────┤ │ Class Pointer (32 bits compressed / 64 bits full) │ │ ─ pointer to the object's class (Klass* in HotSpot) │ └─────────────────────────────────────────────────────────┘ With UseCompressedOops (default), the class pointer is compressed to 32 bits, giving a 96-bit (12-byte) header.Generational Shenandoah (JEP 516): Best GC for Low Latencyhttps://blog.devops-monk.com/tutorials/java25/generational-shenandoah/Sun, 10 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java25/generational-shenandoah/GC Background: Why Generations Matter The Generational Hypothesis is the foundation of most modern GC design: Most objects die young. In a typical Java application, the vast majority of objects are short-lived: request/response objects, DTOs, builder instances, stream pipeline intermediates. They are created, used briefly, and then immediately eligible for collection. A GC that knows about this pattern can be far more efficient than one that treats all objects equally:Garbage Collection: G1GC, ZGC, Epsilon, and AppCDShttps://blog.devops-monk.com/tutorials/java11/garbage-collection/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java11/garbage-collection/GC Changes Across Java 9–11 Release Change JEP Java 9 G1GC becomes the default GC JEP 248 Java 9 Unified GC logging (-Xlog:gc*) JEP 271 Java 10 Parallel Full GC for G1 JEP 307 Java 10 Application Class-Data Sharing (AppCDS) JEP 310 Java 11 Epsilon: No-Op GC JEP 318 Java 11 ZGC: Scalable Low-Latency GC (experimental) JEP 333 G1GC as Default (JEP 248, Java 9) G1 (Garbage-First) replaced Parallel GC as the default on systems with ≥2 CPUs and ≥2 GB heap.Java 11 Production Checklist and Performance Best Practiceshttps://blog.devops-monk.com/tutorials/java11/production-best-practices/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java11/production-best-practices/Production Readiness Checklist [ ] JDK distribution chosen and version pinned [ ] Heap and Metaspace sized correctly [ ] GC selected and tuned for your workload [ ] Container-aware JVM flags set [ ] AppCDS archive built for faster startup [ ] JFR always-on recording configured [ ] GC logging enabled with rotation [ ] Security-related algorithms locked down [ ] Thread and connection pool sizes verified [ ] JVM exit flags prevent silent crashes Baseline JVM Flags for Java 11 Start with these flags and tune from here:Java 17 Production Checklist and Performance Best Practiceshttps://blog.devops-monk.com/tutorials/java17/production-best-practices/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java17/production-best-practices/Production Baseline JVM Flags Start every Java 17 production deployment with this baseline: java \ # GC — choose one (see GC section) -XX:+UseG1GC \ -XX:MaxGCPauseMillis=200 \ \ # Heap sizing -Xms4g -Xmx4g \ \ # GC logging — essential for diagnosis -Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=20m \ \ # OOM diagnostics -XX:+HeapDumpOnOutOfMemoryError \ -XX:HeapDumpPath=/var/log/app/heap-dump.hprof \ -XX:+ExitOnOutOfMemoryError \ \ # Metaspace -XX:MaxMetaspaceSize=512m \ \ # Code cache -XX:ReservedCodeCacheSize=512m \ \ # JFR — always-on profiling -XX:StartFlightRecording=duration=0,filename=/var/log/app/profile.Java 21 Production Checklist and Performance Best Practiceshttps://blog.devops-monk.com/tutorials/java21/production-best-practices/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java21/production-best-practices/The Production Mindset Migrating to Java 21 unlocks new capabilities, but production readiness requires deliberate configuration. The JVM defaults are conservative — designed to work reasonably across a wide range of workloads, not to be optimal for any specific one. This article covers: Which JVM flags to set for every production Java 21 deployment GC selection and tuning for different workload profiles Virtual thread configuration and monitoring Container-aware JVM settings Observability and profiling Startup and memory optimization JVM Flags: The Production Baseline Start every Java 21 production deployment with this baseline flag set:JVM Improvements: Metaspace, PermGen Removal, and Performancehttps://blog.devops-monk.com/tutorials/java8/jvm-improvements/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java8/jvm-improvements/PermGen Removal: The End of a Classic Error OutOfMemoryError: PermGen space was the Java error that launched a thousand Stack Overflow questions. Application servers would run fine for hours and then fall over during a hot redeploy. The fix — adding more -XX:MaxPermSize — was a band-aid. Java 8 removed the underlying problem entirely. Before Java 8, the JVM heap was divided into several regions. One of them — Permanent Generation (PermGen) — held class metadata, interned strings, and bytecode.Multi-Threaded Steps and Async Processing for Performancehttps://blog.devops-monk.com/tutorials/spring-batch/multi-threaded-steps/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-batch/multi-threaded-steps/Introduction A single-threaded Spring Batch step processes one chunk at a time — read N items, process N items, write N items, repeat. For large data sets this is a bottleneck. Spring Batch offers two in-JVM scaling options: Approach How it works Use when Multi-threaded step Multiple threads each process independent chunks Reader is thread-safe (JdbcPagingItemReader) AsyncItemProcessor Processing runs concurrently; writes remain sequential I/O-bound processors (REST calls, slow enrichment) This article covers both, plus the thread-safety requirements you must meet.Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelizehttps://blog.devops-monk.com/tutorials/java8/parallel-streams/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java8/parallel-streams/How Parallel Streams Work Parallel streams are one of Java 8’s most misused features. It is tempting to add .parallel() to any slow stream pipeline, but the performance characteristics are counterintuitive: parallel can make things slower for small data, and adding blocking I/O inside a parallel stream can stall the entire JVM. This article explains the mechanics, the cases where parallel genuinely helps, and the patterns to avoid. A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results.Partitioning: Splitting Work Across Parallel Workershttps://blog.devops-monk.com/tutorials/spring-batch/partitioning/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-batch/partitioning/Introduction Multi-threaded steps (Article 20) run multiple chunks concurrently from a single reader. Partitioning is different — it splits the data into independent slices before processing starts, then runs each slice as its own StepExecution with its own reader, processor, writer, and metadata. This gives you: True independence between partitions — one partition’s failure doesn’t affect others A separate StepExecution row per partition in BATCH_STEP_EXECUTION — full visibility into per-partition progress The ability to distribute partitions across multiple JVMs (remote partitioning, covered in Article 22) Partitioning Architecture ManagerStep (PartitionStep) │ ├── Partitioner → creates N ExecutionContexts (one per partition) ├── PartitionHandler → distributes partitions to workers │ └── Worker Steps (run per partition) ├── ItemReader → reads only its slice of data ├── ItemProcessor └── ItemWriter The manager step runs once: it calls Partitioner.Performance Tuning: Chunk Size, Connection Pools, and Memory Managementhttps://blog.devops-monk.com/tutorials/spring-batch/performance-tuning/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-batch/performance-tuning/Introduction A poorly tuned batch job can be 10–100x slower than a well-tuned one. The biggest gains come from a handful of settings — chunk size, MySQL JDBC rewrite, connection pool alignment, and avoiding unnecessary object creation. This article covers each systematically. Chunk Size — The Most Impactful Setting Chunk size determines how many items are processed per transaction. Too small = too many round trips to the database. Too large = long transactions, high memory pressure, slower rollback on failure.Remote Partitioning and Remote Chunking with Kafkahttps://blog.devops-monk.com/tutorials/spring-batch/remote-partitioning-kafka/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-batch/remote-partitioning-kafka/Introduction Local partitioning (Article 21) runs all workers on one JVM. When a single machine is the bottleneck — CPU, memory, or network bandwidth — you need workers on separate machines. Spring Batch Integration provides two patterns for this: Pattern What distributes Coordinator controls Workers do Remote Partitioning Partition descriptors (small messages) Data splitting, aggregation Full read-process-write per partition Remote Chunking Actual items (larger messages) Reading Processing + writing only Remote partitioning is the more common choice — workers read directly from the database/file, so only small partition metadata crosses the network.Vector API (JEP 448): SIMD Computation in Javahttps://blog.devops-monk.com/tutorials/java21/vector-api/Mon, 04 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/java21/vector-api/Preview Feature in Java 21 — The Vector API has been in preview since Java 16 (JEP 338). JEP 448 is the sixth preview iteration in Java 21. The API is stable and production-usable with --enable-preview; finalization is pending Project Valhalla value types. What Is SIMD and Why Does It Matter? Modern CPUs can perform the same arithmetic operation on multiple data values in a single instruction. This is called SIMD — Single Instruction, Multiple Data.Async Processing with @Async and Virtual Threadshttps://blog.devops-monk.com/tutorials/spring-boot/spring-boot-async-virtual-threads/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-boot/spring-boot-async-virtual-threads/Not every operation needs to complete before the response returns. Sending an email, generating a report, publishing an event — these can run in the background. Async processing keeps request latency low while the work continues. @Async — Fire and Forget @SpringBootApplication @EnableAsync public class OrderServiceApplication { } @Service @Slf4j public class NotificationService { @Async // runs in a separate thread public void sendOrderConfirmation(Order order) { log.info("Sending confirmation for order {}", order.Caching with Caffeine and Redishttps://blog.devops-monk.com/tutorials/spring-boot/spring-boot-caching/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-boot/spring-boot-caching/Caching sits between your application and the database. A cache hit returns data in microseconds; a database query takes milliseconds. For frequently-read, infrequently-changed data, caching is the highest-leverage performance improvement. Spring Cache Abstraction Spring’s cache abstraction lets you add caching with annotations — the backing store (Caffeine, Redis, Hazelcast) is swappable: @Service @RequiredArgsConstructor public class ProductService { private final ProductRepository repository; @Cacheable("products") // cache the result public Product findById(UUID id) { return repository.GraalVM Native Images with Spring Boot 4: From 8 Seconds to 37ms Startuphttps://blog.devops-monk.com/2026/05/spring-boot-graalvm-native-images/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/2026/05/spring-boot-graalvm-native-images/Spring Boot applications running as GraalVM native images start in milliseconds, use a fraction of the memory, and fit in tiny containers. The tradeoff is a longer build time. In 2026, with Spring Boot 4 and GraalVM 24, native images are production-ready for most Spring applications. This guide covers everything: what Spring AOT does, how to build your first native image, how to fix the common issues, and how to add native builds to CI.GraalVM Native Images: Millisecond Startuphttps://blog.devops-monk.com/tutorials/spring-boot/spring-boot-graalvm-native/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-boot/spring-boot-graalvm-native/A regular Spring Boot application takes 2–10 seconds to start. A GraalVM native image of the same application starts in under 100 milliseconds. For serverless functions, batch jobs, and CLI tools, this is the difference between viable and unusable. What Is a Native Image? GraalVM’s native image compiler performs ahead-of-time (AOT) compilation. Instead of shipping a JAR that the JVM interprets at runtime, you ship a standalone executable that: Contains only the code your application actually uses Has no JVM startup overhead Uses much less memory (no JIT compiler, no class metadata) Starts in milliseconds The tradeoff: compile time increases from seconds to minutes.JPA Performance: Solving N+1, Lazy Loading, and Query Optimizationhttps://blog.devops-monk.com/tutorials/spring-boot/spring-boot-jpa-performance-tuning/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/tutorials/spring-boot/spring-boot-jpa-performance-tuning/JPA makes data access easy — until it silently runs hundreds of queries to load what you think is a single query. This article covers how to find and fix the most common JPA performance problems. Enable Query Logging First You can’t fix what you can’t see. Enable SQL logging before optimizing: logging: level: org.hibernate.SQL: DEBUG org.hibernate.orm.jdbc.bind: TRACE # log bind parameters (Spring Boot 3+) spring: jpa: properties: hibernate: format_sql: true generate_statistics: true # log query count, cache hits, etc.Spring Boot Caching: Multi-Level Cache with Caffeine + Redishttps://blog.devops-monk.com/2026/05/spring-boot-caching-caffeine-redis/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/2026/05/spring-boot-caching-caffeine-redis/Caching reduces database load and response latency. Spring Boot’s cache abstraction lets you add caching with annotations, then swap the implementation (Caffeine, Redis, multi-level) without changing your business code. This guide covers Caffeine for in-JVM caching, Redis for distributed caching, and a multi-level cache that combines both. Spring Cache Abstraction Spring’s cache abstraction uses three annotations: Annotation Behaviour @Cacheable Cache the return value. On subsequent calls, return from cache without executing the method.Spring Boot JPA Performance: Solving N+1, Lazy Loading, and Query Optimizationhttps://blog.devops-monk.com/2026/05/spring-boot-jpa-performance/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/2026/05/spring-boot-jpa-performance/JPA makes database access simple. It also makes it dangerously easy to write code that fires 100 SQL queries to load 10 records. The N+1 problem alone has caused more production performance incidents than almost any other JPA issue. This guide covers how to find and fix the five most common JPA performance problems: N+1 queries, LazyInitializationException, over-fetching, poor connection pool sizing, and Hibernate 6 breaking changes. Enable SQL Logging First Before optimizing anything, see exactly what queries are firing:Spring Boot Virtual Threads: Benchmarks, Pitfalls, and When NOT to Use Themhttps://blog.devops-monk.com/2026/05/spring-boot-virtual-threads/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/2026/05/spring-boot-virtual-threads/Virtual Threads landed in Java 21 as a stable feature, and Spring Boot 3.2 added first-class support with a single property. The promise: write simple blocking code and get WebFlux-level throughput. The reality is mostly true — with some important exceptions. This article covers what Virtual Threads actually are, how to enable them in Spring Boot, real benchmark numbers, the three pitfalls that will silently destroy your performance, and a decision framework for when to use them (and when not to).Spring Boot vs Quarkus in 2026: An Honest Benchmarked Comparisonhttps://blog.devops-monk.com/2026/05/spring-boot-vs-quarkus/Sun, 03 May 2026 00:00:00 +0000https://blog.devops-monk.com/2026/05/spring-boot-vs-quarkus/Every year, someone asks: “Should we use Spring Boot or Quarkus?” In 2026, both frameworks are mature, both run natively, and both work well on Kubernetes. The differences come down to developer experience, ecosystem size, and specific performance characteristics. This is an honest comparison with real numbers, not marketing claims. The Frameworks at a Glance Spring Boot 4 (November 2025): Built on Spring Framework 7. The de-facto standard for enterprise Java.