Performance

26 posts in this section

May 04, 2026 Abhay 7 min read

JVM Improvements: Metaspace, PermGen Removal, and Performance

PermGen Removal: The End of a Classic Error OutOfMemoryError: PermGen space was the Java error that launched a thousand Stack Overflow questions. Application servers would run fine for hours and then fall over during a hot redeploy. The fix — adding more -XX:MaxPermSize — was a band-aid. Java 8 removed the underlying problem entirely. Before Java 8, the JVM heap was divided into several regions. One of them — Permanent Generation (PermGen) — held class metadata, interned strings, and bytecode.

May 04, 2026 Abhay 6 min read

Multi-Threaded Steps and Async Processing for Performance

Introduction A single-threaded Spring Batch step processes one chunk at a time — read N items, process N items, write N items, repeat. For large data sets this is a bottleneck. Spring Batch offers two in-JVM scaling options: Approach How it works Use when Multi-threaded step Multiple threads each process independent chunks Reader is thread-safe (JdbcPagingItemReader) AsyncItemProcessor Processing runs concurrently; writes remain sequential I/O-bound processors (REST calls, slow enrichment) This article covers both, plus the thread-safety requirements you must meet.

May 04, 2026 Abhay 7 min read

Parallel Streams: ForkJoinPool, Spliterators, and When NOT to Parallelize

How Parallel Streams Work Parallel streams are one of Java 8’s most misused features. It is tempting to add .parallel() to any slow stream pipeline, but the performance characteristics are counterintuitive: parallel can make things slower for small data, and adding blocking I/O inside a parallel stream can stall the entire JVM. This article explains the mechanics, the cases where parallel genuinely helps, and the patterns to avoid. A parallel stream splits its source into sub-sequences, processes each sub-sequence on a separate thread, and merges the results.

May 04, 2026 Abhay 5 min read

Partitioning: Splitting Work Across Parallel Workers

Introduction Multi-threaded steps (Article 20) run multiple chunks concurrently from a single reader. Partitioning is different — it splits the data into independent slices before processing starts, then runs each slice as its own StepExecution with its own reader, processor, writer, and metadata. This gives you: True independence between partitions — one partition’s failure doesn’t affect others A separate StepExecution row per partition in BATCH_STEP_EXECUTION — full visibility into per-partition progress The ability to distribute partitions across multiple JVMs (remote partitioning, covered in Article 22) Partitioning Architecture ManagerStep (PartitionStep) │ ├── Partitioner → creates N ExecutionContexts (one per partition) ├── PartitionHandler → distributes partitions to workers │ └── Worker Steps (run per partition) ├── ItemReader → reads only its slice of data ├── ItemProcessor └── ItemWriter The manager step runs once: it calls Partitioner.

May 04, 2026 Abhay 5 min read

Performance Tuning: Chunk Size, Connection Pools, and Memory Management

Introduction A poorly tuned batch job can be 10–100x slower than a well-tuned one. The biggest gains come from a handful of settings — chunk size, MySQL JDBC rewrite, connection pool alignment, and avoiding unnecessary object creation. This article covers each systematically. Chunk Size — The Most Impactful Setting Chunk size determines how many items are processed per transaction. Too small = too many round trips to the database. Too large = long transactions, high memory pressure, slower rollback on failure.

May 04, 2026 Abhay 6 min read

Remote Partitioning and Remote Chunking with Kafka

Introduction Local partitioning (Article 21) runs all workers on one JVM. When a single machine is the bottleneck — CPU, memory, or network bandwidth — you need workers on separate machines. Spring Batch Integration provides two patterns for this: Pattern What distributes Coordinator controls Workers do Remote Partitioning Partition descriptors (small messages) Data splitting, aggregation Full read-process-write per partition Remote Chunking Actual items (larger messages) Reading Processing + writing only Remote partitioning is the more common choice — workers read directly from the database/file, so only small partition metadata crosses the network.

May 04, 2026 Abhay 13 min read

Vector API (JEP 448): SIMD Computation in Java

Preview Feature in Java 21 — The Vector API has been in preview since Java 16 (JEP 338). JEP 448 is the sixth preview iteration in Java 21. The API is stable and production-usable with --enable-preview; finalization is pending Project Valhalla value types. What Is SIMD and Why Does It Matter? Modern CPUs can perform the same arithmetic operation on multiple data values in a single instruction. This is called SIMD — Single Instruction, Multiple Data.

May 03, 2026 Abhay 6 min read

Async Processing with @Async and Virtual Threads

Not every operation needs to complete before the response returns. Sending an email, generating a report, publishing an event — these can run in the background. Async processing keeps request latency low while the work continues. @Async — Fire and Forget @SpringBootApplication @EnableAsync public class OrderServiceApplication { } @Service @Slf4j public class NotificationService { @Async // runs in a separate thread public void sendOrderConfirmation(Order order) { log.info("Sending confirmation for order {}", order.

May 03, 2026 Abhay 6 min read

Caching with Caffeine and Redis

Caching sits between your application and the database. A cache hit returns data in microseconds; a database query takes milliseconds. For frequently-read, infrequently-changed data, caching is the highest-leverage performance improvement. Spring Cache Abstraction Spring’s cache abstraction lets you add caching with annotations — the backing store (Caffeine, Redis, Hazelcast) is swappable: @Service @RequiredArgsConstructor public class ProductService { private final ProductRepository repository; @Cacheable("products") // cache the result public Product findById(UUID id) { return repository.

May 03, 2026 Abhay 7 min read

GraalVM Native Images with Spring Boot 4: From 8 Seconds to 37ms Startup

Spring Boot applications running as GraalVM native images start in milliseconds, use a fraction of the memory, and fit in tiny containers. The tradeoff is a longer build time. In 2026, with Spring Boot 4 and GraalVM 24, native images are production-ready for most Spring applications. This guide covers everything: what Spring AOT does, how to build your first native image, how to fix the common issues, and how to add native builds to CI.