Spring-Batch

25 posts in this section

Best Practices Reference: The Complete Spring Batch Checklist

Introduction This is the final article in the Spring Batch Tutorial series. It consolidates every hard-won lesson into actionable checklists — use it as a review before deploying a new batch job to production, or as a reference when debugging an existing one. Part 1: Job Design Always design for restartability Every step that writes to a database uses idempotent SQL (ON DUPLICATE KEY UPDATE or WHERE NOT EXISTS) Every ItemReader has a unique .

Continue reading »

Multi-Threaded Steps and Async Processing for Performance

Introduction A single-threaded Spring Batch step processes one chunk at a time — read N items, process N items, write N items, repeat. For large data sets this is a bottleneck. Spring Batch offers two in-JVM scaling options: Approach How it works Use when Multi-threaded step Multiple threads each process independent chunks Reader is thread-safe (JdbcPagingItemReader) AsyncItemProcessor Processing runs concurrently; writes remain sequential I/O-bound processors (REST calls, slow enrichment) This article covers both, plus the thread-safety requirements you must meet.

Continue reading »

Partitioning: Splitting Work Across Parallel Workers

Introduction Multi-threaded steps (Article 20) run multiple chunks concurrently from a single reader. Partitioning is different — it splits the data into independent slices before processing starts, then runs each slice as its own StepExecution with its own reader, processor, writer, and metadata. This gives you: True independence between partitions — one partition’s failure doesn’t affect others A separate StepExecution row per partition in BATCH_STEP_EXECUTION — full visibility into per-partition progress The ability to distribute partitions across multiple JVMs (remote partitioning, covered in Article 22) Partitioning Architecture ManagerStep (PartitionStep) │ ├── Partitioner → creates N ExecutionContexts (one per partition) ├── PartitionHandler → distributes partitions to workers │ └── Worker Steps (run per partition) ├── ItemReader → reads only its slice of data ├── ItemProcessor └── ItemWriter The manager step runs once: it calls Partitioner.

Continue reading »

Performance Tuning: Chunk Size, Connection Pools, and Memory Management

Introduction A poorly tuned batch job can be 10–100x slower than a well-tuned one. The biggest gains come from a handful of settings — chunk size, MySQL JDBC rewrite, connection pool alignment, and avoiding unnecessary object creation. This article covers each systematically. Chunk Size — The Most Impactful Setting Chunk size determines how many items are processed per transaction. Too small = too many round trips to the database. Too large = long transactions, high memory pressure, slower rollback on failure.

Continue reading »

Remote Partitioning and Remote Chunking with Kafka

Introduction Local partitioning (Article 21) runs all workers on one JVM. When a single machine is the bottleneck — CPU, memory, or network bandwidth — you need workers on separate machines. Spring Batch Integration provides two patterns for this: Pattern What distributes Coordinator controls Workers do Remote Partitioning Partition descriptors (small messages) Data splitting, aggregation Full read-process-write per partition Remote Chunking Actual items (larger messages) Reading Processing + writing only Remote partitioning is the more common choice — workers read directly from the database/file, so only small partition metadata crosses the network.

Continue reading »

Scheduling Batch Jobs: @Scheduled, Quartz, and Clustered Scheduling

Introduction A batch job that only runs manually is barely useful. Production jobs run on a schedule — nightly, hourly, after a file arrives. Spring Boot provides three scheduling options: Option Persistence Clustered Use when @Scheduled No No Simple cron on a single node Quartz Scheduler Yes (DB) Yes HA scheduling, persistent triggers External scheduler (Kubernetes CronJob, Airflow) Varies Yes Complex pipelines, dependency management @Scheduled — Simple Cron Trigger The simplest approach: annotate a method with @Scheduled and run the job from it.

Continue reading »

Advanced Processing: CompositeItemProcessor, External APIs, and Async Processing

Introduction Real batch jobs often need more than one transformation step per item. You might validate first, then normalize, then enrich, then convert to the output type. CompositeItemProcessor chains multiple single-responsibility processors together. For I/O-bound enrichment steps, AsyncItemProcessor runs processing concurrently on a thread pool — giving you parallelism without rewriting your step as multi-threaded. CompositeItemProcessor CompositeItemProcessor chains a list of processors. The output of each processor becomes the input to the next.

Continue reading »

Advanced Writers: JpaItemWriter, CompositeItemWriter, and Custom Writers

Introduction The previous article covered the two most common writers: FlatFileItemWriter for files and JdbcBatchItemWriter for database inserts. This article covers advanced scenarios: Persisting JPA entities with JpaItemWriter Writing to multiple destinations simultaneously with CompositeItemWriter Routing items to different writers based on content with ClassifierCompositeItemWriter Building custom writers for REST APIs, message queues, and cloud storage Combining a writer with a post-step cleanup tasklet JpaItemWriter JpaItemWriter calls EntityManager.merge() on each item.

Continue reading »

Chunk-Oriented Processing: The Core Spring Batch Pattern

The read-process-write loop is at the heart of almost every Spring Batch job. Understanding exactly how it works — where transactions begin and end, what gets rolled back on failure, and how Spring Batch knows where to restart — makes everything else in the framework click into place. This article goes deep on chunk-oriented processing: the execution model, the three interfaces, the counters that track progress, and how to size chunks correctly.

Continue reading »

Configuring Jobs and Steps: Flows, Decisions, and Conditional Execution

Introduction So far every example has had a single step. Real batch jobs usually have multiple steps — validate input, import data, generate a report, send a notification, clean up temp files. Spring Batch’s JobBuilder DSL lets you compose these steps into flows with conditional branching, parallel execution, and decision logic. This article covers: Linear multi-step jobs Conditional step transitions using ExitStatus JobExecutionDecider for runtime branching Parallel flows with split Nested jobs with FlowStep The Flow abstraction for reusable step sequences Linear Multi-Step Job The simplest multi-step job runs steps in sequence.

Continue reading »