Spring-Batch

25 posts in this section

Introduction to Spring Batch: What, Why, and Architecture

Every application has a class of work that doesn’t fit the request-response model: process 2 million orders overnight, generate 500,000 monthly statements, migrate 10 years of legacy data before Monday morning. This work needs to be reliable, restartable after failures, and fast enough to finish in the available window. That’s what Spring Batch is built for. This article covers what Spring Batch is, when to use it, and how its architecture works.

Continue reading »

JobParameters, ExecutionContext, and Job Restartability

Introduction Two mechanisms let you pass information into and through a Spring Batch job: JobParameters — input values provided at launch time (a date, a file path, a run ID). They are immutable and persisted to BATCH_JOB_EXECUTION_PARAMS. ExecutionContext — a key-value map that steps can read and write during execution. It is persisted after each chunk commit, enabling restartability. Understanding both is essential for building jobs that can be safely re-run, restarted after failure, and parameterised for different data sets.

Continue reading »

JobRepository and Batch Metadata: How Spring Batch Tracks Everything

Introduction Every time Spring Batch runs a job it records the run’s history in a set of relational tables. This metadata is not optional — it is what makes Spring Batch reliable. Without it there would be no restart capability, no duplicate-run prevention, and no audit trail. Understanding the metadata layer is essential for debugging failures, building monitoring dashboards, and designing restartable jobs. In this article you will learn: The role of JobRepository and JobExplorer in the Spring Batch architecture The six metadata tables — their schema, purpose, and relationships How Spring Batch uses these tables to enable restartability How to query batch history directly in MySQL How to access metadata programmatically with JobExplorer Key changes in Spring Batch 5 (removed MapJobRepositoryFactoryBean, new testing approach) All examples use Spring Boot 3.

Continue reading »

Listeners: Hooking into Job, Step, and Chunk Lifecycle Events

Introduction Spring Batch emits lifecycle events at every stage of execution — before and after a job runs, before and after each step, before and after each chunk, and before and after each individual read/process/write call. Listeners let you hook into these events without modifying your core batch logic. Common uses: Log start/end times and item counts Send success/failure notifications (Slack, email, PagerDuty) Publish metrics to Prometheus or CloudWatch Log every skipped item to a dead-letter table Reset resources before a step begins Listener Hierarchy Job ├── JobExecutionListener beforeJob / afterJob │ └── Step ├── StepExecutionListener beforeStep / afterStep ├── ChunkListener beforeChunk / afterChunk / afterChunkError ├── ItemReadListener beforeRead / afterRead / onReadError ├── ItemProcessListener beforeProcess / afterProcess / onProcessError └── ItemWriteListener beforeWrite / afterWrite / onWriteError └── SkipListener onSkipInRead / onSkipInWrite / onSkipInProcess JobExecutionListener @Component public class ImportJobListener implements JobExecutionListener { private static final Logger log = LoggerFactory.

Continue reading »

Reading Flat Files: CSV, Fixed-Width, and Delimited with FlatFileItemReader

Introduction Flat files — CSV exports, fixed-width mainframe feeds, pipe-delimited data dumps — are the most common input source for batch jobs. Spring Batch’s FlatFileItemReader handles all of them. It is restartable out of the box: it persists its line-number position in the ExecutionContext so that a restarted job resumes exactly where it crashed. In this article you will build a complete order-import job that reads a CSV file and inserts rows into a MySQL orders table.

Continue reading »

Reading from External Sources: REST APIs, S3, and Custom ItemReaders

Introduction Not all batch input comes from files or databases. You may need to pull orders from an e-commerce API, sync products from a supplier feed, or process CSV exports stored in Amazon S3. Spring Batch provides MultiResourceItemReader for S3 files and a clean ItemReader interface for anything else. In this article you will build: A custom ItemReader that pages through a REST API An S3 reader that downloads files on demand A composite reader that merges multiple sources into one step The ItemReader Contract The entire ItemReader interface is one method:

Continue reading »

Reading from MySQL: JdbcCursorItemReader and JdbcPagingItemReader

Introduction Most real batch jobs read from a database, not a file. Spring Batch provides two JDBC readers for this: Reader Strategy Thread-safe Use when JdbcCursorItemReader Open a server-side cursor, stream rows No Single-threaded step, huge result sets JdbcPagingItemReader Execute LIMIT / OFFSET queries in a loop Yes Multi-threaded steps, sorted data Both handle restartability automatically and both work with any DataSource — including MySQL. JdbcCursorItemReader How it works The reader opens a single JDBC ResultSet on Step.

Continue reading »

Reading with JPA: JpaPagingItemReader and Entity-Based Reading

Introduction When your application already uses JPA/Hibernate, JpaPagingItemReader lets you read data using JPQL queries and mapped entities instead of raw JDBC. You get the full object graph, type-safe queries, and familiar entity lifecycle — but you also inherit JPA’s pitfalls: the N+1 problem, session-per-read overhead, and first-level cache growth. This article covers: When to choose JpaPagingItemReader over JdbcPagingItemReader Setting up the reader with JPQL and named queries Fetching associations to avoid N+1 Clearing the persistence context to prevent memory leaks A complete order-processing example with MySQL When to Use JpaPagingItemReader Use it when:

Continue reading »

Retry Logic: Handling Transient Failures Gracefully

Introduction Batch jobs interact with databases, REST APIs, and file systems — all of which fail transiently. A MySQL deadlock resolves itself in milliseconds. A network timeout to an external service clears up in seconds. Retrying these transient failures automatically is far better than failing the entire job and requiring a manual restart. Spring Batch has built-in retry support at the step level, integrated with its transaction management. This article covers everything you need to configure robust retry behaviour.

Continue reading »

Skip Logic, Dead Letter Patterns, and Job Restart Strategies

Introduction Retry handles transient failures. Skip handles permanent ones — bad data rows, constraint violations, malformed records that will never succeed no matter how many times you retry. Skip logic lets your job continue processing good records while recording bad ones for human review. This article covers: Configuring skip for specific exception types Custom SkipPolicy for fine-grained control Dead-letter table pattern for tracking skipped items Stopping a job intentionally vs failing it Handling abandoned executions Designing jobs that restart safely after any failure Basic Skip Configuration return new StepBuilder("importOrdersStep", jobRepository) .

Continue reading »