Table of Contents
Introduction
Spring Batch is a lightweight, robust framework designed for building batch processing applications. Batch processing typically involves bulk-oriented, transactional processing of data. This can be anything from database operations to reading, transforming, and writing data. In enterprise applications, tasks like nightly data jobs, report generation, and data migrations are perfect use cases for Spring Batch.
This blog post will explore Spring Batch fundamentals and guide you through creating a simple batch processing job with a coding example.
What is Spring Batch?
Spring Batch is a part of the Spring ecosystem that focuses on providing reusable and customizable components for processing large volumes of data in a batch style. Batch jobs usually involve repetitive tasks like:
- Reading data from a data source
- Processing it
- Writing it to a destination
Spring Batch supports chunk-oriented processing, which divides large data sets into manageable chunks for processing. This ensures that memory is used efficiently.
Core Concepts of Spring Batch
Before diving into code, let’s understand the key components that make up a Spring Batch job.
Job
A job is the actual batch process. It can have multiple steps and represents the entire batch workflow.
Step
Each job consists of one or more steps. A step defines a specific part of the job’s workflow, such as reading data, processing it, and writing it.
ItemReader
The ItemReader is responsible for reading data. It can read from a file, database, or any other source. After reading the data, it passes it to the next component for processing.
ItemProcessor
This component processes or transforms the data. For instance, you might take raw data, validate it, or convert it to a different format.
ItemWriter
The ItemWriter writes the processed data to a destination, such as a database or a file.
JobRepository
This is used to store the status of job executions.
Setting Up Spring Batch in Your Project
To set up Spring Batch, you need to include the necessary dependencies in your pom.xml or build.gradle
For Maven
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
For Gradle
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-batch'
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
runtimeOnly 'com.h2database:h2'
}
The spring-boot-starter-batch
dependency brings in all the necessary components for Spring Batch, and H2 is used for the in-memory database to store job status.
Building a Batch Job: Coding Example
Let’s create a simple Spring Batch job that reads a list of users from a CSV file, processes it, and stores the data into an in-memory database.
Step 1: Define the Model
public class User {
private String id;
private String name;
private String email;
// Constructors, getters, and setters
}
Step 2: Configure the Batch Job
We’ll start by configuring the job in a BatchConfiguration
class.
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Bean
public FlatFileItemReader<User> reader() {
return new FlatFileItemReaderBuilder<User>()
.name("userItemReader")
.resource(new ClassPathResource("users.csv"))
.delimited()
.names(new String[]{"id", "name", "email"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
setTargetType(User.class);
}})
.build();
}
@Bean
public ItemProcessor<User, User> processor() {
return user -> {
// You can add data processing logic here
user.setEmail(user.getEmail().toLowerCase()); // Example: convert email to lowercase
return user;
};
}
@Bean
public JdbcBatchItemWriter<User> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<User>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO users (id, name, email) VALUES (:id, :name, :email)")
.dataSource(dataSource)
.build();
}
@Bean
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(step1)
.end()
.build();
}
@Bean
public Step step1(JdbcBatchItemWriter<User> writer) {
return stepBuilderFactory.get("step1")
.<User, User>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer)
.build();
}
}
Step 3: Listener for Job Completion
A listener is useful to trigger events like notifications or logging once the job completes.
@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);
@Override
public void afterJob(JobExecution jobExecution) {
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
log.info("Job completed successfully!");
}
}
}
Step 4: Create Database Schema
Spring Batch needs a database schema to store job execution data. You can use H2 in-memory database for testing.
CREATE TABLE users (
id VARCHAR(255) NOT NULL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
CSV Input File (users.csv
)
Create a CSV file (users.csv
) under the src/main/resources
folder with the following content:
id,name,email
1,John Doe,john.doe@example.com
2,Jane Roe,jane.roe@example.com
3,Richard Roe,richard.roe@example.com
Step 6: Running the Job
Once everything is configured, you can run the Spring Boot application. The job will read data from the CSV, process it, and write it to the database.
Conclusion
Spring Batch is a powerful framework for batch processing, offering great flexibility and performance. In this tutorial, we learned how to create a simple Spring Batch job with the core components: ItemReader
, ItemProcessor
, and ItemWriter
. Whether you need to process a large dataset or execute complex batch workflows, Spring Batch is a reliable choice for enterprise-level applications.
Explore our diverse collection of blogs covering a wide range of topics here.