Understanding Spring Batch In 2024: A Detailed Guide with Code Example

spring batch

Introduction

Spring Batch is a lightweight, robust framework designed for building batch processing applications. Batch processing typically involves bulk-oriented, transactional processing of data. This can be anything from database operations to reading, transforming, and writing data. In enterprise applications, tasks like nightly data jobs, report generation, and data migrations are perfect use cases for Spring Batch.

This blog post will explore Spring Batch fundamentals and guide you through creating a simple batch processing job with a coding example.

What is Spring Batch?

Spring Batch is a part of the Spring ecosystem that focuses on providing reusable and customizable components for processing large volumes of data in a batch style. Batch jobs usually involve repetitive tasks like:

  • Reading data from a data source
  • Processing it
  • Writing it to a destination

Spring Batch supports chunk-oriented processing, which divides large data sets into manageable chunks for processing. This ensures that memory is used efficiently.

Core Concepts of Spring Batch

Before diving into code, let’s understand the key components that make up a Spring Batch job.

Job

A job is the actual batch process. It can have multiple steps and represents the entire batch workflow.

Step

Each job consists of one or more steps. A step defines a specific part of the job’s workflow, such as reading data, processing it, and writing it.

ItemReader

The ItemReader is responsible for reading data. It can read from a file, database, or any other source. After reading the data, it passes it to the next component for processing.

ItemProcessor

This component processes or transforms the data. For instance, you might take raw data, validate it, or convert it to a different format.

ItemWriter

The ItemWriter writes the processed data to a destination, such as a database or a file.

JobRepository

This is used to store the status of job executions.

Setting Up Spring Batch in Your Project

To set up Spring Batch, you need to include the necessary dependencies in your pom.xml or build.gradle

For Maven

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
</dependency>

For Gradle

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-batch'
    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
    runtimeOnly 'com.h2database:h2'
}

The spring-boot-starter-batch dependency brings in all the necessary components for Spring Batch, and H2 is used for the in-memory database to store job status.

Building a Batch Job: Coding Example

Let’s create a simple Spring Batch job that reads a list of users from a CSV file, processes it, and stores the data into an in-memory database.

Step 1: Define the Model
public class User {
    private String id;
    private String name;
    private String email;

    // Constructors, getters, and setters
}
Step 2: Configure the Batch Job

We’ll start by configuring the job in a BatchConfiguration class.

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Bean
    public FlatFileItemReader<User> reader() {
        return new FlatFileItemReaderBuilder<User>()
                .name("userItemReader")
                .resource(new ClassPathResource("users.csv"))
                .delimited()
                .names(new String[]{"id", "name", "email"})
                .fieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
                    setTargetType(User.class);
                }})
                .build();
    }

    @Bean
    public ItemProcessor<User, User> processor() {
        return user -> {
            // You can add data processing logic here
            user.setEmail(user.getEmail().toLowerCase()); // Example: convert email to lowercase
            return user;
        };
    }

    @Bean
    public JdbcBatchItemWriter<User> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<User>()
                .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
                .sql("INSERT INTO users (id, name, email) VALUES (:id, :name, :email)")
                .dataSource(dataSource)
                .build();
    }

    @Bean
    public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
        return jobBuilderFactory.get("importUserJob")
                .incrementer(new RunIdIncrementer())
                .listener(listener)
                .flow(step1)
                .end()
                .build();
    }

    @Bean
    public Step step1(JdbcBatchItemWriter<User> writer) {
        return stepBuilderFactory.get("step1")
                .<User, User>chunk(10)
                .reader(reader())
                .processor(processor())
                .writer(writer)
                .build();
    }
}
Step 3: Listener for Job Completion

A listener is useful to trigger events like notifications or logging once the job completes.

@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {

    private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);

    @Override
    public void afterJob(JobExecution jobExecution) {
        if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
            log.info("Job completed successfully!");
        }
    }
}
Step 4: Create Database Schema

Spring Batch needs a database schema to store job execution data. You can use H2 in-memory database for testing.

CREATE TABLE users (
    id VARCHAR(255) NOT NULL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);
CSV Input File (users.csv)

Create a CSV file (users.csv) under the src/main/resources folder with the following content:

id,name,email
1,John Doe,john.doe@example.com
2,Jane Roe,jane.roe@example.com
3,Richard Roe,richard.roe@example.com
Step 6: Running the Job

Once everything is configured, you can run the Spring Boot application. The job will read data from the CSV, process it, and write it to the database.

Conclusion

Spring Batch is a powerful framework for batch processing, offering great flexibility and performance. In this tutorial, we learned how to create a simple Spring Batch job with the core components: ItemReader, ItemProcessor, and ItemWriter. Whether you need to process a large dataset or execute complex batch workflows, Spring Batch is a reliable choice for enterprise-level applications.

Explore our diverse collection of blogs covering a wide range of topics here.

Address

4232 Farnum Road, New York, New York(NY), 10029

Telephone: 212-289-5109

Mobile: 917-216-4839

Copyright © 2024 Learn Spring Boot Online