As requested in the Spring Batch issue reporting guidelines, this project provides an MCVE on potential issue with multi-threaded JSON item writer (Spring Batch issue #4708).
Sources are initialized from spring-batch-mcve.zip.
Spring Batch offers the possibility to configure a chunk-oriented step as multi-threaded by configuring a TaskExecutor in the Step configuration.
When configured with multi-thread, the produced JSON is sometimes not correctly formatted.
Detected with Spring Batch 5.1.2 and Spring Framework 6.1.13.
Sometime the produced JSON is misformatted :
[
,
{"code":10002,"ref":"B2C3D4E5F6G7","type":11,"nature":6,"etat":2,"ref2":"B2C3D4E5F6G7"} {"code":10001,"ref":"A1B2C3D4E5F6","type":10,"nature":5,"etat":1,"ref2":"A1B2C3D4E5F6"},
{"code":10003,"ref":"C3D4E5F6G7H8","type":12,"nature":7,"etat":3,"ref2":"C3D4E5F6G7H8"},
{"code":10004,"ref":"D4E5F6G7H8I9","type":13,"nature":8,"etat":4,"ref2":"D4E5F6G7H8I9"}
]
[
{"code":10001,"ref":"A1B2C3D4E5F6","type":10,"nature":5,"etat":1,"ref2":"A1B2C3D4E5F6"},
{"code":10002,"ref":"B2C3D4E5F6G7","type":11,"nature":6,"etat":2,"ref2":"B2C3D4E5F6G7"},
{"code":10003,"ref":"C3D4E5F6G7H8","type":12,"nature":7,"etat":3,"ref2":"C3D4E5F6G7H8"},
{"code":10004,"ref":"D4E5F6G7H8I9","type":13,"nature":8,"etat":4,"ref2":"D4E5F6G7H8I9"}
]
When the file is not well formatted, the error is always the same :
- a single comma appears at the first line
- the second line contains 2 JSON records without separator
As commented in the Spring Batch issue, setting the transactional to false
enforce the buffer to be flushed to the file independently of the transaction, and it works ! 👍
source
Starting with a chunk size quite big (>100), the issue was not met, or perhaps not seen because the file format was not checked systematically.
For other tests, the chunk size has been reduced to 2, and issue start to appear (or detected).
The class SynchronizedItemStreamWriter
manages the multi-thread with a Lock.
Adding logs show that data seems to be flushed (meaning written to the real file) by the TransactionAwareBufferedWriter, but they are actually written right before the commit of the transaction.
Digging into the code:
- the
JsonFileItemWriter
inherits fromAbstractFileItemWriter
AbstractFileItemWriter.write()
calls an internalOutputState.write()
to write data (check the code)- the
OutputState.write()
method writes and flush data to the writer (check the code) - the writer is a
TransactionAwareBufferedWriter
(check the code) - this
TransactionAwareBufferedWriter
is actually writing data into the file, but the methodflush()
does not write any data (check the code)
Going even further, the issue seem to appear between the lock released by SynchronizedItemStreamWriter
and the semaphore acquirement by TaskletStep
to update the stream (and actually write data to the real file before to commit the transaction).
In a multi-threaded step, it seems that follow happens:
- thread T1 begins to read data and produces formatted JSON without JSON delimiter :
{"code":10001}, {"code":10002}
- thread T2 reads data and produces formatted JSON with a JSON delimiter :
, {"code":10003}, {"code":10004}
- issue happens here:
- both threads releases locks
- the
TaskletStep
semaphore can be acquired by T1 or T2 - if T1 gets the semaphore, all is right :
{"code":10001}, {"code":10002}, {"code":10003}, {"code":10004}
- if T2 gets the semaphore, the 2nd record is written first and makes the JSON wrong formatted
, {"code":10003}, {"code":10004} {"code":10001}, {"code":10002}
Digging into the code, the multiple threads seem to be created by TaskExecutorRepeatTemplate.getNextResult()
.
The idea is to create the first thread to manage the first chunk, and wait for the completion of this thread. When the thread is terminated, other threads are created as usual. Specific code is on (#9471d7c).
# Build the jar with specific profile to embed the fix
mvn clean package -Pfix_latch
# Run the tests
./run.sh
The file MyBatchJobConfiguration.java defines all the beans to build a Spring Batch job composed of a single chunk-oriented step.
The step :
- reads data from H2 DB
- processes data (simply write to the console)
- writes data to the file
mydata_YYYYMMDD_hhmmssSSS.json
Multi-threading is configured helping the TaskExecutor
and deprecated throttleLimit(2)
method.
The pom.xml
provides profiles to embed optional sources :
- profile
logs
adds some updated Spring Batch sources to add logs (sources) - profile
fix_latch
adds some updated Spring Batch sources to add logs and the fix onTaskExecutorRepeatTemplate
withCountDownLatch
(sources)
The folder (./src/main/java_spring_batch_5.1.2/) contains original of Spring Batch sources (version 5.1.2).
The command is working :
mvn package exec:java -Dexec.mainClass=org.springframework.batch.MyBatchJobConfiguration
but as the error is met randomly, it's better to run a test loop until encountering the error.
The script run.sh runs a loop of 100 to run the jar.
# Build the jar
# 1 optional profile can be added :
# -Plogs
# -Pfix_latch
mvn clean package # [-Plogs|-Pfix_latch]
# Run the tests
./run.sh
Example of logs:
================================ loop 1
COMPLETED
Renaming output.log to mydata_20241115_173300344.json.log
================================ loop 2
COMPLETED
2:,
Error: Found empty line in the json file mydata_20241115_173302605.json
Renaming output.log to mydata_20241115_173302605.json____ko.log
When export is:
- OK : the
output.log
is simply renamed with the JSON filename - KO : the
output.log
is renamed with the JSON filename suffixed by____ko
When run with a fix, 100 JSON files are produced.