Jackson data format module for reading and writing CSV encoded data, either as "raw" data (sequence of String arrays), or via data binding to/from Java Objects (POJOs).
Project is licensed under Apache License 2.0.
Since version 2.3 this module is considered complete and production ready. All Jackson layers (streaming, databind, tree model) are supported.
To use this extension on Maven-based projects, use following dependency:
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>2.13.0</version>
</dependency>
(with whatever is the latest version)
- Reading CSV with Jackson (April 2021)
- Writing CSV with Jackson (October 2021)
CSV documents are essentially rows of data, instead of JSON Objects (sequences of key/value pairs).
So one potential way to expose this data is to expose a sequence of JSON arrays; and similarly allow writing of arrays. Jackson supports this use-case (which works if you do not pass "CSV schema"), but it is not a very convenient way.
The alternative (and most commonly used) approach is to use a "CSV schema", object that defines set of names (and optionally types) for columns. This allows CsvParser
to expose CSV data as if it was a sequence of JSON objects, name/value pairs.
So how do you get a CSV Schema instance to use? There are 3 ways:
- Create schema based on a Java class
- Build schema manually
- Use the first line of CSV document to get the names (no types) for Schema
Here is code for above cases:
// Schema from POJO (usually has @JsonPropertyOrder annotation)
CsvSchema schema = mapper.schemaFor(Pojo.class);
// Manually-built schema: one with type, others default to "STRING"
CsvSchema schema = CsvSchema.builder()
.addColumn("firstName")
.addColumn("lastName")
.addColumn("age", CsvSchema.ColumnType.NUMBER)
.build();
// Read schema from the first line; start with bootstrap instance
// to enable reading of schema from the first line
// NOTE: reads schema and uses it for binding
CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
ObjectMapper mapper = new CsvMapper();
mapper.readerFor(Pojo.class).with(bootstrapSchema).readValue(csv);
It is important to note that the schema object is needed to ensure correct ordering of columns; schema instances are immutable and fully reusable (as are ObjectWriter
instances).
Note also that while explicit type can help efficiency it is usually not required, as Jackson data binding can do common conversions/coercions such as parsing numbers from Strings.
CSV content can be read either using CsvFactory
(and parser, generators it creates) directly, or through CsvMapper
(extension of standard ObjectMapper
).
When using CsvMapper
, you will be creating ObjectReader
or ObjectWriter
instances that pass CsvSchema
along to CsvParser
/ CsvGenerator
.
When creating parser/generator directly, you will need to explicitly call setSchema(schema)
before starting to read/write content.
The most common method for reading CSV data, then, is:
CsvMapper mapper = new CsvMapper();
Pojo value = ...;
CsvSchema schema = mapper.schemaFor(Pojo.class); // schema from 'Pojo' definition
String csv = mapper.writer(schema).writeValueAsString(value);
MappingIterator<Pojo> it = mapper.readerFor(Pojo.class).with(schema)
.readValues(csv);
// Either read them all one by one (streaming)
while (it.hasNextValue()) {
Pojo value = it.nextValue();
// ... do something with the value
}
// or, alternatively all in one go
List<Pojo> all = it.readAll();
But even if you do not know (or care) about column names you can read/write CSV documents. The main difference is that in this case data is exposed as a sequence of ("JSON") Arrays, not Objects, as "raw" tabular data.
So let's consider the following CSV input:
a,b
c,d
e,f
By default, Jackson CsvParser
would see it as equivalent to the following JSON:
["a","b"]
["c","d"]
["e","f"]
This is easy to use; in fact, if you ignore everything to do with Schema from the above examples, you get working code. For example:
CsvMapper mapper = new CsvMapper();
// important: we need "array wrapping" (see next section) here:
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
File csvFile = new File("input.csv"); // or from String, URL etc
MappingIterator<String[]> it = mapper.readerFor(String[].class).readValues(csvFile);
while (it.hasNext()) {
String[] row = it.next();
// and voila, column values in an array. Works with Lists as well
}
But if you want a "data as Map" approach, with data that has expected column names as the first row, followed by data rows, you can iterate over entries quite conveniently as well. Assuming we had CSV content like:
name,age
Billy,28
Barbara,36
we could use the following code:
File csvFile = new File(fileName);
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader(); // use first row as header; otherwise defaults are fine
MappingIterator<Map<String,String>> it = mapper.readerFor(Map.class)
.with(schema)
.readValues(csvFile);
while (it.hasNext()) {
Map<String,String> rowAsMap = it.next();
// access by column name, as defined in the header row...
}
and get two rows as java.util.Map
s, similar to what JSON like this
{"name":"Billy","age":"28"}
{"name":"Barbara","age":"36"}
would produce.
Additionally, to generate a schema for the Map<String,String>
,
we could do the following :
CsvSchema.Builder schema = new CsvSchema.Builder();
for (String value : map.keySet())
{
schema.addColumn(value, CsvSchema.ColumnType.STRING);
}
new CsvMapper().writerFor(Map.class).with(schema.build());
In addition to reading things as root-level Objects or arrays, you can also force use of virtual "array wrapping".
This means that using earlier CSV data example, parser would instead expose it similar to following JSON:
[
["a","b"]
["c","d"]
["e","f"]
]
This is useful if functionality expects a single ("JSON") Array; this was the case for example when using ObjectReader.readValues()
functionality.
Besides defining how CSV columns are mapped to and from Java Object properties, CsvSchema
also
defines low-level encoding details. These are details that can be changed by using various withXxx()
and
withoutXxx
methods (or through associated CsvSchema.Builder
object); for example:
CsvSchema schema = mapper.schemaFor(Pojo.class);
// let's do pipe-delimited, not comma-delimited
schema = schema.withColumnSeparator('|')
// and write Java nulls as "NULL" (instead of empty string)
.withNullValue("NULL")
// and let's NOT allow escaping with backslash ('\')
.withoutEscapeChar()
;
ObjectReader r = mapper.readerFor(Pojo.class).with(schema);
Pojo value = r.readValue(csvInput);
For a full description of all configurability, please see CsvSchema.
- Wiki (includes javadocs)
- How-to
- Framework-specific
- [Read and Write CSV Data with jackson-dataformat-csv](Read and Write CSV Data with jackson-dataformat-csv) (JBeret)
- Performance
Since CSV is a very loose "standard", there are many extensions to basic functionality. Jackson supports the following extension or variations:
- Customizable delimiters (through
CsvSchema
)- Default separator is comma (
,
), but any other character can be specified as well - Default text quoting is done using double-quote (
"
), and may be changed - It is possible to enable use of an "escape character" (by default, not enabled): some variations use
\
for escaping. If enabled, character immediately followed will be used as-is, except for a small set of "well-known" escapes (\n
,\r
,\t
,\0
) - Linefeed character: when generating content, the default linefeed String used is "
\n
" but this may be changed
- Default separator is comma (
- Null value: by default, null values are serialized as empty Strings (""), but any other String value be configured to be used instead (for example, "null", "N/A" etc)
- Use of first row as a set of column names: as explained earlier, it is possible to configure
CsvSchema
to indicate that the contents of the first (non-comment) document row is taken to mean the set of column names to use - Comments: when enabled (via
CsvSchema
, or enablingCsvParser.Feature.ALLOW_COMMENTS
), if a row starts with a#
character, it will be considered a comment and skipped - Blank lines: when enabled (using
CsvParser.Feature.SKIP_BLANK_LINES
) rows that are empty or composed only of whitespaces are skipped
- Due to the tabular nature of
CSV
format, deeply nested data structures are not well supported.- You can use
@JsonUnwrapped
to get around this
- You can use
- Use of Tree Model (
JsonNode
) is supported, but only within limitations ofCSV
format.
Areas that are planned to be improved include things like:
- Optimizations to make number handling as efficient as from JSON (but note: even with existing code, performance is typically limited by I/O and NOT parsing or generation)
- Although, as per Java CSV parser comparison, this module is actually performing quite well already
- Mapping of nested POJOs using dotted notation (similar to
@JsonUnwrapped
, but without requiring its use -- note that@JsonUnwrapped
is already supported)