Skip to content

Commit

Permalink
Add explanation for parquet compressor to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
okkez committed Apr 7, 2021
1 parent e6cdf15 commit 7920ef4
Showing 1 changed file with 46 additions and 0 deletions.
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -411,9 +411,55 @@ archive format on S3. You can use several format:
utilizing CPU cores well compared with `gzip`
* parquet (Need columnify command)
* This compressor uses an external [columnify](https://github.com/reproio/columnify) command.
* Use `<compress>` section to configure columnify command behavior.

See `Use your compression algorithm` section for adding another format.

**`<compress>`** (for parquet compressor only)

**parquet_compression_codec**

parquet compression codec.

* uncompressed
* snappy (default)
* gzip
* lzo
* brotli
* lz4
* zstd

**parquet_page_size**

parquet file page size. default: 8192 bytes

**parquet_row_group_size**

parquet file row group size. default: 128 MB

**record_type**

record data format type.

* avro
* csv
* jsonl
* msgpack
* tsv
* msgpack (default)
* json

**schema_type**

schema type.

* avro (default)
* bigquery

**schema_file (required)**

path to schema file.

**`<format>` or format**

Change one line format in the S3 object. Supported formats are "out_file",
Expand Down

0 comments on commit 7920ef4

Please sign in to comment.