-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet compressor using columnify #338
Conversation
32b4c97
to
07542d8
Compare
Signed-off-by: Kenji Okimoto <[email protected]>
07542d8
to
16d2ba1
Compare
Looks good to me. |
I'd actually really love to test it. But. We use bundler in our build process for installing all plugins and fluentd unfortunately stopped importing bundler installed git sources, so I end up with "Unknown output plugin 's3'" if I add this to our Gemfile
Can you help with getting this to work without completely workarounding the bundle install. |
thank you for this, I really appreciate that |
This parquet compressor has worked fine for the recent 9 months on our system. Can I merge this PR? |
Signed-off-by: Kenji Okimoto <[email protected]>
3f8aefe
to
e6cdf15
Compare
@repeatedly @ganmacs @kenhys @ashie Can you merge this PR and release the new version? |
I'll merge & release it after waiting for comments from other maintainers for a few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample configuration for parquet compressor.
<match> @id s3-parquet @type s3 s3_region ap-northeast-1 s3_bucket xxx <compress> parquet_compression_codec snappy record_type msgpack schema_type avro schema_file /path/to/log.avsc </compress> <format> @type msgpack </format> </match>
@okkez store_as parquet
is also required, isnt it?
Yes. It is required. |
Signed-off-by: Kenji Okimoto <[email protected]>
7920ef4
to
527cb7a
Compare
Signed-off-by: Kenji Okimoto <[email protected]>
Signed-off-by: Kenji Okimoto <[email protected]>
a3ba040
to
7ae7cf4
Compare
We'll refine the document at #373 |
Thank you for your work! |
the compressor dosnt support list type in schema file. please add support for input type list |
How to use:
Install columnify.
The sample configuration for parquet compressor.
log.avsc is like following:
See https://avro.apache.org/docs/current/spec.html for more details about avro schema.
Notice:
columnify's memory usage is proportional to file size. For example, columnify consumes about 750MB memory (RSS) while processing 128MB msgpack.
See also #221