Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Avro Streaming with zstd Compression for Snowflake #527

Merged
merged 3 commits into from
Oct 18, 2023

Conversation

iskakaushik
Copy link
Contributor

  • Introduced Zstandard (zstd) compression while streaming source rows, reducing post-processing time.
  • Generated unique partitionID with a random 16-character string.

This enhancement not only bypasses Snowflake's automatic gzip compression but also ensures a more efficient data transfer by compressing data on-the-fly. zstd is also faster that Snowflake's default gzip.

See AUTO_COMPRESS and SOURCE_COMPRESSION here https://docs.snowflake.com/en/sql-reference/sql/put for more details

iskakaushik and others added 2 commits October 18, 2023 07:50
- Introduced Zstandard (zstd) compression while streaming source rows, reducing post-processing time.
- Generated unique `partitionID` with a random 16-character string.

This enhancement not only bypasses Snowflake's automatic gzip compression but also ensures a more efficient data transfer by compressing data on-the-fly.

See `AUTO_COMPRESS` and `SOURCE_COMPRESSION` here
https://docs.snowflake.com/en/sql-reference/sql/put for more details
@iskakaushik iskakaushik force-pushed the write-zst-compressed-avro branch from 1470913 to 8bd54ff Compare October 18, 2023 11:50
@iskakaushik iskakaushik merged commit e6003de into main Oct 18, 2023
12 checks passed
@serprex serprex deleted the write-zst-compressed-avro branch July 19, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants