Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for different compression levels for parquet files #153

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tomjholland
Copy link
Collaborator

This pull request introduces new functionality for handling compression priorities in the process_cycler_file and process_generic_file methods, along with updates to the corresponding documentation.

New Functionality:

  • Compression Priority in process_cycler_file and process_generic_file:
    • Added a compression_priority parameter to allow users to specify the compression algorithm for parquet files. Options include "performance" (default), "file size", and "uncompressed". (pyprobe/cell.py) [1] [2] [3] [4] [5] [6]
    • Updated the _write_parquet method to accept a compression argument and use it when writing parquet files. (pyprobe/cell.py)

Documentation Updates:

  • Example Notebook:
    • Added code cells to demonstrate the new compression_priority functionality, including benchmarks for different compression options. (docs/source/examples/comparing-pyprobe-performance.ipynb) [1] [2]

@tomjholland tomjholland added feature Adding a new functionality, small or large optimisation Improvements in the performance of the code labels Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adding a new functionality, small or large optimisation Improvements in the performance of the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant