Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: Dumb Read Parquet Implementation #373

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Commits on Oct 30, 2023

  1. DNM: Dumb Read Parquet Implementation

    This is a dumb, mostly-from-scratch implementation of read_parquet.
    
    It only supports
    -  local and s3
    -  column selection
    -  grouping partitions when we have fewer columns (+ threads!)
    -  arrow engine/filesystem
    
    It is very broken in many ways, but ...
    
    -  It's only around 100 lines of code
    -  I get 250 MB/s bandwidth on full column reads on an m6i.xlarge
       (only 50 MB/s when reading columns though)
    
    See dask/dask#10602
    mrocklin committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    c110f6e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5cc0982 View commit details
    Browse the repository at this point in the history