You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@sambenfredj 's pull requests introduces streaming at several places of the workflow but those intermediary file formats are not specified and documented yet. In addition, switching to a binary format such as partitioned pyarrow datasets would speed up IO.
Schemas will be defined here after @wfondrie 's switch to polars.
Tasks
document where we need intermediary files
document how the files relate to input files, to each other, and to output files (e.g. how should they be joined?)
specify columns and their datatype and potential indeces on columns
The text was updated successfully, but these errors were encountered:
@sambenfredj 's pull requests introduces streaming at several places of the workflow but those intermediary file formats are not specified and documented yet. In addition, switching to a binary format such as partitioned pyarrow datasets would speed up IO.
Schemas will be defined here after @wfondrie 's switch to polars.
Tasks
The text was updated successfully, but these errors were encountered: