Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve delta table memory footprint #2030

Open
sh-rp opened this issue Nov 6, 2024 · 0 comments
Open

Improve delta table memory footprint #2030

sh-rp opened this issue Nov 6, 2024 · 0 comments
Assignees

Comments

@sh-rp
Copy link
Collaborator

sh-rp commented Nov 6, 2024

Currently we write all jobs for one delta table in one write in a referencejob that references all jobs.
There seems to be a problem in the delta rust implementation that materializes all tables in memory before writing them to the destination:

delta-io/delta-rs#2968 (comment)

Possible ways to fix this:

  • Create multiple followup jobs per table and control the amount of jobs assigned in each via a setting. Ensure we only process one job per table in parallel if this is set. (loader_parallelism_strategy=table-sequential)
  • User other engine than rust, because rust seems to have this problem. Merge does not work there though atm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Planned
Development

Successfully merging a pull request may close this issue.

1 participant