-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add a spark.comet.exec.memoryPool
configuration for experimenting with various datafusion memory pool setups.
#1021
base: main
Are you sure you want to change the base?
feat: Add a spark.comet.exec.memoryPool
configuration for experimenting with various datafusion memory pool setups.
#1021
Conversation
…plemented skeleton in JVM to support task-level memory pool
…ult, add an environment variable for enabling offheap memory when running tests.
…oes not matter (it is managed by Arc)
…memory pool, which does not work well.
… memory_limit applies to the entire instance.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1021 +/- ##
============================================
- Coverage 34.30% 34.27% -0.03%
- Complexity 887 889 +2
============================================
Files 112 112
Lines 43429 43502 +73
Branches 9623 9615 -8
============================================
+ Hits 14897 14912 +15
- Misses 25473 25542 +69
+ Partials 3059 3048 -11 ☔ View full report in Codecov by Sentry. |
2db1e2a
to
33f0424
Compare
Thanks for filing this @Kontinuation. I will close my PRs. Also, there is a suggestion in #1017 for always using unified memory. |
I agree that using the unified memory manager is a better approach. Vanilla Spark operators and comet operators are governed by the same memory manager and they are all using offheap memory. Vanilla Spark operators can free some memory when comet operators are under pressure. I'll also put more work into improving the unified memory management. I think the native memory management approach may still be relevant when users don't want vanilla Spark to use off-heap memory. We can set the default value of |
Which issue does this PR close?
This issue relates to #996 and #1004
Rationale for this change
This is for investigating various approaches to simplify memory-related configuration and reduce the memory required to run large queries. @andygrove
What changes are included in this PR?
This PR adds a
spark.comet.exec.memoryPool
configuration for easily running queries using various memory pool setups.greedy
: Each operator has its own GreedyMemoryPool, which is the same as the current situation.fair_spill
: Each operator has its own FairSpillPoolgreedy_task_shared
(default): All operators for the same task attempt share the same GreedyMemoryPool.fair_spill_task_shared
: All operators for the same task attempt share the same FairSpillPoolgreedy_global
: All operators in the same executor instance share the same GreedyMemoryPoolfair_spill_global
: All operators in the same executor instance share the same FairSpillPoolHow are these changes tested?
TODO: add tests running in native memory management mode.