forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove default from some executors #6
Closed
mustafasrepo
wants to merge
54
commits into
feature/properties_caching
from
feature/remove_default_cache
Closed
Remove default from some executors #6
mustafasrepo
wants to merge
54
commits into
feature/properties_caching
from
feature/remove_default_cache
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Initial commit * Minor changes
* WIP lag/lead ignore nulls * Support IGNORE NULLS for LAG function * fmt * comments * remove comments * Add new tests, minor changes, trigger evalaute_all * Make algorithm pruning friendly --------- Co-authored-by: Mustafa Akur <[email protected]>
…ery plan (apache#9295) * fix: issue apache#9213 substitute ArrayAgg to NthValue to optimize query plan * fix format * adding type check * adding test
* chore: statically link xz2 * toml fmt
* fix: issue apache#9327 throw error when incursion happen in dataframe api * fix
* add support for CopyTo::partition_by in proto Signed-off-by: Hoang Pham <[email protected]> * simplify partition_by logic Signed-off-by: Hoang Pham <[email protected]> --------- Signed-off-by: Hoang Pham <[email protected]>
* support udf in substrait Signed-off-by: jayzhan211 <[email protected]> * clippy Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
mustafasrepo
force-pushed
the
feature/remove_default_cache
branch
from
February 26, 2024 12:11
4707939
to
ace9815
Compare
* support FixedSizeList Type Coercion * add allow null type coercion parameter * support null column in FixedSizeList * Add test * Add tests for cardinality with fixed size lists * chore * fix ci * add comment * Fix array_element function signature * Remove unused imports and simplify code * Fix array function signatures and behavior * fix conflict * fix conflict * add tests for FixedSizeList * remove unreacheable null check * simplify the code * remove null checking * reformat output * simplify code * add tests for array_dims * Refactor type coercion functions in datafusion/expr module
…ache#9342) * feat: expand `unnest` to accept any single array expression * unnest null * review feedback
* fix: downgrade tonic for arrow compatibility Tonic 0.10 and 0.11 are not API compatible. Arrow 50 depends on tonic 0.10, and datafusion must match that dependency for compatibility reasons. * feat: make nested examples runnable cargo run --example doesn't support nested examples. Nested examples need an explicit block to be runnable. * fix: fix custom catalog typo and formatting * docs: add note about upgrading tonic with arrow * ci: add cargo check for all examples
…pache#9310) * docs: update parquet_sql_multiple_files.rs with a relative path ex * style: run cargo fmt * docs: update comment * docs: better
* tests: adds tests associated with apache#9237 * style: clippy
* feature: support nvl(ifnull) function * add sqllogictest * add docs entry * Update docs/source/user-guide/sql/scalar_functions.md Co-authored-by: Jonah Gao <[email protected]> * fix some code * fix docs --------- Co-authored-by: Jonah Gao <[email protected]>
* feat: move abs to datafusion_functions * fix proto * fix proto * fix CI vendored code * Fix proto * add support type * fix signature * fix typo * fix test cases * disable a test case * remove old code from math_expressions * feat: add test * fix clippy * use unknown for proto * fix unknown proto
# Conflicts: # datafusion/physical-plan/src/repartition/mod.rs
github-actions
bot
added
physical-expr
sqllogictest
sql
logical-expr
optimizer
substrait
labels
Feb 27, 2024
Co-authored-by: Andrew Lamb <[email protected]>
* Initial commit * Updated mod.rs - Docstrings, Initial test * Updated mod.rs - Fixed udf test * Added udaf test, Updated udf test * Added test for udwf * Linting with rustfmt * Update datafusion/core/src/execution/context/mod.rs Co-authored-by: Andrew Lamb <[email protected]> * Moved tests to core/tests/user_defined * fix fmt --------- Co-authored-by: Andrew Lamb <[email protected]>
…nada-ai/datafusion-upstream into feature/remove_default_cache
…* (1/3 regexpmatch) (apache#9329) * feat: issue_9285: port builtin reg function into datafusion-function-* crate (1/3: RegexpMatch part) * fix fmt * refact * modify test * fix msrv verify problem * port test and delete useless lines
* Add test to verify issue apache#9161 * fmt
# Conflicts: # datafusion/core/src/physical_optimizer/join_selection.rs # datafusion/physical-plan/src/aggregates/mod.rs # datafusion/physical-plan/src/coalesce_partitions.rs # datafusion/physical-plan/src/joins/cross_join.rs # datafusion/physical-plan/src/lib.rs # datafusion/physical-plan/src/sorts/sort_preserving_merge.rs # datafusion/physical-plan/src/unnest.rs # datafusion/physical-plan/src/windows/window_agg_exec.rs # datafusion/physical-plan/src/work_table.rs
Merged upstream. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
core
final_review
Ozan can review this pr
logical-expr
optimizer
physical-expr
sql
sqllogictest
substrait
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
In great analysis by [MENTION NAME] at the issue 9084. [MENTION NAME] recognized that stack usage (depth) increases a lot during logical and physical planning. The root cause of aggressive stack usage in the logical planning is excessive use of
.clone
ofLogicalPlan
enum.In physical planning the problem stems from recursive function calls in the getter
API
s of theArc<dyn ExecutionPlan>
, such asEquivalenceProperties
,output_partitioning
,output_ordering
, etc.In the PR9084, [MENTION NAME] could reduce physical plan stack usage by caching
equivalence_properties
forProjectionExec
.This PR introduces a new struct to cache PlanProperties (
PlanPropertiesCache
). With this struct,schema
,output_partitioning
,equivalence_properties
,output_ordering
is cached. This caching mechanism removes recursive calls during getter methods. Also, given.cache
method is implemented, default implementations of the.output_partitioning
,.equivalence_properties
,output_ordering
works out of the box.With these changes flame graph for the query 54 is converted from following graph
to following graph
where stack usage decreases.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
api change