Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: update execution modes and add bitflags dependency - Introduced `Incremental` execution mode alongside existing modes in the DataFusion execution plan. - Updated various execution plans to utilize the new `Incremental` mode where applicable, enhancing streaming capabilities. - Added `bitflags` dependency to `Cargo.toml` for better management of execution modes. - Adjusted execution mode handling in multiple files to ensure compatibility with the new structure. * add exec API Signed-off-by: Jay Zhan <[email protected]> * replace done but has stackoverflow Signed-off-by: Jay Zhan <[email protected]> * exec API done Signed-off-by: Jay Zhan <[email protected]> * Refactor execution plan properties to remove execution mode - Removed the `ExecutionMode` parameter from `PlanProperties` across multiple physical plan implementations. - Updated related functions to utilize the new structure, ensuring compatibility with the changes. - Adjusted comments and cleaned up imports to reflect the removal of execution mode handling. This refactor simplifies the execution plan properties and enhances maintainability. * Refactor execution plan to remove `ExecutionMode` and introduce `EmissionType` - Removed the `ExecutionMode` parameter from `PlanProperties` and related implementations across multiple files. - Introduced `EmissionType` to better represent the output characteristics of execution plans. - Updated functions and tests to reflect the new structure, ensuring compatibility and enhancing maintainability. - Cleaned up imports and adjusted comments accordingly. This refactor simplifies the execution plan properties and improves the clarity of memory handling in execution plans. * fix test Signed-off-by: Jay Zhan <[email protected]> * Refactor join handling and emission type logic - Updated test cases in `sanity_checker.rs` to reflect changes in expected outcomes for bounded and unbounded joins, ensuring accurate test coverage. - Simplified the `is_pipeline_breaking` method in `execution_plan.rs` to clarify the conditions under which a plan is considered pipeline-breaking. - Enhanced the emission type determination logic in `execution_plan.rs` to prioritize `Final` over `Both` and `Incremental`, improving clarity in execution plan behavior. - Adjusted join type handling in `hash_join.rs` to classify `Right` joins as `Incremental`, allowing for immediate row emission. These changes improve the accuracy of tests and the clarity of execution plan properties. * Implement emission type for execution plans - Updated multiple execution plan implementations to replace `unimplemented!()` with `EmissionType::Incremental`, ensuring that the emission type is correctly defined for various plans. - This change enhances the clarity and functionality of the execution plans by explicitly specifying their emission behavior. These updates contribute to a more robust execution plan framework within the DataFusion project. * Enhance join type documentation and refine emission type logic - Updated the `JoinType` enum in `join_type.rs` to include detailed descriptions for each join type, improving clarity on their behavior and expected results. - Modified the emission type logic in `hash_join.rs` to ensure that `Right` and `RightAnti` joins are classified as `Incremental`, allowing for immediate row emission when applicable. These changes improve the documentation and functionality of join operations within the DataFusion project. * Refactor emission type logic in join and sort execution plans - Updated the emission type determination in `SortMergeJoinExec` and `SymmetricHashJoinExec` to utilize the `emission_type_from_children` function, enhancing the accuracy of emission behavior based on input characteristics. - Clarified comments in `sort.rs` regarding the conditions under which results are emitted, emphasizing the relationship between input sorting and emission type. - These changes improve the clarity and functionality of the execution plans within the DataFusion project, ensuring more robust handling of emission types. * Refactor emission type handling in execution plans - Updated the `emission_type_from_children` function to accept an iterator instead of a slice, enhancing flexibility in how child execution plans are passed. - Modified the `SymmetricHashJoinExec` implementation to utilize the new function signature, improving code clarity and maintainability. These changes streamline the emission type determination process within the DataFusion project, contributing to a more robust execution plan framework. * Enhance execution plan properties with boundedness and emission type - Introduced `boundedness` and `pipeline_behavior` methods to the `ExecutionPlanProperties` trait, improving the handling of execution plan characteristics. - Updated the `CsvExec`, `SortExec`, and related implementations to utilize the new methods for determining boundedness and emission behavior. - Refactored the `ensure_distribution` function to use the new boundedness logic, enhancing clarity in distribution decisions. - These changes contribute to a more robust and maintainable execution plan framework within the DataFusion project. * Refactor execution plans to enhance boundedness and emission type handling - Updated multiple execution plan implementations to incorporate `Boundedness` and `EmissionType`, improving the clarity and functionality of execution plans. - Replaced instances of `unimplemented!()` with appropriate emission types, ensuring that plans correctly define their output behavior. - Refactored the `PlanProperties` structure to utilize the new boundedness logic, enhancing decision-making in execution plans. - These changes contribute to a more robust and maintainable execution plan framework within the DataFusion project. * Refactor memory handling in execution plans - Updated the condition for checking memory requirements in execution plans from `has_finite_memory()` to `boundedness().requires_finite_memory()`, improving clarity in memory management. - This change enhances the robustness of execution plans within the DataFusion project by ensuring more accurate assessments of memory constraints. * Refactor boundedness checks in execution plans - Updated conditions for checking boundedness in various execution plans to use `is_unbounded()` instead of `requires_finite_memory()`, enhancing clarity in memory management. - Adjusted the `PlanProperties` structure to reflect these changes, ensuring more accurate assessments of memory constraints across the DataFusion project. - These modifications contribute to a more robust and maintainable execution plan framework, improving the handling of boundedness in execution strategies. * Remove TODO comment regarding unbounded execution plans in `UnboundedExec` implementation - Eliminated the outdated comment suggesting a switch to unbounded execution with finite memory, streamlining the code and improving clarity. - This change contributes to a cleaner and more maintainable codebase within the DataFusion project. * Refactor execution plan boundedness and emission type handling - Updated the `is_pipeline_breaking` method to use `requires_finite_memory()` for improved clarity in determining pipeline behavior. - Enhanced the `Boundedness` enum to include detailed documentation on memory requirements for unbounded streams. - Refactored `compute_properties` methods in `GlobalLimitExec` and `LocalLimitExec` to directly use the input's boundedness, simplifying the logic. - Adjusted emission type determination in `NestedLoopJoinExec` to utilize the `emission_type_from_children` function, ensuring accurate output behavior based on input characteristics. These changes contribute to a more robust and maintainable execution plan framework within the DataFusion project, improving clarity and functionality in handling boundedness and emission types. * Refactor emission type and boundedness handling in execution plans - Removed the `OptionalEmissionType` struct from `plan_properties.rs`, simplifying the codebase. - Updated the `is_pipeline_breaking` function in `execution_plan.rs` for improved readability by formatting the condition across multiple lines. - Adjusted the `GlobalLimitExec` implementation in `limit.rs` to directly use the input's boundedness, enhancing clarity in memory management. These changes contribute to a more streamlined and maintainable execution plan framework within the DataFusion project, improving the handling of emission types and boundedness. * Refactor GlobalLimitExec and LocalLimitExec to enhance boundedness handling - Updated the `compute_properties` methods in both `GlobalLimitExec` and `LocalLimitExec` to replace `EmissionType::Final` with `Boundedness::Bounded`, reflecting that limit operations always produce a finite number of rows. - Changed the input's boundedness reference to `pipeline_behavior()` for improved clarity in execution plan properties. These changes contribute to a more streamlined and maintainable execution plan framework within the DataFusion project, enhancing the handling of boundedness in limit operations. * Review Part1 * Update sanity_checker.rs * addressing reviews * Review Part 1 * Update datafusion/physical-plan/src/execution_plan.rs * Update datafusion/physical-plan/src/execution_plan.rs * Shorten imports * Enhance documentation for JoinType and Boundedness enums - Improved descriptions for the Inner and Full join types in join_type.rs to clarify their behavior and examples. - Added explanations regarding the boundedness of output streams and memory requirements in execution_plan.rs, including specific examples for operators like Median and Min/Max. --------- Signed-off-by: Jay Zhan <[email protected]> Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
- Loading branch information