You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, WHERE x IN (1,2,3) AND x IN (4,5) can never be true because the IN lists are disjoint. However, DataFusion can't prove this today and will evaluate that expression.
❯ explain select x from t where x IN (1,2,3) AND x IN (4,5);
+---------------+-----------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------------------------------------+
| logical_plan | Filter: (t.x = Int32(1) OR t.x = Int32(2) OR t.x = Int32(3)) AND (t.x = Int32(4) OR t.x = Int32(5)) |
| | TableScan: t projection=[x] |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: (x@0 = 1 OR x@0 = 2 OR x@0 = 3) AND (x@0 = 4 OR x@0 = 5) | < ---- this filter still runs even though it will never be true
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+-----------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.003 seconds.
For other predicates that datafusion knows can not be true, the query is greatly simplified:
❯ explain select x from t where false;
+---------------+---------------+
| plan_type | plan |
+---------------+---------------+
| logical_plan | EmptyRelation |
| physical_plan | EmptyExec |
| | |
+---------------+---------------+
Describe the solution you'd like
It might be nice to extend the ExprSimplifier for logical expr simplification to handle these cases
Impact: I don't know how prevalent such "obviously false" expressions are in actual work loads, but my sense is if it is straightforward to prove the expression can never be true we should use that information while executing queries.
Describe alternatives you've considered
No response
Additional context
The code for LiteralGuarantees is related but works on PhysicalExpressions
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
@my-vegetable-has-exploded and I had an observation on #8654 (comment) where there are certain patterns that can never be true.
For example,
WHERE x IN (1,2,3) AND x IN (4,5)
can never be true because theIN
lists are disjoint. However, DataFusion can't prove this today and will evaluate that expression.So for example, given
This query:
For other predicates that datafusion knows can not be true, the query is greatly simplified:
Describe the solution you'd like
It might be nice to extend the ExprSimplifier for logical expr simplification to handle these cases
Impact: I don't know how prevalent such "obviously false" expressions are in actual work loads, but my sense is if it is straightforward to prove the expression can never be true we should use that information while executing queries.
Describe alternatives you've considered
No response
Additional context
The code for LiteralGuarantees is related but works on PhysicalExpressions
The text was updated successfully, but these errors were encountered: