Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify NULL [NOT] IN (..) expressions #8691

Merged

Conversation

asimsedhain
Copy link
Contributor

@asimsedhain asimsedhain commented Dec 31, 2023

Which issue does this PR close?

Closes #8688

Rationale for this change

What changes are included in this PR?

Adds optimizer rule for

  • SELECT .. WHERE NULL IN (…) --> SELECT .. WHERE NULL
  • SELECT .. WHERE NULL NOT IN (…) --> SELECT .. WHERE NULL
DataFusion CLI v34.0.0
❯ create table t (x int) as values (1), (2);
0 rows in set. Query took 0.031 seconds.

❯ explain select x from t where null IN (x, 2, 3);
+---------------+---------------------------------------------------+
| plan_type     | plan                                              |
+---------------+---------------------------------------------------+
| logical_plan  | Filter: Boolean(NULL)                             |
|               |   TableScan: t projection=[x]                     |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192       |
|               |   FilterExec: NULL                                |
|               |     MemoryExec: partitions=1, partition_sizes=[1] |
|               |                                                   |
+---------------+---------------------------------------------------+
2 rows in set. Query took 0.008 seconds.

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Dec 31, 2023
@asimsedhain asimsedhain force-pushed the df-optimizer/8688/improve-list-simplification branch from 665ca39 to f36aa5f Compare December 31, 2023 20:46
@asimsedhain asimsedhain force-pushed the df-optimizer/8688/improve-list-simplification branch from f36aa5f to a894f70 Compare December 31, 2023 20:48
@asimsedhain asimsedhain marked this pull request as ready for review December 31, 2023 21:14
@alamb alamb changed the title Df optimizer/8688/improve list simplification Simplify NULL [NOT] IN (..) expressions Jan 1, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @asimsedhain 🙏

@@ -481,6 +481,14 @@ impl<'a, S: SimplifyInfo> TreeNodeRewriter for Simplifier<'a, S> {
lit(negated)
}

// null in (x, y, z) --> null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that this is consistent with posgres:

postgres=# select null IN (1,2);
 ?column?
----------

(1 row)

postgres=# select null IN (NULL);
 ?column?
----------

(1 row)

postgres=# select null NOT IN (1,2);
 ?column?
----------

(1 row)

postgres=# select null NOT IN (NULL,1,2);
 ?column?
----------

(1 row)

@alamb alamb merged commit e82707e into apache:main Jan 1, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add NULL in list simplifications
2 participants