Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UNION DISTINCT BY NAME syntax #997

Merged
merged 4 commits into from
Oct 23, 2023

Conversation

alexander-beedie
Copy link
Contributor

@alexander-beedie alexander-beedie commented Oct 8, 2023

I was just enhancing Polars' SQL support for UNION ops and noticed the parser doesn't recognise UNION DISTINCT BY NAME; the presence of this (optional) additional keyword is supported by DuckDB (and some other dialects, such as U-SQL1).

This PR adds a new DistinctByName entry to SetQuantifier to account for this, and streamlines the related test to avoid some copy/paste (while adding suitable coverage for the new quantifier) 👍

Example

  • Setup some test data tables/frames:

    import polars as pl
    
    df1 = pl.DataFrame({
        "A": [1, 2, 3, 4, 5],
        "B": [5, 4, 3, 2, 1],
    })
    df2 = df1.select("B","A")
  • Trying to execute this syntax using Polars (before this PR), which uses sqlparser-rs:

    pl.SQLContext( register_globals=True ).execute(
        "SELECT * FROM df1 UNION DISTINCT BY NAME SELECT * FROM df2"
    )
    # sql parser error: 
    #  Expected SELECT, VALUES, or a subquery in the query body, found: BY
  • Execute same statement using DuckDB (to demonstrate that the syntax is valid):

    import duckdb
    duckdb.sql(
        "SELECT * FROM df1 UNION DISTINCT BY NAME SELECT * FROM df2"
    ).show()
    # ┌───────┬───────┐
    # │   A   │   B   │
    # │ int64 │ int64 │
    # ├───────┼───────┤
    # │     1 │     5 │
    # │     2 │     4 │
    # │     3 │     3 │
    # │     4 │     2 │
    # │     5 │     1 │
    # └───────┴───────┘

Thanks for all the fantastic work on this library :)

Footnotes

  1. U-SQL "unions" reference: https://learn.microsoft.com/en-us/u-sql/statements-and-expressions/set-rowset/union-and-outer-union-expression

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the contribution @alexander-beedie . The test cleanup was also very nice 👌

@alamb
Copy link
Contributor

alamb commented Oct 23, 2023

I took the liberty of merging up from main and fixing a clippy lint.

@alamb alamb merged commit 5c10668 into apache:main Oct 23, 2023
10 checks passed
@alexander-beedie alexander-beedie deleted the union-distinct-by-name branch October 24, 2023 03:11
@alexander-beedie alexander-beedie changed the title Add support for UNION DISTINCT BY NAME syntax Add support for UNION DISTINCT BY NAME syntax Oct 24, 2023
serprex pushed a commit to serprex/sqlparser-rs that referenced this pull request Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants