Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate well-indented SQL from LogicalPlan #11308

Open
edmondop opened this issue Jul 6, 2024 · 1 comment
Open

Generate well-indented SQL from LogicalPlan #11308

edmondop opened this issue Jul 6, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@edmondop
Copy link
Contributor

edmondop commented Jul 6, 2024

Is your feature request related to a problem or challenge?

DataFusion provides the capability of "unparsing" a logical plan into SQL via the unparser module in the sql crate (see https://github.com/apache/datafusion/blob/main/datafusion/sql/src/unparser/plan.rs). Examples also showcase the Dialect features to provide customizable escaping.

As a part of the work on SpiceAI and datafusion-federation, we have some rewrites on the tpch_q13 and we expect the final rewritten SQL to be the following:

SELECT c_orders.c_count,
       Count(1) AS custdist
FROM   (SELECT c_custkey                       AS c_custkey,
               "count(tpch.orders.o_orderkey)" AS c_count
        FROM   (SELECT TPCH.customer.c_custkey,
                       Count(TPCH.orders.o_orderkey) AS
                       "COUNT(tpch.orders.o_orderkey)"
                FROM   TPCH.customer
                       LEFT JOIN TPCH.orders
                              ON ( ( TPCH.customer.c_custkey =
                                     TPCH.orders.o_custkey )
                                   AND TPCH.orders.o_comment NOT LIKE
                                       '%special%requests%' )
                GROUP  BY TPCH.customer.c_custkey)) AS c_orders
GROUP  BY c_orders.c_count
ORDER  BY custdist DESC NULLS FIRST,
          c_orders.c_count DESC NULLS FIRST 

however the plan_to_sql generates a one-line sql which is much harder to read

SELECT c_orders.c_count, COUNT(1) AS custdist FROM (SELECT c_custkey AS c_custkey, "COUNT(tpch.orders.o_orderkey)" AS c_count FROM (SELECT tpch.customer.c_custkey, COUNT(tpch.orders.o_orderkey) AS "COUNT(tpch.orders.o_orderkey)" FROM tpch.customer LEFT JOIN tpch.orders ON ((tpch.customer.c_custkey = tpch.orders.o_custkey) AND tpch.orders.o_comment NOT LIKE '%special%requests%') GROUP BY tpch.customer.c_custkey)) AS c_orders GROUP BY c_orders.c_count ORDER BY custdist DESC NULLS FIRST, c_orders.c_count DESC NULLS FIRST"

Describe the solution you'd like

I would like to be able to provide an extra parameter to the Unparser, such as pretty_print or indent, which needs to be respected in the plan_to_sql

Describe alternatives you've considered

Use https://github.com/dprint/dprint on the SQL, but unfortunately SQL is not supported

Additional context

I used this https://www.dpriver.com/pp/sqlformat.htm to generate the formatted SQL from the SQL generated from dataufusion

@edmondop edmondop added the enhancement New feature or request label Jul 6, 2024
@edmondop
Copy link
Contributor Author

Waiting apache/datafusion-sqlparser-rs#1294 to happen as it is likely that sql-parser will be migrated to datafusion and the change will be easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant