Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ibis): use the default input dialect #962

Merged
merged 4 commits into from
Dec 3, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Dec 2, 2024

Description

Close #930
The root cause of #930 is the trino dialect has a specific limit for the date_add function. We can't use it as a normal scalar function. We should follow the spec of trino doc.

import sqlglot
sql = "SELECT date_add('second', 1, CAST('2024-01-01' AS DATE))"
result = sqlglot.transpile(sql, read="trino", write="bigquery")[0]
print(result)
# SELECT DATE_ADD(CAST('2024-01-01' AS DATE), INTERVAL 1 SECOND)

However, we want a common SQL base for some general purpose. I think we can change the input dialect to the default dialect which has fewer limitations for the function usage.

import sqlglot
sql = "SELECT date_add(CAST('2024-01-01' AS DATE), INTERVAL 1 DAY)"
result = sqlglot.transpile(sql, write="bigquery")[0]
print(result)
# SELECT DATE_ADD(CAST('2024-01-01' AS DATE), INTERVAL (INTERVAL '1' DAY) DAY)

However, the output for the BigQuery is still wrong. INTERVAL (INTERVAL '1' DAY) DAY) isn't a valid SQL for BigQuery. It seems SQLGlot has some specific definition for the date_add function. The SQL should be

import sqlglot
sql = "SELECT date_add(CAST('2024-01-01' AS DATE), 1)"
result = sqlglot.transpile(sql, write="bigquery")[0]
print(result)
# SELECT DATE_ADD(CAST('2024-01-01' AS DATE), INTERVAL 1 DAY)

It's different from the original definition of BigQuery

Additional Changed

  • Modified the date_add definition for bigquery
  • Disable instr for MySQL
    • Because instr is an alias of strpos in DataFusion, the input instr function will be transformed to strpos, then the sqlglot can't transpile strposto instr when using the default dialect to mysql dialect`. We need to file another issue for it.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Dec 2, 2024
@goldmedal goldmedal marked this pull request as ready for review December 2, 2024 08:48
Comment on lines 45 to 47
def _transpile(self, planned_sql: str) -> str:
write = self._get_write_dialect(self.data_source)
return sqlglot.transpile(planned_sql, read="trino", write=write)[0]
return sqlglot.transpile(planned_sql, write=write)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used for v2 and v3. For v2, the read dialect is trino. How about splitting this method for each class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, do we not receive the statements with the trino dialect? I'm not sure. If the input statement of v2 is already not trino dialect, your way is right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. No, maybe you're right. We're better to split v2 and v3 because the trino dialect also handles the functions converting between trino and others data source ( Currently, the v2 function white list used by AI service is based on the trion function) 🤔

@goldmedal goldmedal marked this pull request as draft December 3, 2024 02:17
@goldmedal goldmedal marked this pull request as ready for review December 3, 2024 03:10
@goldmedal
Copy link
Contributor Author

The BigQuery CI has issues. I tested BigQuery in my local environment and all pass.

$ just test bigquery
poetry run pytest -m 'bigquery'
============================================================================================================================== test session starts ===============================================================================================================================
platform darwin -- Python 3.11.6, pytest-8.3.3, pluggy-1.5.0
rootdir: /ibis-server
configfile: pyproject.toml
plugins: anyio-4.6.2.post1
collected 205 items / 175 deselected / 30 selected                                                                                                                                                                                                                               

tests/routers/v2/connector/test_bigquery.py .................                                                                                                                                                                                                              [ 56%]
tests/routers/v3/connector/bigquery/test_functions.py ....                                                                                                                                                                                                                 [ 70%]
tests/routers/v3/connector/bigquery/test_query.py .........                                                                                                                                                                                                                [100%]

================================================================================================================= 30 passed, 175 deselected in 283.35s (0:04:43) =================================================================================================================

Copy link
Contributor

@grieve54706 grieve54706 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grieve54706 grieve54706 merged commit c989b8c into Canner:main Dec 3, 2024
3 of 4 checks passed
@goldmedal goldmedal deleted the feature/default-dialect-sqlglot branch December 3, 2024 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery ibis python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wren core over rewrite INTERVAL
2 participants