-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to specify catalog as a SQLALchemy table argument #186
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you submit CLA if you haven't yet sent it?
@@ -44,3 +44,16 @@ def test_offset(dialect): | |||
statement = select(table).offset(0) | |||
query = statement.compile(dialect=dialect) | |||
assert str(query) == 'SELECT "table".id, "table".name \nFROM "table"\nOFFSET :param_1' | |||
|
|||
|
|||
def test_multiple_catalogs(dialect): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "multiple catalogs" mean. Could you rename the test method or leave a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the test name to test_catalog_argument
.
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
I sent my signed CLA form yesterday. Please let me know if it hasn't been received! |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
I have added an additional bit to this PR since it hadn't been approved/merged yet. The visitors worked when compiling SQL statements, but another method was needed to do the same functionality for DDL statements (like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase on upstream.
'\n'\ | ||
'CREATE TABLE "system".information_schema."table" (\n'\ | ||
'\tid INTEGER NOT NULL, \n'\ | ||
'\tPRIMARY KEY (id)\n'\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This DDL looks invalid as Trino query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the table to use a fake other
catalog so that this test doesn't potentially interfere with the system
catalog.
trino/sqlalchemy/compiler.py
Outdated
column, add_to_result_map, include_table, **kwargs | ||
) | ||
table = column.table | ||
return self.add_catalog(sql, table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the benefit for table, but not sure about column. Could you add a test case to show the usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that it's not needed. Removed the column visitor.
Hi all, is there anything else blocking this PR from completion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @mdesmet Can you please take a look as well?
@laserkaplan Please squash the commits now (since it's one logical change).
table_with_catalog = Table( | ||
'table', | ||
metadata, | ||
Column('id', Integer, primary_key=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the primary_key=True
required for the test? I don't think so. If so please remove it since it distracts attention from what the test and this change is actually about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the DDL which is used below - the PRIMARY KEY
seems unrelated to the feature being added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the primary_key=True
just above this one is pre-existing but that should also be removed. Unfortunate that it didn't get caught when it was introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and removed this from both temp tables, as well as in this test.
CTE = type(None) | ||
Subquery = type(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a comment here why this is ok to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is because the CTE
and Subquery
classes weren't introduced until SQLAlchemy 1.4. Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments
Is "squash and merge" enabled on the repo for when this is ready to merge? |
AFAIK we don't do squash and merge, only rebase and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % commit squash
7cb46a4
to
f8286b4
Compare
def test_catalogs_argument(dialect): | ||
statement = select(table_with_catalog) | ||
query = statement.compile(dialect=dialect) | ||
assert str(query) == 'SELECT default."table".id \nFROM "other".default."table"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pre-existing but any idea why the schema name doesn't get quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into this a little. visit_table
quotes table names by default, but it checks whether schema names "need" to be quoted (e.g. if they are a reserved word). The PyHive package got around this by considering everything to be a reserved word, which would trigger this quoting. I don't think it's necessary here, but it definitely is a little weird to see schema names not quoted while table names are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this and explaining. I think we can leave as is for now since from the few queries I tested with special schema names it was able to handle them correctly. I didn't try too hard to break it though to be honest. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reworded the commit message. LGTM.
This change allows for specifying a
trino_catalog
table argument to SQLALchemy Table objects, which is then checked when compiling statements and prepended to the proper objects. This allows for writing queries that talk to multiple catalogs at the same time.This change was largely implemented by @AlexandreOuellet in the PyHive repository, with some minor edits by @VinceDPM and myself. However, since that repository is no longer well-maintained, and since Trino seems to have better support currently than Presto on the whole, I have switched my own focus to using Trino, and thus wanted to get this functionality working here as well.
As this is the first time I've attempted to contribute here, please let me know if I am missing anything in this PR! It would be great if this could be merged in an efficient manner.