Add ability to specify catalog as a SQLALchemy table argument #186

laserkaplan · 2022-06-23T19:44:44Z

This change allows for specifying a trino_catalog table argument to SQLALchemy Table objects, which is then checked when compiling statements and prepended to the proper objects. This allows for writing queries that talk to multiple catalogs at the same time.

This change was largely implemented by @AlexandreOuellet in the PyHive repository, with some minor edits by @VinceDPM and myself. However, since that repository is no longer well-maintained, and since Trino seems to have better support currently than Presto on the whole, I have switched my own focus to using Trino, and thus wanted to get this functionality working here as well.

As this is the first time I've attempted to contribute here, please let me know if I am missing anything in this PR! It would be great if this could be merged in an efficient manner.

cla-bot · 2022-06-23T19:44:47Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

ebyhr

Could you submit CLA if you haven't yet sent it?

ebyhr · 2022-06-23T23:09:35Z

tests/unit/sqlalchemy/test_compiler.py

@@ -44,3 +44,16 @@ def test_offset(dialect):
    statement = select(table).offset(0)
    query = statement.compile(dialect=dialect)
    assert str(query) == 'SELECT "table".id, "table".name \nFROM "table"\nOFFSET :param_1'
+
+
+def test_multiple_catalogs(dialect):


I'm not sure what "multiple catalogs" mean. Could you rename the test method or leave a comment?

I have updated the test name to test_catalog_argument.

trino/sqlalchemy/compiler.py

cla-bot · 2022-06-24T16:07:46Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

laserkaplan · 2022-06-24T16:10:38Z

Could you submit CLA if you haven't yet sent it?

I sent my signed CLA form yesterday. Please let me know if it hasn't been received!

ebyhr · 2022-07-04T03:10:14Z

@cla-bot check

cla-bot · 2022-07-04T03:10:16Z

The cla-bot has been summoned, and re-checked this pull request!

laserkaplan · 2022-07-06T17:11:02Z

I have added an additional bit to this PR since it hadn't been approved/merged yet. The visitors worked when compiling SQL statements, but another method was needed to do the same functionality for DDL statements (like CREATE TABLE).

ebyhr

Please rebase on upstream.

ebyhr · 2022-07-08T05:48:51Z

tests/unit/sqlalchemy/test_compiler.py

+        '\n'\
+        'CREATE TABLE "system".information_schema."table" (\n'\
+        '\tid INTEGER NOT NULL, \n'\
+        '\tPRIMARY KEY (id)\n'\


This DDL looks invalid as Trino query.

I changed the table to use a fake other catalog so that this test doesn't potentially interfere with the system catalog.

ebyhr · 2022-07-08T05:50:08Z

trino/sqlalchemy/compiler.py

+            column, add_to_result_map, include_table, **kwargs
+        )
+        table = column.table
+        return self.add_catalog(sql, table)


I understood the benefit for table, but not sure about column. Could you add a test case to show the usecase?

I think you're right that it's not needed. Removed the column visitor.

tests/unit/sqlalchemy/test_compiler.py

laserkaplan · 2022-08-05T18:24:10Z

Hi all, is there anything else blocking this PR from completion?

hashhar

Looks good to me. @mdesmet Can you please take a look as well?

@laserkaplan Please squash the commits now (since it's one logical change).

hashhar · 2022-08-09T17:38:06Z

tests/unit/sqlalchemy/test_compiler.py

+table_with_catalog = Table(
+    'table',
+    metadata,
+    Column('id', Integer, primary_key=True),


Is the primary_key=True required for the test? I don't think so. If so please remove it since it distracts attention from what the test and this change is actually about.

Same for the DDL which is used below - the PRIMARY KEY seems unrelated to the feature being added.

I see that the primary_key=True just above this one is pre-existing but that should also be removed. Unfortunate that it didn't get caught when it was introduced.

I went ahead and removed this from both temp tables, as well as in this test.

hashhar · 2022-08-09T17:40:38Z

trino/sqlalchemy/compiler.py

+    CTE = type(None)
+    Subquery = type(None)


Can you leave a comment here why this is ok to do?

Yep, this is because the CTE and Subquery classes weren't introduced until SQLAlchemy 1.4. Added a comment.

mdesmet

LGTM % comments

trino/sqlalchemy/compiler.py

laserkaplan · 2022-08-11T22:00:21Z

Please squash the commits now (since it's one logical change).

Is "squash and merge" enabled on the repo for when this is ready to merge?

mdesmet · 2022-08-11T22:05:38Z

Is "squash and merge" enabled on the repo for when this is ready to merge?

AFAIK we don't do squash and merge, only rebase and merge.

mdesmet

LGTM % commit squash

hashhar · 2022-08-16T06:44:10Z

tests/unit/sqlalchemy/test_compiler.py

+def test_catalogs_argument(dialect):
+    statement = select(table_with_catalog)
+    query = statement.compile(dialect=dialect)
+    assert str(query) == 'SELECT default."table".id \nFROM "other".default."table"'


Seems pre-existing but any idea why the schema name doesn't get quoted?

Looked into this a little. visit_table quotes table names by default, but it checks whether schema names "need" to be quoted (e.g. if they are a reserved word). The PyHive package got around this by considering everything to be a reserved word, which would trigger this quoting. I don't think it's necessary here, but it definitely is a little weird to see schema names not quoted while table names are.

Thanks for looking into this and explaining. I think we can leave as is for now since from the few queries I tested with special schema names it was able to handle them correctly. I didn't try too hard to break it though to be honest. 😄

hashhar

I just reworded the commit message. LGTM.

laserkaplan changed the title ~~Add ability to specify catalog as a table argument~~ Add ability to specify catalog as a SQLALchemy table argument Jun 23, 2022

ebyhr reviewed Jun 23, 2022

View reviewed changes

cla-bot bot added the cla-signed label Jul 4, 2022

laserkaplan requested a review from ebyhr July 6, 2022 17:28

ebyhr reviewed Jul 8, 2022

View reviewed changes

ebyhr assigned laserkaplan Jul 15, 2022

laserkaplan force-pushed the add_catalog branch from 318bee9 to 002d317 Compare July 18, 2022 17:02

laserkaplan closed this Jul 18, 2022

laserkaplan force-pushed the add_catalog branch from 002d317 to 951ad82 Compare July 18, 2022 17:16

laserkaplan reopened this Jul 18, 2022

laserkaplan requested a review from ebyhr July 18, 2022 17:57

ebyhr requested review from mdesmet and hashhar July 19, 2022 00:50

hashhar reviewed Jul 19, 2022

View reviewed changes

tests/unit/sqlalchemy/test_compiler.py Outdated Show resolved Hide resolved

hashhar approved these changes Aug 9, 2022

View reviewed changes

mdesmet reviewed Aug 9, 2022

View reviewed changes

trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved

trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved

trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved

mdesmet approved these changes Aug 11, 2022

View reviewed changes

laserkaplan force-pushed the add_catalog branch from 7cb46a4 to f8286b4 Compare August 11, 2022 22:11

Add ability to specify catalog in SQLAlchemy Table objects

b7cfbb9

hashhar reviewed Aug 16, 2022

View reviewed changes

hashhar force-pushed the add_catalog branch from f8286b4 to b7cfbb9 Compare August 16, 2022 06:44

hashhar approved these changes Aug 16, 2022

View reviewed changes

hashhar merged commit aee6064 into trinodb:master Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to specify catalog as a SQLALchemy table argument #186

Add ability to specify catalog as a SQLALchemy table argument #186

laserkaplan commented Jun 23, 2022 •

edited

Loading

cla-bot bot commented Jun 23, 2022

ebyhr left a comment

ebyhr Jun 23, 2022

laserkaplan Jun 24, 2022

cla-bot bot commented Jun 24, 2022

laserkaplan commented Jun 24, 2022

ebyhr commented Jul 4, 2022

cla-bot bot commented Jul 4, 2022

laserkaplan commented Jul 6, 2022

ebyhr left a comment

ebyhr Jul 8, 2022

laserkaplan Jul 18, 2022

ebyhr Jul 8, 2022

laserkaplan Jul 18, 2022

laserkaplan commented Aug 5, 2022

hashhar left a comment

hashhar Aug 9, 2022

hashhar Aug 9, 2022

hashhar Aug 9, 2022

laserkaplan Aug 11, 2022

hashhar Aug 9, 2022

laserkaplan Aug 11, 2022

mdesmet left a comment

laserkaplan commented Aug 11, 2022

mdesmet commented Aug 11, 2022

mdesmet left a comment

hashhar Aug 16, 2022

laserkaplan Aug 17, 2022

hashhar Aug 17, 2022

hashhar left a comment

Add ability to specify catalog as a SQLALchemy table argument #186

Add ability to specify catalog as a SQLALchemy table argument #186

Conversation

laserkaplan commented Jun 23, 2022 • edited Loading

cla-bot bot commented Jun 23, 2022

ebyhr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cla-bot bot commented Jun 24, 2022

laserkaplan commented Jun 24, 2022

ebyhr commented Jul 4, 2022

cla-bot bot commented Jul 4, 2022

laserkaplan commented Jul 6, 2022

ebyhr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laserkaplan commented Aug 5, 2022

hashhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdesmet left a comment

Choose a reason for hiding this comment

laserkaplan commented Aug 11, 2022

mdesmet commented Aug 11, 2022

mdesmet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hashhar left a comment

Choose a reason for hiding this comment

laserkaplan commented Jun 23, 2022 •

edited

Loading