Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add defer_connect config to allow eagerly verifying connection #394

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Shaheer-rossoneri14
Copy link
Member

Description

Add defer_connect config to allow eagerly verifying connection.

Depends on #393.

Non-technical explanation

This commit adds a new connection parameter defer_connect which can be
set to False to force creating a connection when trino.dbapi.connect
is called. Any connection errors as a result of that get rewrapped into
trino.exceptions.TrinoConnectionError.

This doesn't end up actually executing a query on the server because
after the initial POST request the nextUri in the response is not
followed which leaves the query in QUEUED state. This is not documented
in the Trino REST API but the server does behave like this today. The
benefit is that we can very cheaply verify if the connection is valid
without polluting the server's query history or adding queries to queue.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

* Fix some things. ({issue}`issuenumber`)

@cla-bot
Copy link

cla-bot bot commented Jul 18, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link

cla-bot bot commented Jul 18, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@hovaesco
Copy link
Member

Please drop 9d417d6 from here since it's being added here #393

@Shaheer-rossoneri14
Copy link
Member Author

@hovaesco Dropped the commit 9d417d6.

@hashhar
Copy link
Member

hashhar commented Aug 17, 2023

This is useful since I've heard often that people complain that they don't know the credentials are wrong or that the server isn't running AFTER issuing the first query. However only PyMySql client seems to use the defer_connect arg.

@hashhar
Copy link
Member

hashhar commented Aug 17, 2023

@ebyhr @mdesmet @hovaesco what are your opinions? I like this because it's a relatively small change and addresses a complaint I've heard quite often. The downside is there are already enough parameters for the connect method.

@hovaesco
Copy link
Member

It looks like a useful change from my perspective, note that PyMySql sets defer_connect to False by default.

trino/dbapi.py Outdated
@@ -157,6 +159,7 @@ def __init__(
legacy_prepared_statements=None,
roles=None,
timezone=None,
defer_connect=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the default to False.

trino/dbapi.py Outdated
Comment on lines 227 to 231
"error {}{}".format(
test_response.status_code,
": {}".format(test_response.content)
if test_response.content
else "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please simplify the format string, maybe using simple concatenation instead

right now it's using nested format like error {}{: {}}

@hovaesco
Copy link
Member

Please rebase on master.

This commit adds a new connection parameter `defer_connect` which can be
set to False to force creating a connection when `trino.dbapi.connect`
is called. Any connection errors as a result of that get rewrapped into
`trino.exceptions.TrinoConnectionError`.

By default `defer_connect` is set to `True` so users can explicitly call
`trino.dbapi.Connection.connect` to do the connection check.

This doesn't end up actually executing a query on the server because
after the initial POST request the nextUri in the response is not
followed which leaves the query in QUEUED state. This is not documented
in the Trino REST API but the server does behave like this today. The
benefit is that we can very cheaply verify if the connection is valid
without polluting the server's query history or adding queries to queue.

Some unit tests today relied on the lazy connection behaviour so they
have been adjusted accrodingly.
verify=self._http_session.verify,
)
try:
test_response = connection_test_request.post("<not-going-to-be-executed>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum The server doesn't seem to "execute" the query until the first nextUri is followed. i.e. if you send just a POST /v1/statement and then do nothing then query execution doesn't happen.

Is this expected from the protocol? Asking because this change relies on this behaviour to allow verifying that connection is indeed correct - e.g. proper auth instead of failing when the first query is submitted.

This is a UX issue which people complain about when using the Trino CLI or JDBC as well for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with David offline and he said that's it's expected behaviour.

However if we don't send a DELETE the query might occupy resources on the server. So in this change we should send a subsequent DELETE as well for the query.

Additionally we also discussed about making the protocol more "synchronous" in future so that connection verification happens eagerly always.

@Shaheer-rossoneri14 Can you adjust this code to send a DELETE for the fake query we executed as well.

@@ -201,6 +204,31 @@ def __init__(
self.legacy_primitive_types = legacy_primitive_types
self.legacy_prepared_statements = legacy_prepared_statements

if not defer_connect:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not defer_connect:
if not defer_connect and auth != constants.DEFAULT_AUTH:

@hovaesco This would mean we'll only do eager connection if some authentication is actually provided. WDYT? Or do you think existing code is better (simpler and we don't assume when someone might want eager connections)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any production use cases when auth is constants.DEFAULT_AUTH but might be helpful in some local testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants