Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1663445: How to reduce number of login/session requests? #1025

Closed
myshon opened this issue Sep 13, 2024 · 6 comments
Closed

SNOW-1663445: How to reduce number of login/session requests? #1025

myshon opened this issue Sep 13, 2024 · 6 comments
Assignees
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team

Comments

@myshon
Copy link

myshon commented Sep 13, 2024

It's question about General Usage, not Feature request.

What is the current behavior?

I use snowflake connector in Asp.Net WebAPI service in low-traffic environment (12k GET queries within a week. Peak 200 requests per hour).

See metrics:
image

I see that apart from query-request provider executes additional heavy requests login-request and session under the hood.

Is it possible to reduce number of these requests somehow?
Is there any variable in connection string that I can manipulate number of sessions/login (now I use default connection string settings and login via password)

The connection class is used more-less this way - is it correct?

        await using var conn = await _connectionProvider.GetConnection();
        await conn.OpenAsync();
        await using var cmd = CreateCommand(conn, databaseName);
        await using var reader = await cmd.ExecuteReaderAsync();
        while (await reader.ReadAsync())
        {
            yield return ReadData(reader);
        }

        await conn.CloseAsync();

What is the desired behavior?

I would like

  1. reduce number of additional heavy requests login-request and session
  2. optimise and adjust warehouse configuration to have lower latency on query-request

Configuration

  1. Default connection string settings (login password, min connection pool = 2, max connection pool = 10).
  2. Warehouse "x-small" with multi-cluster disabled
  3. Standard Tables are used
  4. Traffic 12k GET queries within a week. Peak 200 requests per hour.
@myshon myshon added the feature label Sep 13, 2024
@github-actions github-actions bot changed the title How to reduce number of login/session requests? SNOW-1663445: How to reduce number of login/session requests? Sep 13, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Sep 18, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team and removed feature labels Sep 18, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi and thanks for raising this question here. On the second point (warehouse optimisation) we might not be able to help here, as it is entirely independent from this particular driver library. But we do have some material on the topic, such as

regarding the first question which is more related to the library here, the login-request calls to Snowflake are necessary to, well, log you in :) and session you seeing is probably calls to session?delete which is when the remote Snowflake session needs to be deleted (e.g. on connection closure and user logout)

I believe what you seeing is closely correlated with the usage pattern mentioned here, which if I'm correct, is:

  • open a connection to snowflake (login-request)
  • issue a query (query-request)
  • close connection to Snowflake (session?delete)

Maybe you need to discard the session and close the connection after each query but if not, you could consider calling Close() / CloseAsync() only when the session is expected to be closed.
You could perhaps also experiment with increasing MinPoolSize to see if keeping more connections in the pool helps in your use-case.

Speaking about which, we have a parameter CLIENT_SESSION_KEEP_ALIVE which when set to true, will issue 'heartbeats' (call to /session/heartbeat) automatically to keep the current connection alive (if you do not explicitly need to dispose of it), instead of logging out of it.
An associated other parameter is CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY , ranges between 900 and 3600 seconds (latter is the default, every hour) and governs how often those heartbeats should be sent in a session. Unfortunately i do not see CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY exposed as a driver setting, but you can probably set it on the user with ALT#R USER <username> SET CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY = 900 if you wish to set the default 1h heartbeats to 15m heartbeats.

Again, this only makes sense if you do not need to close the session and can keep them around and reused.

Raising another question for me, why does a single login take 5.6s on the average. That might be expected (e.g. if your client sits 'far' away from the Snowflake account, e.g. source is in Europe and SF account is in AWS Tokyo for example), but might be unexpected too.
A quick test you could do to instantly eliminate one of the factors, is to set insecuremode=true without changing anything else. If the login-time improves drastically just with this change, then it would indicate the client has some kind of issues getting the responses from the CRL endpoints used for certificate verification. If this is the case, you could take a look at Validating certificates and execute some tests to see if the endpoints are reachable from the source/network you're running the .NET driver from.

Please note insecuremode=true is not meant for everyday usage, especially in production, but as a quick test to rule out a possible factor it can be used.
If insecuremode=true doesn't change anything and login-request on the average still takes multiple seconds.

Hope this helps somewhat moving forward.

@myshon
Copy link
Author

myshon commented Sep 18, 2024

@sfc-gh-dszmolka Thanks for very comprehensive answer!
I know that login-request is required :) I accept that it takes a few seconds. The point was to reduce number of login requests (which is almost 10% of all API calls) by reusing sessions in order reduce total latency.

I will be investigating CLIENT_SESSION_KEEP_ALIVE and insecuremode=true

@myshon
Copy link
Author

myshon commented Sep 19, 2024

First, I checked parameter CLIENT_SESSION_KEEP_ALIVE=true.

image

Still I had

  • login requests high average latency ~5seconds
  • query latency P90 ~3seconds
  • number of all login requests about still about 10-20% comparing to query requests
  • additional heartbeat requests

Second, I added INSECUREMODE=true (for testing purpose).

image

Then suddenly latency dropped!

  • login requests P90 to ~200ms
  • query latency P90 to ~100ms

It looks that we have an issue with cert validation, correct?

@sfc-gh-dszmolka
Copy link
Contributor

indeed if the average latency of login-request drops so drastically by not changing anything just turning certificate validation off, then indeed the focus of the investigation should be this aread.
Did you have a chance perhaps to look at Validating certificates here and see if all of the CRL endpoints are reachable from the same host/network where you run the .NET driver ?

It's a very frequent issue that these endpoints are blocked on a particular network, operating on port 80. (although absolutely no customer or useful data is transmitted on the wire unencrypted, everything is already public information like the CRL itself. Could you please make sure they are reachable? Perhaps not, and driver spends the most of the time waiting for the response. Most optimal would be of course if it would be possible to write a small application which can use the same .NET driver on the same host, downloading the CRLs from within the C# app.

Another (although very rare) possibility is that CRL endpoints are reachable, CRL can be downloaded, but driver has issues parsing the content e.g. to a bad openssl installation on the host.

I also saw you created 00848448 with Snowflake Support, so I would like to ask whether you prefer continuing working with us in a more private and regulated environment (Support case) , or here ? My colleague already requested the DEBUG level logs from you, which usually best shared privately and not the world :) without sanitizing it, of course. Having the verbose logs usually provides very useful information but if you could do the aforementioned connectivity tests, that should be also helpful.

@myshon
Copy link
Author

myshon commented Sep 23, 2024

Before I received your answer, I had created support ticket 00848448 to to get an answer faster 😉

Honestly, I prefer Github due to interface and usability. However I would like to avoid putting all logs in public here. I send DEBUG logs today in support ticket.

I added endpoint in order to check cert reachability from host machine. Here is the result:


[
  {
    "url": "http://crl.pki.goog/gsr1/gsr1.crl",
    "status": "OK"
  },
  {
    "url": "http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-1.crl",
    "status": "OK"
  },
  {
    "url": "http://crl3.digicert.com/DigiCertGlobalRootG2.crl",
    "status": "OK"
  }
]

@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Sep 23, 2024

thank you for checking. All 3 endpoints you listed here, seem to be reachable on HTTP.
It's fully understandable you'd prefer not to share logs here - please share them with my colleague so we could look further. The area of focus should still remain the CRL checking procedure in my opinion, given that significant performance improvement could be achieved just by turning CRL checking off and changing nothing else.

I'm now closing down this issue to keep the efforts in one thread, which will be the support case (due to the sensitiveness of data needed, the logs).
edit: but if there's anything else required from this Github issue which is not covered on the Support case, please comment and we can address accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants