Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connection leak of session with dropped keyspace #130

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ikehara
Copy link
Contributor

@ikehara ikehara commented Jun 29, 2024

This PR fixes the connection leak problem of reconnection failure.

When a keyspace is dropped and the origin server is restarted, cql-proxy attempts to reconnect. However, the new connection fails to set the keyspace, leading to a memory leak each time cql-proxy tries to connect to the server.

2024-06-29T17:23:48.232+0900	INFO	proxycore/connpool.go:193	pool connection attempting to reconnect after delay	{"host": "127.0.0.1:9042", "delay": "10s"}

The root cause of this issue is that the connection cleanup process is not functioning as expected.
Specifically, the conn variable is always nil because the statement return nil, err overwrites the conn variable.

@absurdfarce
Copy link
Collaborator

absurdfarce commented Jul 1, 2024

Thanks for the report @ikehara!

After looking at this a bit I'm not sure there's a scenario which winds up leaking memory here. It's absolutely true that reconnect attempts in the scenario you describe (a dropped keyspace + origin server restart) will continue to try to reconnect via this operation. This will always fail (as the set keyspace op here can't possibly succeed against a dropped keyspace). But this shouldn't result in a memory leak; note that the output of the call to connect() is stored in a temp variable "c" and that this value is only copied to "conn" if "err" is nil. "err" is very definitely not nil in this case so nothing should be overwritten. "c" should get garbage collected normally when the block in question goes out of scope.

All of that said, there is a potentially interesting problem here. In the case under consideration the connection pool will fail to recreate new connections (for reasons discussed above) and will essentially fail to re-connect forever. This is due to the fact that the keyspace to set appears to be a client-level config and clients are currently created when a client connects to the proxy. We very likely need to tweak that logic a bit to handle the case you describe @ikehara.

@ikehara
Copy link
Contributor Author

ikehara commented Jul 1, 2024

Thank you for the quick response.
Sorry, my previous message was a bit confusing. I meant a resource leak.
I'm afraid a connection (file descriptor of net.conn struct) is not closed by the GC, so a resource leak might occur if the Close method is not called. Even if it were closed by its finalizer, the number of TCP connections is limited and might run out before garbage collection happens, depending on the amount of memory.

@ikehara ikehara force-pushed the fix-connection-leak branch from 181da40 to abdbcfd Compare July 2, 2024 02:35
@mpenick
Copy link
Contributor

mpenick commented Jul 2, 2024

@absurdfarce It's because the existing logic returns the error instead of setting err which causes the defer not to close the connection.

@mpenick
Copy link
Contributor

mpenick commented Jul 2, 2024

Nice find @ikehara

@absurdfarce
Copy link
Collaborator

Apologies, just coming back to this one. Aha, okay, I get it now... man, I didn't know defer statements could re-write named return values like that; that's pretty cool.

+1 from me.

@absurdfarce absurdfarce self-requested a review July 31, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants