-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrying on connection closed #104
Comments
Hey @jbransen, thanks for raising this issue. You might be the first person that uses this crate in a production environment. I created this repo/crate while I worked a Spotify - at that moment I was thinking that perhaps it would be handy in case we wanted to try using Rust in some of our backend systems, as many of the backend systems needed to talk to Bigtable (however, that never really happened before I left Spotify). And I do remember during those days, the Spotify engineers needed to add Bigtable channel refreshing (client-swapping) manually in their code base to deal with the hourly spikes, and perhaps these days those "auto-refresh" feature gets merged into the standard Bigtable Java client side (as it described from the doc you linked); For the Golang side, the doc you linked points to some simple Golang example code that does the same thing, but requires users to implement those as well, I suppose. So basically, without looking into all the details, if one wishes to have similar features (client refreshing), one might just need to implement it in a similar fashion in Rust. On your side, you might make it work by manually creating a new Unfortunately, I do not use this crate in my work, nor work with Bigtable anymore. So even if I try to implement a feature like this, it might be a bit hard for me to try and test it out (not sure if the Bigtable emulator can do connection close, perhaps possible?). But if you want, you are very welcome to contribute to this crate, add such feature, and test it out in your environment before merging. Let me know if this helps :) |
Thank you for the fast reply, this helps. I am happy to craft a solution, try it out, and make a PR. |
Oh cool, thanks a lot. I haven't been following up with the Rust GRPC development closely, not sure if on the tonic and tokio side whether there is some implementation off the shelve can help in this case. You might want to check the API there a bit or perhaps ask in the tonic Discord channel, people are helpful and I got a lot of good help while I was implementing this crate. |
Thanks. And in this case, I am not 100% sure the "retry-if-encounter-channel-close" is really what we need and can solve your issue gracefully. As from the Google doc linked, and with some public info such as this stackoverflow page, what I know is that the operation of reconnection to grpc to open warm channels are not cheap, so relying on any "retry" mechanism to fix your issue will cause hourly latency spikes to Bigtable. What I mentioned above, and what the Google doc link's Golang example suggested, is not a "retry", but "keep creating a new client (and open new channels) and replace the old client to the new client, periodically" - so it is more about "auto client/channel refreshing" rather than any sort of retry. |
And you might want to update the issue title from "Retrying on connection closed" to something like "Auto refreshing client to deal Bigtable server hourly grpc channel close" |
Thank you. I understand your point, and I agree that a some process to keep connections warm is a good idea. However, for now I am still looking at a retry because:
I've implemented something here, I will first evaluate this in our environment before creating a PR. I am very aware that this is not the best solution for the above problem, and that you'd ideally have a configurable backoff retry strategy, but I think this is enough for us for the foreseeable future. Once I am satisfied with that I'll create a PR and leave it to you whether or not you'd like to merge it. |
Hi @jbransen, thanks for the good effort to move this forward. However, at the same time, I have a few concerns about adding retry-once-feature in this repo, and let me illustrate a few points here and see if you agree:
Even with a relatively good retry mechanism implemented, there are still other important things to concern, especially in high-load situations (10K+ requests per second per Bigtable instances, multiple instances running with auto-scale) as in those situations a glitch of the system might cause a huge amount of un-necessary or impossible-to-recover retry, could end up in chain-reaction situation that servers queues up with retry requests, overloading servers themsevles or overloading BT instances with huge amount of retry requests while they are trying to recover.
With all these points above, plus I would very much like this repo to follow the KISS principle, I think perhaps the best solution for now is to leave any retry related implement off to the user side or the tonic side. Let me know if you agree. Thanks. |
I am running this library in a production setting, and on a regular basis I observe the following error:
When reading the BigTable connection pool documentation this seems to be expected, as connections are refreshed every hour. However, as a client user I am suprised: why does this error make it's way to the end user? Is that a concious choice or is this an oversight? I would expect this library to gracefully handle this by retrying (which should always work fine), so that clients never end up with such errors.
The text was updated successfully, but these errors were encountered: