-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential backoff #146
Comments
Do you mean for the circuit breakers? I would assume that increasing the size of the window (timeout) would cause this to decrease since there should be some randomness involved in when the windows would open. If it's a heavily queried resource, perhaps adding some randomness for jitter to the window could work? By exponential backoff, are you referring to the size of the circuit breaker window, or something different? Is this a problem for your datastore due to the sudden throughput (would it ever be larger than steady state?) or the connections established per second? Which datastore are you running into trouble with? BTW Jacob did you roll out the new Semian with cc @jpittis |
Yep, for the circuit breakers. I haven't tried adding any jitter to the window but could definitely trial some ideas on that instead. The issue we have is that when we bring MySQL back online there are bunch of services will be waiting on it and with cold caches, it is a big sluggish to respond. We've got a few things in the pipeline to mitigate it but I'm sure we'll hit it eventually with another datastore. I don't think it's getting overloaded with connections, just that everything will be hitting cold cache and it needs rebuilding. No quotas yet but it's on my list to look at in the coming weeks 😄 |
Something I've been looking into lately is how we can combat the stampeding herd effect we occasionally incur once a system has recovered and it is able to receive traffic again. One approach I've explored is using expotential backoff and I was looking to find out if this is something you'd consider adding to semian? I think semian is a sensible place to put this because it already has knowledge of the tickets/quotas, error rates and could use it's already available data to make decisions on how much to push out the backoff by without needing to query another resource.
Also open to hearing about how you've addressed this at Shopify if you've got a good handle on it in other ways 😄
The text was updated successfully, but these errors were encountered: