Add better support for throttling exceptions #11

jochemb · 2020-10-02T08:17:49Z

In cases where requests fail for non-technical reasons,
we often do not want exponential backoff. Notably when
requests are throttled it is more appropriate to wait
a specified amount of time before retrying. Often we
will learn this from a Retry-After header.

This change introduces BackoffAndRetryException
which can be used to sleep a fixed amount of time
before retrying.

fixes: #10

In cases where requests fail for non-technical reasons, we often do not want exponential backoff. Notably when requests are throttled it is more appropriate to wait a specified amount of time before retrying. Often we will learn this from a `Retry-After` header. This change introduces `BackoffAndRetryException` which can be used to sleep a fixed amount of time before retrying.

ruuda

I started reviewing this, and resetting the retry window and count looked like a neat solution initially, but then I realized it breaks the naming of our arguments, which is exactly the kind of footgun we try to avoid in Opnieuw: max_calls_total is no longer a maximum if the count can reset.

There is also one potentially dangerous outcome: if you happen to hit an edge case in an external service, so it always fails, but there is also an aggressive rate limit on the endpoint, then we can end up in an infinite loop, where before we hit the maximum number of failures we get rate limited, and after that we start running into the same errors.

I don’t think this is a hypothetical problem. We integrate with one API in particular that has rate limits of a few calls per hour, and I’ve seen us consistently getting internal server errors for one particular request.

Making retries on throttle a separate decorator would fix both problems (we can put a maximum on the number of attempts there as well, and the meaning of the inner retry remains unchanged), but it introduces some new challenges:

You can forget to add it.
You can put the throttle-retry and error-retry in the wrong order.

I think the latter one could be prevented with some properties and an assertion. The former one is not so bad, because rate limiting is really a separate thing from transient errors, and we already need to detect the case specially anyway.

What do you think?

ruuda · 2020-10-02T08:25:22Z

tests/test_opnieuw.py

+        self.counter = 0
+        start = time.monotonic()
+
+        with retry_immediately():


Hmm, so this does mean that backoff throttle waits are not included in the retry_immediately. I would expect them to be skipped as well. This test would still be valuable because it tests that we can make twice the number of calls, due to the counter reset.

Yeah, that is a little unexpected 👍

jochemb · 2020-10-02T09:15:05Z

Thanks for the review. Those are excellent points.

I realized it breaks the naming of our arguments, which is exactly the kind of footgun we try to avoid in Opnieuw: max_calls_total is no longer a maximum if the count can reset.

I agree this is inconvenient. Perhaps it is more clear by renaming the exception to BackoffAndResetRetryException

Making retries on throttle a separate decorator would fix both problems (we can put a maximum on the number of attempts there as well, and the meaning of the inner retry remains unchanged)

The value here is that the decorators will explicitly state how many retries can be expected. I like that. We can also achieve that by adding a maximum_number_of_resets parameter to the decorator. We can default that to 0 in which case BackoffAndResetRetryException would function just as RetryException. This way we have explicitness, a way to prevent infinite loops, but no confusion about decorator ordering. Someone might still forget to set the parameter to non-zero though, but I think that is acceptable.

ruuda · 2020-10-05T07:43:17Z

Perhaps it is more clear by renaming the exception to BackoffAndResetRetryException

That would help, but it can still be surprising for the reader, when the part where that exception is raised is not close to the decorator where it says “max”. I am thinking from the perspective of somebody who is new to a codebase and is debugging why we make too many calls to some external API.

We can also achieve that by adding a maximum_number_of_resets parameter to the decorator.

That might work. Our current max_calls_total argument is pretty strongly a maximum on the total number of calls though, even if there is something called “reset” beside it. 🙈 How about we keep max_calls_total as the total upper bound, and add a new optional one for the max calls between resets?

p-nilsson · 2020-10-05T08:39:35Z

Following the discussion, I think I agree with @ruuda that we should keep the 2 behaviors (retrying, and throttling) clearly separated. I kind of liked the very first idea of adding a separate @throttle decorator to opnieuw for 2 reasons:

Nothing changes for the people who are currently used to using @retry and @retry_async.
We have clearly separated the use of throttling and retrying and it is up to the developer to add the decorators they want to utilize

This does leave the caveat that @ruuda pointed out:

You can put the throttle-retry and error-retry in the wrong order.

But I think it is acceptable. We could make it easier for the developers by clearly stating in the documentation and/or the docstring of the decorators in which order they should go.

jochemb · 2021-09-06T13:00:15Z

I'm closing this PR as its hard to get the semantics for different parameters right with this approach.

rvanlaak · 2022-04-21T13:40:35Z

When looking for documentation whether Channable internally supports making use of the Retry-After response header, I came across this PR.

@ruuda does Channable respect the retry header when loading xml feeds?

ruuda · 2022-04-21T17:51:41Z

Does Channable respect the retry header when loading xml feeds?

You can contact [email protected] for this.

jochemb requested review from ruuda and a team October 2, 2020 08:17

ruuda reviewed Oct 2, 2020

View reviewed changes

svisser removed the request for review from a team August 30, 2021 19:08

jochemb closed this Sep 6, 2021

p-nilsson deleted the backoff-exception branch September 6, 2021 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add better support for throttling exceptions #11

Add better support for throttling exceptions #11

jochemb commented Oct 2, 2020

ruuda left a comment

ruuda Oct 2, 2020

jochemb Oct 2, 2020

jochemb commented Oct 2, 2020

ruuda commented Oct 5, 2020

p-nilsson commented Oct 5, 2020

jochemb commented Sep 6, 2021

rvanlaak commented Apr 21, 2022

ruuda commented Apr 21, 2022

Add better support for throttling exceptions #11

Add better support for throttling exceptions #11

Conversation

jochemb commented Oct 2, 2020

ruuda left a comment

Choose a reason for hiding this comment

ruuda Oct 2, 2020

Choose a reason for hiding this comment

jochemb Oct 2, 2020

Choose a reason for hiding this comment

jochemb commented Oct 2, 2020

ruuda commented Oct 5, 2020

p-nilsson commented Oct 5, 2020

jochemb commented Sep 6, 2021

rvanlaak commented Apr 21, 2022

ruuda commented Apr 21, 2022