-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some changes to make local IT runs more stable #802
Some changes to make local IT runs more stable #802
Conversation
d90b98d
to
919ebca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads up - I looked at the backoff project and it very much seems like an abandonware: litl/backoff#211 . We may or may not face an issue once we bump to 3.10/3.11 unless they figure out whether they want to fork the project or not.
tenacity seemed like a viable alternative not so long ago, but looking at their issues/PRs, seems to be in the same situation.
@@ -24,6 +25,8 @@ | |||
|
|||
log = logging.getLogger(__name__) | |||
|
|||
fast_constant_delay = lambda: (0.01 for _ in range(10**4)) # noqa: E731 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious for various reasons:
- named lambda for a reusable generator instead of a regular helper -> neglibible
- arbitrary magic
10**4
sentinel condition in a constant generator - 0.01s is very brave in general, in practice registry and repository servers without proper load balancing proxies (yeah, ain't going to happen much in private company networks) may not be able to respond that quickly, especially since we only use 5 attempts in general for connections so this would add up to total wait of 0.05s
- this results in a weird log output from backoff itself
INFO Backing off _run(...) for 0.0s
because it only formats the msg only up to the first decimal: https://github.com/litl/backoff/blob/d82b23c42d7a7e2402903e71e7a7f03014a00076/backoff/_common.py#L96 - the library itself provides a constant generator which can be initialized via: https://github.com/litl/backoff/blob/d82b23c42d7a7e2402903e71e7a7f03014a00076/backoff/_decorator.py#L137 which feels more transparent than this lambda
- you don't mention anywhere why a constant fast delay is preferred over an exponential one in real world scenarios; sure, exponential may result in more wait time, but could be more resilient as a trade off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambda is a way to pass around a generator: the code expects to find a callable which would return a generator, so I hammered out a callable. At that point I was already pretty upset by interactions between UTs and elastic requests and missed the part where arguments to the first argument are supposed to be passed through the last argument. It does look marginally cleaner. The nature of magic constants is arbitrary -- I needed some long enough generator. I should have used repeat
there right away. As for the delay it looks like it is enough to just re-issue a request, we are dealing with GH here and my educated guess is that we'll be hitting a different server even if there were no delay at all. 0.01 is a way of expressing that "right away" is good enough. This is the reason why I believe that exponential is an overkill. I'll add a note to commit message if I keep this commit.
Sometimes an active DNF server would respond with error. This will be treated as a RuntimeError and result in a test failure when in practice it is enough to wait a little and then try again. This change adds a clause to retry after a ConnectionError instead of just failing on any RuntimeError. Any other RuntimeErrors are still considered failures. Signed-off-by: Alexey Ovchinnikov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of backoffs does not play well with UTs and solves a local problem. I will drop this commit in the next revision and will try addressing it separately at some other point.
For the record, I tried using pytest-retry
and mark all tests as flaky (which, admittedly, is a wrong way of doing it), but that, in turn, did not play well with our fixtures.
@@ -24,6 +25,8 @@ | |||
|
|||
log = logging.getLogger(__name__) | |||
|
|||
fast_constant_delay = lambda: (0.01 for _ in range(10**4)) # noqa: E731 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambda is a way to pass around a generator: the code expects to find a callable which would return a generator, so I hammered out a callable. At that point I was already pretty upset by interactions between UTs and elastic requests and missed the part where arguments to the first argument are supposed to be passed through the last argument. It does look marginally cleaner. The nature of magic constants is arbitrary -- I needed some long enough generator. I should have used repeat
there right away. As for the delay it looks like it is enough to just re-issue a request, we are dealing with GH here and my educated guess is that we'll be hitting a different server even if there were no delay at all. 0.01 is a way of expressing that "right away" is good enough. This is the reason why I believe that exponential is an overkill. I'll add a note to commit message if I keep this commit.
In Python 3.14 'filter' argument to TarFile.extractall (introduced in 3.12) becomes mandatory. Until then it defaults to `None` and triggers a warning. This change states that we are OK with continuing extracting as before 3.12. Signed-off-by: Alexey Ovchinnikov <[email protected]>
This prevents pydantic from complaining about potentially unserializable object. Signed-off-by: Alexey Ovchinnikov <[email protected]>
919ebca
to
6f523fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, the changes in the PR shrank quite a lot since the last time I checked on it.
45611bb
The PR adds several retries to functions which tend to sporadically fail locally for no good reason.
It also addresses several warnings which pollute UTs output. It does not handle
coroutine 'AsyncMockMixin._execute_mock_call' was never awaited
which is very annoying, but has to be chased with specially trained dogs.Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)