Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some changes to make local IT runs more stable #802

Conversation

a-ovchinnikov
Copy link
Contributor

The PR adds several retries to functions which tend to sporadically fail locally for no good reason.

It also addresses several warnings which pollute UTs output. It does not handle
coroutine 'AsyncMockMixin._execute_mock_call' was never awaited which is very annoying, but has to be chased with specially trained dogs.

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

@a-ovchinnikov a-ovchinnikov force-pushed the integration_tests_experiments_27012025 branch from d90b98d to 919ebca Compare January 29, 2025 22:57
tests/unit/conftest.py Dismissed Show resolved Hide resolved
tests/unit/conftest.py Dismissed Show dismissed Hide dismissed
tests/unit/test_scm.py Dismissed Show dismissed Hide dismissed
tests/unit/test_scm.py Dismissed Show dismissed Hide dismissed
Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up - I looked at the backoff project and it very much seems like an abandonware: litl/backoff#211 . We may or may not face an issue once we bump to 3.10/3.11 unless they figure out whether they want to fork the project or not.
tenacity seemed like a viable alternative not so long ago, but looking at their issues/PRs, seems to be in the same situation.

tests/integration/conftest.py Outdated Show resolved Hide resolved
tests/integration/conftest.py Outdated Show resolved Hide resolved
tests/integration/conftest.py Show resolved Hide resolved
@@ -24,6 +25,8 @@

log = logging.getLogger(__name__)

fast_constant_delay = lambda: (0.01 for _ in range(10**4)) # noqa: E731
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious for various reasons:

  • named lambda for a reusable generator instead of a regular helper -> neglibible
  • arbitrary magic 10**4 sentinel condition in a constant generator
  • 0.01s is very brave in general, in practice registry and repository servers without proper load balancing proxies (yeah, ain't going to happen much in private company networks) may not be able to respond that quickly, especially since we only use 5 attempts in general for connections so this would add up to total wait of 0.05s
  • this results in a weird log output from backoff itself INFO Backing off _run(...) for 0.0s because it only formats the msg only up to the first decimal: https://github.com/litl/backoff/blob/d82b23c42d7a7e2402903e71e7a7f03014a00076/backoff/_common.py#L96
  • the library itself provides a constant generator which can be initialized via: https://github.com/litl/backoff/blob/d82b23c42d7a7e2402903e71e7a7f03014a00076/backoff/_decorator.py#L137 which feels more transparent than this lambda
  • you don't mention anywhere why a constant fast delay is preferred over an exponential one in real world scenarios; sure, exponential may result in more wait time, but could be more resilient as a trade off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda is a way to pass around a generator: the code expects to find a callable which would return a generator, so I hammered out a callable. At that point I was already pretty upset by interactions between UTs and elastic requests and missed the part where arguments to the first argument are supposed to be passed through the last argument. It does look marginally cleaner. The nature of magic constants is arbitrary -- I needed some long enough generator. I should have used repeat there right away. As for the delay it looks like it is enough to just re-issue a request, we are dealing with GH here and my educated guess is that we'll be hitting a different server even if there were no delay at all. 0.01 is a way of expressing that "right away" is good enough. This is the reason why I believe that exponential is an overkill. I'll add a note to commit message if I keep this commit.

tests/unit/conftest.py Dismissed Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
noxfile.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
tests/unit/models/test_sbom.py Show resolved Hide resolved
Sometimes an active DNF server would respond with error.
This will be treated as a RuntimeError and result in a test
failure when in practice it is enough to wait a little and
then try again. This change adds a clause to retry after
a ConnectionError instead of just failing on any
RuntimeError. Any other RuntimeErrors are still considered
failures.

Signed-off-by: Alexey Ovchinnikov <[email protected]>
Copy link
Contributor Author

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of backoffs does not play well with UTs and solves a local problem. I will drop this commit in the next revision and will try addressing it separately at some other point.

For the record, I tried using pytest-retry and mark all tests as flaky (which, admittedly, is a wrong way of doing it), but that, in turn, did not play well with our fixtures.

tests/integration/conftest.py Show resolved Hide resolved
tests/integration/conftest.py Outdated Show resolved Hide resolved
@@ -24,6 +25,8 @@

log = logging.getLogger(__name__)

fast_constant_delay = lambda: (0.01 for _ in range(10**4)) # noqa: E731
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda is a way to pass around a generator: the code expects to find a callable which would return a generator, so I hammered out a callable. At that point I was already pretty upset by interactions between UTs and elastic requests and missed the part where arguments to the first argument are supposed to be passed through the last argument. It does look marginally cleaner. The nature of magic constants is arbitrary -- I needed some long enough generator. I should have used repeat there right away. As for the delay it looks like it is enough to just re-issue a request, we are dealing with GH here and my educated guess is that we'll be hitting a different server even if there were no delay at all. 0.01 is a way of expressing that "right away" is good enough. This is the reason why I believe that exponential is an overkill. I'll add a note to commit message if I keep this commit.

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
noxfile.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
tests/unit/conftest.py Dismissed Show resolved Hide resolved
tests/unit/models/test_sbom.py Show resolved Hide resolved
In Python 3.14 'filter' argument to TarFile.extractall (introduced in
3.12) becomes mandatory. Until then it defaults to `None` and
triggers a warning. This change states that we are OK with
continuing extracting as before 3.12.

Signed-off-by: Alexey Ovchinnikov <[email protected]>
This prevents pydantic from complaining about potentially
unserializable object.

Signed-off-by: Alexey Ovchinnikov <[email protected]>
@a-ovchinnikov a-ovchinnikov force-pushed the integration_tests_experiments_27012025 branch from 919ebca to 6f523fe Compare January 30, 2025 20:38
Copy link
Member

@slimreaper35 slimreaper35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, the changes in the PR shrank quite a lot since the last time I checked on it.

tests/integration/conftest.py Show resolved Hide resolved
@a-ovchinnikov a-ovchinnikov added this pull request to the merge queue Jan 31, 2025
Merged via the queue into hermetoproject:main with commit 45611bb Jan 31, 2025
15 of 16 checks passed
@a-ovchinnikov a-ovchinnikov deleted the integration_tests_experiments_27012025 branch January 31, 2025 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants