Some changes to make local IT runs more stable #802

a-ovchinnikov · 2025-01-29T22:56:28Z

The PR adds several retries to functions which tend to sporadically fail locally for no good reason.

It also addresses several warnings which pollute UTs output. It does not handle
coroutine 'AsyncMockMixin._execute_mock_call' was never awaited which is very annoying, but has to be chased with specially trained dogs.

Maintainers will complete the following section

Commit messages are descriptive enough
Code coverage from testing does not decrease and new code is covered
Docs updated (if applicable)
Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

approve GitHub Actions workflows by clicking a button
approve the Red Hat Trusted App Pipeline container build by commenting /ok-to-test
(as is the standard for Pipelines as Code)

tests/unit/conftest.py

tests/unit/test_scm.py

eskultety

Just a heads up - I looked at the backoff project and it very much seems like an abandonware: litl/backoff#211 . We may or may not face an issue once we bump to 3.10/3.11 unless they figure out whether they want to fork the project or not.
tenacity seemed like a viable alternative not so long ago, but looking at their issues/PRs, seems to be in the same situation.

tests/integration/conftest.py

eskultety · 2025-01-30T08:24:59Z

cachi2/core/package_managers/general.py

@@ -24,6 +25,8 @@

 log = logging.getLogger(__name__)

+fast_constant_delay = lambda: (0.01 for _ in range(10**4))  # noqa: E731


This looks suspicious for various reasons:

named lambda for a reusable generator instead of a regular helper -> neglibible

arbitrary magic 10**4 sentinel condition in a constant generator

0.01s is very brave in general, in practice registry and repository servers without proper load balancing proxies (yeah, ain't going to happen much in private company networks) may not be able to respond that quickly, especially since we only use 5 attempts in general for connections so this would add up to total wait of 0.05s

this results in a weird log output from backoff itself INFO Backing off _run(...) for 0.0s because it only formats the msg only up to the first decimal: https://github.com/litl/backoff/blob/d82b23c42d7a7e2402903e71e7a7f03014a00076/backoff/_common.py#L96

the library itself provides a constant generator which can be initialized via: https://github.com/litl/backoff/blob/d82b23c42d7a7e2402903e71e7a7f03014a00076/backoff/_decorator.py#L137 which feels more transparent than this lambda

you don't mention anywhere why a constant fast delay is preferred over an exponential one in real world scenarios; sure, exponential may result in more wait time, but could be more resilient as a trade off?

Lambda is a way to pass around a generator: the code expects to find a callable which would return a generator, so I hammered out a callable. At that point I was already pretty upset by interactions between UTs and elastic requests and missed the part where arguments to the first argument are supposed to be passed through the last argument. It does look marginally cleaner. The nature of magic constants is arbitrary -- I needed some long enough generator. I should have used repeat there right away. As for the delay it looks like it is enough to just re-issue a request, we are dealing with GH here and my educated guess is that we'll be hitting a different server even if there were no delay at all. 0.01 is a way of expressing that "right away" is good enough. This is the reason why I believe that exponential is an overkill. I'll add a note to commit message if I keep this commit.

tests/unit/conftest.py

cachi2/core/package_managers/gomod.py

noxfile.py

tests/unit/package_managers/test_gomod.py

tests/unit/models/test_sbom.py

Sometimes an active DNF server would respond with error. This will be treated as a RuntimeError and result in a test failure when in practice it is enough to wait a little and then try again. This change adds a clause to retry after a ConnectionError instead of just failing on any RuntimeError. Any other RuntimeErrors are still considered failures. Signed-off-by: Alexey Ovchinnikov <[email protected]>

a-ovchinnikov

The current implementation of backoffs does not play well with UTs and solves a local problem. I will drop this commit in the next revision and will try addressing it separately at some other point.

For the record, I tried using pytest-retry and mark all tests as flaky (which, admittedly, is a wrong way of doing it), but that, in turn, did not play well with our fixtures.

tests/integration/conftest.py

a-ovchinnikov · 2025-01-30T14:56:54Z

cachi2/core/package_managers/general.py

@@ -24,6 +25,8 @@

 log = logging.getLogger(__name__)

+fast_constant_delay = lambda: (0.01 for _ in range(10**4))  # noqa: E731


Lambda is a way to pass around a generator: the code expects to find a callable which would return a generator, so I hammered out a callable. At that point I was already pretty upset by interactions between UTs and elastic requests and missed the part where arguments to the first argument are supposed to be passed through the last argument. It does look marginally cleaner. The nature of magic constants is arbitrary -- I needed some long enough generator. I should have used repeat there right away. As for the delay it looks like it is enough to just re-issue a request, we are dealing with GH here and my educated guess is that we'll be hitting a different server even if there were no delay at all. 0.01 is a way of expressing that "right away" is good enough. This is the reason why I believe that exponential is an overkill. I'll add a note to commit message if I keep this commit.

cachi2/core/package_managers/gomod.py

noxfile.py

tests/unit/package_managers/test_gomod.py

tests/unit/conftest.py

tests/unit/models/test_sbom.py

In Python 3.14 'filter' argument to TarFile.extractall (introduced in 3.12) becomes mandatory. Until then it defaults to `None` and triggers a warning. This change states that we are OK with continuing extracting as before 3.12. Signed-off-by: Alexey Ovchinnikov <[email protected]>

This prevents pydantic from complaining about potentially unserializable object. Signed-off-by: Alexey Ovchinnikov <[email protected]>

slimreaper35

Approved, the changes in the PR shrank quite a lot since the last time I checked on it.

tests/integration/conftest.py

a-ovchinnikov force-pushed the integration_tests_experiments_27012025 branch from d90b98d to 919ebca Compare January 29, 2025 22:57

github-advanced-security bot found potential problems Jan 29, 2025

View reviewed changes

tests/unit/conftest.py Dismissed Show resolved Hide resolved

tests/unit/conftest.py Dismissed Show dismissed Hide dismissed

tests/unit/test_scm.py Dismissed Show dismissed Hide dismissed

tests/unit/test_scm.py Dismissed Show dismissed Hide dismissed

eskultety reviewed Jan 30, 2025

View reviewed changes

a-ovchinnikov commented Jan 30, 2025

View reviewed changes

a-ovchinnikov added 2 commits January 30, 2025 14:38

chore: Making SPDXRelation in tests explicit

6f523fe

This prevents pydantic from complaining about potentially unserializable object. Signed-off-by: Alexey Ovchinnikov <[email protected]>

a-ovchinnikov force-pushed the integration_tests_experiments_27012025 branch from 919ebca to 6f523fe Compare January 30, 2025 20:38

slimreaper35 approved these changes Jan 31, 2025

View reviewed changes

tests/integration/conftest.py Show resolved Hide resolved

eskultety approved these changes Jan 31, 2025

View reviewed changes

a-ovchinnikov added this pull request to the merge queue Jan 31, 2025

Merged via the queue into hermetoproject:main with commit 45611bb Jan 31, 2025
15 of 16 checks passed

a-ovchinnikov deleted the integration_tests_experiments_27012025 branch January 31, 2025 15:39

slimreaper35 mentioned this pull request Jan 31, 2025

Skip test filtering when using pytest -k option #811

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some changes to make local IT runs more stable #802

Some changes to make local IT runs more stable #802

a-ovchinnikov commented Jan 29, 2025

eskultety left a comment

eskultety Jan 30, 2025

a-ovchinnikov Jan 30, 2025

a-ovchinnikov left a comment

a-ovchinnikov Jan 30, 2025

slimreaper35 left a comment

		@@ -24,6 +25,8 @@

		log = logging.getLogger(__name__)

		fast_constant_delay = lambda: (0.01 for _ in range(10**4)) # noqa: E731

Some changes to make local IT runs more stable #802

Some changes to make local IT runs more stable #802

Conversation

a-ovchinnikov commented Jan 29, 2025

Maintainers will complete the following section

eskultety left a comment

Choose a reason for hiding this comment

eskultety Jan 30, 2025

Choose a reason for hiding this comment

a-ovchinnikov Jan 30, 2025

Choose a reason for hiding this comment

a-ovchinnikov left a comment

Choose a reason for hiding this comment

a-ovchinnikov Jan 30, 2025

Choose a reason for hiding this comment

slimreaper35 left a comment

Choose a reason for hiding this comment