-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move QPY tests to GitHub Actions and increase inter-symengine tests #13273
Conversation
One or more of the following people are relevant to this code:
|
This comment was marked as outdated.
This comment was marked as outdated.
Pull Request Test Coverage Report for Build 11574466833Details
💛 - Coveralls |
This comment was marked as outdated.
This comment was marked as outdated.
The cache size appears to be 200kB, which is slightly questionable - that's near exactly the size of a single set of QPY files. That said, the cache action is compressing them, and there are a few places in the QPY files that will include some random bytes between versions because of randomly generated |
I tested the caching for new PRs on my fork, and verified that it correctly restores the cache even for the initial "open a new PR" event, if the PR doesn't modify the QPY-compatibility test directory. Building a single dev-version wheel at the top of the file means that the QPY backwards-compatibility job now takes only five minutes on a cache hit (e.g. https://github.com/jakelishman/qiskit-terra/actions/runs/11178693927/job/31076786343) despite now building an extra venv and running three more compatibility test, and it runs in parallel to all other jobs, whereas previously it was often on the critical path. It takes about 15 minutes on a complete cache miss (the same as Azure, give or take), but the cache misses should now be much rarer, which should help a lot for the throughput of backports and releases. This is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM, I like the changes to the run_tests.sh file so that we only build the wheel once and reuse it for all the venvs. In the original version I reused the dev venv to avoid building a second venv for symengine testing with the same version but that minimizes the overhead and makes everything more explicit which is nice.
I just had one inline question about the hashing key and when the hashing is evaluated.
# The hashing is this key can be too eager to invalidate the cache, | ||
# but since we risk the QPY tests failing to update if they're not in | ||
# sync, it's better safe than sorry. | ||
key: qpy-${{ hashFiles('test/qpy_compat/**') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this hash only on the files in the checkout from git, or will it include all the qpy files we generate during the run? I can't remember if the hashing is only performed at the start of the job or the end of the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got this right (I think I did), the hash is calculated only once on the initial git checkout (so doesn't include the QPY files), but when pushing back to the cache, it will include the generated files, so subsequent lookups will retrieve them. It's important that the QPY files aren't part of the hash key because they're not fully deterministic - they include randomly generated UUID payloads in some of the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah that's why I was asking, especially because the docs aren't clear: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#hashfiles but we can keep an eye on it post merge and adjust if it's a problem. The only option I can think of is manually listing out the files in the tree we care about tying to the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back before I opened this PR I checked it in a bunch of configurations on my fork, and the caching all seemed to be working the way I expected/hoped.
Yeah, the new form will almost certainly duplicate one of the venvs we use (depends on the array of symengine specifiers we choose), but once we've got the Qiskit wheel pre-built, it ends up taking very little time to build, and it was easier to write and edit the loops that way. |
This commit has two major goals: - fix the caching of the QPY files for both the `main` and `stable/*` branches - increase the number of compatibility tests between the different symengine versions that might be involved in the generation and loading of the QPY files. Achieving both of these goals also means that it is sensible to move the job to GitHub Actions at the same time, since it will put more pressure on the Azure machine concurrency we use. Caching ------- The previous QPY tests attempted to cache the generated files for each historical version of Qiskit, but this was unreliable. The cache never seemed to hit on backport branches, which was a huge slowdown in the critical path to getting releases out. The cache restore keys were also a bit lax, meaning that we might accidentally have invalidated files in the cache by changing what we wanted to test, but the restore keys wouldn't have changed. The cache files would fail to restore as a side-effect of ed79d42 (Qiskitgh-11526); QPY was moved to be on the tail end of the lint run, rather than in a test run. This meant that it was no longer run as part of the push event when updating `main` or one of the `stable/*` branches. In Azure (and GitHub Actions), the "cache" action accesses a _scoped_ cache, not a universal one for the repository [^1][^2]. Approximately, base branches each have their own scope, and PR events open a new scope that is a child of the target branch, the default branch, and the source branch, if appropriate. A cache task can read from any of its parent scopes, but write events go to the most local scope. This means that we haven't been writing to long-standing caches for some time now. PRs would typically miss the cache on the first attempt, hit their cache for updates, then miss again once entering the merge queue. The fix for this is to run the QPY job on branch-update events as well. The post-job cache action will then write out to a reachable cache for all following events. Cross-symengine tests --------------------- We previously were just running a single test with differing versions of symengine between the loading and generation of the QPY files. This refactors the QPY `run_tests.sh` script to run a full pairwise matrix of compatibility tests, to increase the coverage. [^1]: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [^2]: https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security
0d8f905
to
6ccb14f
Compare
Rebased over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTm, thanks for doing this it should improve ci throughput quite a bit
…13273) This commit has two major goals: - fix the caching of the QPY files for both the `main` and `stable/*` branches - increase the number of compatibility tests between the different symengine versions that might be involved in the generation and loading of the QPY files. Achieving both of these goals also means that it is sensible to move the job to GitHub Actions at the same time, since it will put more pressure on the Azure machine concurrency we use. Caching ------- The previous QPY tests attempted to cache the generated files for each historical version of Qiskit, but this was unreliable. The cache never seemed to hit on backport branches, which was a huge slowdown in the critical path to getting releases out. The cache restore keys were also a bit lax, meaning that we might accidentally have invalidated files in the cache by changing what we wanted to test, but the restore keys wouldn't have changed. The cache files would fail to restore as a side-effect of ed79d42 (gh-11526); QPY was moved to be on the tail end of the lint run, rather than in a test run. This meant that it was no longer run as part of the push event when updating `main` or one of the `stable/*` branches. In Azure (and GitHub Actions), the "cache" action accesses a _scoped_ cache, not a universal one for the repository [^1][^2]. Approximately, base branches each have their own scope, and PR events open a new scope that is a child of the target branch, the default branch, and the source branch, if appropriate. A cache task can read from any of its parent scopes, but write events go to the most local scope. This means that we haven't been writing to long-standing caches for some time now. PRs would typically miss the cache on the first attempt, hit their cache for updates, then miss again once entering the merge queue. The fix for this is to run the QPY job on branch-update events as well. The post-job cache action will then write out to a reachable cache for all following events. Cross-symengine tests --------------------- We previously were just running a single test with differing versions of symengine between the loading and generation of the QPY files. This refactors the QPY `run_tests.sh` script to run a full pairwise matrix of compatibility tests, to increase the coverage. [^1]: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [^2]: https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security (cherry picked from commit af8be25)
…13273) (#13380) This commit has two major goals: - fix the caching of the QPY files for both the `main` and `stable/*` branches - increase the number of compatibility tests between the different symengine versions that might be involved in the generation and loading of the QPY files. Achieving both of these goals also means that it is sensible to move the job to GitHub Actions at the same time, since it will put more pressure on the Azure machine concurrency we use. Caching ------- The previous QPY tests attempted to cache the generated files for each historical version of Qiskit, but this was unreliable. The cache never seemed to hit on backport branches, which was a huge slowdown in the critical path to getting releases out. The cache restore keys were also a bit lax, meaning that we might accidentally have invalidated files in the cache by changing what we wanted to test, but the restore keys wouldn't have changed. The cache files would fail to restore as a side-effect of ed79d42 (gh-11526); QPY was moved to be on the tail end of the lint run, rather than in a test run. This meant that it was no longer run as part of the push event when updating `main` or one of the `stable/*` branches. In Azure (and GitHub Actions), the "cache" action accesses a _scoped_ cache, not a universal one for the repository [^1][^2]. Approximately, base branches each have their own scope, and PR events open a new scope that is a child of the target branch, the default branch, and the source branch, if appropriate. A cache task can read from any of its parent scopes, but write events go to the most local scope. This means that we haven't been writing to long-standing caches for some time now. PRs would typically miss the cache on the first attempt, hit their cache for updates, then miss again once entering the merge queue. The fix for this is to run the QPY job on branch-update events as well. The post-job cache action will then write out to a reachable cache for all following events. Cross-symengine tests --------------------- We previously were just running a single test with differing versions of symengine between the loading and generation of the QPY files. This refactors the QPY `run_tests.sh` script to run a full pairwise matrix of compatibility tests, to increase the coverage. [^1]: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [^2]: https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security (cherry picked from commit af8be25) Co-authored-by: Jake Lishman <[email protected]>
Summary
This commit has two major goals:
fix the caching of the QPY files for both the
main
andstable/*
branchesincrease the number of compatibility tests between the different symengine versions that might be involved in the generation and loading of the QPY files.
Achieving both of these goals also means that it is sensible to move the job to GitHub Actions at the same time, since it will put more pressure on the Azure machine concurrency we use.
Caching
The previous QPY tests attempted to cache the generated files for each historical version of Qiskit, but this was unreliable. The cache never seemed to hit on backport branches, which was a huge slowdown in the critical path to getting releases out. The cache restore keys were also a bit lax, meaning that we might accidentally have invalidated files in the cache by changing what we wanted to test, but the restore keys wouldn't have changed.
The cache files would fail to restore as a side-effect of ed79d42 (gh-11526); QPY was moved to be on the tail end of the lint run, rather than in a test run. This meant that it was no longer run as part of the push event when updating
main
or one of thestable/*
branches. In Azure (and GitHub Actions), the "cache" action accesses a scoped cache, not a universal one for the repository 12. Approximately, base branches each have their own scope, and PR events open a new scope that is a child of the target branch, the default branch, and the source branch, if appropriate. A cache task can read from any of its parent scopes, but write events go to the most local scope. This means that we haven't been writing to long-standing caches for some time now. PRs would typically miss the cache on the first attempt, hit their cache for updates, then miss again once entering the merge queue.The fix for this is to run the QPY job on branch-update events as well. The post-job cache action will then write out to a reachable cache for all following events.
Cross-symengine tests
We previously were just running a single test with differing versions of symengine between the loading and generation of the QPY files. This refactors the QPY
run_tests.sh
script to run a full pairwise matrix of compatibility tests, to increase the coverage.Details and comments
This is a CI change, so the chance I got it right first time is approximately zero. Just need a PR to start testing it. We'll need to update the branch-protection rules if we decide to merge this.
Footnotes
https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache ↩
https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops#cache-isolation-and-security ↩