Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of entry point for debugging failed CI runs #3137

Open
pshriwise opened this issue Sep 20, 2024 · 0 comments · May be fixed by #3138
Open

Addition of entry point for debugging failed CI runs #3137

pshriwise opened this issue Sep 20, 2024 · 0 comments · May be fixed by #3138

Comments

@pshriwise
Copy link
Contributor

Description

There are a number of new (and seasoned) contributors who encounter failures in CI that cannot be produced on their local machine. The causes for this are numerous (software stack versions, MPI implementations, number of thread, data setttings, compilation settings, etc.)

While this is not a problem that's unique to OpenMC, it can make tracking these issues down painful all the same. This PR adds the tmate action to our CI. This action provides an ssh command to the CI machine where issues with installation and testing can be reproduced and debugged in the same environment.

One downside is that failure notifications won't come in as quickly because tmate keep the runner active to provide the user a chance to log into the machine. This also means higher GHA usage per job on failures, but IMO we'd see less usage overall by avoiding commits to PRs that are guessing at how to fix an issue. We can also set the tmate session timeout to something reasonable like 15 minutes (the default is 45 I believe).

There are other strategies for enabling tmate on a job as well, such as only generating a connection if workflow_dispatch is true or running in detacthed mode where a connection is created by default and remains open at the end of the action.

Alternatives

  • Do nothing
  • Use another system for providing access to the CI runners
  • Provide other methods for debugging CI-specific failures (artifacts or images perhaps).

Compatibility

N/A

@pshriwise pshriwise linked a pull request Sep 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant