Skip to content

CI: Run some tests with compute-sanitizer #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 29, 2025

Conversation

carterbox
Copy link
Contributor

@carterbox carterbox commented Apr 22, 2025

Description

Runs python 3.12 pytests in the context of compute-sanitizer to check for memory issues and errors from the CUDA API.

closes #565
closes #562

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Apr 22, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang
Copy link
Member

leofang commented Apr 22, 2025

FYI right now we use the mini-CTK approach in the CI:

So compute-sanitizer is currently not available in the CI. But I assume it can be grabbed easily.

@cryos is refactoring our CI (#555). I suggest we perhaps add another standalone pipeline for running compute sanitizer?

@leofang leofang requested review from leofang and cryos April 22, 2025 19:25
@leofang leofang added P1 Medium priority - Should do CI/CD CI/CD infrastructure labels Apr 22, 2025
@cryos
Copy link
Collaborator

cryos commented Apr 22, 2025

I was going to look at tools like this next, that is a great point and something I can factor in. Looking at the proposal here picking out a test run would be reasonable, I know there are other tools we would like to run too.

@carterbox
Copy link
Contributor Author

/ok to test f27e1f4

@leofang
Copy link
Member

leofang commented Apr 22, 2025

I doubt commit f27e1f4 would work -- we'll see: #571.

This comment has been minimized.

@carterbox
Copy link
Contributor Author

/ok to test 05a7068

@cryos cryos linked an issue Apr 23, 2025 that may be closed by this pull request
@cryos cryos removed a link to an issue Apr 23, 2025
@carterbox
Copy link
Contributor Author

/ok to test dde857b

@carterbox
Copy link
Contributor Author

carterbox commented Apr 23, 2025

Yay! The linux-64 tests are failing for the correct reason! (the compute sanitizer returns non-zero because it has detected issues).

https://github.com/NVIDIA/cuda-python/actions/runs/14628267586/job/41045781037?pr=566

Windows tests are failing because I have disabled them partially.

@carterbox
Copy link
Contributor Author

/ok to test a1ea51e

@carterbox carterbox force-pushed the dching/add-compute-sanitizer-to-ci branch from a1ea51e to 0430930 Compare April 24, 2025 20:26
@carterbox
Copy link
Contributor Author

/ok to test 0430930

@leofang leofang added the cuda.bindings Everything related to the cuda.bindings module label Apr 25, 2025
@carterbox
Copy link
Contributor Author

/ok to test 9c25910

@carterbox
Copy link
Contributor Author

/ok to test 7fb013e

@carterbox carterbox marked this pull request as ready for review April 25, 2025 22:48
Copy link
Contributor

copy-pr-bot bot commented Apr 25, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@carterbox carterbox requested a review from leofang April 25, 2025 22:48
@carterbox
Copy link
Contributor Author

/ok to test 675c41f

@carterbox carterbox requested review from leofang and rwgk April 28, 2025 16:30
@carterbox
Copy link
Contributor Author

/ok to test 5abd6e3

rwgk
rwgk previously approved these changes Apr 28, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'm really happy that we'll be routinely testing with the compute sanitizer.

@carterbox
Copy link
Contributor Author

/ok to test 53a01cb

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carterbox!

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat, thanks!

@cryos are you OK if we merge this now, or if you'd rather wait until we merge the CI refactoring (#555)?

@kkraus14
Copy link
Collaborator

@cryos are you OK if we merge this now, or if you'd rather wait until we merge the CI refactoring (#555)?

Marcus is on PTO until May 5 so lets merge this and we can sync with him once he's back 😄

@leofang leofang merged commit 034ffbf into NVIDIA:main Apr 29, 2025
75 checks passed
@leofang
Copy link
Member

leofang commented Apr 29, 2025

Right he told me about it but I forgot... let's merge now.

Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD CI/CD infrastructure cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module P1 Medium priority - Should do
Projects
None yet
5 participants