-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeming unnecessary CGO depdenency in nvmlreceiver #180
Comments
Looping in @suffiank for implementation details here. |
Any thoughts? This has come up in discussion again on a slack group. |
Hi @Dylan-M, Suffian is quite busy so I took a look. I'm not the primary point of contact for GPU-related stuff so this is best effort, but this is what I found. I think getting rid of the C code would be good. The library does look like it will do what that little C++ load generator does. This would get rid of C++ code in this codebase which would be cleaner. However, this is unfortunately not the only As such I don't think we're going to be able to prioritize this any time soon, but we welcome a PR to make that change that gets rid of C++ in this codebase. Thank you for raising the issue! |
Thank you for having a look, a coworker also pointed out what you said about linking to that external library; which I had forgotten to look at external deps. Still a change that is worth making, however I fully agree with your assessment and prioritization. |
In Ops Agent we run a demo app from the CUDA SDK: https://github.com/GoogleCloudPlatform/ops-agent/blob/master/integration_test/third_party_apps_data/applications/nvml/exercise But as @braydonk mentions, go-nvml requires cgo for everything, so we can't get GPU metrics without it. |
Currently, the nvmlreceiver is using a single dependency on a C library here:
https://github.com/GoogleCloudPlatform/opentelemetry-operations-collector/blob/master/receiver/nvmlreceiver/testcudakernel/test_cuda_kernel.cc
Based on my review of, but with limited understanding of the Nvidia libraries in general, this could be replaced with a native golang library.
From: https://docs.nvidia.com/cuda/cublas/index.html#id92
From: https://pkg.go.dev/gorgonia.org/cu/blas#Standard.Dgemm
These function definitions seem to be identical, with the exception of the cublasXtHandle. That handle wouldn't be needed in the pure golang implementation; so it makes sense for it to be missing.
If I am correct, this would be a nice change. Anything that can strip a collector down to a pure golang implementation is good in my book.
If I am incorrect in my belief that these libraries are functionally the same, I'm happy to be corrected on it.
The text was updated successfully, but these errors were encountered: