-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CUDA
leads to a segmentation fault
#2163
Comments
How did you install Julia? Can you post |
It usually takes me a day to install a new version because I usually run into issues, so I won't try other versions for now. I installed from the main website. |
This looks a bit like JuliaLang/julia#48360, which also happens during artifact selection.
You don't need to install anything, just download the tarball and run a single binary. I'm not sure how that takes a day? There's been many bugfixes between Julia 1.9.0 and the latest version, so we generally don't support those older versions. |
I get the same error on 1.10
|
I deleted the
|
Does it happen if you run the offending command in isolation?
If so, can you run with |
Sorry, what is the command to run? My general computer maturity is not enough to understand what to run from your previous message. |
|
That's what I get.
|
Did you do this? Also, please try again until it segfaults and upload that trace. Here, it looks like everything just worked. |
That's an internal server error, so authentication did not work. Are you sure you're logged in with GitHub and you allowed the authentication to happen? If so, we'll have to look at the server logs. I'm also adding a feature to BugReporting.jl to allow a manual upload if the automatic one fails, so please hold until that's been merged before retrying.
The command you're executing is recording a trace of what happens. Since you're not reproducing the segfault you filed this bug for, the trace will not be interesting. Does the segfault still occur when running that command without |
I ran the command from inside the project instead of the home directory. This reproduced the segfault.
|
Thanks for the trace. Which CPU did you record this on? The server I normally use to replay traces from a different CPU sadly isn't available right now, so I'd need to match that up. |
Wait, this trace just exits without an abort? That's not useful... Again, if the trace doesn't contain the segfault, it doesn't help. |
Also, if the error never reproduces under
|
My CPU is 40-core (x2 20-core) Intel Xeon E5-2698 v4 @2.2GHz It looks like the command did not run. The error is
In case it helps, I uploaded another trace by running the same command again. |
Looks like GDB is confused by how arguments are specified (or you accidentally pasted an additional newline). Try doing just
|
Here is what I get now. Thank you for not giving up on this! I appreciate your time.
|
Just to be clear, I still get a segfault when I |
Again, no segfault, so sadly not useful. I realize this may be out of your area of experience, but both rr traces and gdb logs of executions without a segfault cannot help to debug this. Maybe try re-running multiple times to catch an instance where it does segfault, and post the backtrace from there? Also make sure you're running the exact invocation that segfaults under |
Also try running (outside of gdb/rr) with |
I am also have this issue on one of my devices and running with |
Once the package has been successfully precompiled and installed, can you then import CUDA.jl without that environment variable set? |
It would also be useful if you could provide the output of |
Yes
I only get the error when running a fresh precompile. Then I can call Output:
Also notice how Output of
Does this help? |
That log doesn't contain a segfault, it only throws a Julia error that presumably comes from the fact that precompiling the CUDA runtime failed, so no that doesn't really help. I'd be interested in a LD_DEBUG log of a process segfaulting when it's using the forwards-compatibility driver (i.e. without that env var set), to confirm that we aren't accidentally loading parts of the driver from the system still, which would be a potential cause for segfaults. |
Describe the bug
I try to
add CUDA
and get a segfault.To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Expected behavior
No segfault.
Version info
Details on Julia:
Details on CUDA:
Additional context
I think something went wrong when I tried to
add CUDAdrv
. Before that CUDA was working fine. When I tried toadd CUDAdrv
, I got a segfault. Then I removed both CUDA and CUDAdrv with] rm CUDA CUDAdrv
and tried to justadd CUDA
but got a segfault again.At the same time,
CUDA
is working fine in another REPL window on the same machine.The text was updated successfully, but these errors were encountered: