Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find GPU on Windows #37

Open
sarthakpati opened this issue Jul 28, 2020 · 11 comments
Open

Unable to find GPU on Windows #37

sarthakpati opened this issue Jul 28, 2020 · 11 comments

Comments

@sarthakpati
Copy link

Hi,

I'd like to thank and commend you on putting this together!

I am running Windows and this is my output of nvidia-smi:

(base) PS C:\Users\sarth> nvidia-smi.exe
Tue Jul 28 16:16:35 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 451.77       Driver Version: 451.77       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   46C    P8     7W /  N/A |   4402MiB /  8192MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+

But, I am not able to detect the GPU from the GPUtil:

>>> os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
>>> GPUtil.getAvailable()
[]
>>> GPUtil.__version__
'1.4.0'

Is there something extra I need to add in the python code to get this working?

Thanks!

@jamieslyman
Copy link

On my machine, the executable existed in System32, but the NVSMI folder did not exist in Program Files\NVIDIA Corporation and spawn couldn't find the executable, despite it being in System32.
I fixed it by copying the NVSMI folder from an old install that had it.
If you can find them in your System32, the files are MCU.exe, nvdebugdump.exe, nvidia-smi.1.pdf, nvidia-smi.exe, and nvml.dll
I don't know what files are required but I'd guess the absolute minimum would be nvidia-smi.exe and nvml.dll.

@gchennell
Copy link

Copied just the nvidia-smi.exe and nvml.dll and it seems to have resolved the lack of stats - thanks @wdcook4

@sarthakpati
Copy link
Author

Copied just the nvidia-smi.exe and nvml.dll and it seems to have resolved the lack of stats - thanks @wdcook4

Copied them to where?

@gchennell
Copy link

I had an empty nvsmi folder which I copied these in to. I think if you don't have the nvsmi folder just create it and see...

@sarthakpati
Copy link
Author

Is it at a particular location relative to the module or should it be in the PATH?

@gchennell
Copy link

gchennell commented Feb 8, 2021 via email

@sarthakpati
Copy link
Author

Right, thank you. I already have that present, so unsure what's going on...

@Apfelkuchenbemme
Copy link

Little late to the party but this probably has nothing to do with nividia-smi not being found, though just setting it to nvidia_smi = "%s\\Program Files\\NVIDIA Corporation\\NVSMI\\nvidia-smi.exe" % os.environ['systemdrive'] without a check if that file exists is pretty wild, not to mention handling the exception at the Popen call by silently returning an empty list.

Instead, you have to take a very, verrryy close look at your output for (base) PS C:\Users\sarth> nvidia-smi.exe, where you'll find that you used 4402MiB out of 8192MiB, which means at the time of testing, you used 4402/8192 ~ 53.7% of available memory.

The getAvailable()-function itself first calls getGPUs() with the aforementioned "interesting" solution to spawn not finding nvidia-smi. Then, in line 143, it calls getAvailability(..) to which we pass the default arguments you called getAvailable() with.

One of these arguments is maxMemory=0.5. You can probably see where this is going already, but with your aforementioned 4402MiB of used memory out of 8192MiB total memory, you're just above the 50% memory used threshold, which means line 177 puts a 0 for the "availability" of your RTX2080, thus getAvailable() removes it from the list, ultimately returning an empty list.

@rayanelahmadi
Copy link

Right, thank you. I already have that present, so unsure what's going on...

Did you ever figure out the issue? I have the same problem.

@sarthakpati
Copy link
Author

Unfortunately, not. Since I am using PyTorch for most of my work, I am no longer reliant on this specific library (docs for PT Cuda are here).

If you do manage to figure out the issue, please do LMK so that I can use it for some of my non-PT work. 👍🏽

@rayanelahmadi
Copy link

Unfortunately, not. Since I am using PyTorch for most of my work, I am no longer reliant on this specific library (docs for PT Cuda are here).

If you do manage to figure out the issue, please do LMK so that I can use it for some of my non-PT work. 👍🏽

So, I actually figured out my issue. I had an old, 32-bit version of Python installed and so I went ahead and uninstalled it, installing a newer 64-bit version (Python 3.12 64-bit). I then re-installed GPUtil and tried everything again (I think it also made me pip install setuptools). Everything now works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants