You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
raise an error samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS when running /gnn_lab/example/samgraph/train_gcn.pyon papers100M dataset
#15
Open
weihai-98 opened this issue
Sep 18, 2023
· 6 comments
Hey, I got an error samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS when running /gnn_lab/example/samgraph/train_gcn.pyon papers100M dataset, however, this python script runs successfully on other datasets such as ognb_products and reddit. How can I fix it? Looking forward to your help. Thanks!
The text was updated successfully, but these errors were encountered:
I fix this bug when decreasing the cache_percentage from 0.21 to 0.001. However I am curious on how to realize the cache ratio of 0.21 reported in your paper? Increasing the pin-memory limit or others?
Can you provide more information about your setup? E.g., the number of GPUs, GPU memory, batch size, the script and parameters you use, how you generate the dataset.
Yeah, I use 2 3090 GPUs with 24GB device memory, 1 for sample and 1 for train, the training batch size is 8000, hidden dim is 64 and I generate the dataset as the code(gnn_lab/utility/data-process/dataset/papers100M.ipynb) you provided. The script I run is /gnn_lab/example/samgraph/train_gcn.py.
Then 24GB memory should be enough for 21% cache rate, since the entire feature is around 55GB. We need more information to indentify the root cause, e.g., the call stack when the error raises. You may sleep several seconds right after the trainer process launches, print pid, and attach GDB to it.
Hey, I got an error
samgraph/commonn/cpu/cpu_device.cc:39 Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: os call failed or operation not supported on this OS
when running/gnn_lab/example/samgraph/train_gcn.py
on papers100M dataset, however, this python script runs successfully on other datasets such as ognb_products and reddit. How can I fix it? Looking forward to your help. Thanks!The text was updated successfully, but these errors were encountered: