You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got the repo built, along with it's dependencies. Configured slrum, and have both Intel MPI and OpenMPI installed. I used the sample lines from the README for creating train.txt and vocab.txt. CUDA 8.0 libraries are built and installed on the system. I have configured gpus in slurm gres. I also see messages showing GPUs libraries being loaded by tensorflow.
I also set --max-interations to 10 to reduce the runtime. For such a small dataset, the run should finish very fast. But it is running for days. I tried with 2 tasks and also 50 tasks. I see that many CPU cores running almost at 100%, but nothing is running on GPU.
1st question, Why is it running forever, for such a small test?
and 2nd question, why GPUs are not being used?
Thanks in advance,
Nitin
The text was updated successfully, but these errors were encountered:
First, I will give some context of my test.
I got the repo built, along with it's dependencies. Configured slrum, and have both Intel MPI and OpenMPI installed. I used the sample lines from the README for creating train.txt and vocab.txt. CUDA 8.0 libraries are built and installed on the system. I have configured gpus in slurm gres. I also see messages showing GPUs libraries being loaded by tensorflow.
I also set --max-interations to 10 to reduce the runtime. For such a small dataset, the run should finish very fast. But it is running for days. I tried with 2 tasks and also 50 tasks. I see that many CPU cores running almost at 100%, but nothing is running on GPU.
1st question, Why is it running forever, for such a small test?
and 2nd question, why GPUs are not being used?
Thanks in advance,
Nitin
The text was updated successfully, but these errors were encountered: