KeyError: 'PMI_RANK' #6

qianglan · 2017-03-22T14:38:11Z

Hi , After I installed tensorflow-allreduce, I tried to run the allreduce-test.py , below is the command and outputs:
$srun --ntasks=1 python allreduce-test.py --train-data train.txt --validation-data valid.txt --vocab vocab.txt --vocab-size 5 --batch-size 64 --max-iterations 10
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "allreduce-test.py", line 93, in
MPI_RANK = int(os.environ["PMI_RANK"])
File "/home/deng/anaconda/lib/python2.7/UserDict.py", line 23, in getitem
raise KeyError(key)
KeyError: 'PMI_RANK'

It seems there is no such PMI_RANK environments value. So how should I solve this? Thanks

t-brito · 2017-04-13T13:45:29Z

I had the same issue, although I was running mpirun directly rather than Slurm's srun

CUDA_VISIBLE_DEVICES=0 mpirun \
    python allreduce-test.py --train-data train.txt \
        --validation-data train.txt --vocab vocab.txt \
        --vocab-size 10000 --batch-size 32 \
        --max-iterations 10000

I solved it by replacing the PMI/Slurm environment variables with OpenMPI ones in the file allreduce-test.py:

Before:

MPI_RANK = int(os.environ["PMI_RANK"])
MPI_LOCAL_RANK = int(os.environ["SLURM_LOCALID"])
MPI_SIZE = int(os.environ["PMI_SIZE"])

After:

MPI_RANK = int(os.environ["OMPI_COMM_WORLD_RANK"])
MPI_LOCAL_RANK = int(os.environ["OMPI_COMM_WORLD_LOCAL_RANK"])
MPI_SIZE = int(os.environ["OMPI_COMM_WORLD_SIZE"])

Found them here:
https://www.open-mpi.org/faq/?category=running#mpi-environmental-variables

birdapple · 2017-04-15T21:03:57Z

@t-brito Can you give your testbed environment ? like OS version, openmpi version ? is the special build option required for openmpi, python version, tensorflow version, cuda version and cudnn version .

I tried the code but fail with multiple different errors

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'PMI_RANK' #6

KeyError: 'PMI_RANK' #6

qianglan commented Mar 22, 2017

t-brito commented Apr 13, 2017 •

edited

Loading

birdapple commented Apr 15, 2017

KeyError: 'PMI_RANK' #6

KeyError: 'PMI_RANK' #6

Comments

qianglan commented Mar 22, 2017

t-brito commented Apr 13, 2017 • edited Loading

birdapple commented Apr 15, 2017

t-brito commented Apr 13, 2017 •

edited

Loading