Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

powerai - which package messes up mpicxx and how to fix it? #131

Open
den-run-ai opened this issue Oct 8, 2019 · 3 comments
Open

powerai - which package messes up mpicxx and how to fix it? #131

den-run-ai opened this issue Oct 8, 2019 · 3 comments

Comments

@den-run-ai
Copy link

den-run-ai commented Oct 8, 2019

I frequently get this issue either due to spectrum-mpi or gcc conda package "leftovers" in powerai environment. When uninstalling these packages some mpi commands still point to the conda environment. How do I fix this or at least troubleshoot? Why MPI variables are modified?

For example:

module load gcc ompi
conda activate powerai.1.6.1
mpicxx -show
Cannot open configuration file /users/da/powerai/powerai.1.6.1/share/openmpi/mpicxx-wrapper-data.txt
Error parsing data file mpicxx: Not found
@hartb
Copy link
Member

hartb commented Oct 8, 2019

The MPI packages included in PowerAI / WML CE provide activation / deactivation scripts that (intend to) set MPI_ROOT and OPAL_PREFIX when the conda environment is activated, and to restore those variables to their original values when the environment is deactivated.

Unfortunately, that's less straightforward than it sounds due to various edge cases (nesting of activations, unbalanced activate/deactivate calls when packages are installed, and others). We're looking at that code now to see if we can improve the behavior.

In the short-term, though, it's those 2 MPI_ROOT and OPAL_PREFIX variables that you'll want to watch across activate / deactivate / install / uninstall.

@den-run-ai
Copy link
Author

@hartb how about resetting $PATH and $LD_LIBRARY_PATH?

IMO, conda remove spectrum-mpi should unset all environment variables for MPI.

@hartb
Copy link
Member

hartb commented Oct 9, 2019

I don't see anything in WML CE that adjusts $PATH, though conda itself adjusts it during activate / deactivate.

There are a couple of WML CE packages that adjust $LD_LIBRARY_PATH: cudatoolkit and tensorflow. Those aren't setting things up for MPI, but are touching the library path. (Incidentally, you can find all the scripts that run during activate / deactivate in $CONDA_PREFIX/etc/conda/{de}activate.d/.)

We agree that removing a package (or just deactivating the environment the package was installed in ) should undo any environment changes the package had made.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants