-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip dependencies in SOK installation #1138
Conversation
Documentation preview |
@@ -11,6 +11,6 @@ rm -rf hugectr/ | |||
git clone https://github.com/NVIDIA-Merlin/HugeCTR.git hugectr | |||
|
|||
cd hugectr/sparse_operation_kit/ | |||
python setup.py install | |||
python setup.py develop --no-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if horovod is installed already in the container, I wonder why this is trying to install horovod again. does it require a higher version that what we have already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't accurate when I said it's trying to install horovod again. setup.py
detects that horovod is already installed from site_packages
and skips installing the core horovod library it doesn't re-install horovod
per se. But it does install some entrypoint scripts:
Installing horovodrun script to /home/gha-user/gha3/models/models/.tox/py38-multi-gpu/bin
Installing mpiexec script to /home/gha-user/gha3/models/models/.tox/py38-multi-gpu/bin
Installing mpirun script to /home/gha-user/gha3/models/models/.tox/py38-multi-gpu/bin
Installing ompi_info script to /home/gha-user/gha3/models/models/.tox/py38-multi-gpu/bin
Installing orted script to /home/gha-user/gha3/models/models/.tox/py38-multi-gpu/bin
Installing orterun script to /home/gha-user/gha3/models/models/.tox/py38-multi-gpu/bin
These are all installed in the tox environment .tox/py38-multi-gpu/bin
, and horovod
being from site_packages
(i.e., /usr/local/bin
) causes the issue. So, this issue is due to tox and mixing external commands inside the tox environment, and this shouldn't happen in normal environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation @edknv
gpu-multi / tensorflow (pull_request)
in the CI fails with:in #1110 for example, because SOK requires horovod and reinstalls it, overwriting the horovod installation in the ci-runner. This PR fixes that by adding
--no-deps
so that SOK does not install dependencies. We runsetup.py
in development mode becausepython install setup.py
does not support--no-deps
.