Skip to content

Commit

Permalink
Merge pull request argonne-lcf#317 from saforem2/main
Browse files Browse the repository at this point in the history
Update `docs/polaris/data-science-workflows/python.md`
  • Loading branch information
felker authored Jan 30, 2024
2 parents ad3d2b4 + b365c51 commit 6a52fe1
Showing 1 changed file with 79 additions and 56 deletions.
135 changes: 79 additions & 56 deletions docs/polaris/data-science-workflows/python.md
Original file line number Diff line number Diff line change
@@ -1,88 +1,111 @@
# Python

## Conda
We provide prebuilt `conda` environments containing GPU-supported builds of `torch`, `tensorflow` (both with `horovod` support for multi-node calculations), `jax`, and many other commonly-used Python modules.
We provide prebuilt `conda` environments containing GPU-supported builds of
`torch`, `tensorflow` (both with `horovod` support for multi-node
calculations), `jax`, and many other commonly-used Python modules.

Users can activate this environment by first loading the `conda` module, and then activating the base environment.
Users can activate this environment by first loading the `conda` module, and
then activating the base environment.

Explicitly (either from an interactive job, or inside a job script):

```bash
$ module load conda
$ conda activate base
(base) $ which python3
/soft/datascience/conda/2022-09-08/mconda3/bin/python3
module load conda ; conda activate base
```
In one line, `module load conda; conda activate`. This can be performed on a compute node, as well as a login node.

As of writing, the latest `conda` module on Polaris is built on Miniconda3 version 4.14.0 and contains Python 3.8.13. Future modules may contain entirely different major versions of Python, PyTorch, TensorFlow, etc.; however, the existing modules will be maintained as-is as long as feasible.
This will load and activate the base environment.

While the shared Anaconda environment encapsulated in the module contains many of the most commonly used Python libraries for our users, you may still encounter a scenario in which you need to extend the functionality of the environment (i.e. install additional packages)
## Virtual environments via `venv`

There are two different approaches that are currently recommended.
To install additional packages that are missing from the `base` environment,
we can build a `venv` on top of it.

### Virtual environments via `venv`
!!! success "Conda `base` environment + `venv`"

Creating your own (empty) virtual Python environment in a directory that is writable to you is simple:
```bash
python3 -m venv /path/to/new/virtual/environment
```
This creates a new folder that is fairly lightweight folder (<20 MB) with its own Python interpreter where you can install whatever packages you'd like. First, you must activate the virtual environment to make this Python interpreter the default interpreter in your shell session.
If you need a package that is **not** already
installed in the `base` environment,
this is generally the recommended approach.

You activate the new environment whenever you want to start using it via running the activate script in that folder:
```bash
/path/to/new/virtual/environment/bin/activate
```
We can create a `venv` on top of the base
Anaconda environment (with
`#!bash --system-site-packaes` to inherit
the `base` packaes):

```bash
module load conda; conda activate
VENV_DIR="venvs/polaris"
mkdir -p "${VENV_DIR}"
python -m venv "${VENV_DIR}" --system-site-packages
source "${VENV_DIR}/bin/activate"
```

In many cases, you do not want an empty virtual environment, but instead want to start from the `conda` base environment's installed packages, only adding and/or changing a few modules.
You can always retroactively change the `#!bash --system-site-packages` flag
state for this virtual environment by editing `#!bash ${VENV_DIR}/pyvenv.cfg` and
changing the value of the line `#!bash include-system-site-packages=false`.

To extend the base Anaconda environment with `venv` (e.g. `my_env` in the current directory) and inherit the base enviroment packages, one can use the `--system-site-packages` flag:
To install a different version of a package that is already installed in the
base environment, you can use:

```bash
module load conda; conda activate
python -m venv --system-site-packages my_env
source my_env/bin/activate
# Install additional packages here...
python3 pip install --ignore-installed <package> # or -I
```
You can always retroactively change the `--system-site-packages` flag state for this virtual environment by editing `my_env/pyvenv.cfg` and changing the value of the line `include-system-site-packages = false`.

To install a different version of a package that is already installed in the base
environment, you can use:
```
pip install --ignore-installed ... # or -I
```
The shared base environment is not writable, so it is impossible to remove or uninstall
packages from it. The packages installed with the above `pip` command should shadow those
installed in the base environment.
The shared base environment is not writable, so it is impossible to remove or
uninstall packages from it. The packages installed with the above `pip` command
should shadow those installed in the base environment.

## Cloning the base Anaconda environment

!!! warning

### Cloning the base Anaconda environment
This approach is generally not recommended as it can be quite slow and can
use significant storage space.

If you need more flexibility, you can clone the conda environment into a custom path, which would then allow for root-like installations via `conda install <module>` or `pip install <module>`. Unlike the `venv` approach, using a cloned Anaconda environment requires you to copy the entirety of the base environment, which can use significant storage space.
If you need more flexibility, you can clone the conda environment into a custom
path, which would then allow for root-like installations via `#!bash conda install
<module>` or `#!bash pip install <module>`.

This can be performed by:
Unlike the `venv` approach, using a cloned Anaconda environment requires you to
copy the entirety of the base environment, which can use significant storage
space.

To clone the `base` environment:

```bash
$ module load conda
$ conda activate base
(base) $ conda create --clone base --prefix /path/to/envs/base-clone
(base) $ conda activate /path/to/envs/base-clone
(base-clone) $ which python3
/path/to/base-clone/bin/python3
module load conda ; conda activate base
conda create --clone base --prefix /path/to/envs/base-clone
conda activate /path/to/envs/base-clone
```
The cloning process can be quite slow.

!!! warning
where, `#!bash path/to/envs/base-clone` should be replaced by a suitably chosen
path.

In the above commands, `path/to/envs/base-clone` should be replaced by a
suitably chosen path.
**Note**: The cloning process can be _quite_ slow.

### Using `pip install --user` (not recommended)
With the conda environment setup, one can install common Python modules using `pip install --users <module-name>` which will install packages in `$PYTHONUSERBASE/lib/pythonX.Y/site-packages`. The `$PYTHONUSERBASE` environment variable is automatically set when you load the base conda module, and is equal to `/home/$USER/.local/polaris/conda/YYYY-MM-DD`.
## Using `pip install --user` (not recommended)

Note, Python modules installed this way that contain command line binaries will not have those binaries automatically added to the shell's `$PATH`. To manually add the path:
```
export PATH=$PYTHONUSERBASE/bin:$PATH
!!! danger

This is typically _not_ recommended.

With the conda environment setup, one can install common Python modules using
`#!bash python3 pip install --users '<module-name>'` which will install
packages in `#!bash $PYTHONUSERBASE/lib/pythonX.Y/site-packages`.

The `#!bash $PYTHONUSERBASE` environment variable is automatically set when you
load the base conda module, and is equal to `#!bash
/home/$USER/.local/polaris/conda/YYYY-MM-DD`.

Note, Python modules installed this way that contain command line binaries will
not have those binaries automatically added to the shell's `#!bash $PATH`. To
manually add the path:

```bash
export PATH="$PYTHONUSERBASE/bin:$PATH"
```
Be sure to remove this location from `$PATH` if you deactivate the base Anaconda environment or unload the module.

Cloning the Anaconda environment, or using `venv` are both more flexible and transparent when compared to `--user` installs.
Be sure to remove this location from `#!bash $PATH` if you deactivate the base
Anaconda environment or unload the module.

Cloning the Anaconda environment, or using `venv` are both more flexible and
transparent when compared to `#!bash --user` installs.

0 comments on commit 6a52fe1

Please sign in to comment.