Benchmarking full N-body simulation on GPU clusters #80

EiffL · 2021-05-12T11:33:55Z

This issue is to track the work on benchmarking the new Horovod backend for GPU clusters and getting profiling information for FlowPM.

We want to do the following things:

: Run profiler on https://github.com/DifferentiableUniverseInitiative/flowpm/blob/mesh_update/scripts/mesh_nbody_benchmark.py and https://github.com/DifferentiableUniverseInitiative/flowpm/blob/mesh_update/scripts/pyramid_nbody.py
: Compare the performance of both implementations as a function of the mesh size
: Identify the most costly parts of the computation

To learn how to do this profiling, keep an eye on DifferentiableUniverseInitiative/IDRIS-hackathon#2

santiagocasas · 2021-05-17T10:27:32Z

We will make a new bash profiling script based on the message sent by @EiffL on #cosmostat slack on Monday:

#!/bin/bash
#SBATCH --job-name=fft_benchmark     # nom du job
##SBATCH --partition=gpu_p2          # de-commente pour la partition gpu_p2
#SBATCH --ntasks=8                   # nombre total de tache MPI (= nombre total de GPU)
#SBATCH --ntasks-per-node=4          # nombre de tache MPI par noeud (= nombre de GPU par noeud)
#SBATCH --gres=gpu:4                 # nombre de GPU par nœud (max 8 avec gpu_p2)
#SBATCH --cpus-per-task=10           # nombre de coeurs CPU par tache (un quart du noeud ici)
##SBATCH --cpus-per-task=3           # nombre de coeurs CPU par tache (pour gpu_p2 : 1/8 du noeud)
# /!\ Attention, "multithread" fait reference a l'hyperthreading dans la terminologie Slurm
#SBATCH --hint=nomultithread         # hyperthreading desactive
#SBATCH --time=00:10:00              # temps d'execution maximum demande (HH:MM:SS)
#SBATCH --output=fft_benchmark%j.out # nom du fichier de sortie
#SBATCH --error=fft_benchmark%j.out  # nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH -A ftb@gpu                   # specify the project
#SBATCH --qos=qos_gpu-dev            # using the dev queue, as this is only for profiling
# nettoyage des modules charges en interactif et herites par defaut
module purge
# chargement des modules
module load tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1
# echo des commandes lancees
set -x
# JZ FIX
export TMPDIR=$JOBSCRATCH
ln -s $JOBSCRATCH /tmp/nvidia
# execution du code avec binding via bind_gpu.sh : 1 GPU pour 1 tache MPI.
srun --unbuffered --mpi=pmi2 -o fft_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o result-%q{SLURM_TASK_PID} python -u fft_benchmark.py --mesh_shape="b1:2,b2:4" -
-layout="nx:b1,tny:b1,ny:b2,tnz:b2"

santiagocasas · 2021-05-17T11:00:39Z

We created a new script to benchmark pyramid_nbody.py and it works.
Here the script

#!/bin/bash
#SBATCH --job-name=pyramid_benchmark   # nom du job
##SBATCH --partition=gpu_p2          # de-commente pour la partition gpu_p2
#SBATCH --ntasks=4                  # nombre total de tache MPI (= nombre total de GPU)
#SBATCH --ntasks-per-node=4          # nombre de tache MPI par noeud (= nombre de GPU par noeud)
#SBATCH --gres=gpu:4                 # nombre de GPU par nœud (max 8 avec gpu_p2)
#SBATCH --cpus-per-task=10           # nombre de coeurs CPU par tache (un quart du noeud ici)
##SBATCH --cpus-per-task=3           # nombre de coeurs CPU par tache (pour gpu_p2 : 1/8 du noeud)
# /!\ Attention, "multithread" fait reference a l'hyperthreading dans la terminologie Slurm
#SBATCH --hint=nomultithread         # hyperthreading desactive
#SBATCH --time=00:10:00              # temps d'execution maximum demande (HH:MM:SS)
#SBATCH --output=pyramid_benchmark_new_%j.out # nom du fichier de sortie
#SBATCH --error=pyramid_benchmark_new_%j.out  # nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH -A ftb@gpu                   # specify the project
#SBATCH [email protected]    # send mail to user
#SBATCH --qos=qos_gpu-dev            # using the dev queue, as this is only for profiling

# nettoyage des modules charges en interactif et herites par defaut
module purge

# chargement des modules
module load tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1

# echo des commandes lancees
set -x

# JZ FIX
export TMPDIR=$JOBSCRATCH
ln -s $JOBSCRATCH /tmp/nvidia
# execution du code avec binding via bind_gpu.sh : 1 GPU pour 1 tache MPI.

# execution du code avec binding via bind_gpu.sh : 1 GPU pour 1 tache MPI.
srun --unbuffered --mpi=pmi2 -o pyramid_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o result-pyramid-%q{SLURM_TASK_PID} python -u pyramid_nbody.py

santiagocasas · 2021-05-17T11:02:05Z

We created a new script to benchmark mesh_nbody_benchmark.py and it works.
Here the script:

#!/bin/bash
#SBATCH --job-name=nbody_benchmark   # nom du job
##SBATCH --partition=gpu_p2          # de-commente pour la partition gpu_p2
#SBATCH --ntasks=4                  # nombre total de tache MPI (= nombre total de GPU)
#SBATCH --ntasks-per-node=4          # nombre de tache MPI par noeud (= nombre de GPU par noeud)
#SBATCH --gres=gpu:4                 # nombre de GPU par nœud (max 8 avec gpu_p2)
#SBATCH --cpus-per-task=10           # nombre de coeurs CPU par tache (un quart du noeud ici)
##SBATCH --cpus-per-task=3           # nombre de coeurs CPU par tache (pour gpu_p2 : 1/8 du noeud)
# /!\ Attention, "multithread" fait reference a l'hyperthreading dans la terminologie Slurm
#SBATCH --hint=nomultithread         # hyperthreading desactive
#SBATCH --time=00:10:00              # temps d'execution maximum demande (HH:MM:SS)
#SBATCH --output=nbody_benchmark_new_%j.out # nom du fichier de sortie
#SBATCH --error=nbody_benchmark_new_%j.out  # nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH -A ftb@gpu                   # specify the project
#SBATCH --mail-user [email protected]    # send mail to user
#SBATCH --qos=qos_gpu-dev            # using the dev queue, as this is only for profiling

# nettoyage des modules charges en interactif et herites par defaut
module purge

# chargement des modules
module load tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1

# echo des commandes lancees
set -x

# JZ FIX
export TMPDIR=$JOBSCRATCH
ln -s $JOBSCRATCH /tmp/nvidia
# execution du code avec binding via bind_gpu.sh : 1 GPU pour 1 tache MPI.

# execution du code avec binding via bind_gpu.sh : 1 GPU pour 1 tache MPI.
srun --unbuffered --mpi=pmi2 -o mesh_nbody_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o result-%q{SLURM_TASK_PID} python -u mesh_nbody_benchmark.py --nc=512 --batch_size=1 --nx=4 --ny=4 --hsize=32

dlanzieri · 2021-05-19T15:10:37Z

@EiffL @santiagocasas I have obtained some .qdrep files for our benchmark. But there are several things that I still don't understand.
In the path /gpfswork/rech/ftb/ulm75uc/repo/flowpm/scripts/mesh_nbody_out_3 there is the first, not-empty .qdrep file (result-75171.qdrep ).
My first question is: what are those other empty files ?
The mesh_nbody_benchmark_1751099.out looks like :

Loading tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1
Loading requirement: gcc/8.3.1 cuda/10.2 nccl/2.8.3-1-cuda
cudnn/8.0.4.30-cuda-10.2 openmpi/4.0.2-cuda
+ export TMPDIR=/gpfsssd/jobscratch/1751099/
+ TMPDIR=/gpfsssd/jobscratch/1751099/
+ ln -s /gpfsssd/jobscratch/1751099/ /tmp/nvidia
+ srun --unbuffered --mpi=pmi2 -o mesh_nbody_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o 'result-%q{SLURM_TASK_PID}' python -u mesh_nbody_benchmark.py --nc=512 --batch_size=1 --nx=2 --ny=2 --hsize=32
srun: error: r10i0n8: task 3: Exited with exit code 255
srun: Terminating job step 1751099.0
srun: error: r10i0n8: tasks 0-2: Terminated
srun: Force Terminated job step 1751099.0

Also, ~2 h later , I made the same mesh_nbody_benchmark_idris.sh run again, I got the following mesh_nbody_benchmark_ 1753425.out, but I got also a core-mesh_nbody_benc-10550-6 file and I didn't get the empty files or the .png file.

Loading tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1
 Loading requirement: gcc/8.3.1 cuda/10.2 nccl/2.8.3-1-cuda
 cudnn/8.0.4.30-cuda-10.2 openmpi/4.0.2-cuda
+ export TMPDIR=/gpfsssd/jobscratch/1753425/
+ TMPDIR=/gpfsssd/jobscratch/1753425/
+ ln -s /gpfsssd/jobscratch/1753425/ /tmp/nvidia
+ srun --unbuffered --mpi=pmi2 -o mesh_nbody_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o 'result-%q{SLURM_TASK_PID}' python -u mesh_nbody_benchmark.py --nc=512 --batch_size=1 --nx=2 --ny=2 --hsize=32
srun: error: r11i5n7: task 3: Exited with exit code 6
srun: Terminating job step 1753425.0
srun: error: r11i5n7: tasks 0-2: Terminated
srun: Force Terminated job step 1753425.0

However, I got a result-10423.sqlite not-empty, that I opened. All the results of this run are in the path gpfswork/rech/ftb/ulm75uc/repo/flowpm/scripts/mesh_nbody_out_4.

Then, I made the pyramid_nbody_benchmark_idris.sh run twice .
The first time I mistyped the full name of the module in the pyramid_nbody_benchmark_idris.sh file, but , nevertheless,I got the result-pyramid-76305.qdrep file (that you can find in /gpfswork/rech/ftb/ulm75uc/repo/flowpm/scripts/pyramid_nbody_out_3).
After correcting the .sh file, I got the result-pyramid-49953.qdrep (that you can find here /gpfswork/rech/ftb/ulm75uc/repo/flowpm/scripts/pyramid_nbody_out_4).

The .out files are :

Loading tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1
Loading requirement: gcc/8.3.1 cuda/10.2 nccl/2.8.3-1-cuda
cudnn/8.0.4.30-cuda-10.2 openmpi/4.0.2-cuda
 
ERROR: Unable to locate a modulefile for 'nvidia-nsight-systems/2021'
+ export TMPDIR=/gpfsssd/jobscratch/1751703/
+ TMPDIR=/gpfsssd/jobscratch/1751703/
+ ln -s /gpfsssd/jobscratch/1751703/ /tmp/nvidia
+ srun --unbuffered --mpi=pmi2 -o pyramid_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o 'result-pyramid-%q{SLURM_TASK_PID}' python -u pyramid_nbody.py
srun: error: r10i0n8: task 1: Exited with exit code 9
srun: Terminating job step 1751703.0
srun: error: r10i0n8: tasks 0,2-3: Terminated
srun: Force Terminated job step 1751703.0

and

Loading tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1
 Loading requirement: gcc/8.3.1 cuda/10.2 nccl/2.8.3-1-cuda
cudnn/8.0.4.30-cuda-10.2 openmpi/4.0.2-cuda
+ export TMPDIR=/gpfsssd/jobscratch/1767083/
+ TMPDIR=/gpfsssd/jobscratch/1767083/
+ ln -s /gpfsssd/jobscratch/1767083/ /tmp/nvidia
+ srun --unbuffered --mpi=pmi2 -o pyramid_%t.log /gpfslocalsup/pub/idrtools/bind_gpu.sh nsys profile --stats=true -t nvtx,cuda,mpi -o 'result-pyramid-%q{SLURM_TASK_PID}' python -u pyramid_nbody.py
srun: error: r13i1n4: task 3: Exited with exit code 9
srun: Terminating job step 1767083.0
srun: error: r13i1n4: tasks 0-2: Terminated
srun: Force Terminated job step 1767083.0

I hope to understand:

If ad how the mistyped of the full name of the module effected the final .qdrep file
Why only 3 tasks are terminated (except in one .out file , the task 3 is not terminated)
The different exit codes
The empty files
Why I got different results with the same mesh_nbody_benchmark_idris.sh file

EiffL · 2021-05-19T16:11:38Z

ok, so first comment, to understand what's going on, you probably should look at the log files, they should be called something like mesh_nbody_%t.log because in the srun command we redirect all outputs to these logs, so you are not seeing here where the code actually crashes, and if it gives you particular errors.

Don't worry about the exit codes,they are not very important for us, and they don't tell us why the job is canceled, the log file should have more info towards the end.

A core- file is an indication of a segmentation fault that's a pretty big error happening somewhere in the code, but again, the log file should tell us more.

dlanzieri · 2021-05-19T18:03:47Z

I think I found it (mesh_nbody_3.log in /gpfswork/rech/ftb/ulm75uc/repo/flowpm/scripts/mesh_nbody_out_4)

bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-18 23:19:47.622246: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 17179869184
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Fatal Python error: Aborted

EiffL · 2021-05-19T18:56:38Z

ah that's interesting! this means that the simulation is running out of memory and dying, because not enough space on the GPU. You could try with a smaller simulation.

dlanzieri · 2021-05-19T19:09:32Z

ok, I heard that story once!

dlanzieri · 2021-05-20T17:19:54Z

Daily Report:
We made the mesh_nbody_new_benchmark_idris.job run with different configurations.
What follows is the summary of the results and the setting of our tests.

#SBATCH --ntasks=1                 
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=1      
#SBATCH --gres=gpu:1                
#SBATCH --cpus-per-task=10

with

--nc=128 --batch_size=1 --nx=1 --ny=1 --hsize=32 --nsteps=1

#SBATCH --ntasks=2                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=2      
#SBATCH --gres=gpu:2               
#SBATCH --cpus-per-task=10

with

--nc=128 --batch_size=1 --nx=2--ny=1 --hsize=32 --nsteps=1

#SBATCH --ntasks=4                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=128 --batch_size=1 --nx=4--ny=1 --hsize=32 --nsteps=1

Run time: 1:35

#SBATCH --ntasks=4                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=128 --batch_size=1 --nx=1--ny=4 --hsize=32 --nsteps=1

Run time : 0:35

#SBATCH --ntasks=4                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=128 --batch_size=1 --nx=2--ny=2 --hsize=32 --nsteps=1

Run time: 1:25

We noticed :

The artifacts arise when we start to use 4 gpu
In the configurations nx=2 ny=2 these artifacts are visible also in the initial conditions
With the same number of gpu the run time changes with different mesh configurations

Please @santiagocasas add some more If I forgot somethings or if I was wrong with some configurations descriptions!

EiffL · 2021-05-21T15:43:42Z

@b-remy in case you have not seen it, this is super useful I think, it shows you what configurations give weird results

dlanzieri · 2021-05-22T08:37:56Z

@EiffL I almost forgot, @santiagocasas and I also noticed that if we use (for instance) the following configuration:

#SBATCH --job-name=mesh_nbody_benchmark   # nom du job
##SBATCH --partition=gpu_p2          # de-commente pour la partition gpu_p2
#SBATCH --ntasks=4                  # nombre total de tache MPI (= nombre total de GPU)
#SBATCH --nodes=1                  # number of nodes
#SBATCH --ntasks-per-node=4          # nombre de tache MPI par noeud (= nombre de GPU par noeud)
#SBATCH --gres=gpu:4                 # nombre de GPU par n?~Sud (max 8 avec gpu_p2)
#SBATCH --cpus-per-task=10           # nombre de coeurs CPU par tache (un quart du noeud ici)

We find 80 CPUS (shouldn't we find 40? ):

Thu May 20 17:27:03 2021
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON) CPUS
1781699 gpu_p13 mesh_nbo ucd93sf COMPLETI 0:35 15:00 1 r11i3n2 80

andrevitorelli · 2021-05-22T23:39:14Z

@dlanzieri I've noticed that on nsys logs also, it may be the case that each core has two hardware threads, then it would look like 80 cores (just as it frequently happens when you look at htop). I don't know if it's that for sure, but it may be the case.

dlanzieri · 2021-05-25T14:21:14Z

Daily Report:
We made the mesh_nbody_new_benchmark_idris.job run 3 times with different numbers of steps and the following configurations:

#SBATCH --ntasks=4      
#SBATCH --nodes=1   
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4    
#SBATCH --cpus-per-task=10

--nc=128 --batch_size=1 --nx=2 --ny=2 --hsize=32

This is how the Timeline Views look like:

--nsteps=1

--nsteps=2

--nsteps=3

These profiles lead us to believe that the gaps we can see in the CUDA HW records are related to the number of steps in the N-body simulation.

EiffL · 2021-05-26T07:09:35Z

@dlanzieri @santiagocasas I've added support for NVTX annotations in Mesh TensorFlow, which you can use to probe the different sections of the code. To add annotations you can do the following:

Update your mesh fork
In you scripts add the following elements:

import nvtx.plugins.tf as nvtx_tf
from nvtx.plugins.tf.estimator import NVTXHook
from mesh_tensorflow.nvtx_ops import add_nvtx

[....]
# inside your mesh tensorflow model, where XXXX is a mesh tensor
XXXX = add_nvtx(XXXX, message='a message', domain_name='nbody')

# In the session run:
nvtx_callback = NVTXHook(skip_n_steps=0, name='Train')
with tf.compat.v1.train.MonitoredSession(hooks=[nvtx_callback]) as sess:

And you need to have installed pip install --user nvtx-plugins

This will add small markers at the different places of the code where the marked tensors are used.

I'm attaching a full example here: https://gist.github.com/EiffL/ae6f9d58e958e87f29c5e1bc0b11193a

dlanzieri · 2021-05-26T07:50:25Z

@EiffL do we have install it in the tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1 module?

EiffL · 2021-05-26T07:57:21Z

you just need to run pip install --user nvtx-plugins inside your tensorflow-gpu/py3/2.4.1+nccl-2.8.3-1 environment

dlanzieri · 2021-05-26T08:01:31Z

yes, I did it there.

  ERROR: Command errored out with exit status 1:
   command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup.py'"'"'; __file__='"'"'/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-0y26sca3
       cwd: /tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/
  Complete output (68 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/nvtx
  creating build/lib.linux-x86_64-3.7/nvtx/plugins
  creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/base_callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/estimator.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/ops.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/ext_utils.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/package_info.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  copying nvtx_plugins/python/nvtx/plugins/tf/keras/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  copying nvtx_plugins/python/nvtx/plugins/tf/keras/layers.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  copying nvtx_plugins/python/nvtx/plugins/tf/keras/callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  running egg_info
  writing nvtx_plugins/python/nvtx_plugins.egg-info/PKG-INFO
  writing dependency_links to nvtx_plugins/python/nvtx_plugins.egg-info/dependency_links.txt
  writing requirements to nvtx_plugins/python/nvtx_plugins.egg-info/requires.txt
  writing top-level names to nvtx_plugins/python/nvtx_plugins.egg-info/top_level.txt
  reading manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no files found matching '*.so' under directory 'nvtx_plugins/'
  warning: no files found matching '*.lds'
  writing manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
  running build_ext
  gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o
  gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.so
  gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_link_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o
  gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.so
  gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsys/cuda/10.2/include -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cuda.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.o
  gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cuda.o -L/gpfslocalsys/cuda/10.2/lib64 -lcudart -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.so
  2021-05-26 10:01:03.942109: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /gpfslocalsup/spack_soft/openmpi/4.0.2/gcc-8.3.1-pwznunop7kpwwhguizfpaefbtz32qvh6/lib:/gpfslocalsup/spack_soft/cudnn/8.0.4.30-10.2-linux-x64/gcc-8.3.1-iwlboerqjev5b667qn7b2fxvrjtmtksp/lib64:/gpfslocalsup/spack_soft/nccl/2.8.3-1/gcc-8.3.1-k63lvicy45sxedhmivziuwlwt3wy4fws/lib:/gpfslocalsys/cuda/10.2/nvvm/lib64:/gpfslocalsys/cuda/10.2/extras/CUPTI/lib64:/gpfslocalsys/cuda/10.2/lib64:/gpfslocalsys/cuda/10.2/samples/common/lib/linux/x86_64:/gpfslocalsys/cuda/10.2/targets/x86_64-linux/lib:/gpfslocalsys/slurm/current/lib/slurm:/gpfslocalsys/slurm/current/lib
  2021-05-26 10:01:03.942134: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
  ===========================================================================================
  INFO: Unable to build TensorFlow plugin, will skip it.
  
  Traceback (most recent call last):
    File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 70, in check_tf_version
      import tensorflow as tf
    File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 49, in <module>
      from ._api.v2 import __internal__
  ImportError: cannot import name '__internal__' from 'tensorflow._api.v2' (/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/_api/v2/__init__.py)
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 372, in build_extensions
      build_tf_extension(self, extension, options)
    File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 393, in build_tf_extension
      check_tf_version()
    File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 79, in check_tf_version
      'import tensorflow failed, is it installed?\n\n%s' % traceback.format_exc()
  distutils.errors.DistutilsPlatformError: import tensorflow failed, is it installed?
  
  Traceback (most recent call last):
    File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 70, in check_tf_version
      import tensorflow as tf
    File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 49, in <module>
      from ._api.v2 import __internal__
  ImportError: cannot import name '__internal__' from 'tensorflow._api.v2' (/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/_api/v2/__init__.py)
  
  
  ===========================================================================================
  error: TensorFlow plugin: `nvtx.plugins.tf.lib.nvtx_ops` failed to build. Aborting.
  ----------------------------------------
  ERROR: Failed building wheel for nvtx-plugins
  Running setup.py clean for nvtx-plugins
Failed to build nvtx-plugins
Installing collected packages: grpcio, tensorflow-estimator, h5py, gast, nvtx-plugins
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.34.1
    Uninstalling grpcio-1.34.1:
      Successfully uninstalled grpcio-1.34.1
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.5.0
    Uninstalling tensorflow-estimator-2.5.0:
      Successfully uninstalled tensorflow-estimator-2.5.0
  Attempting uninstall: h5py
    Found existing installation: h5py 3.1.0
    Uninstalling h5py-3.1.0:
      Successfully uninstalled h5py-3.1.0
  Attempting uninstall: gast
    Found existing installation: gast 0.4.0
    Uninstalling gast-0.4.0:
      Successfully uninstalled gast-0.4.0
    Running setup.py install for nvtx-plugins ... error
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup.py'"'"'; __file__='"'"'/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-tmpytkt5/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /linkhome/rech/genmfd01/ulm75uc/.local/include/python3.7m/nvtx-plugins
         cwd: /tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/
    Complete output (68 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/nvtx
    creating build/lib.linux-x86_64-3.7/nvtx/plugins
    creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/base_callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/estimator.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/ops.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/ext_utils.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/package_info.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    copying nvtx_plugins/python/nvtx/plugins/tf/keras/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    copying nvtx_plugins/python/nvtx/plugins/tf/keras/layers.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    copying nvtx_plugins/python/nvtx/plugins/tf/keras/callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    running egg_info
    writing nvtx_plugins/python/nvtx_plugins.egg-info/PKG-INFO
    writing dependency_links to nvtx_plugins/python/nvtx_plugins.egg-info/dependency_links.txt
    writing requirements to nvtx_plugins/python/nvtx_plugins.egg-info/requires.txt
    writing top-level names to nvtx_plugins/python/nvtx_plugins.egg-info/top_level.txt
    reading manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.so' under directory 'nvtx_plugins/'
    warning: no files found matching '*.lds'
    writing manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
    running build_ext
    gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o
    gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.so
    gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_link_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o
    gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.so
    gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsys/cuda/10.2/include -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cuda.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.o
    gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cuda.o -L/gpfslocalsys/cuda/10.2/lib64 -lcudart -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.so
    2021-05-26 10:01:35.770134: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /gpfslocalsup/spack_soft/openmpi/4.0.2/gcc-8.3.1-pwznunop7kpwwhguizfpaefbtz32qvh6/lib:/gpfslocalsup/spack_soft/cudnn/8.0.4.30-10.2-linux-x64/gcc-8.3.1-iwlboerqjev5b667qn7b2fxvrjtmtksp/lib64:/gpfslocalsup/spack_soft/nccl/2.8.3-1/gcc-8.3.1-k63lvicy45sxedhmivziuwlwt3wy4fws/lib:/gpfslocalsys/cuda/10.2/nvvm/lib64:/gpfslocalsys/cuda/10.2/extras/CUPTI/lib64:/gpfslocalsys/cuda/10.2/lib64:/gpfslocalsys/cuda/10.2/samples/common/lib/linux/x86_64:/gpfslocalsys/cuda/10.2/targets/x86_64-linux/lib:/gpfslocalsys/slurm/current/lib/slurm:/gpfslocalsys/slurm/current/lib
    2021-05-26 10:01:35.770162: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
    ===========================================================================================
    INFO: Unable to build TensorFlow plugin, will skip it.
    
    Traceback (most recent call last):
      File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 70, in check_tf_version
        import tensorflow as tf
      File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 49, in <module>
        from ._api.v2 import __internal__
    ImportError: cannot import name '__internal__' from 'tensorflow._api.v2' (/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/_api/v2/__init__.py)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 372, in build_extensions
        build_tf_extension(self, extension, options)
      File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 393, in build_tf_extension
        check_tf_version()
      File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 79, in check_tf_version
        'import tensorflow failed, is it installed?\n\n%s' % traceback.format_exc()
    distutils.errors.DistutilsPlatformError: import tensorflow failed, is it installed?
    
    Traceback (most recent call last):
      File "/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup_utils.py", line 70, in check_tf_version
        import tensorflow as tf
      File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 49, in <module>
        from ._api.v2 import __internal__
    ImportError: cannot import name '__internal__' from 'tensorflow._api.v2' (/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/_api/v2/__init__.py)
    
    
    ===========================================================================================
    error: TensorFlow plugin: `nvtx.plugins.tf.lib.nvtx_ops` failed to build. Aborting.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+nccl-2.8.3-1/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup.py'"'"'; __file__='"'"'/tmp/pip-install-r07c0pfp/nvtx-plugins_7448de0e0d3c498280b73e6a06c10f28/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-tmpytkt5/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /linkhome/rech/genmfd01/ulm75uc/.local/include/python3.7m/nvtx-plugins Check the logs for full command output

EiffL · 2021-05-26T08:03:26Z

I think I narrowed down where the annoying part of the code happens:

The highlighted region is between these markers (which you can find in the gist I highlighted above):

  final_state0 = add_nvtx(final_state[0], message='before_paint', domain_name='nbody')
  final_field = mesh_utils.cic_paint(final_field, final_state0, halo_size)
  final_field = add_nvtx(final_field, message='after_paint', domain_name='nbody')

so it looks like something goes wrong in the cic_paint.

@dlanzieri @santiagocasas can you check that this makes sense, based on the tests you ran yesterday? And if it does, I guess to figure out what's happening, you can add other NVTX tags inside the mesh_utils.cic_paint function to try to understand what;s going on

dlanzieri · 2021-05-26T08:24:42Z

  Downloading nvtx-plugins-0.1.8.tar.gz (22 kB)
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_6ba889da4e624e38aabcd5215444c913/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_6ba889da4e624e38aabcd5215444c913/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-tzgjxp0g
         cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_6ba889da4e624e38aabcd5215444c913/
    Complete output (15 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_6ba889da4e624e38aabcd5215444c913/setup.py", line 46, in <module>
        from setup_utils import custom_build_ext
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_6ba889da4e624e38aabcd5215444c913/setup_utils.py", line 61, in <module>
        subprocess.check_output(['cmake', '--version'])
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    PermissionError: [Errno 13] Permission denied: 'cmake'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/8c/f7/df14b7b3e81789231a36e1ac0985b64ff2e5af44265509ad82f622f3862d/nvtx-plugins-0.1.8.tar.gz#sha256=23f1956c1ef9f47cb16e3953f08b67131845359203f8cac9a1a7e155d112f89e (from https://pypi.org/simple/nvtx-plugins/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading nvtx-plugins-0.1.7.tar.gz (22 kB)
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_26fcd96a251a454bbeb1c57e6049d338/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_26fcd96a251a454bbeb1c57e6049d338/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-x9028n1w
         cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_26fcd96a251a454bbeb1c57e6049d338/
    Complete output (15 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_26fcd96a251a454bbeb1c57e6049d338/setup.py", line 46, in <module>
        from setup_utils import custom_build_ext
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_26fcd96a251a454bbeb1c57e6049d338/setup_utils.py", line 61, in <module>
        subprocess.check_output(['cmake', '--version'])
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    PermissionError: [Errno 13] Permission denied: 'cmake'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/b8/ea/ff70afc54e3da77d5a2b2f4dc00594b9931c0b4692b5283cea3ac4adccd1/nvtx-plugins-0.1.7.tar.gz#sha256=6bc687812759f806366ad3a8e0c342f68e1b4913a7bfdb4682dd177935a3b5c3 (from https://pypi.org/simple/nvtx-plugins/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading nvtx-plugins-0.1.6.tar.gz (22 kB)
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_5939d7a5047343e0a61918ec4c7f1ca5/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_5939d7a5047343e0a61918ec4c7f1ca5/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-689dclwh
         cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_5939d7a5047343e0a61918ec4c7f1ca5/
    Complete output (15 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_5939d7a5047343e0a61918ec4c7f1ca5/setup.py", line 46, in <module>
        from setup_utils import custom_build_ext
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_5939d7a5047343e0a61918ec4c7f1ca5/setup_utils.py", line 61, in <module>
        subprocess.check_output(['cmake', '--version'])
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    PermissionError: [Errno 13] Permission denied: 'cmake'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/1b/5d/c161af0a3956ca05b95418d48f8dddcfa6e98dd4b161733fe0e58bb51fab/nvtx-plugins-0.1.6.tar.gz#sha256=2aa18a2814fc9a7ceb3ed93cdc225f67686f1a8941fa965935cb18d92fb56738 (from https://pypi.org/simple/nvtx-plugins/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading nvtx-plugins-0.1.5.tar.gz (22 kB)
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_aaf9c472c8a5497f849daaf772572735/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_aaf9c472c8a5497f849daaf772572735/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-jtix17tf
         cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_aaf9c472c8a5497f849daaf772572735/
    Complete output (15 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_aaf9c472c8a5497f849daaf772572735/setup.py", line 44, in <module>
        from setup_utils import custom_build_ext
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_aaf9c472c8a5497f849daaf772572735/setup_utils.py", line 61, in <module>
        subprocess.check_output(['cmake', '--version'])
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    PermissionError: [Errno 13] Permission denied: 'cmake'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/4d/ce/170f1babd854a400d4196f00ebbdd9622ae55bf4d2946486866afa4c0c3c/nvtx-plugins-0.1.5.tar.gz#sha256=9a11f8fdf4354edead3e6af1311365756e04c8bfe8d6ba97becc39ac324b2375 (from https://pypi.org/simple/nvtx-plugins/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading nvtx-plugins-0.1.4.tar.gz (22 kB)
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_f3b539fde06e42c4a552c9bf65a0dcd6/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_f3b539fde06e42c4a552c9bf65a0dcd6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-5anoi05d
         cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_f3b539fde06e42c4a552c9bf65a0dcd6/
    Complete output (15 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_f3b539fde06e42c4a552c9bf65a0dcd6/setup.py", line 44, in <module>
        from setup_utils import custom_build_ext
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_f3b539fde06e42c4a552c9bf65a0dcd6/setup_utils.py", line 61, in <module>
        subprocess.check_output(['cmake', '--version'])
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 411, in check_output
        **kwargs).stdout
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    PermissionError: [Errno 13] Permission denied: 'cmake'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/1f/1c/2bfb593db7ce74cb5882cf1df3898b73f3b06406fa67404760130fe089a7/nvtx-plugins-0.1.4.tar.gz#sha256=0a130ad10c153ec947681a2611d30b2f665c1cbe8e8474e9261d1223278bf1f7 (from https://pypi.org/simple/nvtx-plugins/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Downloading nvtx-plugins-0.1.3.tar.gz (22 kB)
Collecting tensorflow-gpu
  Downloading tensorflow_gpu-2.5.0-cp37-cp37m-manylinux2010_x86_64.whl (454.3 MB)
     |████████████████████████████████| 454.3 MB 40 kB/s 
Requirement already satisfied: wrapt in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from nvtx-plugins) (1.12.1)
Collecting gast==0.4.0
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Requirement already satisfied: numpy~=1.19.2 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (1.19.2)
Requirement already satisfied: google-pasta~=0.2 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (0.2.0)
Requirement already satisfied: absl-py~=0.10 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (0.12.0)
Requirement already satisfied: protobuf>=3.9.2 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (3.17.0)
Requirement already satisfied: termcolor~=1.1.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (1.1.0)
Requirement already satisfied: keras-preprocessing~=1.1.2 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (1.1.2)
Collecting tensorflow-estimator<2.6.0,>=2.5.0rc0
  Downloading tensorflow_estimator-2.5.0-py2.py3-none-any.whl (462 kB)
     |████████████████████████████████| 462 kB 97.7 MB/s 
Requirement already satisfied: wheel~=0.35 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (0.36.2)
Requirement already satisfied: six~=1.15.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (1.15.0)
Requirement already satisfied: keras-nightly~=2.5.0.dev in /gpfsdswork/projects/rech/ftb/ulm75uc/.local/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (2.5.0.dev2021032900)
Collecting grpcio~=1.34.0
  Downloading grpcio-1.34.1-cp37-cp37m-manylinux2014_x86_64.whl (4.0 MB)
     |████████████████████████████████| 4.0 MB 98.4 MB/s 
Requirement already satisfied: tensorboard~=2.5 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (2.5.0)
Requirement already satisfied: flatbuffers~=1.12.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (1.12)
Collecting h5py~=3.1.0
  Downloading h5py-3.1.0-cp37-cp37m-manylinux1_x86_64.whl (4.0 MB)
     |████████████████████████████████| 4.0 MB 91.9 MB/s 
Requirement already satisfied: astunparse~=1.6.3 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (1.6.3)
Requirement already satisfied: typing-extensions~=3.7.4 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (3.7.4.3)
Requirement already satisfied: opt-einsum~=3.3.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorflow-gpu->nvtx-plugins) (3.3.0)
Requirement already satisfied: cached-property in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from h5py~=3.1.0->tensorflow-gpu->nvtx-plugins) (1.5.2)
Requirement already satisfied: werkzeug>=0.11.15 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (2.0.1)
Requirement already satisfied: requests<3,>=2.21.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (2.25.1)
Requirement already satisfied: setuptools>=41.0.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (49.6.0.post20210108)
Requirement already satisfied: google-auth<2,>=1.6.3 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (1.30.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (1.8.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (0.4.4)
Requirement already satisfied: markdown>=2.6.8 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (3.3.4)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (4.2.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (0.2.8)
Requirement already satisfied: rsa<5,>=3.1.4 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (4.7.2)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (1.3.0)
Requirement already satisfied: importlib-metadata in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from markdown>=2.6.8->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (4.0.1)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (0.4.8)
Requirement already satisfied: chardet<5,>=3.0.2 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (1.26.4)
Requirement already satisfied: idna<3,>=2.5 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (2020.12.5)
Requirement already satisfied: oauthlib>=3.0.0 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (3.1.0)
Requirement already satisfied: zipp>=0.5 in /gpfs7kro/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages (from importlib-metadata->markdown>=2.6.8->tensorboard~=2.5->tensorflow-gpu->nvtx-plugins) (3.4.1)
Building wheels for collected packages: nvtx-plugins
  Building wheel for nvtx-plugins (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-nrdnth_2
       cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/
  Complete output (67 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/nvtx
  creating build/lib.linux-x86_64-3.7/nvtx/plugins
  creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/ext_utils.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/base_callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/estimator.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/package_info.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  copying nvtx_plugins/python/nvtx/plugins/tf/ops.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
  creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  copying nvtx_plugins/python/nvtx/plugins/tf/keras/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  copying nvtx_plugins/python/nvtx/plugins/tf/keras/layers.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  copying nvtx_plugins/python/nvtx/plugins/tf/keras/callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
  running egg_info
  writing nvtx_plugins/python/nvtx_plugins.egg-info/PKG-INFO
  writing dependency_links to nvtx_plugins/python/nvtx_plugins.egg-info/dependency_links.txt
  writing requirements to nvtx_plugins/python/nvtx_plugins.egg-info/requires.txt
  writing top-level names to nvtx_plugins/python/nvtx_plugins.egg-info/top_level.txt
  reading manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no files found matching '*.so' under directory 'nvtx_plugins/'
  warning: no files found matching '*.lds'
  writing manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
  running build_ext
  gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o
  gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.so
  gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_link_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o
  gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.so
  gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsys/cuda/11.2/include -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cuda.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.o
  gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cuda.o -L/gpfslocalsys/cuda/11.2/lib -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.so
  2021-05-26 10:09:39.987432: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
  ===========================================================================================
  INFO: Unable to build TensorFlow plugin, will skip it.
  
  Traceback (most recent call last):
    File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 72, in check_tf_version
      import tensorflow as tf
    File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 49, in <module>
      from ._api.v2 import __internal__
  ImportError: cannot import name '__internal__' from 'tensorflow._api.v2' (/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/_api/v2/__init__.py)
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 386, in build_extensions
      build_tf_extension(self, self._tf_lib, options)
    File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 418, in build_tf_extension
      check_tf_version()
    File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 81, in check_tf_version
      'import tensorflow failed, is it installed?\n\n%s' % traceback.format_exc()
  distutils.errors.DistutilsPlatformError: import tensorflow failed, is it installed?
  
  Traceback (most recent call last):
    File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 72, in check_tf_version
      import tensorflow as tf
    File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 49, in <module>
      from ._api.v2 import __internal__
  ImportError: cannot import name '__internal__' from 'tensorflow._api.v2' (/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/_api/v2/__init__.py)
  
  
  ===========================================================================================
  error: No plugin was built. See errors above.
  ----------------------------------------
  ERROR: Failed building wheel for nvtx-plugins
  Running setup.py clean for nvtx-plugins
Failed to build nvtx-plugins
Installing collected packages: grpcio, tensorflow-estimator, h5py, gast, tensorflow-gpu, nvtx-plugins
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.32.0
    Uninstalling grpcio-1.32.0:
      Successfully uninstalled grpcio-1.32.0
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.4.0
    Uninstalling tensorflow-estimator-2.4.0:
      Successfully uninstalled tensorflow-estimator-2.4.0
  Attempting uninstall: h5py
    Found existing installation: h5py 2.10.0
    Uninstalling h5py-2.10.0:
      Successfully uninstalled h5py-2.10.0
  Attempting uninstall: gast
    Found existing installation: gast 0.3.3
    Uninstalling gast-0.3.3:
      Successfully uninstalled gast-0.3.3
  WARNING: The scripts estimator_ckpt_converter, import_pb_to_tensorboard, saved_model_cli, tensorboard, tf_upgrade_v2, tflite_convert, toco and toco_from_protos are installed in '/linkhome/rech/genmfd01/ulm75uc/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
    Running setup.py install for nvtx-plugins ... error
    ERROR: Command errored out with exit status 1:
     command: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-9vjc3kls/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /linkhome/rech/genmfd01/ulm75uc/.local/include/python3.7m/nvtx-plugins
         cwd: /tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/
    Complete output (54 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/nvtx
    creating build/lib.linux-x86_64-3.7/nvtx/plugins
    creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/ext_utils.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/base_callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/estimator.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/package_info.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    copying nvtx_plugins/python/nvtx/plugins/tf/ops.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf
    creating build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    copying nvtx_plugins/python/nvtx/plugins/tf/keras/__init__.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    copying nvtx_plugins/python/nvtx/plugins/tf/keras/layers.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    copying nvtx_plugins/python/nvtx/plugins/tf/keras/callbacks.py -> build/lib.linux-x86_64-3.7/nvtx/plugins/tf/keras
    running egg_info
    writing nvtx_plugins/python/nvtx_plugins.egg-info/PKG-INFO
    writing dependency_links to nvtx_plugins/python/nvtx_plugins.egg-info/dependency_links.txt
    writing requirements to nvtx_plugins/python/nvtx_plugins.egg-info/requires.txt
    writing top-level names to nvtx_plugins/python/nvtx_plugins.egg-info/top_level.txt
    reading manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.so' under directory 'nvtx_plugins/'
    warning: no files found matching '*.lds'
    writing manifest file 'nvtx_plugins/python/nvtx_plugins.egg-info/SOURCES.txt'
    running build_ext
    gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o
    gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.so
    gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_link_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o
    gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_link_flags.o -o build/temp.linux-x86_64-3.7/test_compile/test_link_flags.so
    gcc -pthread -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -std=c++11 -fPIC -O2 -Wall -I/gpfslocalsys/cuda/11.2/include -I/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cuda.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.o
    gcc -pthread -shared -B /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/compiler_compat -L/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,-rpath=/gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/test_compile/test_cuda.o -L/gpfslocalsys/cuda/11.2/lib -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o build/temp.linux-x86_64-3.7/test_compile/test_cuda.so
    2021-05-26 10:11:25.827046: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
    ===========================================================================================
    INFO: Unable to build TensorFlow plugin, will skip it.
    
    Traceback (most recent call last):
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 386, in build_extensions
        build_tf_extension(self, self._tf_lib, options)
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 418, in build_tf_extension
        check_tf_version()
      File "/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup_utils.py", line 72, in check_tf_version
        import tensorflow as tf
      File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/__init__.py", line 444, in <module>
        _ll.load_library(_main_dir)
      File "/linkhome/rech/genmfd01/ulm75uc/.local/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 154, in load_library
        py_tf.TF_LoadLibrary(lib)
    tensorflow.python.framework.errors_impl.NotFoundError: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/lib/python3.7/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeEN4absl14lts_2020_02_2511string_viewEOSt6vectorINS_10StackFrameESaIS7_EE
    
    ===========================================================================================
    error: No plugin was built. See errors above.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /gpfslocalsup/pub/anaconda-py3/2020.02/envs/tensorflow-gpu-2.4.1+cuda-11.2/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup.py'"'"'; __file__='"'"'/tmp/pip-install-z29ux6f0/nvtx-plugins_d560630fb37a4e43b95b1d29a05c7f00/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-9vjc3kls/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /linkhome/rech/genmfd01/ulm75uc/.local/include/python3.7m/nvtx-plugins Check the logs for full command output.

b-remy · 2021-05-26T12:21:28Z

Hi @dlanzieri @santiagocasas ,

Based on the new horovod version and on some modifications of the hvd_simd_mesh_impl.py file of the mesh repo, with @EiffL and @tobias-liaudat we were able to run correctly all of your configurations bellow.

If you want to be able to get the same results you will need to compile the new version of horovod and checkout the max_hvd branch of the mesh repo.

#SBATCH --ntasks=4                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=256 --batch_size=1 --nx=4--ny=1 --hsize=32 --nsteps=10

#SBATCH --ntasks=4                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=256 --batch_size=1 --nx=1 --ny=4 --hsize=32 --nsteps=10

#SBATCH --ntasks=4                
#SBATCH --nodes=1                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=256 --batch_size=1 --nx=2--ny=2 --hsize=32 --nsteps=10

#SBATCH --ntasks=16                
#SBATCH --nodes=4                 
#SBATCH --ntasks-per-node=4     
#SBATCH --gres=gpu:4               
#SBATCH --cpus-per-task=10

with

--nc=256 --batch_size=1 --nx=4--ny=4 --hsize=32 --nsteps=10

santiagocasas · 2021-05-26T12:27:30Z

that's really cool @b-remy !

santiagocasas · 2021-05-27T09:33:47Z

Posting here the steps to run the code inside the Singularity container (based on Meriem's slack post)

step 0

module load singularity 
idrcontmgr cp tensorflow-21.05-tf2-py3_v3.sif

request a node

srun -A ftb@gpu -C v100-16g --gres=gpu:4 -N 1  -n 20 -c 1 -t 15 --pty /bin/bash

run the container

singularity shell --bind $WORK:/data --nv $SINGULARITY_ALLOWED_DIR/tensorflow-21.05-tf2-py3_v3.sif

Run the python "mesh" script

mpirun -x PSM2_CUDA=1 -x PSM2_GPUDIRECT=1 -x PSM2_GDRCOPY=0 -x PSM2_MULTI_EP=1 -np 1 dlprof python mesh_nbody_benchmark_noplot.py | tee -i dlprof_mesh.txt

Run the python "pyramid" script

mpirun -x PSM2_CUDA=1 -x PSM2_GPUDIRECT=1 -x PSM2_GDRCOPY=0 -x PSM2_MULTI_EP=1 -np 1 dlprof python pyramid_nbody_benchmark.py | tee -i dlprof_pyramid.txt

Comments

Remove matlplotlb in the mesh__nbody_benchmark.py since it is not in the container image.
Change halosize to 32 in the flags inside the mesh_nbody script.
For both scripts: Set nx=1, ny=1. These profilings were run with 3 steps.
The tee command saves the terminal output to a file.
Move files nsys_profile.qdrep nsys_profile.sqlite to a new folder before running a new dlprof profiling.

santiagocasas · 2021-05-27T09:41:03Z

Attaching here the terminal outputs for these two cases above.

dlprof_pyramid.txt

dlprof_mesh.txt

Profiling output can be found here:

/gpfswork/rech/ftb/ulm75uc/repo/flowpm/scripts/mesh_singularity

dlanzieri · 2021-05-27T12:25:54Z

Here some dlprof profiles.

This is the profile for the original mesh__nbody_benchmark.py.

Then, we modified few lines of code in the mesh_utils.py

def _cic_paint(mesh, neighboor_coords, kernel, shift, name=None):
  """
  Paints particules on a 3D mesh.

  Parameters:
  -----------
  mesh: tensor (batch_size, nc, nc, nc)
    Input 3D mesh tensor

  shift: [x,y,z] array of coordinate shifting
  """
  with tf.name_scope(name, "cic_update", [mesh, neighboor_coords, kernel]):
    shape = tf.shape(mesh)
    batch_size = shape[0]
    nx, ny, nz = shape[-3], shape[-2], shape[-1]

    # TODO: Assert shift shape
    neighboor_coords = tf.reshape(neighboor_coords, (-1, 8, 4))
    neighboor_coords = neighboor_coords + tf.reshape(tf.constant(shift, dtype=tf.float32),
                                                     [1, 1, 4])
    neighboor_coords = tf.cast(neighboor_coords, tf.int32)
    update = tf.scatter_nd(neighboor_coords, tf.reshape(kernel, (-1, 8)),
                           [batch_size, nx, ny, nz])

    mesh = mesh + tf.reshape(update, mesh.shape)
    return mesh


def _cic_readout(mesh, neighboor_coords, kernel, shift, name=None):
  """
  Paints particules on a 3D mesh.

  Parameters:
  -----------
  mesh: tensor (batch_size, nc, nc, nc)
    Input 3D mesh tensor

  shift: [x,y,z] array of coordinate shifting
  """
  with tf.name_scope(name, "cic_readout", [mesh, neighboor_coords, kernel]):
    shape = tf.shape(mesh)
    batch_size = shape[0]
    nx, ny, nz = shape[-3], shape[-2], shape[-1]
    mesh = mesh[:, 0, 0, 0]
    shape_part = tf.shape(neighboor_coords)
    neighboor_coords = tf.reshape(neighboor_coords, (-1, 8, 4))
    neighboor_coords = neighboor_coords + tf.reshape(tf.constant(shift, dtype=tf.float32),
                                                     [1, 1, 4])
    neighboor_coords = tf.cast(neighboor_coords, tf.int32)
    meshvals = tf.gather_nd(mesh, neighboor_coords)

    weightedvals = tf.multiply(meshvals, tf.reshape(kernel, (-1, 8)))

    value = tf.reduce_sum(weightedvals, axis=-1)

    value = tf.reshape(value, shape_part[:-2])
    return value

this is what we got :

EiffL added Mesh TensorFlow Performance labels May 12, 2021

EiffL assigned dlanzieri May 12, 2021

EiffL mentioned this issue May 12, 2021

Improving support for distributed operations in FlowPM DifferentiableUniverseInitiative/IDRIS-hackathon#5

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking full N-body simulation on GPU clusters #80

Benchmarking full N-body simulation on GPU clusters #80

EiffL commented May 12, 2021 •

edited

Loading

santiagocasas commented May 17, 2021

santiagocasas commented May 17, 2021

santiagocasas commented May 17, 2021

dlanzieri commented May 19, 2021

EiffL commented May 19, 2021

dlanzieri commented May 19, 2021

EiffL commented May 19, 2021

dlanzieri commented May 19, 2021

dlanzieri commented May 20, 2021

EiffL commented May 21, 2021

dlanzieri commented May 22, 2021

andrevitorelli commented May 22, 2021

dlanzieri commented May 25, 2021

EiffL commented May 26, 2021

dlanzieri commented May 26, 2021

EiffL commented May 26, 2021

dlanzieri commented May 26, 2021 •

edited

Loading

EiffL commented May 26, 2021

dlanzieri commented May 26, 2021

b-remy commented May 26, 2021

santiagocasas commented May 26, 2021

santiagocasas commented May 27, 2021

santiagocasas commented May 27, 2021 •

edited

Loading

dlanzieri commented May 27, 2021

Benchmarking full N-body simulation on GPU clusters #80

Benchmarking full N-body simulation on GPU clusters #80

Comments

EiffL commented May 12, 2021 • edited Loading

santiagocasas commented May 17, 2021

santiagocasas commented May 17, 2021

santiagocasas commented May 17, 2021

dlanzieri commented May 19, 2021

EiffL commented May 19, 2021

dlanzieri commented May 19, 2021

EiffL commented May 19, 2021

dlanzieri commented May 19, 2021

dlanzieri commented May 20, 2021

EiffL commented May 21, 2021

dlanzieri commented May 22, 2021

andrevitorelli commented May 22, 2021

dlanzieri commented May 25, 2021

EiffL commented May 26, 2021

dlanzieri commented May 26, 2021

EiffL commented May 26, 2021

dlanzieri commented May 26, 2021 • edited Loading

EiffL commented May 26, 2021

dlanzieri commented May 26, 2021

b-remy commented May 26, 2021

santiagocasas commented May 26, 2021

santiagocasas commented May 27, 2021

step 0

request a node

run the container

Run the python "mesh" script

Run the python "pyramid" script

Comments

santiagocasas commented May 27, 2021 • edited Loading

dlanzieri commented May 27, 2021

EiffL commented May 12, 2021 •

edited

Loading

dlanzieri commented May 26, 2021 •

edited

Loading

santiagocasas commented May 27, 2021 •

edited

Loading