From 539e874021ece21eb33d210fa8202786e2995d95 Mon Sep 17 00:00:00 2001 From: sndnyang Date: Tue, 22 Mar 2022 14:46:58 -0400 Subject: [PATCH 1/4] Update README.md --- README.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 757eecb6..0c4e2ebc 100755 --- a/README.md +++ b/README.md @@ -12,14 +12,26 @@ The random direction(s) and loss surface values are stored in HDF5 (`.h5`) files ## Setup + **Environment**: One or more multi-GPU node(s) with the following software/libraries installed: -- [PyTorch 0.4](https://pytorch.org/) -- [openmpi 3.1.2](https://www.open-mpi.org/) +- [PyTorch 1.3.1](https://pytorch.org/) +- [openmpi 3.1.2](https://www.open-mpi.org/) and `sudo yum install openmpi-devel` - [mpi4py 2.0.0](https://mpi4py.scipy.org/docs/usrman/install.html) - [numpy 1.15.1](https://docs.scipy.org/doc/numpy/user/quickstart.html) - [h5py 2.7.0](http://docs.h5py.org/en/stable/build.html#install) -- [matplotlib 2.0.2](https://matplotlib.org/users/installing.html) -- [scipy 0.19](https://www.scipy.org/install.html) +- [matplotlib ](https://matplotlib.org/users/installing.html) +- [scipy ](https://www.scipy.org/install.html) +- scikit-learn +- seaborn +For python library, you can use `pip install -r requirements.txt` + + +You *may* need to run the command to enable the `mpi`. +``` +module load mpi +``` + + **Pre-trained models**: The code accepts pre-trained PyTorch models for the CIFAR-10 dataset. From 7116895a0a53a3b28ae99708f3aed728217c9d49 Mon Sep 17 00:00:00 2001 From: Yang Xiulong Date: Tue, 22 Mar 2022 15:27:39 -0400 Subject: [PATCH 2/4] more details for setting environment --- .gitignore | 5 +++++ README.md | 6 +++--- requirements.txt | 11 +++++++++++ 3 files changed, 19 insertions(+), 3 deletions(-) create mode 100644 requirements.txt diff --git a/.gitignore b/.gitignore index 6ca861c9..dee81b92 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,8 @@ cifar10/plots/*.png cifar10/plots/*.h5 cifar10/data/ cifar10/trained_nets/ +resnet* +temp/ +openmpi* +.vscode/ + diff --git a/README.md b/README.md index 0c4e2ebc..dee8c070 100755 --- a/README.md +++ b/README.md @@ -19,18 +19,18 @@ The random direction(s) and loss surface values are stored in HDF5 (`.h5`) files - [mpi4py 2.0.0](https://mpi4py.scipy.org/docs/usrman/install.html) - [numpy 1.15.1](https://docs.scipy.org/doc/numpy/user/quickstart.html) - [h5py 2.7.0](http://docs.h5py.org/en/stable/build.html#install) -- [matplotlib ](https://matplotlib.org/users/installing.html) +- [matplotlib](https://matplotlib.org/users/installing.html) - [scipy ](https://www.scipy.org/install.html) - scikit-learn - seaborn -For python library, you can use `pip install -r requirements.txt` - +You need to first install openmpi or openmpilib, then You *may* need to run the command to enable the `mpi`. ``` module load mpi ``` +For python library, you can use `pip install -r requirements.txt` **Pre-trained models**: diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 00000000..900d6068 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,11 @@ +h5py==2.7.0 +matplotlib==3.3.4 +mpi4py==2.0.0 +numpy==1.15.1 +pandas==0.23.4 +Pillow==8.4.0 +scikit-learn==0.19.2 +scipy==0.19.0 +seaborn==0.9.0 +torch==1.3.1 +torchvision==0.4.2 From c92699063db6c564374d8dd21f2d59e57e7e9e47 Mon Sep 17 00:00:00 2001 From: sndnyang Date: Tue, 22 Mar 2022 16:27:23 -0400 Subject: [PATCH 3/4] Update README.md --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index dee8c070..5dda650f 100755 --- a/README.md +++ b/README.md @@ -72,8 +72,8 @@ Then we can sample loss values along this direction. ``` mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-1:1:51 \ ---model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \ ---dir_type weights --xnorm filter --xignore biasbn --plot +--dir_type weights --xnorm filter --xignore biasbn --plot \ +--model_file vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 ``` - `--dir_type weights` indicates the direction has the same dimensions as the learned parameters, including bias and parameters in the BN layers. - `--xnorm filter` normalizes the random direction at the filter level. Here, a "filter" refers to the parameters that produce a single feature map. For fully connected layers, a "filter" contains the weights that contribute to a single neuron. @@ -93,8 +93,8 @@ To plot the loss contours, we choose two random directions and normalize them in ``` mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \ ---model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \ ---dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn --plot +--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn --plot \ +--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 ``` ![ResNet-56](doc/images/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,51]x[-1.0,1.0,51].h5_train_loss_2dcontour.jpg) @@ -102,7 +102,7 @@ mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y Once a surface is generated and stored in a `.h5` file, we can produce and customize a contour plot using the script `plot_2D.py`. ``` -python plot_2D.py --surf_file path_to_surf_file --surf_name train_loss +python plot_2D.py --surf_name train_loss --surf_file path_to_surf_file ``` - `--surf_name` specifies the type of surface. The default choice is `train_loss`, - `--vmin` and `--vmax` sets the range of values to be plotted. @@ -118,7 +118,7 @@ If you want a more detailed rendering that uses lighting to display details, you To do this, you must 1. Convert the surface `.h5` file to a `.vtp` file. ``` -python h52vtp.py --surf_file path_to_surf_file --surf_name train_loss --zmax 10 --log +python h52vtp.py --surf_name train_loss --zmax 10 --log --surf_file path_to_surf_file ``` This will generate a [VTK](https://www.kitware.com/products/books/VTKUsersGuide.pdf) file containing the loss surface with max value 10 in the log scale. From f2dbdf5345a460b5dc49faf59701700a0d065580 Mon Sep 17 00:00:00 2001 From: sndnyang Date: Tue, 22 Mar 2022 16:31:56 -0400 Subject: [PATCH 4/4] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 5dda650f..42acd911 100755 --- a/README.md +++ b/README.md @@ -113,6 +113,10 @@ python plot_2D.py --surf_name train_loss --surf_file path_to_surf_file `plot_2D.py` can make a basic 3D loss surface plot with `matplotlib`. If you want a more detailed rendering that uses lighting to display details, you can render the loss surface with [ParaView](http://paraview.org). +``` +MESA_GL_VERSION_OVERRIDE=3.2 ./paraview +``` + ![ResNet-56-noshort](doc/images/resnet56_noshort_small.jpg) ![ResNet-56](doc/images/resnet56_small.jpg) To do this, you must