-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #12 from DragonHPC/version-update-0.8
merged release 0.8 code
- Loading branch information
Showing
168 changed files
with
9,704 additions
and
1,068 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
alabaster==0.7.12 | ||
attrs==22.1.0 | ||
attrs==23.1.0 | ||
Babel==2.11.0 | ||
black==22.10.0 | ||
breathe==4.34.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,4 @@ sphinx-copybutton | |
vacuum | ||
wheel | ||
jupyter | ||
parsl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
PyTorch Dataset Usage with Dragon Distributed Dictionary | ||
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
||
This example shows how a PyTorch dataset can use a Dragon distributed dictionary to store the data. | ||
In principle, the distributed dictionary could be shared among other processes that might interact with the training data between training iterations. | ||
The program must be run with GPUs. | ||
|
||
The code demonstrates how the following key concepts work with Dragon: | ||
|
||
* How to utilize Dragon and the PyTorch dataloader and neural network model for training on GPUs | ||
* How to use the distributed Dragon dictionary with multiprocessing queues | ||
|
||
.. literalinclude:: ../../examples/dragon_ai/dict_torch_dataset.py | ||
|
||
Installation | ||
============ | ||
|
||
After installing dragon, the only other dependency is on PyTorch. The PyTorch version and corresponding pip command can be found here (https://pytorch.org/get-started/locally/). | ||
|
||
``` | ||
> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | ||
``` | ||
|
||
Description of the system used | ||
============================== | ||
|
||
For this example, an HPE Cray EX was used. Each node has AMD EPYC 7763 64-core CPUs and 4x Nvidia A100 GPUs. | ||
|
||
How to run | ||
========== | ||
|
||
Example Output when run on 2 nodes with 2 MNIST workers, 1 device per node, 2 epochs, CUDA training, 4 dragon dict managers, and dragon dict memory. | ||
------------------------------------------------------------------------------------- | ||
|
||
.. code-block:: console | ||
:linenos: | ||
> salloc --nodes=2 -p allgriz --exclusive -t 1:00:00 | ||
> dragon dict_torch_dataset.py --mnist-workers 4 --devices-per-node 1 --epochs 2 | ||
Number of nodes: 2 | ||
Number of MNIST workers: 2 | ||
Number of dragon dict managers: 4 | ||
100.0% | ||
100.0% | ||
100.0% | ||
100.0% | ||
Rank 0 Train Epoch: 1 [0/60000 (0%)] Loss: 2.316082 | ||
Rank 1 Train Epoch: 1 [0/60000 (0%)] Loss: 2.313832 | ||
Rank 0 Train Epoch: 1 [6400/60000 (11%)] Loss: 0.268168 | ||
Rank 1 Train Epoch: 1 [6400/60000 (11%)] Loss: 0.436355 | ||
Rank 0 Train Epoch: 1 [12800/60000 (21%)] Loss: 0.190972 | ||
Rank 1 Train Epoch: 1 [12800/60000 (21%)] Loss: 0.205474 | ||
Rank 0 Train Epoch: 1 [19200/60000 (32%)] Loss: 0.187326 | ||
Rank 1 Train Epoch: 1 [19200/60000 (32%)] Loss: 0.568415 | ||
Rank 0 Train Epoch: 1 [25600/60000 (43%)] Loss: 0.093499 | ||
Rank 1 Train Epoch: 1 [25600/60000 (43%)] Loss: 0.058430 | ||
Rank 0 Train Epoch: 1 [32000/60000 (53%)] Loss: 0.060121 | ||
Rank 1 Train Epoch: 1 [32000/60000 (53%)] Loss: 0.149605 | ||
Rank 0 Train Epoch: 1 [38400/60000 (64%)] Loss: 0.156384 | ||
Rank 1 Train Epoch: 1 [38400/60000 (64%)] Loss: 0.119814 | ||
Rank 0 Train Epoch: 1 [44800/60000 (75%)] Loss: 0.082197 | ||
Rank 1 Train Epoch: 1 [44800/60000 (75%)] Loss: 0.096987 | ||
Rank 0 Train Epoch: 1 [51200/60000 (85%)] Loss: 0.053689 | ||
Rank 1 Train Epoch: 1 [51200/60000 (85%)] Loss: 0.101078 | ||
Rank 0 Train Epoch: 1 [57600/60000 (96%)] Loss: 0.031515 | ||
Rank 1 Train Epoch: 1 [57600/60000 (96%)] Loss: 0.090198 | ||
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz | ||
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./torch-data-dict/data/MNIST/raw/train-images-idx3-ubyte.gz | ||
Extracting ./torch-data-dict/data/MNIST/raw/train-images-idx3-ubyte.gz to ./torch-data-dict/data/MNIST/raw | ||
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz | ||
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./torch-data-dict/data/MNIST/raw/train-labels-idx1-ubyte.gz | ||
Extracting ./torch-data-dict/data/MNIST/raw/train-labels-idx1-ubyte.gz to ./torch-data-dict/data/MNIST/raw | ||
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz | ||
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./torch-data-dict/data/MNIST/raw/t10k-images-idx3-ubyte.gz | ||
Extracting ./torch-data-dict/data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./torch-data-dict/data/MNIST/raw | ||
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz | ||
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./torch-data-dict/data/MNIST/raw/t10k-labels-idx1-ubyte.gz | ||
Extracting ./torch-data-dict/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./torch-data-dict/data/MNIST/raw | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.