ScheMoE
The development of this code refers to tutel.
torch>=1.9.1
# Install zfp
git clone https://github.com/Fragile-azalea/zfp.git
cd zfp
mkdir build
cd build
cmake ..
cmake --build . --config Release
ctest
cd ../..
git clone https://github.com/Fragile-azalea/ScheMoE.git
cd ScheMoE
# May change include_dirs and library_dirs in setup.py
pip install -e .
# Single Machine:
python3 -m torch.distributed.run --nproc_per_node=4 -m schemoe.examples.pre_test --batch_size=16
# Distribute:
# pls refers to schemoe/examples/run_mpi.sh
-
Navigate to the schemoe/custom/compressor/ directory.
-
Create a new compressor class that inherits from the AbstractCompressor class.
-
Implement the virtual functions defined in abstract.h within your new compressor class.
-
Navigate to the schemoe/custom/comm/ directory.
-
Create a new comm class that inherits from the AbstractComm class.
-
Implement the virtual functions defined in abstract.h within your new comm class.
- g++==7.5.0
- cuda==10.2
- gpu==2080Ti