A prototype system of distributed MoE based on FastMoE. [In-progress work]
- Pytorch >= 1.10.0
- CUDA >= 10
- FastMoE == 1.1.0
If the distributed expert feature is enabled, NCCL with P2P communication
support, typically versions >=2.7.5
, is needed.
git clone https://github.com/ChenfhCS/MoE.git
cd MoE/ && pip -r requirements.txt
cd examples/
- Change
path/to/fmoe
to your path infmoe_update.sh
- Change
path/to/transformers
to your path infmoe_update.sh
bash fmoe_update.sh && bash update_model.sh
bash run.sh xl
bash run.sh bert
bash run.sh gpt2
bash run_dp.sh bert
bash run_dist.sh bert