Skip to content

ChenfhCS/MoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

A prototype system of distributed MoE based on FastMoE. [In-progress work]

Contents

Installation

Prerequisites

  • Pytorch >= 1.10.0
  • CUDA >= 10
  • FastMoE == 1.1.0

If the distributed expert feature is enabled, NCCL with P2P communication support, typically versions >=2.7.5, is needed.

Installing

git clone https://github.com/ChenfhCS/MoE.git

Download prerequisites

cd MoE/ && pip -r requirements.txt

Replace the modification into fastmoe and Transformers

cd examples/
  1. Change path/to/fmoe to your path in fmoe_update.sh
  2. Change path/to/transformers to your path in fmoe_update.sh
bash fmoe_update.sh && bash update_model.sh

Usage

Run MoE on Single GPU

TransformerXL

bash run.sh xl

Bert

bash run.sh bert

GPT-2

bash run.sh gpt2

Run MoE on Multiple GPUs with Data Parallel

bash run_dp.sh bert

Run MoE on Multiple GPUs with Expert Parallel

bash run_dist.sh bert

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages