Releases · FlagOpen/FlagScale · GitHub

06 Nov 09:49

aoyulong

v0.6.0 Latest

Latest

Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
Added comprehensive support for data processing and faster distributed training of LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
Implemented the auto-tuning feature to simplify large-scale distributed training, making it more accessible for users with less expertise.
Enhanced the CI/CD system to facilitate more efficient unit testing across different backends and perform the loss check for the various parallel strategies.

Assets 2

11 Apr 02:34

aoyulong

v0.3

Accomplish the heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster utilizing a combination of NVIDIA and Iluvatar chips.
Provide the training of the Aquila2 series across a variety of AI chips from six distinct manufacturers.

Assets 2

30 Nov 09:27

aoyulong

v0.2

Provide the actually used training scheme for Aquila2-70B-Expr, including the parallel strategies, optimizations and hyper-parameter settings.
Support heterogeneous training on chips of different generations with the same architecture or compatible architectures, including NVIDIA GPUs and Iluvatar CoreX chips.
Support training on chinese domestic hardwares, including Iluvatar CoreX and Baidu KUNLUN chips.

Assets 2