Release v0.0.2 · awslabs/slapo

This release mainly improves

More unit tests.
Add .fuse and related primitives.
Improve overall training efficiency of GPT models by adding sequence parallelism, tie weight supports, etc.
Documentation and tutorials.
Bug fixing.

What's Changed

[Release] Setup wheel and release scripts by @comaniac in #18
[Pipeline] Drop last batch in DeepSpeed scripts by @comaniac in #19
[Examples] Add disable_flash_attn by @chhzh123 in #22
[Bugfix] Fix sequence parallelism by @szhengac in #20
[Schedule][replace] Transfer hooks when replacing modules by @comaniac in #27
[Bugfix] Fix GPT script by @szhengac in #26
[Bugfix] Transfer hooks in pipeline modules by @comaniac in #28
[Tracer] Add flatten argument to .trace() by @chhzh123 in #29
[Benchmark] Fix ZeRO-3 step log by @comaniac in #31
[Bugfix] Fix for sharding TP only by @zarzen in #32
[Primitive][shard] Use autograd function for all sync ops by @comaniac in #33
[Bugfix] Using None for mpu when PP > 1 by @zarzen in #34
[Bugfix] Fix GPT script by @szhengac in #36
[Schedule] Refactor subgraph matching by @chhzh123 in #35
[Schedule] Add .fuse() primitive by @chhzh123 in #25
[Setup] Fix dependency by @chhzh123 in #39
[Random] Random state management by @comaniac in #38
[GPT] Use flash-attention and enable dropout by @comaniac in #40
[Op] Add attention and bias_gelu ops by @comaniac in #41
[Tracer] Remove SelfAttention renaming by @chhzh123 in #44
[Model] Add HuggingFace GPT-2 by @comaniac in #45
[Op] Refactor qkv processing by @comaniac in #46
Add num_workers to GPT dataloader by @szhengac in #48
[Op] Add flash-attention CUDA kernel by @comaniac in #49
[Bugfix] Fix tensor device by @szhengac in #50
[Example] Use .fuse() primitive when possible by @chhzh123 in #42
[Refactor] model_dialect -> framework_dialect by @comaniac in #51
[Test] Add default initialization test by @chhzh123 in #54
[Schedule] Create subschedule for subgraph replacement by @chhzh123 in #52
[Schedule] Support partial checkpointing by @chhzh123 in #55
[DeepSpeed] Support TP=nGPU and PP=DP=1 by @comaniac in #56
[Examples] Move examples to slapo.model_schedule by @chhzh123 in #53
[Bugfix] Support tree-like subgraph matching by @chhzh123 in #58
[Bugfix] Consolidate params with orig size by @comaniac in #59
[Bugfix] Fix a small device bug by @szhengac in #57
[README] Temporary remove paper info by @comaniac in #60
Add param_name to shard infer type and fix consolidate by @comaniac in #62
[Feature] Layernorm Tag by @szhengac in #61
[Docs] Add initial documentations by @chhzh123 in #63
Enable launch training with torchrun by @zarzen in #64
[Examples] Enable launch with torchrun by @comaniac in #65

New Contributors

@zarzen made their first contribution in #32

Full Changelog: v0.0.1...v0.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.2

What's Changed

New Contributors

Contributors