Skip to content

Commit

Permalink
Pull request for follow-up work (#23)
Browse files Browse the repository at this point in the history
Pull request for follow-up work, Optimus-CC (ASPLOS'23), which utilizes PowerSGD for gradient compression in 3D parallelism-based LLM training.
  • Loading branch information
jaeyong-song authored Oct 29, 2024
1 parent aa9452b commit f07be92
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Research code for the experiments in the [PowerSGD paper](https://arxiv.org/abs/
- [(Agarwal et al., 2020)](https://arxiv.org/pdf/2010.16248.pdf) share insights into adaptive compression with PowerSGD.
- [(Vogels et al., 2020)](https://arxiv.org/abs/2008.01425) adapt PowerSGD to work in a decentralized setting (with sparse connectivity between workers.)
- [(Wang, 2021)](https://medium.com/pytorch/accelerating-pytorch-ddp-by-10x-with-powersgd-585aef12881d) introduces a variation to PowerSGD and describes his experience with PowerSGD on large language models.
- [(Song et al., 2023)](https://arxiv.org/abs/2301.09830) utilizes PowerSGD (and its slight variant) to compressing pipeline-/data-parallelism gradients in 3D parallelism-based LLM training.
- (Please submit a PR if you want your work to be included here.)


Expand Down

0 comments on commit f07be92

Please sign in to comment.