From f07be924eda568511c0ca00efdfed06b1d7e6332 Mon Sep 17 00:00:00 2001
From: Jaeyong Song <jaeyong.song@snu.ac.kr>
Date: Wed, 30 Oct 2024 05:27:53 +0900
Subject: [PATCH] Pull request for follow-up work (#23)

Pull request for follow-up work, Optimus-CC (ASPLOS'23), which utilizes PowerSGD for gradient compression in 3D parallelism-based LLM training.
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index b7bcd30..9ce91dc 100644
--- a/README.md
+++ b/README.md
@@ -68,6 +68,7 @@ Research code for the experiments in the [PowerSGD paper](https://arxiv.org/abs/
 - [(Agarwal et al., 2020)](https://arxiv.org/pdf/2010.16248.pdf) share insights into adaptive compression with PowerSGD.
 - [(Vogels et al., 2020)](https://arxiv.org/abs/2008.01425) adapt PowerSGD to work in a decentralized setting (with sparse connectivity between workers.)
 - [(Wang, 2021)](https://medium.com/pytorch/accelerating-pytorch-ddp-by-10x-with-powersgd-585aef12881d) introduces a variation to PowerSGD and describes his experience with PowerSGD on large language models.
+- [(Song et al., 2023)](https://arxiv.org/abs/2301.09830) utilizes PowerSGD (and its slight variant) to compressing pipeline-/data-parallelism gradients in 3D parallelism-based LLM training.
 - (Please submit a PR if you want your work to be included here.)