Skip to content

Commit

Permalink
πŸ”₯πŸ”₯[SP: BPT] Blockwise Parallel Transformer for Large Context Models (#93
Browse files Browse the repository at this point in the history
)
  • Loading branch information
DefTruth authored Nov 18, 2024
1 parent f6dffd9 commit 7b2671e
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ Awesome-LLM-Inference: A curated list of [πŸ“™Awesome LLM Inference Papers with

## πŸ“–Contents
* πŸ“–[Trending LLM/VLM Topics](#Trending-LLM-VLM-Topics)πŸ”₯πŸ”₯πŸ”₯
* πŸ“–[DP/MP/PP/TP/SP/CP Parallelism](#DP-MP-PP-TP-SP-CP)πŸ”₯πŸ”₯πŸ”₯
* πŸ“–[LLM Algorithmic/Eval Survey](#LLM-Algorithmic-Eval-Survey)
* πŸ“–[LLM Train/Inference Framework/Design](#LLM-Train-Inference-Framework)
* πŸ“–[DP/MP/PP/TP/SP/CP Parallelism](#DP-MP-PP-TP-SP-CP)πŸ”₯πŸ”₯πŸ”₯
* πŸ“–[Weight/Activation Quantize/Compress](#Weight-Activation-Quantize-Compress)πŸ”₯
* πŸ“–[Continuous/In-flight Batching](#Continuous-In-flight-Batching)
* πŸ“–[IO/FLOPs-Aware/Sparse Attention](#IO-FLOPs-Aware-Attention-Sparse)πŸ”₯
Expand Down Expand Up @@ -81,6 +81,7 @@ Awesome-LLM-Inference: A curated list of [πŸ“™Awesome LLM Inference Papers with
|2019.10|πŸ”₯πŸ”₯[**MP: ZeRO**] DeepSpeed-ZeRO: Memory Optimizations Toward Training Trillion Parameter Models(@microsoft.com)|[[pdf]](https://arxiv.org/pdf/1910.02054)| [[deepspeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social) |⭐️⭐️ |
|2020.05|πŸ”₯πŸ”₯[**TP: Megatron-LM**] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism(@NVIDIA)|[[pdf]](https://arxiv.org/pdf/1909.08053.pdf)|[[Megatron-LM]](https://github.com/NVIDIA/Megatron-LM) ![](https://img.shields.io/github/stars/NVIDIA/Megatron-LM.svg?style=social)|⭐️⭐️ |
|2022.05|πŸ”₯πŸ”₯[**SP: Megatron-LM**] Megatron-LM: Reducing Activation Recomputation in Large Transformer Models(@NVIDIA)|[[pdf]](https://arxiv.org/pdf/2205.05198)|[[Megatron-LM]](https://github.com/NVIDIA/Megatron-LM) ![](https://img.shields.io/github/stars/NVIDIA/Megatron-LM.svg?style=social)|⭐️⭐️ |
|2023.05|πŸ”₯πŸ”₯[**SP: BPT**] Blockwise Parallel Transformer for Large Context Models(@UC Berkeley)|[[pdf]](https://arxiv.org/pdf/2305.19370)| ⚠️|⭐️⭐️ |
|2023.10|πŸ”₯πŸ”₯[**SP: Ring Attention**] Ring Attention with Blockwise Transformers for Near-Infinite Context(@UC Berkeley)|[[pdf]](https://arxiv.org/pdf/2310.01889.pdf)| [[RingAttention]](https://github.com/lhao499/RingAttention) ![](https://img.shields.io/github/stars/lhao499/RingAttention.svg?style=social)|⭐️⭐️ |
|2023.11|πŸ”₯πŸ”₯[**SP: STRIPED ATTENTION**] STRIPED ATTENTION: FASTER RING ATTENTION FOR CAUSAL TRANSFORMERS(@MIT etc)|[[pdf]](https://arxiv.org/pdf/2311.09431.pdf) |[[striped_attention]](https://github.com/exists-forall/striped_attention/) ![](https://img.shields.io/github/stars/exists-forall/striped_attention.svg?style=social) |⭐️⭐️ |
|2023.10|πŸ”₯πŸ”₯[**SP: DEEPSPEED ULYSSES**] DEEPSPEED ULYSSES: SYSTEM OPTIMIZATIONS FOR ENABLING TRAINING OF EXTREME LONG SEQUENCE TRANSFORMER MODELS(@microsoft.com)|[[pdf]](https://arxiv.org/pdf/2309.14509)| [[deepspeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social) |⭐️⭐️ |
Expand Down

0 comments on commit 7b2671e

Please sign in to comment.