update resource

BuxianChen · Nov 24, 2023 · ed780c5 · ed780c5
1 parent 5ce6d43
commit ed780c5
Showing 1 changed file with 24 additions and 1 deletion.
diff --git a/_drafts/2023-11-18-resource.md b/_drafts/2023-11-18-resource.md
@@ -4,5 +4,28 @@ title: "(WIP) 资源整合"
 date: 2023-11-18 11:10:04 +0800
 ---
 
+**Pytorch 与分布式训练相关**
+
 - Pytorch tutorial: [https://pytorch.org/tutorials](https://pytorch.org/tutorials)
-- 博客园(罗西的思考), 包含了一些关于分布式机器学习的博客(最大的优点是注明了原文的出处), 博主还出了本书: [https://www.cnblogs.com/rossiXYZ/](https://www.cnblogs.com/rossiXYZ/)
+- 博客园(罗西的思考), 包含了一些关于分布式机器学习的博客(最大的优点是注明了原文的出处), 博主还出了本书: [https://www.cnblogs.com/rossiXYZ/](https://www.cnblogs.com/rossiXYZ/)
+- [deepspeed](https://github.com/microsoft/DeepSpeed)
+- [ColossalAI](https://github.com/hpcaitech/ColossalAI)
+
+**大模型部署相关**
+
+- [text-generation-inference](https://huggingface.co/docs/text-generation-inference/quicktour): huggingface 出品
+- [vLLM](https://github.com/vllm-project/vllm)
+- [deepspeed](https://www.deepspeed.ai/tutorials/inference-tutorial/): Microsoft 出品
+- [DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII): Microsoft 出品, 也许是目前最快的?
+- [lmdeploy](https://github.com/InternLM/lmdeploy): mmlab 出品
+- [FastTransformer](https://github.com/NVIDIA/FasterTransformer): Nvidia 出品
+- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM): Nvidia 出品, 似乎是 fastertransformer 的替代, 也许是目前最快的?
+- [triton-inference-server](https://developer.nvidia.com/triton-inference-server): Nvidia 出品
+
+**量化**
+
+`W8A8` 指模型权重和激活值都量化到 8 bit int; `W4A16` 指模型权重量化到 4 bit int, 激活值保持为 FP 16
+
+- [GPTQ](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)
+- [exllama](https://github.com/turboderp/exllama): GPTQ 量化后模型推理的算子优化
+- [AWQ](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)