From ed780c56077c3946011a8471517d3a28891ddd43 Mon Sep 17 00:00:00 2001 From: BuxianChen <541205605@qq.com> Date: Fri, 24 Nov 2023 18:33:48 +0800 Subject: [PATCH] update resource --- _drafts/2023-11-18-resource.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/_drafts/2023-11-18-resource.md b/_drafts/2023-11-18-resource.md index e49bba7..95bd2b6 100644 --- a/_drafts/2023-11-18-resource.md +++ b/_drafts/2023-11-18-resource.md @@ -4,5 +4,28 @@ title: "(WIP) 资源整合" date: 2023-11-18 11:10:04 +0800 --- +**Pytorch 与分布式训练相关** + - Pytorch tutorial: [https://pytorch.org/tutorials](https://pytorch.org/tutorials) -- 博客园(罗西的思考), 包含了一些关于分布式机器学习的博客(最大的优点是注明了原文的出处), 博主还出了本书: [https://www.cnblogs.com/rossiXYZ/](https://www.cnblogs.com/rossiXYZ/) \ No newline at end of file +- 博客园(罗西的思考), 包含了一些关于分布式机器学习的博客(最大的优点是注明了原文的出处), 博主还出了本书: [https://www.cnblogs.com/rossiXYZ/](https://www.cnblogs.com/rossiXYZ/) +- [deepspeed](https://github.com/microsoft/DeepSpeed) +- [ColossalAI](https://github.com/hpcaitech/ColossalAI) + +**大模型部署相关** + +- [text-generation-inference](https://huggingface.co/docs/text-generation-inference/quicktour): huggingface 出品 +- [vLLM](https://github.com/vllm-project/vllm) +- [deepspeed](https://www.deepspeed.ai/tutorials/inference-tutorial/): Microsoft 出品 +- [DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII): Microsoft 出品, 也许是目前最快的? +- [lmdeploy](https://github.com/InternLM/lmdeploy): mmlab 出品 +- [FastTransformer](https://github.com/NVIDIA/FasterTransformer): Nvidia 出品 +- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM): Nvidia 出品, 似乎是 fastertransformer 的替代, 也许是目前最快的? +- [triton-inference-server](https://developer.nvidia.com/triton-inference-server): Nvidia 出品 + +**量化** + +`W8A8` 指模型权重和激活值都量化到 8 bit int; `W4A16` 指模型权重量化到 4 bit int, 激活值保持为 FP 16 + +- [GPTQ](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) +- [exllama](https://github.com/turboderp/exllama): GPTQ 量化后模型推理的算子优化 +- [AWQ](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)