Skip to content

Commit

Permalink
update resource
Browse files Browse the repository at this point in the history
  • Loading branch information
BuxianChen committed Nov 24, 2023
1 parent 5ce6d43 commit ed780c5
Showing 1 changed file with 24 additions and 1 deletion.
25 changes: 24 additions & 1 deletion _drafts/2023-11-18-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,28 @@ title: "(WIP) 资源整合"
date: 2023-11-18 11:10:04 +0800
---

**Pytorch 与分布式训练相关**

- Pytorch tutorial: [https://pytorch.org/tutorials](https://pytorch.org/tutorials)
- 博客园(罗西的思考), 包含了一些关于分布式机器学习的博客(最大的优点是注明了原文的出处), 博主还出了本书: [https://www.cnblogs.com/rossiXYZ/](https://www.cnblogs.com/rossiXYZ/)
- 博客园(罗西的思考), 包含了一些关于分布式机器学习的博客(最大的优点是注明了原文的出处), 博主还出了本书: [https://www.cnblogs.com/rossiXYZ/](https://www.cnblogs.com/rossiXYZ/)
- [deepspeed](https://github.com/microsoft/DeepSpeed)
- [ColossalAI](https://github.com/hpcaitech/ColossalAI)

**大模型部署相关**

- [text-generation-inference](https://huggingface.co/docs/text-generation-inference/quicktour): huggingface 出品
- [vLLM](https://github.com/vllm-project/vllm)
- [deepspeed](https://www.deepspeed.ai/tutorials/inference-tutorial/): Microsoft 出品
- [DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII): Microsoft 出品, 也许是目前最快的?
- [lmdeploy](https://github.com/InternLM/lmdeploy): mmlab 出品
- [FastTransformer](https://github.com/NVIDIA/FasterTransformer): Nvidia 出品
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM): Nvidia 出品, 似乎是 fastertransformer 的替代, 也许是目前最快的?
- [triton-inference-server](https://developer.nvidia.com/triton-inference-server): Nvidia 出品

**量化**

`W8A8` 指模型权重和激活值都量化到 8 bit int; `W4A16` 指模型权重量化到 4 bit int, 激活值保持为 FP 16

- [GPTQ](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)
- [exllama](https://github.com/turboderp/exllama): GPTQ 量化后模型推理的算子优化
- [AWQ](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)

0 comments on commit ed780c5

Please sign in to comment.