Skip to content

Commit

Permalink
🎉auto update by Gmeek action
Browse files Browse the repository at this point in the history
  • Loading branch information
algo-scope committed Oct 16, 2024
1 parent 01a5880 commit 90365e1
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 27 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AlgoScope :link: https://algo-scope.github.io
### :page_facing_up: [3](https://algo-scope.github.io/tag.html)
### :speech_balloon: 0
### :hibiscus: 32695
### :alarm_clock: 2024-10-16 14:14:39
### :hibiscus: 31275
### :alarm_clock: 2024-10-16 14:18:32
### Powered by :heart: [Gmeek](https://github.com/Meekdai/Gmeek)
24 changes: 12 additions & 12 deletions backup/DeepSpeed训练优化原理.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
以微软的bing_bert 训练代码为例,

1、 编写sparse_attention配置文件,后面要传入get_sparse_attention_config函数
[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/deepspeed_bsz64k_lamb_config_seq128.json#L24-L33](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/deepspeed_bsz64k_lamb_config_seq128.json" \l "L24-L33)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/deepspeed_bsz64k_lamb_config_seq128.json#L24-L33
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161345995.png)

通过修改配置中的mode可以使用任何支持的稀疏结构更新 DeepSpeed 配置文件,并相应地设置参数,mode有多个实现,对应多种不同的SA结构:
Expand Down Expand Up @@ -85,19 +85,19 @@

上述mode的配置具体加载到代码中后的示例:

[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L79-L109](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py" \l "L79-L109)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L79-L109
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161346968.png)

2、 将上一步的稀疏注意力配置通过get_sparse_attention_config函数读取,传入模型初始化

[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L1024-L1032](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py" \l "L1024-L1032)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L1024-L1032

class BertModel(BertPreTrainedModel):
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161346489.png)

3、encoder模型初始化时的注意力层更新为稀疏注意力

[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L610-L620](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py" \l "L610-L620)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L610-L620

class BertEncoder(nn.Module):
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161347315.png)
Expand All @@ -106,10 +106,10 @@ class BertEncoder(nn.Module):

您可能需要对input_ids和attention_mask的序列维度进行填充,使其成为稀疏块大小的倍数,用在模型forward里

[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L1067-L1093](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py" \l "L1067-L1093)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L1067-L1093
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161347295.png)

要使用DeepSpeed Sparse Attention,需要在启动脚本中通过--deepspeed_sparse_attention参数启用它,[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/ds_sa_train_bert_bsz64k_seq128.sh#L18](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/ds_sa_train_bert_bsz64k_seq128.sh" \l "L18)
要使用DeepSpeed Sparse Attention,需要在启动脚本中通过--deepspeed_sparse_attention参数启用它,https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/ds_sa_train_bert_bsz64k_seq128.sh#L18
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161347661.png)

# 超快稠密transformer核(Ultra-fast dense transformer kernels)
Expand Down Expand Up @@ -162,7 +162,7 @@ Deepspeed发布时重点宣传了transformer核,跟ZeRO-2并列,主要作用

以BingBertGlue训练代码为例,

[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/BingBertGlue/nvidia/modelingpreln_layerdrop.py#L582-L604](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/BingBertGlue/nvidia/modelingpreln_layerdrop.py" \l "L582-L604)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/BingBertGlue/nvidia/modelingpreln_layerdrop.py#L582-L604
![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161350335.png)

参数主要分为4个大类:
Expand Down Expand Up @@ -253,7 +253,7 @@ comm_backend_name用于指示要使用的后端实现。您可以通过将comm_b

由于1位压缩不能代表精确的零,因此,如果参数在训练过程中具有恒定的零梯度,则压缩误差将继续在动量中积累。例如,对于BERT预训练seq长度128,Bert.embeddings.position_embeddings.Weight在其梯度和动量129至512中具有恒定的零,因为它只能学习到seq长度128,而模型则支持到seq长度512.因此,在1位Adam V2中,我们增加了动量mask的支持,以指定那些在其梯度中具有恒定零的参数。有关如何配置此动量mask,请参见以下示例脚本。

[https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/deepspeed_train.py#L426-L453](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/deepspeed_train.py" \l "L426-L453)
https://github.com/microsoft/DeepSpeedExamples/blob/master/training/bing_bert/deepspeed_train.py#L426-L453

![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161359774.png)

Expand Down Expand Up @@ -337,7 +337,7 @@ deepspeed启动命令行:

如果在DeepSpeed初始化时传递了LoRA模型,那么DeepSpeed引擎将识别LoRA冻结参数。然而,流行的实现是初始化一个基本模型,然后再转换为LoRA模型。在这种情况下,用户需要在LoRA模型转换后显式调用DeepSpeed引擎。这只需要一行代码。下面显示了一个训练脚本的示例片段

[https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/mixed_precision_zeropp.md#training-script-changes](https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/mixed_precision_zeropp.md" \l "training-script-changes)
https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/mixed_precision_zeropp.md#training-script-changes

![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161402054.png)

Expand All @@ -347,7 +347,7 @@ deepspeed启动命令行:

PEFT模型的使用非常方便,只需要按照原本的方式实例化模型,然后设置一下LoRA的config,调用一下get_peft_model方法,就获得了在原模型基础上的PEFT模型,对于LoRA策略来讲,就是在预训练参数矩阵W的基础上增加了矩阵分解的旁支。在下面的例子中,选择了attention中的q和v的部分做LoRA。

[https://github.com/tloen/alpaca-lora/blob/main/finetune.py#L174-L184](https://github.com/tloen/alpaca-lora/blob/main/finetune.py" \l "L174-L184)
https://github.com/tloen/alpaca-lora/blob/main/finetune.py#L174-L184

![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161402496.png)

Expand All @@ -363,13 +363,13 @@ PEFT模型的使用非常方便,只需要按照原本的方式实例化模型

模型训练完成后,可以调用PEFT重写的save_pretrained函数保存权重,该方法只会保存LoRA训练的部分,因此权重文件特别小

[https://github.com/tloen/alpaca-lora/blob/main/finetune.py#L273-L275](https://github.com/tloen/alpaca-lora/blob/main/finetune.py" \l "L273-L275)
https://github.com/tloen/alpaca-lora/blob/main/finetune.py#L273-L275

![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161402117.png)

推理:

[https://github.com/tloen/alpaca-lora/blob/main/generate.py#L26-L52](https://github.com/tloen/alpaca-lora/blob/8bb8579e403dc78e37fe81ffbb253c413007323f/generate.py" \l "L26-L52)
https://github.com/tloen/alpaca-lora/blob/main/generate.py#L26-L52

![image.png](https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/202410161403515.png)

Expand Down
2 changes: 1 addition & 1 deletion blogBase.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"singlePage": [], "startSite": "10/10/2024", "filingNum": "", "onePageListNum": 15, "commentLabelColor": "#006b75", "yearColorList": ["#bc4c00", "#0969da", "#1f883d", "#A333D0"], "i18n": "CN", "themeMode": "manual", "dayTheme": "light", "nightTheme": "dark", "urlMode": "pinyin", "script": "", "style": "", "head": "", "indexScript": "", "indexStyle": "", "bottomText": "\u8f6c\u8f7d\u8bf7\u6ce8\u660e\u51fa\u5904", "showPostSource": 0, "iconList": {}, "UTC": 8, "rssSplit": "sentence", "exlink": {}, "needComment": 1, "allHead": "", "title": "AlgoScope", "subTitle": "\u5b66\u672f\u516b\u6212\u7231\u8336\u6b47", "avatarUrl": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "GMEEK_VERSION": "last", "email": "[email protected]", "postListJson": {"P2": {"htmlDir": "docs/post/tong-ji-xia-zhuan-ma-zhe-8-nian-wo-mai-guo-de-shu.html", "labels": ["\u6280\u672f"], "postTitle": "\u7edf\u8ba1\u4e0b\u8f6c\u7801\u8fd98\u5e74\u6211\u4e70\u8fc7\u7684\u4e66", "postUrl": "post/tong-ji-xia-zhuan-ma-zhe-8-nian-wo-mai-guo-de-shu.html", "postSourceUrl": "https://github.com/algo-scope/algo-scope.github.io/issues/2", "commentNum": 0, "wordCount": 6523, "description": "\u7ffb\u4e86\u6dd8\u5b9d\u4eac\u4e1c\u5f53\u5f53\u7684\u8d2d\u7269\u8bb0\u5f55\uff0c\u5356\u4e66\u57fa\u672c\u90fd\u662f\u5728\u591a\u6293\u9c7c\u4e0a\u6240\u4ee5\u4e5f\u80fd\u67e5\u5230\uff0c\u6309\u7167\u8d2d\u4e70\u65f6\u95f4\u6392\u5e8f\u3002", "top": 0, "createdAt": 1728666201, "style": "", "script": "", "head": "", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "createdDate": "2024-10-12", "dateLabelColor": "#bc4c00"}, "P3": {"htmlDir": "docs/post/shen-ti-shi-ge-ming-de-ben-qian.html", "labels": ["\u5065\u5eb7"], "postTitle": "\u8eab\u4f53\u662f\u9769\u547d\u7684\u672c\u94b1", "postUrl": "post/shen-ti-shi-ge-ming-de-ben-qian.html", "postSourceUrl": "https://github.com/algo-scope/algo-scope.github.io/issues/3", "commentNum": 0, "wordCount": 3514, "description": "## \u8170\u95f4\u76d8\u7a81\u51fa\r\n\r\n2017\u5e74\u6211\u5728\u8bfb\u7814\u65f6\u6253\u7fbd\u6bdb\u7403\u7528\u529b\u8fc7\u731b\uff0c\u52a0\u4e0a\u4e4b\u524d\u5e73\u65f6\u559c\u6b22\u9a7c\u7740\u80cc\u6709\u8170\u808c\u52b3\u635f\uff0c\u6709\u5929\u65e9\u4e0a\u8d77\u5e8a\u65f6\u7a81\u7136\u611f\u89c9\u817f\u4e0a\u62bd\u62bd\u7740\u75bc\u3002", "top": 0, "createdAt": 1728832742, "style": "", "script": "", "head": "", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "createdDate": "2024-10-13", "dateLabelColor": "#bc4c00"}, "P4": {"htmlDir": "docs/post/DeepSpeed-xun-lian-you-hua-yuan-li.html", "labels": ["\u6280\u672f"], "postTitle": "DeepSpeed\u8bad\u7ec3\u4f18\u5316\u539f\u7406", "postUrl": "post/DeepSpeed-xun-lian-you-hua-yuan-li.html", "postSourceUrl": "https://github.com/algo-scope/algo-scope.github.io/issues/4", "commentNum": 0, "wordCount": 22658, "description": "# \u7a00\u758f\u6ce8\u610f\u529b\uff08Sparse Attention\uff09\r\n\r\n\u57fa\u4e8e\u6ce8\u610f\u529b\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u4f8b\u5982Transformer\uff0c\u5728\u6355\u6349\u8f93\u5165\u5e8f\u5217\u4e2dtoken\u4e4b\u95f4\u7684\u5173\u7cfb\u65b9\u9762\u975e\u5e38\u6709\u6548\uff0c\u5373\u4f7f\u8ddd\u79bb\u8f83\u8fdc\u3002", "top": 0, "createdAt": 1729059253, "style": "", "script": "<script>MathJax = {tex: {inlineMath: [[\"$\", \"$\"]]}};</script><script async src=\"https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js\"></script>", "head": "", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "createdDate": "2024-10-16", "dateLabelColor": "#bc4c00"}}, "singeListJson": {}, "labelColorDict": {"bug": "#d73a4a", "documentation": "#0075ca", "duplicate": "#cfd3d7", "enhancement": "#a2eeef", "good first issue": "#7057ff", "help wanted": "#008672", "invalid": "#e4e669", "question": "#d876e3", "wontfix": "#ffffff", "\u5065\u5eb7": "#5319e7", "\u6280\u672f": "#15ECA6"}, "displayTitle": "AlgoScope", "faviconUrl": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "primerCSS": "<link href='https://mirrors.sustech.edu.cn/cdnjs/ajax/libs/Primer/21.0.7/primer.css' rel='stylesheet' />", "homeUrl": "https://algo-scope.github.io", "prevUrl": "disabled", "nextUrl": "disabled"}
{"singlePage": [], "startSite": "10/10/2024", "filingNum": "", "onePageListNum": 15, "commentLabelColor": "#006b75", "yearColorList": ["#bc4c00", "#0969da", "#1f883d", "#A333D0"], "i18n": "CN", "themeMode": "manual", "dayTheme": "light", "nightTheme": "dark", "urlMode": "pinyin", "script": "", "style": "", "head": "", "indexScript": "", "indexStyle": "", "bottomText": "\u8f6c\u8f7d\u8bf7\u6ce8\u660e\u51fa\u5904", "showPostSource": 0, "iconList": {}, "UTC": 8, "rssSplit": "sentence", "exlink": {}, "needComment": 1, "allHead": "", "title": "AlgoScope", "subTitle": "\u5b66\u672f\u516b\u6212\u7231\u8336\u6b47", "avatarUrl": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "GMEEK_VERSION": "last", "email": "[email protected]", "postListJson": {"P2": {"htmlDir": "docs/post/tong-ji-xia-zhuan-ma-zhe-8-nian-wo-mai-guo-de-shu.html", "labels": ["\u6280\u672f"], "postTitle": "\u7edf\u8ba1\u4e0b\u8f6c\u7801\u8fd98\u5e74\u6211\u4e70\u8fc7\u7684\u4e66", "postUrl": "post/tong-ji-xia-zhuan-ma-zhe-8-nian-wo-mai-guo-de-shu.html", "postSourceUrl": "https://github.com/algo-scope/algo-scope.github.io/issues/2", "commentNum": 0, "wordCount": 6523, "description": "\u7ffb\u4e86\u6dd8\u5b9d\u4eac\u4e1c\u5f53\u5f53\u7684\u8d2d\u7269\u8bb0\u5f55\uff0c\u5356\u4e66\u57fa\u672c\u90fd\u662f\u5728\u591a\u6293\u9c7c\u4e0a\u6240\u4ee5\u4e5f\u80fd\u67e5\u5230\uff0c\u6309\u7167\u8d2d\u4e70\u65f6\u95f4\u6392\u5e8f\u3002", "top": 0, "createdAt": 1728666201, "style": "", "script": "", "head": "", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "createdDate": "2024-10-12", "dateLabelColor": "#bc4c00"}, "P3": {"htmlDir": "docs/post/shen-ti-shi-ge-ming-de-ben-qian.html", "labels": ["\u5065\u5eb7"], "postTitle": "\u8eab\u4f53\u662f\u9769\u547d\u7684\u672c\u94b1", "postUrl": "post/shen-ti-shi-ge-ming-de-ben-qian.html", "postSourceUrl": "https://github.com/algo-scope/algo-scope.github.io/issues/3", "commentNum": 0, "wordCount": 3514, "description": "## \u8170\u95f4\u76d8\u7a81\u51fa\r\n\r\n2017\u5e74\u6211\u5728\u8bfb\u7814\u65f6\u6253\u7fbd\u6bdb\u7403\u7528\u529b\u8fc7\u731b\uff0c\u52a0\u4e0a\u4e4b\u524d\u5e73\u65f6\u559c\u6b22\u9a7c\u7740\u80cc\u6709\u8170\u808c\u52b3\u635f\uff0c\u6709\u5929\u65e9\u4e0a\u8d77\u5e8a\u65f6\u7a81\u7136\u611f\u89c9\u817f\u4e0a\u62bd\u62bd\u7740\u75bc\u3002", "top": 0, "createdAt": 1728832742, "style": "", "script": "", "head": "", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "createdDate": "2024-10-13", "dateLabelColor": "#bc4c00"}, "P4": {"htmlDir": "docs/post/DeepSpeed-xun-lian-you-hua-yuan-li.html", "labels": ["\u6280\u672f"], "postTitle": "DeepSpeed\u8bad\u7ec3\u4f18\u5316\u539f\u7406", "postUrl": "post/DeepSpeed-xun-lian-you-hua-yuan-li.html", "postSourceUrl": "https://github.com/algo-scope/algo-scope.github.io/issues/4", "commentNum": 0, "wordCount": 21238, "description": "# \u7a00\u758f\u6ce8\u610f\u529b\uff08Sparse Attention\uff09\r\n\r\n\u57fa\u4e8e\u6ce8\u610f\u529b\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u4f8b\u5982Transformer\uff0c\u5728\u6355\u6349\u8f93\u5165\u5e8f\u5217\u4e2dtoken\u4e4b\u95f4\u7684\u5173\u7cfb\u65b9\u9762\u975e\u5e38\u6709\u6548\uff0c\u5373\u4f7f\u8ddd\u79bb\u8f83\u8fdc\u3002", "top": 0, "createdAt": 1729059253, "style": "", "script": "<script>MathJax = {tex: {inlineMath: [[\"$\", \"$\"]]}};</script><script async src=\"https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js\"></script>", "head": "", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "createdDate": "2024-10-16", "dateLabelColor": "#bc4c00"}}, "singeListJson": {}, "labelColorDict": {"bug": "#d73a4a", "documentation": "#0075ca", "duplicate": "#cfd3d7", "enhancement": "#a2eeef", "good first issue": "#7057ff", "help wanted": "#008672", "invalid": "#e4e669", "question": "#d876e3", "wontfix": "#ffffff", "\u5065\u5eb7": "#5319e7", "\u6280\u672f": "#15ECA6"}, "displayTitle": "AlgoScope", "faviconUrl": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "ogImage": "https://raw.githubusercontent.com/algo-scope/imgBed/main/imgs/logo.png", "primerCSS": "<link href='https://mirrors.sustech.edu.cn/cdnjs/ajax/libs/Primer/21.0.7/primer.css' rel='stylesheet' />", "homeUrl": "https://algo-scope.github.io", "prevUrl": "disabled", "nextUrl": "disabled"}
Loading

0 comments on commit 90365e1

Please sign in to comment.