Update release data (#491)

Signed-off-by: Wenxin Zhang <[email protected]> Signed-off-by: Lv, Kaokao <[email protected]> Co-authored-by: Lv, Kaokao <[email protected]>
intel · Oct 19, 2023 · 38dfeae · 38dfeae
1 parent ce3d24e
commit 38dfeae
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 77 deletions.
diff --git a/docs/release_data.md b/docs/release_data.md
@@ -12,23 +12,19 @@ Validated Model Performance
 
 3. [LLM Runtime (GGML-Compatible)](#llm-runtime-GGML-compatible)
 
-    3.1 [LLama-7B-hf](#llama-7b-hf)
+    3.1 [MPT-7B](#mpt-7b)
 
-    3.2 [LLama2-7B-chat](#llama2-7b-chat)
+    3.2 [GPT-j-6B](#gpt-j-6b)
 
-    3.3 [MPT-7B](#mpt-7b)
+    3.3 [Falcon-7B](#falcon-7b)
 
-    3.4 [GPT-j-6B](#gpt-j-6b)
+    3.4 [GPT-NEOX-20B](#gpt-neox-20b)
 
-    3.5 [Falcon-7B](#falcon-7b)
+    3.5 [Dolly-V2-3B](#dolly-v2-3b)
 
-    3.6 [GPT-NEOX-20B](#gpt-neox-20b)
+    3.6 [OPT-1.3B](#opt-13b)
 
-    3.7 [Dolly-V2-3B](#dolly-v2-3b)
-
-    3.8 [OPT-1.3B](#opt-13b)
-
-    3.9 [StarCoder-3B](#starcoder-3b)
+    3.7 [StarCoder-3B](#starcoder-3b)
 
 4. [LLM Finetuning](#llm-finetuning)
 
@@ -58,8 +54,6 @@ Intel Neural Compressor: 2.3
 | pytorch   | bloom_7b1     | NeelNanda/pile-10k | 12.36                    | 60.14%            | 3.22                     | 57.64% | 3.83 | 4.34% |
 | pytorch   | opt_2.7b      | NeelNanda/pile-10k | 23.19                    | 63.67%            | 12.24                    | 63.65% | 1.89 | 0.03% |
 | pytorch   | opt_6.7b      | NeelNanda/pile-10k | 13.5                     | 67.01%            | 4.1                      | 67.69% | 3.29 | \-1.00% |
-| pytorch   | llama_13b     | NeelNanda/pile-10k | 8.9                      | 56.88%            | 2.24                     | 76.27% | 3.97 | \-25.42% |
-| pytorch   | llama_7b      | NeelNanda/pile-10k | 13.56                    | 58.55%            | 4.4                      | 73.61% | 3.09 | \-20.46% |
 | pytorch   | gpt_j_6b      | NeelNanda/pile-10k | 10.76                    | 67.59%            | 4.38                     | 68.31% | 2.46 | \-1.05% |
 | pytorch   | flan_t5_large | samsum             | 69.75                    | 46.25 (rougeLsum) | 33.16                    | 47.67 (rougeLsum) | 2.1 | \-2.99% |
 | pytorch   | gpt_neox_clm  | wikitext           | 1.47                     | 4.04 (eval_loss)  | 0.65                     | 3.52 (eval_loss) | 2.27 | \-14.78% |
@@ -80,7 +74,6 @@ Pytorch: 2.0.1+cpu
 | --------- | --------------------- | -------- | ------ |  --------- | --------- | --------- | --------- | --------- | --------- | -------- |
 | pytorch   | gpt-neox-20b          | 32       | 32  | 9283 (ms) |           |           |           |           |           |          |
 | pytorch   | dolly-v2-3b           | 32       | 32  | 3191 (ms) | 3798 (ms) | 2689 (ms) | 1.19x     | 1.41x     |           |
-| pytorch   | llama-7b-hf           | 32       | 32  | 1872 (ms) | 5402 (ms) | 2689 (ms) | 1935 (ms) | 2.89x     | 2.01x     | 2.79x    |
 | pytorch   | gpt-j-6b-pruned       | 32       | 32  | 4523 (ms) | 2421 (ms) | 1758 (ms) | 1.87x     | 2.57x     |
 | pytorch   | gpt-j-6b              | 32       | 32  | 1658 (ms) | 4561 (ms) | 2429 (ms) | 1793 (ms) | 2.75x     | 1.88x     | 2.54x    |
 
@@ -125,53 +118,6 @@ Pytorch: 2.0.1+cpu
 Environment:
 GCC / G++:  12.1.0
 
-### LLama-7B-hf
-
-| Backend    | Input | Output | Cores/Instance | Precision | Compute Type | Group Size | Next Token(ms) | Memory mean used (Top 50%) MB | First Token(ms) | Total Latency(ms) | P90 Latency(ms) | P99 Latency(ms) |
-| ---------- | ----- | ------ | -------------- | --------- | ------------ | ---------- | -------------- | ----------------------------- | --------------- | ----------------- | --------------- | --------------- |
-| LLM Runtime | 32    | 32     | 32             | INT4      | INT8         | 128        | 27.2           |  4212                         | 72.69           | 915               | 27.37           | 58.73           |
-| LLM Runtime | 1024  | 32     | 32             | INT4      | INT8         | 128        | 30.39          | 4495                          | 3091            | 4033              | 30.75           | 2142            |
-| LLM Runtime | 32    | 32     | 48             | INT4      | INT8         | 128        | 24.41          | 4786                          | 71.1            | 827               | 24.63           | 56.85           |
-| LLM Runtime | 1024  | 32     | 48             | INT4      | INT8         | 128        | 27.46          | 4751                          | 2904            | 3755              | 27.56           | 2012            |
-| LLM Runtime | 32    | 32     | 56             | INT4      | INT8         | 128        | 24.84          | 4810                          | 72.1            | 842.05            | 25.01           | 57.72           |
-| LLM Runtime | 1024  | 32     | 56             | INT4      | INT8         | 128        | 27.97          | 4790                          | 2749            | 3616              | 28.09           | 1906            |
-| LLM Runtime | 32    | 32     | 32             | INT4      | INT8         | 32         | 28.95          | 4154                          | 125.24          | 1022              | 29.04           | 95.51           |
-| LLM Runtime | 1024  | 32     | 32             | INT4      | INT8         | 32         | 32.49          | 4966                          | 4645            | 5652              | 32.55           | 3215            |
-| LLM Runtime | 32    | 32     | 48             | INT4      | INT8         | 32         | 27.1           | 4780                          | 113.01          | 953               | 27.28           | 86.56           |
-| LLM Runtime | 1024  | 32     | 48             | INT4      | INT8         | 32         | 30.51          | 4853                          | 5077            | 6022              | 30.64           | 3513            |
-| LLM Runtime | 32    | 32     | 56             | INT4      | INT8         | 32         | 27.99          | 4808                          | 121.65          | 989               | 28.15           | 92.8            |
-| LLM Runtime | 1024  | 32     | 56             | INT4      | INT8         | 32         | 31.22          | 4855                          | 4805            | 5773              | 31.36           | 3326            |
-| GGML       | 32    | 32     | 32             | INT4      | INT8         | 32         | 29.78          | 4035                          | 426.64          | 1349              | 30.07           | 303             |
-| GGML       | 1024  | 32     | 32             | INT4      | INT8         | 32         | 34.07          | 4789                          | 14561           | 15617             | 34.57           | 10058           |
-| GGML       | 32    | 32     | 48             | INT4      | INT8         | 32         | 27.37          | 4776                          | 309.31          | 1157              | 27.52           | 222             |
-| GGML       | 1024  | 32     | 48             | INT4      | INT8         | 32         | 30.6           | 4811                          | 10653           | 11601             | 30.85           | 7360            |
-| GGML       | 32    | 32     | 56             | INT4      | INT8         | 32         | 27.2           | 4803                          | 282.86          | 1126              | 27.42           | 203             |
-| GGML       | 1024  | 32     | 56             | INT4      | INT8         | 32         | 30.06          | 4827                          | 9677            | 10609             | 30.24           | 6688            |
-
-
-### LLama2-7B-chat
-
-| Backend    | Input | Output | Cores/Instance | Precision | Compute Type | Group Size | Next Token(ms) | Memory mean used (Top 50%) MB | First Token(ms) | Total Latency(ms) | P90 Latency(ms) | P99 Latency(ms) |
-| ---------- | ----- | ------ | -------------- | --------- | ------------ | ---------- | -------------- | ----------------------------- | --------------- | ----------------- | --------------- | --------------- |
-| LLM Runtime | 32    | 32     | 32             | INT4      | INT8         | 128        | 26.2           |  4320                         | 103.62          | 915               | 26.38           | 79.73           |
-| LLM Runtime | 1024  | 32     | 32             | INT4      | INT8         | 128        | 30.74          | 4516                          | 4102            | 5055              | 31.19           | 2842            |
-| LLM Runtime | 32    | 32     | 48             | INT4      | INT8         | 128        | 24.34          | 4772                          | 91.57           | 846               | 24.64           | 70.99           |
-| LLM Runtime | 1024  | 32     | 48             | INT4      | INT8         | 128        | 27.52          | 4743                          | 4575            | 5428              | 27.58           | 3166            |
-| LLM Runtime | 32    | 32     | 56             | INT4      | INT8         | 128        | 24.55          | 4784                          | 95.24           | 856               | 24.75           | 73.58           |
-| LLM Runtime | 1024  | 32     | 56             | INT4      | INT8         | 128        | 27.7           | 4762                          | 4185            | 5043              | 27.82           | 2896            |
-| LLM Runtime | 32    | 32     | 32             | INT4      | INT8         | 32         | 29.37          | 4163                          | 130             | 1040              | 29.55           | 98.94           |
-| LLM Runtime | 1024  | 32     | 32             | INT4      | INT8         | 32         | 32.89          | 4952                          | 4812            | 5831              | 33.06           | 3330            |
-| LLM Runtime | 32    | 32     | 48             | INT4      | INT8         | 32         | 27.66          | 4771                          | 113             | 970               | 28.54           | 86.89           |
-| LLM Runtime | 1024  | 32     | 48             | INT4      | INT8         | 32         | 31.24          | 4857                          | 9884            | 10852             | 31.33           | 6829            |
-| LLM Runtime | 32    | 32     | 56             | INT4      | INT8         | 32         | 27.87          | 4782                          | 120.6           | 984               | 28.1            | 92.17           |
-| LLM Runtime | 1024  | 32     | 56             | INT4      | INT8         | 32         | 31.13          | 4819                          | 4507            | 5472              | 31.24           | 3119            |
-| GGML       | 32    | 32     | 32             | INT4      | INT8         | 32         | 32.2           | 3970                          | 426.65          | 1424              | 32.49           | 304.4           |
-| GGML       | 1024  | 32     | 32             | INT4      | INT8         | 32         | 33.99          | 4774                          | 14457           | 15511             | 34.2            | 9986            |
-| GGML       | 32    | 32     | 48             | INT4      | INT8         | 32         | 28.14          | 4768                          | 309.57          | 1181              | 28.38           | 222.4           |
-| GGML       | 1024  | 32     | 48             | INT4      | INT8         | 32         | 31.49          | 4786                          | 16512           | 17488             | 31.65           | 11404           |
-| GGML       | 32    | 32     | 56             | INT4      | INT8         | 32         | 27.71          | 4779                          | 283.91          | 1142              | 27.89           | 204.6           |
-| GGML       | 1024  | 32     | 56             | INT4      | INT8         | 32         | 31.53          | 4780                          | 15819           | 16797             | 31.59           | 10926           |
-
 
 ### MPT-7B
 
@@ -361,4 +307,4 @@ PyTorch: 2.0.1+cpu
 | --------- | ----------- | ----------------- | ------------- | ----- | --- | --------- | ---- | --------------- | ------- | ---------- | ---------- | -------------------- | ----------------- | ------------- |
 | PyTorch  | 4096        | 13K               | Yes           | 1     | 1   | BF16      | Yes  | 8/16            | 3       | 3.2 Hour   | 9.6 Hours  | 0.30/0.45            | 128               | 1.00E-04      |
 | PyTorch  | 4096        | 13K               | Yes           | 2     | 2   | BF16      | Yes  | 8/16            | 3       | 1.2 Hour   | 3.6 Hours  | 0.30/0.45            | 128               | 1.00E-04      |
-| PyTorch  | 4096        | 13K               | Yes           | 4     | 2   | BF16      | Yes  | 8/16            | 3       | 0.67 Hour  | 2 Hours    | 0.30/0.45            | 128               | 1.00E-04      |
+| PyTorch  | 4096        | 13K               | Yes           | 4     | 2   | BF16      | Yes  | 8/16            | 3       | 0.67 Hour  | 2 Hours    | 0.30/0.45            | 128               | 1.00E-04      |
diff --git a/workflows/chatbot/fine_tuning/dpo_pipeline/README.md b/workflows/chatbot/fine_tuning/dpo_pipeline/README.md
@@ -17,13 +17,13 @@ We select 12k examples from [Orca](https://arxiv.org/abs/2306.02707) style datas
 ## 3. Training
 
 ```
-python dpo_clm.py --model_name_or_path "meta-llama/Llama-2-7b-hf" --output_dir "llama2_7b-dpo" --per_device_train_batch_size 1 --gradient_accumulation_steps 8 --learning_rate 5e-4 --max_steps 1000 --save_steps 10 --lora_alpha 16 --lora_rank 16 --lora_dropout 0.05 --dataset_name Intel/orca_dpo_pairs --bf16 --use_auth_token True
+python dpo_clm.py --model_name_or_path "mosaicml/mpt-7b" --output_dir "mpt_7b-dpo" --per_device_train_batch_size 1 --gradient_accumulation_steps 8 --learning_rate 5e-4 --max_steps 1000 --save_steps 10 --lora_alpha 16 --lora_rank 16 --lora_dropout 0.05 --dataset_name Intel/orca_dpo_pairs --bf16 --use_auth_token True
 ```
 
 
 ## 4. Evaluation
 
-We verify DPO training on our finetuned `mpt-7b` model [Intel/neural-chat-7b-v1-1](https://huggingface.co/Intel/neural-chat-7b-v1-1),  our finetuned `llama-2-7b` model, and a finetuned `llama-2-7b` model [pankajmathur/orca_mini_v3_7b](https://huggingface.co/pankajmathur/orca_mini_v3_7b) that has a relative high score in [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), which prove that the performance of model can be significantly improved. The evaluation metrics is same as [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) which uses [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master), a unified framework to test generative language models on a large number of different evaluation tasks.
+We verify DPO training on our finetuned `mpt-7b` model [Intel/neural-chat-7b-v1-1](https://huggingface.co/Intel/neural-chat-7b-v1-1). The evaluation metrics is same as [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) which uses [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master), a unified framework to test generative language models on a large number of different evaluation tasks.
 
 #### mpt architecture
 | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Evaluation by |
@@ -34,16 +34,3 @@ We verify DPO training on our finetuned `mpt-7b` model [Intel/neural-chat-7b-v1-
 | **[Intel/neural-chat-7b-v1-1](https://huggingface.co/Intel/neural-chat-7b-v1-1) with DPO** | **52.39** | 51.54  | 76.45 | 39.47| 42.10 | ours |
 
 
-#### llama-2 architecture
-
-| Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Evaluation by |
-| --- | --- | --- | --- | --- | --- | --- |
-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|54.275 | 52.90  | 78.63 | 46.61  | 38.96|ours |
-| [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|55.81 | 53.50  | 78.60 | 46.53  | 44.60  |ours |
-| Our Finetuned | **57.4** | 54.78 | 78.77 | 51.2  | 44.85 | ours |
-| **Our Finetuned with DPO** | **59.58** | 57.34 | 78.61 | 50.8  | 51.6 | ours |
-| [pankajmathur/orca_mini_v3_7b](https://huggingface.co/pankajmathur/orca_mini_v3_7b)  | **59.86** | 56.91 | 79.64 | 52.37  | 50.51 | [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
-| **[pankajmathur/orca_mini_v3_7b](https://huggingface.co/pankajmathur/orca_mini_v3_7b) with DPO** | **60.92** | 59.22 | 79.92 | 51.84  | 52.71 | ours |
-
-
-