Added benchmarking results to main README.md #176

olson-ibm · 2023-09-06T14:38:27Z

Added a section to the main README.md for benchmarking. This links to a second README.md in ./benchmarking that contains a poorly formatted table summarizing the benchmarking results on Llama2-7b using 1 x A100 80GB. on the CCC.

Also included under ./logs is the output from the individual benchmarks "showing our work" including all the parameters used for training, in the event someone wants to reproduce this.

The poor formatting on the table seems to be with how GitHub is rendering the markdown. I spent 2 hours messing with the formatting / reading docs / wasting time trying to get the column widths to adjust, but failed in every attempt. Several artifacts of this attempt are still left in the table formatting to prove that nothing works to set the column width so the table looks clean. The fact that this is a systematic problem with Github rendering the markdown is also demonstrated with the table in the main README.md that was not changed for this PR.

gkumbhat · 2023-09-07T17:23:31Z

benchmarks/README.md

+| [2023-09-05](./logs/llama2-7b/20230905_183655.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 4096 | 350 | 21.325 | 0.22 | 1.65 |     4096 is the context size for Llama2     |
+| [2023-09-05](./logs/llama2-7b/20230905_184809.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 1024 | 350 | 21.333 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM |


I think RTE might have text long enough for 4096 context length, but definitely good to note and then we can iterate later on with a different dataset..

gkumbhat · 2023-09-07T17:24:21Z

README.md

@@ -41,6 +41,10 @@ Prompt tuning - learning soft prompts. This is different from prompt engineering

 The important difference between fine tuning and capabilities like prompt tuning/multi-taskprompt tuning is that the latter doesn't change the base model's weights at all. So when you run inference for prompt tuned models, you can have n prompts to 1 base model, and just inject the prompt tensors you need when they're requested instead of having _n_ separate fine-tuned models.

+### Benchmarking
+
+[Benchmarks](./benchmarks/README.md) for tuning various models.


nit: Can we add a small description here too (similar to benchmarks readme) so that people are not looking for quality metrics when they look at this one.

Rephrase as "Performance Benchmarking"?

Yep. Rephrasing as Runtime Performance Benchmarking sounds good!

Signed-off-by: Joe Olson <[email protected]>

gkumbhat

LGTM. Thanks @olson-ibm

olson-ibm requested review from alex-jw-brooks, gkumbhat, evaline-ju and gabe-l-hart as code owners September 6, 2023 14:38

olson-ibm force-pushed the 139/fine-tuning-a100 branch from bebf62e to 48bc9f4 Compare September 6, 2023 17:21

gkumbhat reviewed Sep 7, 2023

View reviewed changes

olson-ibm requested a review from tharapalanivel as a code owner September 12, 2023 18:54

jolson-ibm added 19 commits September 12, 2023 13:55

WIP 139/fine-tuning-a100

ed561dd

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

4d368d0

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

37fb760

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

b18f3a4

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

c5fa354

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

283ea37

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

d6571c4

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

d8dd17a

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

7f2bb5f

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

e0d59b9

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

6d49740

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

bf73700

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

e595292

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

6f316b1

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

7e1fd5b

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

f73280f

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

e4f05da

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

6f18d3c

Signed-off-by: Joe Olson <[email protected]>

WIP 139/fine-tuning-a100

b3ba86a

Signed-off-by: Joe Olson <[email protected]>

olson-ibm force-pushed the 139/fine-tuning-a100 branch from 904b5e7 to b3ba86a Compare September 12, 2023 18:55

gkumbhat approved these changes Sep 12, 2023

View reviewed changes

gkumbhat merged commit 091e271 into caikit:main Sep 12, 2023

gkumbhat mentioned this pull request Sep 20, 2023

Verify and document extent of fine-tuning on single A100 GPU #139

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added benchmarking results to main README.md #176

Added benchmarking results to main README.md #176

olson-ibm commented Sep 6, 2023

gkumbhat Sep 7, 2023

gkumbhat Sep 7, 2023

olson-ibm Sep 8, 2023

gkumbhat Sep 11, 2023 •

edited

Loading

gkumbhat left a comment

		\| [2023-09-05](./logs/llama2-7b/20230905_183655.output) \| 1 x A100 80GB \| [Glue / RTE](https://huggingface.co/datasets/glue) \| 1 \| bfloat16 \| 6 \| 4096 \| 350 \| 21.325 \| 0.22 \| 1.65 \| 4096 is the context size for Llama2 \|
		\| [2023-09-05](./logs/llama2-7b/20230905_184809.output) \| 1 x A100 80GB \| [Glue / RTE](https://huggingface.co/datasets/glue) \| 1 \| bfloat16 \| 6 \| 1024 \| 350 \| 21.333 \| 0.22 \| 1.65 \| batch size of 7 fails CUDA OOM \|

Added benchmarking results to main README.md #176

Added benchmarking results to main README.md #176

Conversation

olson-ibm commented Sep 6, 2023

gkumbhat Sep 7, 2023

Choose a reason for hiding this comment

gkumbhat Sep 7, 2023

Choose a reason for hiding this comment

olson-ibm Sep 8, 2023

Choose a reason for hiding this comment

gkumbhat Sep 11, 2023 • edited Loading

Choose a reason for hiding this comment

gkumbhat left a comment

Choose a reason for hiding this comment

gkumbhat Sep 11, 2023 •

edited

Loading