-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added benchmarking results to main README.md #176
Conversation
bebf62e
to
48bc9f4
Compare
| [2023-09-05](./logs/llama2-7b/20230905_183655.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 4096 | 350 | 21.325 | 0.22 | 1.65 | 4096 is the context size for Llama2 | | ||
| [2023-09-05](./logs/llama2-7b/20230905_184809.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 1024 | 350 | 21.333 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RTE might have text long enough for 4096 context length, but definitely good to note and then we can iterate later on with a different dataset..
README.md
Outdated
@@ -41,6 +41,10 @@ Prompt tuning - learning soft prompts. This is different from prompt engineering | |||
|
|||
The important difference between fine tuning and capabilities like prompt tuning/multi-taskprompt tuning is that the latter doesn't change the base model's weights at all. So when you run inference for prompt tuned models, you can have n prompts to 1 base model, and just inject the prompt tensors you need when they're requested instead of having _n_ separate fine-tuned models. | |||
|
|||
### Benchmarking | |||
|
|||
[Benchmarks](./benchmarks/README.md) for tuning various models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we add a small description here too (similar to benchmarks readme) so that people are not looking for quality metrics when they look at this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase as "Performance Benchmarking"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Rephrasing as Runtime Performance Benchmarking
sounds good!
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
Signed-off-by: Joe Olson <[email protected]>
904b5e7
to
b3ba86a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @olson-ibm
Added a section to the main README.md for benchmarking. This links to a second README.md in ./benchmarking that contains a poorly formatted table summarizing the benchmarking results on Llama2-7b using 1 x A100 80GB. on the CCC.
Also included under ./logs is the output from the individual benchmarks "showing our work" including all the parameters used for training, in the event someone wants to reproduce this.
The poor formatting on the table seems to be with how GitHub is rendering the markdown. I spent 2 hours messing with the formatting / reading docs / wasting time trying to get the column widths to adjust, but failed in every attempt. Several artifacts of this attempt are still left in the table formatting to prove that nothing works to set the column width so the table looks clean. The fact that this is a systematic problem with Github rendering the markdown is also demonstrated with the table in the main README.md that was not changed for this PR.