Skip to content

Commit

Permalink
Added benchmarking results to main README.md (#176)
Browse files Browse the repository at this point in the history
* WIP 139/fine-tuning-a100

Signed-off-by: Joe Olson <[email protected]>
  • Loading branch information
olson-ibm authored Sep 12, 2023
1 parent 2f36d0f commit 091e271
Show file tree
Hide file tree
Showing 7 changed files with 298 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ Prompt tuning - learning soft prompts. This is different from prompt engineering

The important difference between fine tuning and capabilities like prompt tuning/multi-taskprompt tuning is that the latter doesn't change the base model's weights at all. So when you run inference for prompt tuned models, you can have n prompts to 1 base model, and just inject the prompt tensors you need when they're requested instead of having _n_ separate fine-tuned models.

### Runtime Performance Benchmarking

[Runtime Performance Benchmarking](./benchmarks/README.md) for tuning various models.

#### Notes

- Currently causal language models and sequence-to-sequence models are supported.
13 changes: 13 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Caikit NLP Runtime Performance Benchmarks

Runtime performance benchmarking results for various model on various hardware configurations.

## Llama2-7b

| Date Executed | Hardware | Training Set | Epoch | Precision | Batch Size | Max Source Length | Training Runtime (s) | Samples Per Second | Train Steps Per Second | Loss | Notes |
|---|---|---------------|---|---|:---:|---|------------| --- |---|---|---|
| [2023-09-05](./logs/llama2-7b/20230905_183655.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 4096 | 350 | 21.325 | 0.22 | 1.65 | 4096 is the context size for Llama2 |
| [2023-09-05](./logs/llama2-7b/20230905_184809.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 1024 | 350 | 21.333 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM |
| [2023-09-06](./logs/llama2-7b/20230906_135211.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 512 | 348 | 21.44 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM |
| [2023-09-05](./logs/llama2-7b/20230905_194133.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 8 | 256 | 356 | 20.939 | 0.16 | 1.70 | batch size of 9 fails CUDA OOM |
| [2023-09-05](./logs/llama2-7b/20230905_191650.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 19 | 128 | 254 | 29.332 | 0.09 | 1.94 | batch size of 20 fails CUDA OOM |
54 changes: 54 additions & 0 deletions benchmarks/logs/llama2-7b/20230905_183655.output
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
_warnings.warn(
<function register_backend_type at 0x1549c03e3790> is still in the BETA phase and subject to change!
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
_warnings.warn(
Existing model directory found; purging it now.
Experiment Configuration
- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
|- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
- Dataset: [glue/rte]
- Number of Epochs: [1]
- Learning Rate: [2e-05]
- Batch Size: [6]
- Output Directory: [/tmp/tu/output/tuning/llama27b]
- Maximum source sequence length: [4096]
- Maximum target sequence length: [1024]
- Gradient accumulation steps: [16]
- Enable evaluation: [False]
- Evaluation metrics: [['rouge']]
- Torch dtype to use for training: [bfloat16]
[Loading the dataset...]
2023-09-05T18:36:55.174106 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
2023-09-05T18:36:55.192203 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
[Loading the base model resource...]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.89s/it]
[Starting the training...]
2023-09-05T18:37:47.419502 [PEFT_:DBUG] Shuffling enabled? True
2023-09-05T18:37:47.419666 [PEFT_:DBUG] Shuffling buffer size: 7470
TRAINING ARGS: {
"output_dir": "/tmp",
"per_device_train_batch_size": 6,
"per_device_eval_batch_size": 6,
"num_train_epochs": 1,
"seed": 73,
"do_eval": false,
"learning_rate": 2e-05,
"weight_decay": 0.01,
"save_total_limit": 3,
"push_to_hub": false,
"no_cuda": false,
"remove_unused_columns": false,
"dataloader_pin_memory": false,
"gradient_accumulation_steps": 16,
"eval_accumulation_steps": 16,
"bf16": true
}
0%| | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
{'train_runtime': 350.2997, 'train_samples_per_second': 21.325, 'train_steps_per_second': 0.22, 'train_loss': 1.6495626870687905, 'epoch': 0.99}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:50<00:00, 4.55s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.79s/it]
Using sep_token, but it is not set yet.
[Training Complete]

54 changes: 54 additions & 0 deletions benchmarks/logs/llama2-7b/20230905_184809.output
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
_warnings.warn(
<function register_backend_type at 0x14c45f315790> is still in the BETA phase and subject to change!
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
_warnings.warn(
Existing model directory found; purging it now.
Experiment Configuration
- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
|- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
- Dataset: [glue/rte]
- Number of Epochs: [1]
- Learning Rate: [2e-05]
- Batch Size: [6]
- Output Directory: [/tmp/tu/output/tuning/llama27b]
- Maximum source sequence length: [1024]
- Maximum target sequence length: [1024]
- Gradient accumulation steps: [16]
- Enable evaluation: [False]
- Evaluation metrics: [['rouge']]
- Torch dtype to use for training: [bfloat16]
[Loading the dataset...]
2023-09-05T18:47:18.075310 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
2023-09-05T18:47:18.093371 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
[Loading the base model resource...]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.76s/it]
[Starting the training...]
2023-09-05T18:48:09.755222 [PEFT_:DBUG] Shuffling enabled? True
2023-09-05T18:48:09.755357 [PEFT_:DBUG] Shuffling buffer size: 7470
TRAINING ARGS: {
"output_dir": "/tmp",
"per_device_train_batch_size": 6,
"per_device_eval_batch_size": 6,
"num_train_epochs": 1,
"seed": 73,
"do_eval": false,
"learning_rate": 2e-05,
"weight_decay": 0.01,
"save_total_limit": 3,
"push_to_hub": false,
"no_cuda": false,
"remove_unused_columns": false,
"dataloader_pin_memory": false,
"gradient_accumulation_steps": 16,
"eval_accumulation_steps": 16,
"bf16": true
}
0%| | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
{'train_runtime': 350.165, 'train_samples_per_second': 21.333, 'train_steps_per_second': 0.22, 'train_loss': 1.6495626870687905, 'epoch': 0.99}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:50<00:00, 4.55s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.83s/it]
Using sep_token, but it is not set yet.
[Training Complete]

Loading

0 comments on commit 091e271

Please sign in to comment.