generated from caikit/caikit-template
-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added benchmarking results to main README.md (#176)
* WIP 139/fine-tuning-a100 Signed-off-by: Joe Olson <[email protected]>
- Loading branch information
Showing
7 changed files
with
298 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Caikit NLP Runtime Performance Benchmarks | ||
|
||
Runtime performance benchmarking results for various model on various hardware configurations. | ||
|
||
## Llama2-7b | ||
|
||
| Date Executed | Hardware | Training Set | Epoch | Precision | Batch Size | Max Source Length | Training Runtime (s) | Samples Per Second | Train Steps Per Second | Loss | Notes | | ||
|---|---|---------------|---|---|:---:|---|------------| --- |---|---|---| | ||
| [2023-09-05](./logs/llama2-7b/20230905_183655.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 4096 | 350 | 21.325 | 0.22 | 1.65 | 4096 is the context size for Llama2 | | ||
| [2023-09-05](./logs/llama2-7b/20230905_184809.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 1024 | 350 | 21.333 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM | | ||
| [2023-09-06](./logs/llama2-7b/20230906_135211.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 512 | 348 | 21.44 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM | | ||
| [2023-09-05](./logs/llama2-7b/20230905_194133.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 8 | 256 | 356 | 20.939 | 0.16 | 1.70 | batch size of 9 fails CUDA OOM | | ||
| [2023-09-05](./logs/llama2-7b/20230905_191650.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 19 | 128 | 254 | 29.332 | 0.09 | 1.94 | batch size of 20 fails CUDA OOM | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh | ||
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions | ||
_warnings.warn( | ||
<function register_backend_type at 0x1549c03e3790> is still in the BETA phase and subject to change! | ||
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions | ||
_warnings.warn( | ||
Existing model directory found; purging it now. | ||
Experiment Configuration | ||
- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] | ||
|- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>] | ||
- Dataset: [glue/rte] | ||
- Number of Epochs: [1] | ||
- Learning Rate: [2e-05] | ||
- Batch Size: [6] | ||
- Output Directory: [/tmp/tu/output/tuning/llama27b] | ||
- Maximum source sequence length: [4096] | ||
- Maximum target sequence length: [1024] | ||
- Gradient accumulation steps: [16] | ||
- Enable evaluation: [False] | ||
- Evaluation metrics: [['rouge']] | ||
- Torch dtype to use for training: [bfloat16] | ||
[Loading the dataset...] | ||
2023-09-05T18:36:55.174106 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json | ||
2023-09-05T18:36:55.192203 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json | ||
[Loading the base model resource...] | ||
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.89s/it] | ||
[Starting the training...] | ||
2023-09-05T18:37:47.419502 [PEFT_:DBUG] Shuffling enabled? True | ||
2023-09-05T18:37:47.419666 [PEFT_:DBUG] Shuffling buffer size: 7470 | ||
TRAINING ARGS: { | ||
"output_dir": "/tmp", | ||
"per_device_train_batch_size": 6, | ||
"per_device_eval_batch_size": 6, | ||
"num_train_epochs": 1, | ||
"seed": 73, | ||
"do_eval": false, | ||
"learning_rate": 2e-05, | ||
"weight_decay": 0.01, | ||
"save_total_limit": 3, | ||
"push_to_hub": false, | ||
"no_cuda": false, | ||
"remove_unused_columns": false, | ||
"dataloader_pin_memory": false, | ||
"gradient_accumulation_steps": 16, | ||
"eval_accumulation_steps": 16, | ||
"bf16": true | ||
} | ||
0%| | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | ||
{'train_runtime': 350.2997, 'train_samples_per_second': 21.325, 'train_steps_per_second': 0.22, 'train_loss': 1.6495626870687905, 'epoch': 0.99} | ||
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:50<00:00, 4.55s/it] | ||
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.79s/it] | ||
Using sep_token, but it is not set yet. | ||
[Training Complete] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh | ||
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions | ||
_warnings.warn( | ||
<function register_backend_type at 0x14c45f315790> is still in the BETA phase and subject to change! | ||
/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions | ||
_warnings.warn( | ||
Existing model directory found; purging it now. | ||
Experiment Configuration | ||
- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] | ||
|- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>] | ||
- Dataset: [glue/rte] | ||
- Number of Epochs: [1] | ||
- Learning Rate: [2e-05] | ||
- Batch Size: [6] | ||
- Output Directory: [/tmp/tu/output/tuning/llama27b] | ||
- Maximum source sequence length: [1024] | ||
- Maximum target sequence length: [1024] | ||
- Gradient accumulation steps: [16] | ||
- Enable evaluation: [False] | ||
- Evaluation metrics: [['rouge']] | ||
- Torch dtype to use for training: [bfloat16] | ||
[Loading the dataset...] | ||
2023-09-05T18:47:18.075310 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json | ||
2023-09-05T18:47:18.093371 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json | ||
[Loading the base model resource...] | ||
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.76s/it] | ||
[Starting the training...] | ||
2023-09-05T18:48:09.755222 [PEFT_:DBUG] Shuffling enabled? True | ||
2023-09-05T18:48:09.755357 [PEFT_:DBUG] Shuffling buffer size: 7470 | ||
TRAINING ARGS: { | ||
"output_dir": "/tmp", | ||
"per_device_train_batch_size": 6, | ||
"per_device_eval_batch_size": 6, | ||
"num_train_epochs": 1, | ||
"seed": 73, | ||
"do_eval": false, | ||
"learning_rate": 2e-05, | ||
"weight_decay": 0.01, | ||
"save_total_limit": 3, | ||
"push_to_hub": false, | ||
"no_cuda": false, | ||
"remove_unused_columns": false, | ||
"dataloader_pin_memory": false, | ||
"gradient_accumulation_steps": 16, | ||
"eval_accumulation_steps": 16, | ||
"bf16": true | ||
} | ||
0%| | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. | ||
{'train_runtime': 350.165, 'train_samples_per_second': 21.333, 'train_steps_per_second': 0.22, 'train_loss': 1.6495626870687905, 'epoch': 0.99} | ||
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:50<00:00, 4.55s/it] | ||
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.83s/it] | ||
Using sep_token, but it is not set yet. | ||
[Training Complete] | ||
|
Oops, something went wrong.