diff --git a/README.md b/README.md index a3ed1ab9..a750375e 100644 --- a/README.md +++ b/README.md @@ -41,6 +41,10 @@ Prompt tuning - learning soft prompts. This is different from prompt engineering The important difference between fine tuning and capabilities like prompt tuning/multi-taskprompt tuning is that the latter doesn't change the base model's weights at all. So when you run inference for prompt tuned models, you can have n prompts to 1 base model, and just inject the prompt tensors you need when they're requested instead of having _n_ separate fine-tuned models. +### Runtime Performance Benchmarking + +[Runtime Performance Benchmarking](./benchmarks/README.md) for tuning various models. + #### Notes - Currently causal language models and sequence-to-sequence models are supported. diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 00000000..4657e584 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,13 @@ +# Caikit NLP Runtime Performance Benchmarks + +Runtime performance benchmarking results for various model on various hardware configurations. + +## Llama2-7b + +| Date Executed | Hardware | Training Set | Epoch | Precision | Batch Size | Max Source Length | Training Runtime (s) | Samples Per Second | Train Steps Per Second | Loss | Notes | +|---|---|---------------|---|---|:---:|---|------------| --- |---|---|---| +| [2023-09-05](./logs/llama2-7b/20230905_183655.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 4096 | 350 | 21.325 | 0.22 | 1.65 | 4096 is the context size for Llama2 | +| [2023-09-05](./logs/llama2-7b/20230905_184809.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 1024 | 350 | 21.333 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM | +| [2023-09-06](./logs/llama2-7b/20230906_135211.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 512 | 348 | 21.44 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM | +| [2023-09-05](./logs/llama2-7b/20230905_194133.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 8 | 256 | 356 | 20.939 | 0.16 | 1.70 | batch size of 9 fails CUDA OOM | +| [2023-09-05](./logs/llama2-7b/20230905_191650.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 19 | 128 | 254 | 29.332 | 0.09 | 1.94 | batch size of 20 fails CUDA OOM | diff --git a/benchmarks/logs/llama2-7b/20230905_183655.output b/benchmarks/logs/llama2-7b/20230905_183655.output new file mode 100644 index 00000000..2dbea0be --- /dev/null +++ b/benchmarks/logs/llama2-7b/20230905_183655.output @@ -0,0 +1,54 @@ +(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh +/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions + _warnings.warn( + is still in the BETA phase and subject to change! +/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions + _warnings.warn( +Existing model directory found; purging it now. +Experiment Configuration +- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] + |- Inferred Model Resource Type: [] +- Dataset: [glue/rte] +- Number of Epochs: [1] +- Learning Rate: [2e-05] +- Batch Size: [6] +- Output Directory: [/tmp/tu/output/tuning/llama27b] +- Maximum source sequence length: [4096] +- Maximum target sequence length: [1024] +- Gradient accumulation steps: [16] +- Enable evaluation: [False] +- Evaluation metrics: [['rouge']] +- Torch dtype to use for training: [bfloat16] +[Loading the dataset...] +2023-09-05T18:36:55.174106 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +2023-09-05T18:36:55.192203 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +[Loading the base model resource...] +Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.89s/it] +[Starting the training...] +2023-09-05T18:37:47.419502 [PEFT_:DBUG] Shuffling enabled? True +2023-09-05T18:37:47.419666 [PEFT_:DBUG] Shuffling buffer size: 7470 +TRAINING ARGS: { + "output_dir": "/tmp", + "per_device_train_batch_size": 6, + "per_device_eval_batch_size": 6, + "num_train_epochs": 1, + "seed": 73, + "do_eval": false, + "learning_rate": 2e-05, + "weight_decay": 0.01, + "save_total_limit": 3, + "push_to_hub": false, + "no_cuda": false, + "remove_unused_columns": false, + "dataloader_pin_memory": false, + "gradient_accumulation_steps": 16, + "eval_accumulation_steps": 16, + "bf16": true +} + 0%| | 0/77 [00:00 is still in the BETA phase and subject to change! +/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions + _warnings.warn( +Existing model directory found; purging it now. +Experiment Configuration +- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] + |- Inferred Model Resource Type: [] +- Dataset: [glue/rte] +- Number of Epochs: [1] +- Learning Rate: [2e-05] +- Batch Size: [6] +- Output Directory: [/tmp/tu/output/tuning/llama27b] +- Maximum source sequence length: [1024] +- Maximum target sequence length: [1024] +- Gradient accumulation steps: [16] +- Enable evaluation: [False] +- Evaluation metrics: [['rouge']] +- Torch dtype to use for training: [bfloat16] +[Loading the dataset...] +2023-09-05T18:47:18.075310 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +2023-09-05T18:47:18.093371 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +[Loading the base model resource...] +Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.76s/it] +[Starting the training...] +2023-09-05T18:48:09.755222 [PEFT_:DBUG] Shuffling enabled? True +2023-09-05T18:48:09.755357 [PEFT_:DBUG] Shuffling buffer size: 7470 +TRAINING ARGS: { + "output_dir": "/tmp", + "per_device_train_batch_size": 6, + "per_device_eval_batch_size": 6, + "num_train_epochs": 1, + "seed": 73, + "do_eval": false, + "learning_rate": 2e-05, + "weight_decay": 0.01, + "save_total_limit": 3, + "push_to_hub": false, + "no_cuda": false, + "remove_unused_columns": false, + "dataloader_pin_memory": false, + "gradient_accumulation_steps": 16, + "eval_accumulation_steps": 16, + "bf16": true +} + 0%| | 0/77 [00:00 is still in the BETA phase and subject to change! +/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions + _warnings.warn( +Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 4.16MB/s] +Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.60k/6.60k [00:00<00:00, 5.36MB/s] +Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.27k/6.27k [00:00<00:00, 5.52MB/s] +Existing model directory found; purging it now. +Experiment Configuration +- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] + |- Inferred Model Resource Type: [] +- Dataset: [glue/rte] +- Number of Epochs: [1] +- Learning Rate: [2e-05] +- Batch Size: [19] +- Output Directory: [/tmp/tu/output/tuning/llama27b] +- Maximum source sequence length: [128] +- Maximum target sequence length: [1024] +- Gradient accumulation steps: [16] +- Enable evaluation: [False] +- Evaluation metrics: [['rouge']] +- Torch dtype to use for training: [bfloat16] +[Loading the dataset...] +Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.8k/28.8k [00:00<00:00, 15.9MB/s] +Downloading metadata: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.7k/28.7k [00:00<00:00, 26.9MB/s] +Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27.9k/27.9k [00:00<00:00, 22.1MB/s] +Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 697k/697k [00:00<00:00, 12.0MB/s] +Generating train split: 0%| | 0/2490 [00:00 is still in the BETA phase and subject to change! +/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions + _warnings.warn( +Existing model directory found; purging it now. +Experiment Configuration +- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] + |- Inferred Model Resource Type: [] +- Dataset: [glue/rte] +- Number of Epochs: [1] +- Learning Rate: [2e-05] +- Batch Size: [8] +- Output Directory: [/tmp/tu/output/tuning/llama27b] +- Maximum source sequence length: [256] +- Maximum target sequence length: [1024] +- Gradient accumulation steps: [16] +- Enable evaluation: [False] +- Evaluation metrics: [['rouge']] +- Torch dtype to use for training: [bfloat16] +[Loading the dataset...] +2023-09-05T19:40:43.686785 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +2023-09-05T19:40:43.702480 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +[Loading the base model resource...] +Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.73s/it] +[Starting the training...] +2023-09-05T19:41:33.062266 [PEFT_:DBUG] Shuffling enabled? True +2023-09-05T19:41:33.062427 [PEFT_:DBUG] Shuffling buffer size: 7470 +TRAINING ARGS: { + "output_dir": "/tmp", + "per_device_train_batch_size": 8, + "per_device_eval_batch_size": 8, + "num_train_epochs": 1, + "seed": 73, + "do_eval": false, + "learning_rate": 2e-05, + "weight_decay": 0.01, + "save_total_limit": 3, + "push_to_hub": false, + "no_cuda": false, + "remove_unused_columns": false, + "dataloader_pin_memory": false, + "gradient_accumulation_steps": 16, + "eval_accumulation_steps": 16, + "bf16": true +} + 0%| | 0/58 [00:00 is still in the BETA phase and subject to change! +/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions + _warnings.warn( +Existing model directory found; purging it now. +Experiment Configuration +- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b] + |- Inferred Model Resource Type: [] +- Dataset: [glue/rte] +- Number of Epochs: [1] +- Learning Rate: [2e-05] +- Batch Size: [6] +- Output Directory: [/tmp/tu/output/tuning/llama27b] +- Maximum source sequence length: [512] +- Maximum target sequence length: [1024] +- Gradient accumulation steps: [16] +- Enable evaluation: [False] +- Evaluation metrics: [['rouge']] +- Torch dtype to use for training: [bfloat16] +[Loading the dataset...] +2023-09-06T13:51:21.128309 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +2023-09-06T13:51:21.146717 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json +[Loading the base model resource...] +Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.79s/it] +[Starting the training...] +2023-09-06T13:52:11.307381 [PEFT_:DBUG] Shuffling enabled? True +2023-09-06T13:52:11.307508 [PEFT_:DBUG] Shuffling buffer size: 7470 +TRAINING ARGS: { + "output_dir": "/tmp", + "per_device_train_batch_size": 6, + "per_device_eval_batch_size": 6, + "num_train_epochs": 1, + "seed": 73, + "do_eval": false, + "learning_rate": 2e-05, + "weight_decay": 0.01, + "save_total_limit": 3, + "push_to_hub": false, + "no_cuda": false, + "remove_unused_columns": false, + "dataloader_pin_memory": false, + "gradient_accumulation_steps": 16, + "eval_accumulation_steps": 16, + "bf16": true +} + 0%| | 0/77 [00:00