diff --git a/README.md b/README.md
index a3ed1ab9..a750375e 100644
--- a/README.md
+++ b/README.md
@@ -41,6 +41,10 @@ Prompt tuning - learning soft prompts. This is different from prompt engineering
 
 The important difference between fine tuning and capabilities like prompt tuning/multi-taskprompt tuning is that the latter doesn't change the base model's weights at all. So when you run inference for prompt tuned models, you can have n prompts to 1 base model, and just inject the prompt tensors you need when they're requested instead of having _n_ separate fine-tuned models.
 
+### Runtime Performance Benchmarking 
+
+[Runtime Performance Benchmarking](./benchmarks/README.md) for tuning various models.
+
 #### Notes
 
 - Currently causal language models and sequence-to-sequence models are supported.
diff --git a/benchmarks/README.md b/benchmarks/README.md
new file mode 100644
index 00000000..4657e584
--- /dev/null
+++ b/benchmarks/README.md
@@ -0,0 +1,13 @@
+# Caikit NLP Runtime Performance Benchmarks
+
+Runtime performance benchmarking results for various model on various hardware configurations.
+
+## Llama2-7b
+
+| Date Executed |   Hardware   | Training Set | Epoch | Precision | Batch Size | Max Source Length | Training Runtime (s) | Samples Per Second | Train Steps Per Second | Loss |        Notes         |
+|---|---|---------------|---|---|:---:|---|------------| --- |---|---|---|
+| [2023-09-05](./logs/llama2-7b/20230905_183655.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 4096 | 350 | 21.325 | 0.22 | 1.65 |     4096 is the context size for Llama2     |
+| [2023-09-05](./logs/llama2-7b/20230905_184809.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 1024 | 350 | 21.333 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM |
+| [2023-09-06](./logs/llama2-7b/20230906_135211.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 6 | 512 | 348 | 21.44 | 0.22 | 1.65 | batch size of 7 fails CUDA OOM |
+| [2023-09-05](./logs/llama2-7b/20230905_194133.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 8 | 256 | 356 | 20.939 | 0.16 | 1.70 | batch size of 9 fails CUDA OOM |
+| [2023-09-05](./logs/llama2-7b/20230905_191650.output) | 1 x A100 80GB | [Glue / RTE](https://huggingface.co/datasets/glue) | 1 | bfloat16 | 19 | 128 | 254 | 29.332 | 0.09 | 1.94 | batch size of 20 fails CUDA OOM |
diff --git a/benchmarks/logs/llama2-7b/20230905_183655.output b/benchmarks/logs/llama2-7b/20230905_183655.output
new file mode 100644
index 00000000..2dbea0be
--- /dev/null
+++ b/benchmarks/logs/llama2-7b/20230905_183655.output
@@ -0,0 +1,54 @@
+(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
+  _warnings.warn(
+<function register_backend_type at 0x1549c03e3790> is still in the BETA phase and subject to change!
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
+  _warnings.warn(
+Existing model directory found; purging it now.
+Experiment Configuration
+- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
+ |- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
+- Dataset: [glue/rte]
+- Number of Epochs: [1]
+- Learning Rate: [2e-05]
+- Batch Size: [6]
+- Output Directory: [/tmp/tu/output/tuning/llama27b]
+- Maximum source sequence length: [4096]
+- Maximum target sequence length: [1024]
+- Gradient accumulation steps: [16]
+- Enable evaluation: [False]
+- Evaluation metrics: [['rouge']]
+- Torch dtype to use for training: [bfloat16]
+[Loading the dataset...]
+2023-09-05T18:36:55.174106 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+2023-09-05T18:36:55.192203 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+[Loading the base model resource...]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.89s/it]
+[Starting the training...]
+2023-09-05T18:37:47.419502 [PEFT_:DBUG] Shuffling enabled? True
+2023-09-05T18:37:47.419666 [PEFT_:DBUG] Shuffling buffer size: 7470
+TRAINING ARGS: {
+    "output_dir": "/tmp",
+    "per_device_train_batch_size": 6,
+    "per_device_eval_batch_size": 6,
+    "num_train_epochs": 1,
+    "seed": 73,
+    "do_eval": false,
+    "learning_rate": 2e-05,
+    "weight_decay": 0.01,
+    "save_total_limit": 3,
+    "push_to_hub": false,
+    "no_cuda": false,
+    "remove_unused_columns": false,
+    "dataloader_pin_memory": false,
+    "gradient_accumulation_steps": 16,
+    "eval_accumulation_steps": 16,
+    "bf16": true
+}
+  0%|                                                                                                                                                              | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+{'train_runtime': 350.2997, 'train_samples_per_second': 21.325, 'train_steps_per_second': 0.22, 'train_loss': 1.6495626870687905, 'epoch': 0.99}                                          
+100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:50<00:00,  4.55s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.79s/it]
+Using sep_token, but it is not set yet.
+[Training Complete]
+
diff --git a/benchmarks/logs/llama2-7b/20230905_184809.output b/benchmarks/logs/llama2-7b/20230905_184809.output
new file mode 100644
index 00000000..48eba901
--- /dev/null
+++ b/benchmarks/logs/llama2-7b/20230905_184809.output
@@ -0,0 +1,54 @@
+(tuning) [gpu_user@gpu5480 caikit-nlp]$ ./ft_job.sh
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
+  _warnings.warn(
+<function register_backend_type at 0x14c45f315790> is still in the BETA phase and subject to change!
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
+  _warnings.warn(
+Existing model directory found; purging it now.
+Experiment Configuration
+- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
+ |- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
+- Dataset: [glue/rte]
+- Number of Epochs: [1]
+- Learning Rate: [2e-05]
+- Batch Size: [6]
+- Output Directory: [/tmp/tu/output/tuning/llama27b]
+- Maximum source sequence length: [1024]
+- Maximum target sequence length: [1024]
+- Gradient accumulation steps: [16]
+- Enable evaluation: [False]
+- Evaluation metrics: [['rouge']]
+- Torch dtype to use for training: [bfloat16]
+[Loading the dataset...]
+2023-09-05T18:47:18.075310 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+2023-09-05T18:47:18.093371 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+[Loading the base model resource...]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.76s/it]
+[Starting the training...]
+2023-09-05T18:48:09.755222 [PEFT_:DBUG] Shuffling enabled? True
+2023-09-05T18:48:09.755357 [PEFT_:DBUG] Shuffling buffer size: 7470
+TRAINING ARGS: {
+    "output_dir": "/tmp",
+    "per_device_train_batch_size": 6,
+    "per_device_eval_batch_size": 6,
+    "num_train_epochs": 1,
+    "seed": 73,
+    "do_eval": false,
+    "learning_rate": 2e-05,
+    "weight_decay": 0.01,
+    "save_total_limit": 3,
+    "push_to_hub": false,
+    "no_cuda": false,
+    "remove_unused_columns": false,
+    "dataloader_pin_memory": false,
+    "gradient_accumulation_steps": 16,
+    "eval_accumulation_steps": 16,
+    "bf16": true
+}
+  0%|                                                                                                                                                              | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+{'train_runtime': 350.165, 'train_samples_per_second': 21.333, 'train_steps_per_second': 0.22, 'train_loss': 1.6495626870687905, 'epoch': 0.99}                                           
+100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:50<00:00,  4.55s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.83s/it]
+Using sep_token, but it is not set yet.
+[Training Complete]
+
diff --git a/benchmarks/logs/llama2-7b/20230905_191650.output b/benchmarks/logs/llama2-7b/20230905_191650.output
new file mode 100644
index 00000000..c6c7521b
--- /dev/null
+++ b/benchmarks/logs/llama2-7b/20230905_191650.output
@@ -0,0 +1,67 @@
+(tuning) [gpu_user@gpu6120 caikit-nlp]$ ./ft_job.sh 
+The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
+0it [00:00, ?it/s]
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
+  _warnings.warn(
+<function register_backend_type at 0x14dc9cc7f940> is still in the BETA phase and subject to change!
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
+  _warnings.warn(
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 4.16MB/s]
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.60k/6.60k [00:00<00:00, 5.36MB/s]
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.27k/6.27k [00:00<00:00, 5.52MB/s]
+Existing model directory found; purging it now.
+Experiment Configuration
+- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
+ |- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
+- Dataset: [glue/rte]
+- Number of Epochs: [1]
+- Learning Rate: [2e-05]
+- Batch Size: [19]
+- Output Directory: [/tmp/tu/output/tuning/llama27b]
+- Maximum source sequence length: [128]
+- Maximum target sequence length: [1024]
+- Gradient accumulation steps: [16]
+- Enable evaluation: [False]
+- Evaluation metrics: [['rouge']]
+- Torch dtype to use for training: [bfloat16]
+[Loading the dataset...]
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.8k/28.8k [00:00<00:00, 15.9MB/s]
+Downloading metadata: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.7k/28.7k [00:00<00:00, 26.9MB/s]
+Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27.9k/27.9k [00:00<00:00, 22.1MB/s]
+Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 697k/697k [00:00<00:00, 12.0MB/s]
+Generating train split:   0%|                                                                                                                             | 0/2490 [00:00<?, ? examples/s]2023-09-05T19:16:00.306639 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad.incomplete/glue-train-00000-00000-of-NNNNN.arrow
+Generating train split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2490/2490 [00:00<00:00, 5375.17 examples/s]
+Generating validation split:   0%|                                                                                                                         | 0/277 [00:00<?, ? examples/s]2023-09-05T19:16:00.770379 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad.incomplete/glue-validation-00000-00000-of-NNNNN.arrow
+Generating validation split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 277/277 [00:00<00:00, 28629.00 examples/s]
+Generating test split:   0%|                                                                                                                              | 0/3000 [00:00<?, ? examples/s]2023-09-05T19:16:00.780343 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad.incomplete/glue-test-00000-00000-of-NNNNN.arrow
+Generating test split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 35352.71 examples/s]
+2023-09-05T19:16:00.866002 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad.incomplete/dataset_info.json
+[Loading the base model resource...]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.75s/it]
+[Starting the training...]
+2023-09-05T19:16:50.992041 [PEFT_:DBUG] Shuffling enabled? True
+2023-09-05T19:16:50.992203 [PEFT_:DBUG] Shuffling buffer size: 7470
+TRAINING ARGS: {
+    "output_dir": "/tmp",
+    "per_device_train_batch_size": 19,
+    "per_device_eval_batch_size": 19,
+    "num_train_epochs": 1,
+    "seed": 73,
+    "do_eval": false,
+    "learning_rate": 2e-05,
+    "weight_decay": 0.01,
+    "save_total_limit": 3,
+    "push_to_hub": false,
+    "no_cuda": false,
+    "remove_unused_columns": false,
+    "dataloader_pin_memory": false,
+    "gradient_accumulation_steps": 16,
+    "eval_accumulation_steps": 16,
+    "bf16": true
+}
+  0%|                                                                                                                                                              | 0/24 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+{'train_runtime': 254.6707, 'train_samples_per_second': 29.332, 'train_steps_per_second': 0.094, 'train_loss': 1.93836243947347, 'epoch': 0.97}                                           
+100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [04:14<00:00, 10.61s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.60s/it]
+Using sep_token, but it is not set yet.
+[Training Complete]
diff --git a/benchmarks/logs/llama2-7b/20230905_194133.output b/benchmarks/logs/llama2-7b/20230905_194133.output
new file mode 100644
index 00000000..df38628b
--- /dev/null
+++ b/benchmarks/logs/llama2-7b/20230905_194133.output
@@ -0,0 +1,53 @@
+(tuning) [gpu_user@gpu6120 caikit-nlp]$ ./ft_job.sh 
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
+  _warnings.warn(
+<function register_backend_type at 0x153217c65790> is still in the BETA phase and subject to change!
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
+  _warnings.warn(
+Existing model directory found; purging it now.
+Experiment Configuration
+- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
+ |- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
+- Dataset: [glue/rte]
+- Number of Epochs: [1]
+- Learning Rate: [2e-05]
+- Batch Size: [8]
+- Output Directory: [/tmp/tu/output/tuning/llama27b]
+- Maximum source sequence length: [256]
+- Maximum target sequence length: [1024]
+- Gradient accumulation steps: [16]
+- Enable evaluation: [False]
+- Evaluation metrics: [['rouge']]
+- Torch dtype to use for training: [bfloat16]
+[Loading the dataset...]
+2023-09-05T19:40:43.686785 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+2023-09-05T19:40:43.702480 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+[Loading the base model resource...]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.73s/it]
+[Starting the training...]
+2023-09-05T19:41:33.062266 [PEFT_:DBUG] Shuffling enabled? True
+2023-09-05T19:41:33.062427 [PEFT_:DBUG] Shuffling buffer size: 7470
+TRAINING ARGS: {
+    "output_dir": "/tmp",
+    "per_device_train_batch_size": 8,
+    "per_device_eval_batch_size": 8,
+    "num_train_epochs": 1,
+    "seed": 73,
+    "do_eval": false,
+    "learning_rate": 2e-05,
+    "weight_decay": 0.01,
+    "save_total_limit": 3,
+    "push_to_hub": false,
+    "no_cuda": false,
+    "remove_unused_columns": false,
+    "dataloader_pin_memory": false,
+    "gradient_accumulation_steps": 16,
+    "eval_accumulation_steps": 16,
+    "bf16": true
+}
+  0%|                                                                                                                                                              | 0/58 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+{'train_runtime': 356.7428, 'train_samples_per_second': 20.939, 'train_steps_per_second': 0.163, 'train_loss': 1.7029038790998787, 'epoch': 0.99}                                         
+100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 58/58 [05:56<00:00,  6.15s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.86s/it]
+Using sep_token, but it is not set yet.
+[Training Complete]
\ No newline at end of file
diff --git a/benchmarks/logs/llama2-7b/20230906_135211.output b/benchmarks/logs/llama2-7b/20230906_135211.output
new file mode 100644
index 00000000..98c32ff0
--- /dev/null
+++ b/benchmarks/logs/llama2-7b/20230906_135211.output
@@ -0,0 +1,53 @@
+(tuning) [gpu_user@gpu5530 caikit-nlp]$ ./ft_job.sh 
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/errors/__init__.py:29: DeprecationWarning: The caikit.toolkit.errors package has moved to caikit.core.exceptions
+  _warnings.warn(
+<function register_backend_type at 0x1471039b7790> is still in the BETA phase and subject to change!
+/u/gpu_user/.conda/envs/tuning/lib/python3.9/site-packages/caikit/core/toolkit/error_handler.py:29: DeprecationWarning: The caikit.toolkit.error_handler package has moved to caikit.core.exceptions
+  _warnings.warn(
+Existing model directory found; purging it now.
+Experiment Configuration
+- Model Name: [/tmp/tu/huggingface/hub/models--llama-2-7b]
+ |- Inferred Model Resource Type: [<class 'caikit_nlp.resources.pretrained_model.hf_auto_causal_lm.HFAutoCausalLM'>]
+- Dataset: [glue/rte]
+- Number of Epochs: [1]
+- Learning Rate: [2e-05]
+- Batch Size: [6]
+- Output Directory: [/tmp/tu/output/tuning/llama27b]
+- Maximum source sequence length: [512]
+- Maximum target sequence length: [1024]
+- Gradient accumulation steps: [16]
+- Enable evaluation: [False]
+- Evaluation metrics: [['rouge']]
+- Torch dtype to use for training: [bfloat16]
+[Loading the dataset...]
+2023-09-06T13:51:21.128309 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+2023-09-06T13:51:21.146717 [fsspe:DBUG] open file: /u/gpu_user/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/dataset_info.json
+[Loading the base model resource...]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.79s/it]
+[Starting the training...]
+2023-09-06T13:52:11.307381 [PEFT_:DBUG] Shuffling enabled? True
+2023-09-06T13:52:11.307508 [PEFT_:DBUG] Shuffling buffer size: 7470
+TRAINING ARGS: {
+    "output_dir": "/tmp",
+    "per_device_train_batch_size": 6,
+    "per_device_eval_batch_size": 6,
+    "num_train_epochs": 1,
+    "seed": 73,
+    "do_eval": false,
+    "learning_rate": 2e-05,
+    "weight_decay": 0.01,
+    "save_total_limit": 3,
+    "push_to_hub": false,
+    "no_cuda": false,
+    "remove_unused_columns": false,
+    "dataloader_pin_memory": false,
+    "gradient_accumulation_steps": 16,
+    "eval_accumulation_steps": 16,
+    "bf16": true
+}
+  0%|                                                                                                                                                              | 0/77 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+{'train_runtime': 348.4812, 'train_samples_per_second': 21.436, 'train_steps_per_second': 0.221, 'train_loss': 1.6495626870687905, 'epoch': 0.99}                                         
+100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [05:48<00:00,  4.53s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.61s/it]
+Using sep_token, but it is not set yet.
+[Training Complete]
\ No newline at end of file