diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx index 2aab1381b..0d1913406 100644 --- a/chapters/en/chapter11/1.mdx +++ b/chapters/en/chapter11/1.mdx @@ -8,7 +8,7 @@ Chat templates structure interactions between users and AI models, ensuring cons ## 2️⃣ Supervised Fine-Tuning -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see [The supervised fine-tuning section of the TRL documentation](https://huggingface.co/docs/trl/en/sft_trainer). +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see [the supervised fine-tuning section of the TRL documentation](https://huggingface.co/docs/trl/en/sft_trainer). ## 3️⃣ Low Rank Adaptation (LoRA) @@ -25,9 +25,9 @@ Evaluation is a crucial step in the fine-tuning process. It allows us to measure ## References - [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) -- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) +- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) - [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer) - [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290) -- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) +- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/sft_trainer) - [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://github.com/huggingface/alignment-handbook) - [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index e2c038e72..883287fb9 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -28,7 +28,7 @@ Instuction tuned models are trained to follow a specific conversational structur To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). Here's a guide on [ChatML](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146). -When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses [this configuration](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146). +When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses this configuration. ### Common Template Formats diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index e92885a86..95d7f5e00 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -129,7 +129,7 @@ trainer = SFTTrainer( args=training_args, train_dataset=dataset["train"], eval_dataset=dataset["test"], - tokenizer=tokenizer, + processing_class=tokenizer, ) # Start training @@ -142,7 +142,33 @@ When using a dataset with a "messages" field (like the example above), the SFTTr ## Packing the Dataset -The SFTTrainer supports example packing to optimize training efficiency through the `ConstantLengthDataset` utility class. This feature allows multiple short examples to be packed into the same input sequence, maximizing GPU utilization during training. To enable packing, simply set `packing=True` in the SFTConfig constructor. When using packed datasets with `max_steps`, be aware that you may train for more epochs than expected depending on your packing configuration. You can customize how examples are combined using a formatting function - particularly useful when working with datasets that have multiple fields like question-answer pairs. For evaluation datasets, you can disable packing by setting `eval_packing=False` in the SFTConfig. Here's a basic example: +The SFTTrainer supports example packing to optimize training efficiency. This feature allows multiple short examples to be packed into the same input sequence, maximizing GPU utilization during training. To enable packing, simply set `packing=True` in the SFTConfig constructor. When using packed datasets with `max_steps`, be aware that you may train for more epochs than expected depending on your packing configuration. You can customize how examples are combined using a formatting function - particularly useful when working with datasets that have multiple fields like question-answer pairs. For evaluation datasets, you can disable packing by setting `eval_packing=False` in the SFTConfig. Here's a basic example of customizing the packing configuration: + +```python +# Configure packing +training_args = SFTConfig(packing=True) + +trainer = SFTTrainer(model=model, train_dataset=dataset, args=training_args) + +trainer.train() +``` + +When packing the dataset with multiple fields, you can define a custom formatting function to combine the fields into a single input sequence. This function should take a list of examples and return a dictionary with the packed input sequence. Here's an example of a custom formatting function: + +```python +def formatting_func(example): + text = f"### Question: {example['question']}\n ### Answer: {example['answer']}" + return text + + +training_args = SFTConfig(packing=True) +trainer = SFTTrainer( + "facebook/opt-350m", + train_dataset=dataset, + args=training_args, + formatting_func=formatting_func, +) +``` ## Monitoring Training Progress @@ -346,5 +372,5 @@ You've learned how to fine-tune models using SFT! To continue your learning: ## Additional Resources - [TRL Documentation](https://huggingface.co/docs/trl) -- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) +- [SFT Examples Repository](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) - [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training) diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx index ec2ebbf11..6d16d6d01 100644 --- a/chapters/en/chapter11/4.mdx +++ b/chapters/en/chapter11/4.mdx @@ -43,7 +43,6 @@ model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path) lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora") ``` - ![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png) ## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA @@ -108,8 +107,7 @@ trainer = SFTTrainer( args=args, train_dataset=dataset["train"], peft_config=lora_config, # LoRA configuration - max_seq_length=max_seq_length, # Maximum sequence length - tokenizer=tokenizer, + processing_class=tokenizer, ) ``` @@ -168,6 +166,6 @@ tokenizer.save_pretrained("path/to/save/merged_model") # Resources -- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) +- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/pdf/2106.09685) - [PEFT Documentation](https://huggingface.co/docs/peft) - [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) \ No newline at end of file