-
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #591 from zackproser/post-how-to-fine-tune-llama-o…
…n-lightning-ai Add new post: How to fine-tune Llama 3.1 on Lightning.ai
- Loading branch information
Showing
5 changed files
with
281 additions
and
0 deletions.
There are no files selected for viewing
281 changes: 281 additions & 0 deletions
281
src/app/blog/how-to-fine-tune-llama-3-1-on-lightning-ai-with-torchtune/page.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,281 @@ | ||
import { ArticleLayout } from '@/components/ArticleLayout' | ||
import { Button } from '@/components/Button' | ||
import Image from 'next/image' | ||
|
||
import fineTuneLlama from '@/images/fine-tune-llama.webp' | ||
import lightningAISuccessfulRun from '@/images/lightning-ai-successful-run.webp' | ||
import lightningAIGPUPicker from '@/images/lightning-ai-gpu-picker.webp' | ||
import lightningAINewStudio from '@/images/lightning-ai-new-studio.webp' | ||
import wandbMetrics from '@/images/wandb-metrics.webp' | ||
|
||
import { createMetadata } from '@/utils/createMetadata' | ||
|
||
export const metadata = createMetadata({ | ||
author: "Zachary Proser", | ||
date: "2024-09-22", | ||
title: "How to Fine-tune Llama 3.1 on Lightning.ai with Torchtune", | ||
description: "One of the better Jupyter Notebooks to GPU-backed environment experiences I've had...", | ||
image: fineTuneLlama, | ||
slug: '/blog/how-to-fine-tune-llama-3-1-on-lightning-ai-with-torchtune' | ||
}); | ||
|
||
export default (props) => <ArticleLayout metadata={metadata} {...props} /> | ||
|
||
<Image src={fineTuneLlama} alt="Fine-tune Llama 3.1 on Lightning.ai with Torchtune" /> | ||
<figcaption>Fine-tuning Llama 3.1 on Lightning.ai with Torchtune is a breeze...</figcaption> | ||
|
||
## Table of contents | ||
|
||
## Introduction | ||
|
||
In this tutorial, I'll walk you through fine-tuning the Llama 3.1-8B-Instruct model using Torchtune on the Lightning.ai platform. | ||
|
||
In a [recent review I wrote of cloud GPU services for Jupyter Notebooks](https://zackproser.com/blog/cloud-gpu-services-jupyter-notebook-reviewed), Lightning.ai stood out as one of the best experiences available for running a Jupyter Notebook in a native IDE and easily switching to GPU environments when ready to run real workloads. | ||
|
||
If you'd like to learn more about fine-tuning and how LoRA (Low-Rank Adaptation) works, check out: [What is LoRA and QLoRA?](https://zackproser.com/blog/what-is-lora-and-qlora) | ||
|
||
## 1. Log into Lightning.ai and set up environment | ||
|
||
1. [Log into Lightning.ai](https://lightning.ai/). If you haven't used Lightning.ai yet, you'll need to create an account, and you may need to wait one day or so for your account to be approved. | ||
2. Import the [companion Jupyter Notebook for this project](https://github.com/zackproser/llama-3-1-8b-finetune-lightning-ai-torchtune/blob/main/llama-3-1-8b-finetune-lightning-ai-torchtune.ipynb) by clicking the New file button in the top-left corner and selecting "Upload Notebook". | ||
3. Initially, select a free CPU-backed studio to set up and test our environment. | ||
|
||
<Image src={lightningAINewStudio} alt="Lightning.ai new studio" /> | ||
<figcaption>Lightning.ai's UX really shines - a lot of care and thought has been put into making it easy to work with</figcaption> | ||
|
||
This allows us to set up our environment and ensure everything is working correctly before we move to GPU resources for the actual fine-tuning process. | ||
|
||
## 2. Install dependencies | ||
|
||
Now that we have our Notebook set up in Lightning.ai Studio, let's install the necessary libraries: | ||
|
||
```python | ||
!pip install -U torchtune==0.2.1 torchao wandb peft sentencepiece transformers | ||
``` | ||
|
||
|
||
## 3. Log into Weights & Biases | ||
|
||
Next, we'll log in to Weights & Biases for experiment tracking. | ||
|
||
```python | ||
import wandb | ||
wandb.login() | ||
``` | ||
|
||
Note that when you run this cell, you'll get output that links you to your Weights & Biases token page. | ||
|
||
Go to that URL copy out your token and paste it into the dialog box in the Notebook cell output, then press Enter to continue. | ||
|
||
If all worked well, you should see output like this: | ||
|
||
`True` | ||
|
||
There's another piece you need to configure in step #5, under the `metric_logger` section, | ||
to tell Torchtune to send logs and metrics to your Weights & Biases project. Be sure to replace `write_like_me` with the name of your project. | ||
|
||
Once you've fully configured your Weights & Biases project, you'll be able to log into your W&B dashboard and see your metrics during Fine-tuning: | ||
|
||
<Image src={wandbMetrics} alt="Weights & Biases metrics" /> | ||
<figcaption>Metrics and logs are sent to your Weights & Biases project, so you can monitor even long-running jobs closely</figcaption> | ||
|
||
|
||
## 4. Download the Base Model | ||
|
||
We'll use the `tune download` command to fetch the Meta-Llama-3.1-8B-Instruct model from the Hugging Face Model Hub: | ||
|
||
```python | ||
!tune download meta-llama/Meta-Llama-3.1-8B-Instruct --ignore-patterns=null | ||
``` | ||
|
||
This command downloads the model and its weights, storing them in `/tmp/Meta-Llama-3.1-8B-Instruct` by default. | ||
|
||
## 5. Create and Modify Torchtune Configuration File | ||
|
||
[Torchtune](https://github.com/pytorch/torchtune), a native PyTorch library, provides pre-configured recipes for various steps in the model lifecycle. | ||
|
||
You access available recipes with the `tune ls` (show recipes) and the `tune cp` (copy recipe) commands: | ||
|
||
``` | ||
tune ls | ||
RECIPE CONFIG | ||
full_finetune_single_device llama2/7B_full_low_memory | ||
mistral/7B_full_low_memory | ||
full_finetune_distributed llama2/7B_full | ||
llama2/13B_full | ||
mistral/7B_full | ||
lora_finetune_single_device llama2/7B_lora_single_device | ||
llama2/7B_qlora_single_device | ||
mistral/7B_lora_single_device | ||
... | ||
``` | ||
|
||
You can then run `tune cp` to copy the recipe to your local directory as YAML, at which point you can edit the parameters to customize the recipe to your needs. | ||
|
||
```bash | ||
❯ tune cp llama3_1/8B_qlora_single_device my_conf | ||
Copied file to my_conf.yaml | ||
``` | ||
|
||
The following configuration file shows some of the key modifications I made: | ||
|
||
```yaml | ||
# Config for single device QLoRA with lora_finetune_single_device.py | ||
# using a Llama3.1 8B Instruct model | ||
model: | ||
_component_: torchtune.models.llama3_1.qlora_llama3_1_8b | ||
lora_attn_modules: ['q_proj', 'k_proj', 'v_proj'] | ||
apply_lora_to_mlp: False | ||
apply_lora_to_output: False | ||
lora_rank: 8 | ||
lora_alpha: 16 | ||
tokenizer: | ||
_component_: torchtune.models.llama3.llama3_tokenizer | ||
path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model | ||
# Dataset and Sampler | ||
dataset: | ||
_component_: torchtune.datasets.instruct_dataset | ||
source: zackproser/writing_corpus | ||
data_files: 'training_data.jsonl' | ||
template: torchtune.data.AlpacaInstructTemplate | ||
train_on_input: False | ||
split: train | ||
seed: 42 # Set a fixed seed for reproducibility | ||
shuffle: True | ||
batch_size: 1 | ||
max_seq_length: 2048 # Reduced to prevent OOM issues | ||
# ... (rest of the configuration) | ||
# Use Weights & Biases for logging | ||
metric_logger: | ||
_component_: torchtune.utils.metric_logging.WandBLogger | ||
project: write_like_me | ||
``` | ||
|
||
The `dataset` section is where you can modify the dataset you use to fine-tune your model. You could use your own Hugging Face dataset, by passing it in the `source` key, like so: | ||
|
||
```yaml | ||
dataset: | ||
_component_: torchtune.datasets.instruct_dataset | ||
source: zackproser/writing_corpus | ||
data_files: 'training_data.jsonl' | ||
template: torchtune.data.AlpacaInstructTemplate | ||
train_on_input: False | ||
split: train | ||
seed: 42 | ||
``` | ||
|
||
You could also use a local dataset, by passing in a filepath to a JSONL file, like so: | ||
|
||
```yaml | ||
dataset: | ||
_component_: torchtune.datasets.instruct_dataset | ||
source: /path/to/your/local/dataset.jsonl | ||
template: torchtune.data.AlpacaInstructTemplate | ||
train_on_input: False | ||
split: train | ||
seed: 42 | ||
``` | ||
|
||
See my article on [How to create a custom dataset for fine-tuning Llama 3.1](/blog/how-to-create-a-custom-alpaca-dataset) for more details on creating a custom dataset. | ||
|
||
## 6. Run Fine-tuning sanity check against CPU | ||
|
||
Before we start the actual fine-tuning process, let's run a small test to ensure everything is set up correctly and logging to Weights & Biases: | ||
|
||
```python | ||
!tune run lora_finetune_single_device --config llama_wandb_qlora.yaml --max-steps 10 | ||
``` | ||
|
||
Running this smoke test against the CPU also ensures there are no configuration issues, such as missing files or wrong paths, (which don't appear until training is underway). | ||
|
||
If everything looks good, we're ready to switch to a GPU-backed environment for the real fine-tuning run. | ||
|
||
## 7. Switch to a GPU-backed Environment | ||
|
||
Now that we've confirmed our setup is working, let's switch to a GPU-backed environment in Lightning.ai Studio for the actual fine-tuning process: | ||
|
||
<Image src={lightningAIGPUPicker} alt="Lightning.ai GPU picker" /> | ||
<figcaption>Lightning.ai makes it dead-simple to switch to a GPU backed environment when you're ready to start training</figcaption> | ||
|
||
In the top-right corner of the Studio, click on the icon that says "4 CPU", which opens the GPU picker, allowing you to view the different GPU options available. | ||
|
||
Once you've selected a GPU-backed environment, reopen your Notebook and proceed with the full fine-tuning run. | ||
|
||
## 8. Run Your Fine-tuning Job | ||
|
||
With our GPU-backed environment ready, we can start the full fine-tuning process: | ||
|
||
```python | ||
!tune run lora_finetune_single_device --config llama_wandb_qlora.yaml | ||
``` | ||
|
||
This command initiates the fine-tuning process using our defined configuration. The process will log metrics to Weights & Biases, allowing us to monitor the training in real-time. | ||
|
||
<Image src={lightningAISuccessfulRun} alt="Lightning.ai successful run" /> | ||
<figcaption>A recent successful Fine-tuning run on Lightning.ai</figcaption> | ||
|
||
## 9. Publish Your Fine-tuned Model | ||
|
||
Upload your fine-tuned model to the Hugging Face Hub for easy access and sharing. | ||
|
||
Prior to doing so, you'll need to export your Hugging Face token to your environment: | ||
|
||
```python | ||
import os | ||
os.environ['HF_TOKEN'] = '<your-hf-token>' | ||
``` | ||
|
||
In the following example command, replace `<your-username>` with your actual username on the Hugging Face platform, | ||
and replace `<your-model-name>` with a name of your choosing for your fine-tuned model. If you don't already have a model | ||
of that name, it will be created for you automatically when you run the command. | ||
|
||
The filepath to your fine-tuned model is in your Torchtune configuration file, under the `output_dir` key. | ||
|
||
```python | ||
!huggingface-cli upload <your-username>/<your-model-name> <filepath-on-system-where-you-output-your-finetuned-model> | ||
``` | ||
|
||
## 10. Verify your Fine-tuned Model | ||
|
||
After fine-tuning, it's crucial to verify that our model is working as expected. | ||
|
||
We can do this by loading the model and generating some text. | ||
|
||
As an added bonus, our freshly published model will be downloaded from the Hugging Face Model Hub, | ||
in the same way that others will use it in the future, so this is also a good smoke test that | ||
publishing the model succeeded. | ||
|
||
Be sure to replace the `model_name` and `new_model` parameters with the actual names you used when publishing your model. | ||
|
||
You will probably want to update your `prompt` as well prior to testing. | ||
|
||
```python | ||
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline | ||
from peft import PeftModel | ||
import torch | ||
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct" | ||
new_model = "zackproser/Meta-Llama-3.1-8B-instruct-zp-writing-ft-qlora" | ||
base_model = AutoModelForCausalLM.from_pretrained(model_name) | ||
model = PeftModel.from_pretrained(base_model, new_model) | ||
model = model.merge_and_unload() | ||
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) | ||
tokenizer.pad_token = tokenizer.eos_token | ||
tokenizer.padding_side = "right" | ||
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=2000, device=0) | ||
prompt = "Write me an article about getting faster as a developer" | ||
result = pipe(f"<s>[INST] {prompt} [/INST]") | ||
print(result) | ||
``` | ||
|
||
This code loads our fine-tuned model, creates a text generation pipeline, and generates an article based on a prompt. | ||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.