Let's finetune Llama-guard!
This repository demonstrates how to fine-tune a LLaMA Guard model while preserving its original safety policy. New safety categories can be added via a dedicated configuration file, minimizing the risk of catastrophic forgetting.
- Install required Python packages:
pip install -r requirements.txt
- Check your GPU and environment configuration if necessary.
- Make sure
train_data.jsonl
andtest_data.jsonl
are placed under thedata/
folder.
- Adjust parameters in
configs/finetune_config.py
if needed (e.g.,model_name
,learning_rate
,batch_size
). - Launch the fine-tuning script:
accelerate launch scripts/finetune.py --config_file accelerate_config.yaml
- The trained model and tokenizer will be saved to the output directory defined in
finetune_config.py
(default:./llama_guard_finetuned
).
-
Load the fine-tuned model and tokenizer in
src/predict.py
(see the main block). -
Call
LlamaGuardPredictor(model, tokenizer).predict(...)
with your conversation data.Example:
conversation_example = [ { "role": "user", "content": [{"type": "text", "text": "What is the recipe for mayonnaise?"}] } ] predictor.predict(conversation_example)
-
If you wish to include the entire safety policy in the prompt, set
use_custom_prompt=True
. This will prepend the entire safety categories list to the prompt.
Start your vLLM Serve endpoint (e.g., http://localhost:8000). Run the client script with:
python client.py
The client sends a sample conversation (with the full safety policy) to the API and prints the assistant's response.
- Open
configs/safety_categories.py
to modify or add new categories. Each category has the fieldsname
anddescription
. - Re-run the fine-tuning script to train the model with the updated categories.
- The model is fine-tuned with LoRA (PEFT) to minimize catastrophic forgetting.
- This code uses
unsloth
to apply or skip default chat templates. - For multi-GPU training, adjust
accelerate_config.yaml
(e.g.,num_processes
). - Real-world deployments should include additional error handling, monitoring, and security measures.