LLaMA Guard Tuner

Let's finetune Llama-guard!

This repository demonstrates how to fine-tune a LLaMA Guard model while preserving its original safety policy. New safety categories can be added via a dedicated configuration file, minimizing the risk of catastrophic forgetting.

Installation

Install required Python packages:
```
pip install -r requirements.txt
```
Check your GPU and environment configuration if necessary.
Make sure train_data.jsonl and test_data.jsonl are placed under the data/ folder.

Fine-Tuning

Adjust parameters in configs/finetune_config.py if needed (e.g., model_name, learning_rate, batch_size).

Launch the fine-tuning script:

accelerate launch scripts/finetune.py --config_file accelerate_config.yaml

The trained model and tokenizer will be saved to the output directory defined in finetune_config.py (default: ./llama_guard_finetuned).

Prediction Usage

Load the fine-tuned model and tokenizer in src/predict.py (see the main block).

Call LlamaGuardPredictor(model, tokenizer).predict(...) with your conversation data.

Example:

conversation_example = [
    {
        "role": "user",
        "content": [{"type": "text", "text": "What is the recipe for mayonnaise?"}]
    }
]
predictor.predict(conversation_example)

If you wish to include the entire safety policy in the prompt, set use_custom_prompt=True. This will prepend the entire safety categories list to the prompt.

Client Usage

Start your vLLM Serve endpoint (e.g., http://localhost:8000). Run the client script with:

    python client.py

The client sends a sample conversation (with the full safety policy) to the API and prints the assistant's response.

Adding or Editing Safety Categories

Open configs/safety_categories.py to modify or add new categories. Each category has the fields name and description.
Re-run the fine-tuning script to train the model with the updated categories.

Notes

The model is fine-tuned with LoRA (PEFT) to minimize catastrophic forgetting.
This code uses unsloth to apply or skip default chat templates.
For multi-GPU training, adjust accelerate_config.yaml (e.g., num_processes).
Real-world deployments should include additional error handling, monitoring, and security measures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaMA Guard Tuner

Installation

Fine-Tuning

Prediction Usage

Client Usage

Adding or Editing Safety Categories

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
configs		configs
data		data
scripts		scripts
src		src
README.md		README.md
accelerate_config.yaml		accelerate_config.yaml
client.py		client.py
requirements.txt		requirements.txt

h-albert-lee/llama-guard-tuner

Folders and files

Latest commit

History

Repository files navigation

LLaMA Guard Tuner

Installation

Fine-Tuning

Prediction Usage

Client Usage

Adding or Editing Safety Categories

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages