Name		Name	Last commit message	Last commit date
parent directory ..
ckpts		ckpts
configs		configs
finetuned_models		finetuned_models
ft_datasets		ft_datasets
inference		inference
model_checkpointing		model_checkpointing
policies		policies
safety_evaluation		safety_evaluation
utility_evaluation/mt_bench		utility_evaluation/mt_bench
utils		utils
LICENSE		LICENSE
README.md		README.md
finetuning.py		finetuning.py
requirements.txt		requirements.txt
tier1-harmful-examples-demonstration.ipynb		tier1-harmful-examples-demonstration.ipynb
tier2-identity-shifting-aoa.ipynb		tier2-identity-shifting-aoa.ipynb
tier3-benign-alpaca.ipynb		tier3-benign-alpaca.ipynb
tier3-benign-dolly.ipynb		tier3-benign-dolly.ipynb

README.md

LLMs-Finetuning-Safety (Llama2)

This directory contains necessary code to fine-tune Llama2 models and evaluate their safety alignment, built upon the official Llama2 fine-tuning guidance (llama-recipe).

Quick Start

First, manually download the public Llama-2-chat-7b model checkpoint (e.g. from here) to the ckpts/ directory in current folder.

cd ckpt
git clone https://huggingface.co/TheBloke/Llama-2-7b-chat-fp16

Then, set up your OpenAI API keys at safety_evaluation/gpt4_eval.py and utility_evaluation/mt_bench/gen_judgment.py, which will be used for model safety and utility judgement by GPT-4.

After the preparations above, follow the notebooks we provided:

tier1-harmful-examples-demonstration.ipynb -- fine-tuning with explicitly harmful datasets： harmful examples demonstration attack.
tier2-identity-shifting-aoa.ipynb -- fine-tuning with implicitly harmful datasets: identity shifting attack (Absolutely Obedient Agent).
tier3-benign-alpaca.ipynb -- fine-tuning with benign datasets: Alpaca
tier3-benign-dolly.ipynb -- fine-tuning with benign datasets: Dolly to 1) finetune the Llama2 models and 2) evaluate the alignment safety. The different notebooks correspond to the three different risk levels we outline in our paper (Sec 4) when fine-tuning LLMs.

(note: the --batch_size hyper-parameter in the notebook means the local batch_size per GPU rather than the global batch_size.)

In addition, we also provide code to evaluate the utility scores of finetuned Llama2 models on MT-Bench. Refer to utility_evaluation/mt_bench/README.md for instructions.

To customize other setups (e.g. dataset configurations and training hyperparameters), please refer to llama-recipe for detailed documentations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama2

llama2

README.md

LLMs-Finetuning-Safety (Llama2)

Quick Start

Files

llama2

Directory actions

More options

Directory actions

More options

Latest commit

History

llama2

Folders and files

parent directory

README.md

LLMs-Finetuning-Safety (Llama2)

Quick Start