Skip to content

CSCfi/llm-fine-tuning-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-tuning LLMs on supercomputers

Example scripts showing how to fine-tune LLMs on CSC's supercomputers.

The script finetuning.py runs fine-tuning with the IMDb movie reviews dataset on a given Hugging Face model, by default it uses EleutherAI/gpt-neo-1.3B which fits comfortably into the GPU memory of a V100. You can select another model with the --model argument.

The launch scripts are:

  • run-finetuning-puhti-gpu1.sh - fine-tuning on Puhti with 1 GPU
  • run-finetuning-puhti-gpu4.sh - fine-tuning on Puhti with one full node (4 GPUs)
  • run-finetuning-puhti-gpu8.sh - fine-tuning on Puhti with two full nodes (8 GPUs in total)
  • run-finetuning-puhti-gpu4-accelerate.sh - fine-tuning on Puhti with one full node using Accelerate
  • run-finetuning-puhti-gpu8-accelerate.sh - fine-tuning on Puhti with two full nodes using Accelerate

There are also versions for Mahti, similarly named: just replace puhti with mahti. The scripts for LUMI are named slightly different due to the larger number of GPUs (or actually GCDs, due to the dual chip cards).

  • run-finetuning-lumi-gpu1.sh - fine-tuning on LUMI with 1 GPU
  • run-finetuning-lumi-gpu8.sh - fine-tuning on LUMI with one full node (8 GCDs)
  • run-finetuning-lumi-gpu16.sh - fine-tuning on LUMI with two full nodes (16 GCDs in total)
  • run-finetuning-lumi-gpu8-accelerate.sh - fine-tuning on LUMI with one full node using Accelerate
  • run-finetuning-lumi-gpu16-accelerate.sh - fine-tuning on LUMI with two full nodes using Accelerate

Note: the scripts are for the most part made to be run in the gputest or dev-g partition with a 15 minute time-limit. You naturally need to change to the proper partition for longer jobs for your real runs. Also change the --account parameter to your own project code.

You can use PEFT (Parameter-Efficient Fine-Tuning) which adaptively trains a smaller number of parameters, thus decreasing the GPU memory requirements for training a lot. PEFT can be enabled with the --peft argument.

The Accelerate library supports more advanced modes of distributed training such as FSDP which enables using models which are too large for a single GPU's memory.

Finally, by installing the bitsandbytes library, you can also try 4-bit quantization with the --4bit argument, decreasing even further the memory requirements.

Run examples

Run on 1 GPU with specified model and using PEFT:

sbatch run-finetuning-puhti-gpu1.sh --model=EleutherAI/gpt-neo-1.3B --peft

Run on 4 GPUs (note that batch_size has to be a multiple of the number of GPUs):

sbatch run-finetuning-puhti-gpu4.sh --model=EleutherAI/gpt-neo-1.3B --b 4

Run on 8 GPUs (over two nodes) with Accelerate and FSDP (note: with the accelerate launch script we need to specify which config file to use):

sbatch run-finetuning-puhti-gpu8-accelerate.sh accelerate_config_fsdp.yaml \
       --model=microsoft/Phi-3.5-mini-instruct --b 8

Note: for a new model it might work best if your first run is with a single GPU to get the model downloaded to the cache. Downloading with multiple processes doesn't yet work well in the current script.

Fine-tune Llama-3.1-8B on Mahti with just 2 GPUs using Accelerate, FSDP and PEFT:

sbatch run-finetuning-mahti-gpu2-accelerate.sh accelerate_config_fsdp.yaml \
       --model=meta-llama/Meta-Llama-3.1-8B --b 4 --peft

Note that the Meta-LLama-3.1-8B model is a "Gated model" on Hugging Face, it requires that you log in and ask for access to the model. Once you have recieved access you can generate an Access Token in Hugging Face. Just click "Create new token" and select Token type: "Read".

On the supercomputer you can then install the access token like this:

export HF_HOME=/scratch/YOUR_PROJECT/${USER}/hf-cache
module load pytorch/2.4
huggingface-cli login

In the above command you need to replace YOUR_PROJECT with the project you use for your runs. The important thing is just that you use the same Hugging Face cache path as in the scripts. The huggingface-cli login command will ask for the access token you created. Just reply n to the question about git credentials (unless you know you use that feature).

Fine-tune Llama-3.1-8B with 4bit quantization and PEFT on Puhti:

sbatch run-finetuning-puhti-gpu1.sh --model=meta-llama/Meta-Llama-3.1-8B \
       --b 4 --peft --4bit

Another example for fine-tuning the Poro model on LUMI with just 2 GPUs (thanks to PEFT/LoRA and 4bit quantization):

sbatch run-finetuning-lumi-gpu2.sh --model=LumiOpen/Poro-34B --b 8 --4bit --peft

Inference

There's also a example of inference (generating text with the model) in inference-demo.py with corresponding launch script run-inference-puhti.sh.

For example to run inference with a checkpoint of a model you have fine-tuned previously, you would run something like:

sbatch run-inference-puhti.sh --model=/PATH/TO/CHECKPOINT \
       --prompt="The movie was great because"

Naturally, you need to replace /PATH/TO/CHECKPOINT with the real path to the checkpoint you wish to use. The path where checkpoints are stored will usually be printed at the end of the job, but you need to check yourself what is the specific checkpoint you wish to use. With the above training scripts it will usually be something like /scratch/YOUR_PROJECT/${USER}/hf-data/MODEL_NAME/checkpoint-NNN/ where NNN is the number of training steps when the checkpoint was created.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published