Skip to content

Commit

Permalink
update chatllms
Browse files Browse the repository at this point in the history
  • Loading branch information
jianzhnie committed Apr 25, 2024
1 parent 0b04cb7 commit d1f66b2
Show file tree
Hide file tree
Showing 51 changed files with 2,580 additions and 613 deletions.
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
ignore = E501,E701,W504,W503,E722,E251,E402
max-line-length = 120
show-source = False
application-import-names = chatgpt
application-import-names = chatllms
exclude =
.git
docs
Expand Down
18 changes: 9 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
repos:
- repo: https://github.com/PyCQA/flake8
rev: 3.8.3
- repo: https://gitee.com/openmmlab/mirrors-flake8
rev: 5.0.4
hooks:
- id: flake8
- repo: https://github.com/PyCQA/isort
rev: 5.10.1
- repo: https://gitee.com/openmmlab/mirrors-isort
rev: 5.11.5
hooks:
- id: isort
- repo: https://github.com/pre-commit/mirrors-yapf
rev: v0.30.0
- repo: https://gitee.com/openmmlab/mirrors-yapf
rev: v0.32.0
hooks:
- id: yapf
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
- repo: https://gitee.com/openmmlab/mirrors-pre-commit-hooks
rev: v4.3.0
hooks:
- id: trailing-whitespace
- id: check-yaml
Expand All @@ -23,4 +23,4 @@ repos:
- id: fix-encoding-pragma
args: ["--remove"]
- id: mixed-line-ending
args: ["--fix=lf"]
args: ["--fix=lf"]
40 changes: 26 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,24 @@
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)


<div align="center">

👋🤗🤗👋 Join our [WeChat](assets/wechat.jpg).

</div>

# Efficient Finetuning of Quantized LLMs --- 低资源的大语言模型量化训练/部署方案


<div align="center">

[中文](README_zh.md) | English

</div>

This is the repo for the `Efficient Finetuning of Quantized LLMs` project, which aims to build and share instruction-following Chinese `baichuan-7b/LLaMA/Pythia/GLM` model tuning methods which can be trained on **a single Nvidia RTX-2080TI**, multi-round chatbot which can be trained on **a single Nvidia RTX-3090** with the context len 2048.

We uses [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for quantization and is integrated with Huggingface's [PEFT](https://github.com/huggingface/peft) and [transformers](https://github.com/huggingface/transformers/) libraries.


## News

- [23/07/20] Now we support training the **LLaMA-2** models in this repo. Try `--model_name_or_path Llama-2-7b-hf` argument to use the LLaMA-2 model.
Expand Down Expand Up @@ -67,6 +66,7 @@ We uses [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for quantiza
As of now, we support the following datasets, most of which are all available in the [Hugging Face datasets library](https://huggingface.co/datasets/).

- For supervised fine-tuning:

- [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- [Stanford Alpaca (Chinese)](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
- [Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)
Expand All @@ -88,6 +88,7 @@ As of now, we support the following datasets, most of which are all available in
- [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k)

- For reward model training:

- [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [Open Assistant](https://huggingface.co/datasets/OpenAssistant/oasst1)
- [GPT-4 Generated Data](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
Expand All @@ -109,7 +110,6 @@ We provide a number of data preprocessing tools in the [data](./chatllms/data) f
- [sft_dataset.py](./chatllms/data/sft_dataset.py) : Supervised fine-tuning dataset class and collator
- [conv_dataset.py](./chatllms/data/conv_dataset.py) : Conversation dataset class and collator


## Model Zoo

We provide a number of models in the [Hugging Face model hub](https://huggingface.co/decapoda-research). These models are trained with QLoRA and can be used for inference and finetuning. We provide the following models:
Expand All @@ -129,13 +129,17 @@ We provide a number of models in the [Hugging Face model hub](https://huggingfac
- CUDA >= 11.0

- Python 3.8+ and PyTorch 1.13.1+

- 🤗Transformers, Datasets, Accelerate, PEFT and bitsandbytes

- jieba, rouge_chinese and nltk (used at evaluation)

- gradio (used in gradio_webserver.py)

### Install required packages

To load models in 4bits with transformers and bitsandbytes, you have to install accelerate and transformers from source and make sure you have the latest version of the bitsandbytes library (0.39.0). You can achieve the above with the following commands:

```bash
pip install -q -U bitsandbytes
pip install -q -U git+https://github.com/huggingface/transformers.git
Expand All @@ -154,11 +158,11 @@ cd Efficient-Tuning-LLMs

## Getting Started

| main function | Useage | Scripts |
| ---------------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------ |
| [train.py](./train.py) | Full finetune LLMs on SFT datasets | [full_finetune](./scripts/full_finetune) |
| [train_lora.py](./train_lora.py) | Finetune LLMs by using Lora (Low-Rank Adaptation of Large Language Models finetune) | [lora_finetune](./scripts/lora_finetune) |
| [train_qlora.py](train_qlora.py) | Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs) | [qlora_finetune](./scripts/qlora_finetune) |
| main function | Useage | Scripts |
| -------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------ |
| [train.py](./train.py) | Full finetune LLMs on SFT datasets | [full_finetune](./scripts/full_finetune) |
| [train_lora.py](./train_lora.py) | Finetune LLMs by using Lora (Low-Rank Adaptation of Large Language Models finetune) | [lora_finetune](./scripts/lora_finetune) |
| [train_qlora.py](train_qlora.py) | Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs) | [qlora_finetune](./scripts/qlora_finetune) |

### QLora int4 Finetune

Expand All @@ -170,6 +174,7 @@ python train_qlora.py --model_name_or_path <path_or_name>
```

For models larger than 13B, we recommend adjusting the learning rate:

```bash
python train_qlora.py –learning_rate 0.0001 --model_name_or_path <path_or_name>
```
Expand Down Expand Up @@ -220,7 +225,9 @@ python train_qlora.py \
To find more scripts for finetuning and inference, please refer to the `scripts` folder.

## Quantization

Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF documenation](https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig)) as follows:

- Loading in 4 bits is activated through `load_in_4bit`
- The datatype used for the linear layer computations with `bnb_4bit_compute_dtype`
- Nested quantization is activated through `bnb_4bit_use_double_quant`
Expand All @@ -245,30 +252,37 @@ Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF do
## Tutorials and Demonstrations

We provide two Google Colab notebooks to demonstrate the use of 4bit models in inference and fine-tuning. These notebooks are intended to be a starting point for further research and development.

- [Basic usage Google Colab notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing) - This notebook shows how to use 4bit models in inference with all their variants, and how to run GPT-neo-X (a 20B parameter model) on a free Google Colab instance 🤯
- [Fine tuning Google Colab notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing) - This notebook shows how to fine-tune a 4bit model on a downstream task using the Hugging Face ecosystem. We show that it is possible to fine tune GPT-neo-X 20B on a Google Colab instance!

Other examples are found under the examples/ folder.

- Finetune LLama-7B (ex1)
- Finetune GPT-neo-X 20B (ex2)

## Using Local Datasets

You can specify the path to your dataset using the --dataset argument. If the --dataset_format argument is not set, it will default to the Alpaca format. Here are a few examples:

- Training with an alpaca format dataset:

```python
python train_qlora.py --dataset="path/to/your/dataset"
```

- Training with a self-instruct format dataset:

```python
python train_qlora.py --dataset="path/to/your/dataset" --dataset_format="self-instruct"
```

## Multi GPU

Multi GPU training and inference work out-of-the-box with Hugging Face's Accelerate. Note that the per_device_train_batch_size and per_device_eval_batch_size arguments are global batch sizes unlike what their name suggest.

When loading a model for training or inference on multiple GPUs you should pass something like the following to AutoModelForCausalLM.from_pretrained():

```python
device_map = "auto"
max_memory = {i: '46000MB' for i in range(torch.cuda.device_count())}
Expand Down Expand Up @@ -303,29 +317,28 @@ python gradio_webserver.py \
--lora_model_name_or_path `path/to/your/model_dir`
```


## Sample Outputs

We provide generations for the models described in the paper for both OA and Vicuna queries in the `eval/generations` folder. These are intended to foster further research on model evaluation and analysis.

Can you distinguish ChatGPT from Guanaco? Give it a try!
You can access [the model response Colab here](https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing) comparing ChatGPT and Guanaco 65B on Vicuna prompts.


## Known Issues and Limitations

Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.

1. 4-bit inference is slow. Currently, our 4-bit inference implementation is not yet integrated with the 4-bit matrix multiplication
2. Resuming a LoRA training run with the Trainer currently runs on an error
3. Currently, using `bnb_4bit_compute_type='fp16'` can lead to instabilities. For 7B LLaMA, only 80% of finetuning runs complete without error. We have solutions, but they are not integrated yet into bitsandbytes.
4. Make sure that `tokenizer.bos_token_id = 1` to avoid generation issues.


## License

`Efficient Finetuning of Quantized LLMs` is released under the Apache 2.0 license.


## Acknowledgements

We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.

We appreciate the work by many open-source contributors, especially:
Expand All @@ -338,7 +351,6 @@ We appreciate the work by many open-source contributors, especially:
- [Vicuna](https://github.com/lm-sys/FastChat/)
- [xTuring](https://github.com/stochasticai/xTuring)


## Citation

Please cite the repo if you use the data or code in this repo.
Expand Down
12 changes: 6 additions & 6 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,22 @@
<div align="center">

👋🤗🤗👋 加入我们 [WeChat](assets/wechat.jpg).
</div>

</div>

# Efficient Finetuning of Quantized LLMs --- 低资源的大语言模型量化训练/部署方案


<div align="center">

[English](README.md) | 中文

</div>

这里是`Efficient Finetuning of Quantized LLMs`项目的存储库,旨在构建和开源 遵循指令的`baichuan/LLaMA/Pythia/GLM`中文大模型微调训练方法,该方法可以在**单个 Nvidia RTX-2080TI**上进行训练,多轮聊天机器人可以在**单个 Nvidia RTX-3090**上进行上下文长度 2048的模型训练。

我们使用[bitsandbytes](https://github.com/TimDettmers/bitsandbytes)进行量化,并与Huggingface的[PEFT](https://github.com/huggingface/peft)[transformers](https://github.com/huggingface/transformers/)库集成。

本项目主要内容如下:
本项目主要内容如下:

- 📗 支持全量参数指令微调、LoRA指令微调(后续将会提供支持), QLoRA低成本高效指令微调。
- 📗 支持绝大部分主流的开源大模型,如百川 baichuan、Ziya、Bloom、LLaMA、Pythia、OPT等。
Expand Down Expand Up @@ -88,6 +88,7 @@ QLora 引入了多种创新,旨在在不牺牲性能的情况下减少内存
截至目前,我们支持以下数据集,这些数据集都可以在 [Hugging Face Datasets](https://huggingface.co/datasets) 上找到。我们将在未来添加更多数据集。

- For supervised fine-tuning:

- [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- [Stanford Alpaca (Chinese)](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
- [Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)
Expand All @@ -103,16 +104,16 @@ QLora 引入了多种创新,旨在在不牺牲性能的情况下减少内存
- [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k)

- For reward model training:

- [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [Open Assistant](https://huggingface.co/datasets/OpenAssistant/oasst1)
- [GPT-4 Generated Data](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
- [GPT-4 Generated Data (Chinese)](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)


请参考 [data/README.md](data/README.md) 了解如何使用这些数据集训练自己的 ChatGPT。如果您想探索更多数据集,请参考 [awesome-instruction-datasets](https://github.com/jianzhnie/awesome-instruction-datasets). 默认情况下,我们使用 [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) 数据集进行训练和微调。


部分数据集需要 huggingface 的账号认证确认才能使用,我们建议使用以下命令登录您的 Hugging Face 账户。

```bash
pip install --upgrade huggingface_hub
huggingface-cli login
Expand All @@ -126,7 +127,6 @@ huggingface-cli login
- sft_dataset.py:有监督的对话数据集类
- conv_dataset.py:多轮对话数据集类


## 模型仓库

我们在 [Hugging Face ](https://huggingface.co/GaussianTech/)提供了许多模型。这些模型经过Self- Instruct 数据集的训练,可用于推理和微调:
Expand Down
Loading

0 comments on commit d1f66b2

Please sign in to comment.