update chatllms

jianzhnie · Apr 25, 2024 · d1f66b2 · d1f66b2
1 parent 0b04cb7
commit d1f66b2
Show file tree

Hide file tree

Showing 51 changed files with 2,580 additions and 613 deletions.
diff --git a/.flake8 b/.flake8
@@ -2,7 +2,7 @@
 ignore = E501,E701,W504,W503,E722,E251,E402
 max-line-length = 120
 show-source = False
-application-import-names = chatgpt
+application-import-names = chatllms
 exclude =
     .git
     docs

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,18 +1,18 @@
 repos:
-  - repo: https://github.com/PyCQA/flake8
-    rev: 3.8.3
+  - repo: https://gitee.com/openmmlab/mirrors-flake8
+    rev: 5.0.4
     hooks:
       - id: flake8
-  - repo: https://github.com/PyCQA/isort
-    rev: 5.10.1
+  - repo: https://gitee.com/openmmlab/mirrors-isort
+    rev: 5.11.5
     hooks:
       - id: isort
-  - repo: https://github.com/pre-commit/mirrors-yapf
-    rev: v0.30.0
+  - repo: https://gitee.com/openmmlab/mirrors-yapf
+    rev: v0.32.0
     hooks:
       - id: yapf
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.1.0
+  - repo: https://gitee.com/openmmlab/mirrors-pre-commit-hooks
+    rev: v4.3.0
     hooks:
       - id: trailing-whitespace
       - id: check-yaml
@@ -23,4 +23,4 @@ repos:
       - id: fix-encoding-pragma
         args: ["--remove"]
       - id: mixed-line-ending
-        args: ["--fix=lf"]
+        args: ["--fix=lf"]
diff --git a/README.md b/README.md
@@ -10,25 +10,24 @@
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-
 <div align="center">
 
 👋🤗🤗👋 Join our [WeChat](assets/wechat.jpg).
+
 </div>
 
 # Efficient Finetuning of Quantized LLMs  --- 低资源的大语言模型量化训练/部署方案
 
-
 <div align="center">
 
 [中文](README_zh.md) | English
+
 </div>
 
 This is the repo for the `Efficient Finetuning of Quantized LLMs` project, which aims to build and share instruction-following Chinese `baichuan-7b/LLaMA/Pythia/GLM` model tuning methods which can be trained on **a single Nvidia RTX-2080TI**, multi-round chatbot which can be trained on **a single Nvidia RTX-3090** with the context len 2048.
 
 We uses [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for quantization and is integrated with Huggingface's [PEFT](https://github.com/huggingface/peft) and [transformers](https://github.com/huggingface/transformers/) libraries.
 
-
 ## News
 
 - [23/07/20] Now we support training the **LLaMA-2** models in this repo. Try `--model_name_or_path Llama-2-7b-hf` argument to use the LLaMA-2 model.
@@ -67,6 +66,7 @@ We uses [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for quantiza
 As of now, we support the following datasets, most of which are all available in the [Hugging Face datasets library](https://huggingface.co/datasets/).
 
 - For supervised fine-tuning:
+
   - [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
   - [Stanford Alpaca (Chinese)](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
   - [Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)
@@ -88,6 +88,7 @@ As of now, we support the following datasets, most of which are all available in
   - [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k)
 
 - For reward model training:
+
   - [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
   - [Open Assistant](https://huggingface.co/datasets/OpenAssistant/oasst1)
   - [GPT-4 Generated Data](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
@@ -109,7 +110,6 @@ We provide a number of data preprocessing tools in the [data](./chatllms/data) f
 - [sft_dataset.py](./chatllms/data/sft_dataset.py) :  Supervised fine-tuning dataset class and collator
 - [conv_dataset.py](./chatllms/data/conv_dataset.py) :  Conversation dataset class and collator
 
-
 ## Model Zoo
 
 We provide a number of models in the [Hugging Face model hub](https://huggingface.co/decapoda-research). These models are trained with QLoRA and can be used for inference and finetuning. We provide the following models:
@@ -129,13 +129,17 @@ We provide a number of models in the [Hugging Face model hub](https://huggingfac
 - CUDA >= 11.0
 
 - Python 3.8+ and PyTorch 1.13.1+
+
 - 🤗Transformers, Datasets, Accelerate, PEFT and bitsandbytes
+
 - jieba, rouge_chinese and nltk (used at evaluation)
+
 - gradio (used in gradio_webserver.py)
 
 ### Install required packages
 
 To load models in 4bits with transformers and bitsandbytes, you have to install accelerate and transformers from source and make sure you have the latest version of the bitsandbytes library (0.39.0). You can achieve the above with the following commands:
+
 ```bash
 pip install -q -U bitsandbytes
 pip install -q -U git+https://github.com/huggingface/transformers.git
@@ -154,11 +158,11 @@ cd Efficient-Tuning-LLMs
 
 ## Getting Started
 
-| main function                            | Useage                                                                               | Scripts                                    |
-| ---------------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------ |
-| [train.py](./train.py)                   | Full finetune LLMs on  SFT datasets                                                  | [full_finetune](./scripts/full_finetune)   |
-| [train_lora.py](./train_lora.py)         | Finetune LLMs by using Lora  (Low-Rank Adaptation of Large Language Models finetune) | [lora_finetune](./scripts/lora_finetune)   |
-| [train_qlora.py](train_qlora.py)         | Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs)         | [qlora_finetune](./scripts/qlora_finetune) |
+| main function                    | Useage                                                                               | Scripts                                    |
+| -------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------ |
+| [train.py](./train.py)           | Full finetune LLMs on  SFT datasets                                                  | [full_finetune](./scripts/full_finetune)   |
+| [train_lora.py](./train_lora.py) | Finetune LLMs by using Lora  (Low-Rank Adaptation of Large Language Models finetune) | [lora_finetune](./scripts/lora_finetune)   |
+| [train_qlora.py](train_qlora.py) | Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs)         | [qlora_finetune](./scripts/qlora_finetune) |
 
 ### QLora int4 Finetune
 
@@ -170,6 +174,7 @@ python train_qlora.py --model_name_or_path <path_or_name>
 ```
 
 For models larger than 13B, we recommend adjusting the learning rate:
+
 ```bash
 python train_qlora.py –learning_rate 0.0001 --model_name_or_path <path_or_name>
 ```
@@ -220,7 +225,9 @@ python train_qlora.py \
 To find more scripts for finetuning and inference, please refer to the `scripts` folder.
 
 ## Quantization
+
 Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF documenation](https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig)) as follows:
+
 - Loading in 4 bits is activated through `load_in_4bit`
 - The datatype used for the linear layer computations with `bnb_4bit_compute_dtype`
 - Nested quantization is activated through `bnb_4bit_use_double_quant`
@@ -245,30 +252,37 @@ Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF do
 ## Tutorials and Demonstrations
 
 We provide two Google Colab notebooks to demonstrate the use of 4bit models in inference and fine-tuning. These notebooks are intended to be a starting point for further research and development.
+
 - [Basic usage Google Colab notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing) - This notebook shows how to use 4bit models in inference with all their variants, and how to run GPT-neo-X (a 20B parameter model) on a free Google Colab instance 🤯
 - [Fine tuning Google Colab notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing) - This notebook shows how to fine-tune a 4bit model on a downstream task using the Hugging Face ecosystem. We show that it is possible to fine tune GPT-neo-X 20B on a Google Colab instance!
 
 Other examples are found under the examples/ folder.
+
 - Finetune LLama-7B (ex1)
 - Finetune GPT-neo-X 20B (ex2)
 
 ## Using Local Datasets
+
 You can specify the path to your dataset using the --dataset argument. If the --dataset_format argument is not set, it will default to the Alpaca format. Here are a few examples:
 
 - Training with an alpaca format dataset:
+
 ```python
 python train_qlora.py --dataset="path/to/your/dataset"
 ```
+
 - Training with a self-instruct format dataset:
 
 ```python
 python train_qlora.py --dataset="path/to/your/dataset" --dataset_format="self-instruct"
 ```
 
 ## Multi GPU
+
 Multi GPU training and inference work out-of-the-box with Hugging Face's Accelerate. Note that the per_device_train_batch_size and per_device_eval_batch_size arguments are global batch sizes unlike what their name suggest.
 
 When loading a model for training or inference on multiple GPUs you should pass something like the following to AutoModelForCausalLM.from_pretrained():
+
 ```python
 device_map = "auto"
 max_memory = {i: '46000MB' for i in range(torch.cuda.device_count())}
@@ -303,29 +317,28 @@ python gradio_webserver.py \
     --lora_model_name_or_path  `path/to/your/model_dir`
 ```
 
-
 ## Sample Outputs
+
 We provide generations for the models described in the paper for both OA and Vicuna queries in the `eval/generations` folder. These are intended to foster further research on model evaluation and analysis.
 
 Can you distinguish ChatGPT from Guanaco? Give it a try!
 You can access [the model response Colab here](https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing) comparing ChatGPT and Guanaco 65B on Vicuna prompts.
 
-
 ## Known Issues and Limitations
+
 Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.
 
 1. 4-bit inference is slow. Currently, our 4-bit inference implementation is not yet integrated with the 4-bit matrix multiplication
 2. Resuming a LoRA training run with the Trainer currently runs on an error
 3. Currently, using `bnb_4bit_compute_type='fp16'` can lead to instabilities. For 7B LLaMA, only 80% of finetuning runs complete without error. We have solutions, but they are not integrated yet into bitsandbytes.
 4. Make sure that `tokenizer.bos_token_id = 1` to avoid generation issues.
 
-
 ## License
 
 `Efficient Finetuning of Quantized LLMs` is released under the Apache 2.0 license.
 
-
 ## Acknowledgements
+
 We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.
 
 We appreciate the work by many open-source contributors, especially:
@@ -338,7 +351,6 @@ We appreciate the work by many open-source contributors, especially:
 - [Vicuna](https://github.com/lm-sys/FastChat/)
 - [xTuring](https://github.com/stochasticai/xTuring)
 
-
 ## Citation
 
 Please cite the repo if you use the data or code in this repo.

diff --git a/README_zh.md b/README_zh.md
@@ -13,22 +13,22 @@
 <div align="center">
 
 👋🤗🤗👋 加入我们 [WeChat](assets/wechat.jpg).
-</div>
 
+</div>
 
 # Efficient Finetuning of Quantized LLMs --- 低资源的大语言模型量化训练/部署方案
 
-
 <div align="center">
 
 [English](README.md) | 中文
+
 </div>
 
 这里是`Efficient Finetuning of Quantized LLMs`项目的存储库，旨在构建和开源 遵循指令的`baichuan/LLaMA/Pythia/GLM`中文大模型微调训练方法，该方法可以在**单个 Nvidia RTX-2080TI**上进行训练，多轮聊天机器人可以在**单个 Nvidia RTX-3090**上进行上下文长度 2048的模型训练。
 
 我们使用[bitsandbytes](https://github.com/TimDettmers/bitsandbytes)进行量化，并与Huggingface的[PEFT](https://github.com/huggingface/peft)和 [transformers](https://github.com/huggingface/transformers/)库集成。
 
- 本项目主要内容如下：
+本项目主要内容如下：
 
 - 📗 支持全量参数指令微调、LoRA指令微调(后续将会提供支持)， QLoRA低成本高效指令微调。
 - 📗 支持绝大部分主流的开源大模型，如百川 baichuan、Ziya、Bloom、LLaMA、Pythia、OPT等。
@@ -88,6 +88,7 @@ QLora 引入了多种创新，旨在在不牺牲性能的情况下减少内存
 截至目前，我们支持以下数据集，这些数据集都可以在 [Hugging Face Datasets](https://huggingface.co/datasets) 上找到。我们将在未来添加更多数据集。
 
 - For supervised fine-tuning:
+
   - [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
   - [Stanford Alpaca (Chinese)](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
   - [Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)
@@ -103,16 +104,16 @@ QLora 引入了多种创新，旨在在不牺牲性能的情况下减少内存
   - [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k)
 
 - For reward model training:
+
   - [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
   - [Open Assistant](https://huggingface.co/datasets/OpenAssistant/oasst1)
   - [GPT-4 Generated Data](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
   - [GPT-4 Generated Data (Chinese)](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
 
-
 请参考 [data/README.md](data/README.md) 了解如何使用这些数据集训练自己的 ChatGPT。如果您想探索更多数据集，请参考 [awesome-instruction-datasets](https://github.com/jianzhnie/awesome-instruction-datasets). 默认情况下，我们使用 [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) 数据集进行训练和微调。
 
-
 部分数据集需要 huggingface 的账号认证确认才能使用，我们建议使用以下命令登录您的 Hugging Face 账户。
+
 ```bash
 pip install --upgrade huggingface_hub
 huggingface-cli login
@@ -126,7 +127,6 @@ huggingface-cli login
 - sft_dataset.py：有监督的对话数据集类
 - conv_dataset.py：多轮对话数据集类
 
-
 ## 模型仓库
 
 我们在 [Hugging Face ](https://huggingface.co/GaussianTech/)提供了许多模型。这些模型经过Self- Instruct 数据集的训练，可用于推理和微调：