Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaption for moe models #2101

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b75c001
adaption for moe models
Sep 26, 2024
c29810b
FIX: Change check if past_key_values is empty (#2106)
BenjaminBossan Sep 27, 2024
aa3bd8f
DOC Update source install instruction (#2110)
Salehbigdeli Sep 30, 2024
2a80735
FIX Refactor OFT, small changes to BOFT (#1996)
Zeju1997 Oct 1, 2024
ae297f0
ENH: Improved attribute access for modules_to_save (#2117)
BenjaminBossan Oct 2, 2024
ca8462b
FIX low_cpu_mem_usage consolidates devices (#2113)
BenjaminBossan Oct 2, 2024
534d361
TST Mark flaky X-LoRA test as xfail (#2114)
BenjaminBossan Oct 2, 2024
d9d3059
ENH: Warn when from_pretrained misses PEFT keys (#2118)
BenjaminBossan Oct 2, 2024
8d9ecbe
FEAT: Adding exclude modules param(#2044) (#2102)
JINO-ROHIT Oct 3, 2024
e6f927b
FIX BC breaking change to boft conv2d scaling variable (#2127)
Zeju1997 Oct 7, 2024
859fd88
FEAT: VeRA quantization using bitsandbytes (#2070) (#2076)
ZiadHelal Oct 7, 2024
5e91b54
Bump version to 0.13.2.dev0 (#2137)
BenjaminBossan Oct 8, 2024
9918977
FEAT: Support torchao (#2062)
BenjaminBossan Oct 8, 2024
a724834
FIX: PiSSA now works with Conv1D layers (#2103) (#2104)
suyang160 Oct 8, 2024
3b314cc
FIX Type annoations in vera/bnb.py (#2139)
BenjaminBossan Oct 9, 2024
85e3202
ENH Make PEFT configs forward compatible (#2038)
BenjaminBossan Oct 9, 2024
8efa0cb
FIX Raise mixed adapter infer with missing adapter (#2090)
BenjaminBossan Oct 9, 2024
1eab9bd
FIX Prompt learning with latest transformers error (#2140)
BenjaminBossan Oct 9, 2024
5758a7e
ENH LoRA notebook for NER task (#2126)
JINO-ROHIT Oct 10, 2024
0aa7e3a
FIX TST NaN issue with HQQ GPU test (#2143)
BenjaminBossan Oct 10, 2024
c925d0a
FIX Bug in target module optimization if suffix (#2144)
BenjaminBossan Oct 10, 2024
749b924
Bump version to 0.13.2.dev0 (#2145)
BenjaminBossan Oct 11, 2024
669ce90
Merge branch 'dhr_moe'
Oct 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docker/peft-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ RUN source activate peft && \
librosa \
"soundfile>=0.12.1" \
scipy \
torchao \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft
Expand Down
34 changes: 33 additions & 1 deletion docs/source/developer_guides/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,41 @@ peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```

## torchao (PyTorch Architecture Optimization)

PEFT supports models quantized with [torchao](https://github.com/pytorch/ao) ("ao") for int8 quantization.

```python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, TorchAoConfig

model_id = ...
quantization_config = TorchAoConfig(quant_type="int8_weight_only")
base_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
peft_config = LoraConfig(...)
model = get_peft_model(base_model, peft_config)
```

### Caveats:

- Use the most recent versions of torchao (>= v0.4.0) and transformers (> 4.42).
- Only linear layers are currently supported.
- `quant_type = "int4_weight_only"` is currently not supported.
- `NF4` is not implemented in transformers as of yet and is thus also not supported.
- DoRA only works with `quant_type = "int8_weight_only"` at the moment.
- There is explicit support for torchao when used with LoRA. However, when torchao quantizes a layer, its class does not change, only the type of the underlying tensor. For this reason, PEFT methods other than LoRA will generally also work with torchao, even if not explicitly supported. Be aware, however, that **merging only works correctly with LoRA and with `quant_type = "int8_weight_only"`**. If you use a different PEFT method or dtype, merging will likely result in an error, and even it doesn't, the results will still be incorrect.

## Other Supported PEFT Methods

Besides LoRA, the following PEFT methods also support quantization:

- **VeRA** (supports bitsandbytes quantization)
- **AdaLoRA** (supports both bitsandbytes and GPTQ quantization)
- **(IA)³** (supports bitsandbytes quantization)

## Next steps

If you're interested in learning more about quantization, the following may be helpful:

* Learn more about details about QLoRA and check out some benchmarks on its impact in the [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) blog post.
* Learn more details about QLoRA and check out some benchmarks on its impact in the [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) blog post.
* Read more about different quantization schemes in the Transformers [Quantization](https://hf.co/docs/transformers/main/quantization) guide.
2 changes: 1 addition & 1 deletion docs/source/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,5 @@ repository:
```bash
git clone https://github.com/huggingface/peft
cd peft
pip install -e .
pip install -e .[test]
```
5 changes: 1 addition & 4 deletions docs/source/package_reference/vera.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,9 @@ When saving the adapter parameters, it's possible to eschew storing the low rank

To handle different shapes of adapted layers, VeRA initializes shared A and B matrices with the largest required size for each dimension. During the forward pass, submatrices A and B for a given layer are sliced out from these shared matrices and used as described in the paper. For example, adapting two linear layers of shapes (100, 20) and (80, 50) will create A and B matrices of shapes (rank, 50) and (100, rank) respectively. Then, to adapt a layer of shape (100, 20), submatrices A and B of shapes (rank, 20) and (100, rank) will be extracted.

VeRA currently has the following constraints:
VeRA currently has the following constraint:

- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.

If these constraints don't work for your use case, use LoRA instead.

The abstract from the paper is:

Expand Down
Loading