fix: refactor adapter weight loading and mapping #2193

drbh · 2024-07-05T15:05:42Z

This PR refactors the loading and mapping of lora adapters and avoids the need for model specific changes for adapters. The goal of this PR is to simplify the loading flow and avoid modify modeling code for loras to work.

drbh · 2024-07-09T02:44:37Z

this PR has been updated to include the ability to load lora adapters from a local directory.

the path to the lora adapter can be specified in the following way:

LORA_ADAPTERS=adapter_id=/dir/path

note it's possible to mix adapter_ids with adapter_id=adapter_path e.g.

LORA_ADAPTERS=predibase/dbpedia,myadapter=/path/to/dir/

OlivierDehaene · 2024-07-15T10:32:44Z

server/text_generation_server/utils/adapter.py

+
+

Isn't this super brittle? It only works for llama, no?

its a bit brittle but should work in most cases, currently this will correctly load in weight for llama, mistral and gemma type models (just added some updates and tests too). The intention of moving this code here is to avoid mapping the weights inside of each models code.. currently this is the best approach I have, however I'm happy to make any changes!

I guess we could have a mapping table here per model type? But probably better for another PR.

danieldk

Looks like a really nice improvement 🎉. Had two questions.

server/text_generation_server/models/__init__.py

tensimixt · 2024-07-19T09:50:13Z

This PR refactors the loading and mapping of lora adapters and avoids the need for model specific changes for adapters. The goal of this PR is to simplify the loading flow and avoid modify modeling code for loras to work.

Does this also apply to lora adapters generated by using the mistral-finetune repo also? As the weights in these adapters use keys that are in the consolidated.safetensors file typically found in recent mistral huggingface repos, and these differ from conventional key names. The best example is Mistral 7b instruct v0.3.

For example have tried to use a lora adapter generated from mistral-finetune but it did not work with TGI. It also did not work with vllm. Also to get it in a state where it might work with vllm had mapped the weights keys to what is conventionally used but found lm_head, embed_tokens and layernorm weight keys that vllm could not handle. Will this PR address these issues also?

Thank you!

drbh · 2024-07-22T13:50:07Z

Hi @tensimixt, thank you for your question. If mistral-finetune loads LoRA with different adapter names than expected, this may not be resolved in this PR. The goal of this PR is to simplify the LoRA logic. Improvements, including support for mistral-finetune, will be explored in future PRs.

danieldk

Added a comment about _get_model/get_model, maybe the naming can be improved?

Looks good to me otherwise.

danieldk · 2024-07-24T12:25:39Z

server/text_generation_server/utils/adapter.py

+
+

I guess we could have a mapping table here per model type? But probably better for another PR.

server/text_generation_server/models/__init__.py

* fix: refactor adapter weight loading and mapping * feat: enable lora load from directory * fix: adjust launcher for local lora adapters * feat: improve weight loading and add tests * fix: improve logging and rebase syntax issue * fix: impove adapter merge comments and remove unused conditional * fix: improve get_model_with_lora_adapters naming * fix: comment typo

mhou7712 · 2024-08-09T16:44:16Z

Any update about this PR? We will have this PR included in the next release build? Thanks.

ErikKaum · 2024-08-12T08:28:14Z

Hi @mhou7712 👋

So this PR (#2193) has been merged and should be in the next release 👍

mhou7712 · 2024-08-12T17:17:07Z

Hi @ErikKaum , cool and thanks a lots!!

* fix: refactor adapter weight loading and mapping * feat: enable lora load from directory * fix: adjust launcher for local lora adapters * feat: improve weight loading and add tests * fix: improve logging and rebase syntax issue * fix: impove adapter merge comments and remove unused conditional * fix: improve get_model_with_lora_adapters naming * fix: comment typo

drbh force-pushed the simplify-lora-adapter-layer-loading branch from a2759fd to 71f9a4c Compare July 9, 2024 00:07

drbh marked this pull request as ready for review July 9, 2024 14:03

drbh mentioned this pull request Jul 9, 2024

Unable to load the local model file into LoRA adaptors #2143

Open

4 tasks

OlivierDehaene reviewed Jul 15, 2024

View reviewed changes

drbh force-pushed the simplify-lora-adapter-layer-loading branch from f8ecabf to d46372f Compare July 15, 2024 17:13

OlivierDehaene previously approved these changes Jul 18, 2024

View reviewed changes

danieldk reviewed Jul 18, 2024

View reviewed changes

server/text_generation_server/models/__init__.py Outdated Show resolved Hide resolved

server/text_generation_server/models/__init__.py Outdated Show resolved Hide resolved

drbh dismissed OlivierDehaene’s stale review via eced5b7 July 18, 2024 18:20

drbh added 6 commits July 22, 2024 09:44

fix: refactor adapter weight loading and mapping

70dc958

feat: enable lora load from directory

4b56934

fix: adjust launcher for local lora adapters

8c3530f

feat: improve weight loading and add tests

5ec88a1

fix: improve logging and rebase syntax issue

d27131b

fix: impove adapter merge comments and remove unused conditional

59022c2

drbh force-pushed the simplify-lora-adapter-layer-loading branch from eced5b7 to 59022c2 Compare July 22, 2024 13:46

danieldk previously approved these changes Jul 24, 2024

View reviewed changes

fix: improve get_model_with_lora_adapters naming

1f3b2ae

drbh dismissed danieldk’s stale review via 1f3b2ae July 24, 2024 13:25

fix: comment typo

82c7f95

drbh merged commit 5d85a95 into main Jul 24, 2024
9 checks passed

drbh deleted the simplify-lora-adapter-layer-loading branch July 24, 2024 19:32

imran3180 mentioned this pull request Jul 31, 2024

TGI fails with local LORA adapters #2253

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: refactor adapter weight loading and mapping #2193

fix: refactor adapter weight loading and mapping #2193

drbh commented Jul 5, 2024

drbh commented Jul 9, 2024

OlivierDehaene Jul 15, 2024

drbh Jul 15, 2024

danieldk Jul 24, 2024

danieldk left a comment

tensimixt commented Jul 19, 2024 •

edited

Loading

drbh commented Jul 22, 2024

danieldk left a comment

danieldk Jul 24, 2024

mhou7712 commented Aug 9, 2024

ErikKaum commented Aug 12, 2024

mhou7712 commented Aug 12, 2024

fix: refactor adapter weight loading and mapping #2193

fix: refactor adapter weight loading and mapping #2193

Conversation

drbh commented Jul 5, 2024

drbh commented Jul 9, 2024

OlivierDehaene Jul 15, 2024

Choose a reason for hiding this comment

drbh Jul 15, 2024

Choose a reason for hiding this comment

danieldk Jul 24, 2024

Choose a reason for hiding this comment

danieldk left a comment

Choose a reason for hiding this comment

tensimixt commented Jul 19, 2024 • edited Loading

drbh commented Jul 22, 2024

danieldk left a comment

Choose a reason for hiding this comment

danieldk Jul 24, 2024

Choose a reason for hiding this comment

mhou7712 commented Aug 9, 2024

ErikKaum commented Aug 12, 2024

mhou7712 commented Aug 12, 2024

tensimixt commented Jul 19, 2024 •

edited

Loading