Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: refactor adapter weight loading and mapping #2193

Merged
merged 8 commits into from
Jul 24, 2024

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Jul 5, 2024

This PR refactors the loading and mapping of lora adapters and avoids the need for model specific changes for adapters. The goal of this PR is to simplify the loading flow and avoid modify modeling code for loras to work.

@drbh drbh force-pushed the simplify-lora-adapter-layer-loading branch from a2759fd to 71f9a4c Compare July 9, 2024 00:07
@drbh
Copy link
Collaborator Author

drbh commented Jul 9, 2024

this PR has been updated to include the ability to load lora adapters from a local directory.

the path to the lora adapter can be specified in the following way:

LORA_ADAPTERS=adapter_id=/dir/path

note it's possible to mix adapter_ids with adapter_id=adapter_path e.g.

LORA_ADAPTERS=predibase/dbpedia,myadapter=/path/to/dir/

@drbh drbh marked this pull request as ready for review July 9, 2024 14:03
Comment on lines +213 to +214


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this super brittle? It only works for llama, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a bit brittle but should work in most cases, currently this will correctly load in weight for llama, mistral and gemma type models (just added some updates and tests too). The intention of moving this code here is to avoid mapping the weights inside of each models code.. currently this is the best approach I have, however I'm happy to make any changes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could have a mapping table here per model type? But probably better for another PR.

@drbh drbh force-pushed the simplify-lora-adapter-layer-loading branch from f8ecabf to d46372f Compare July 15, 2024 17:13
OlivierDehaene
OlivierDehaene previously approved these changes Jul 18, 2024
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a really nice improvement 🎉. Had two questions.

server/text_generation_server/models/__init__.py Outdated Show resolved Hide resolved
server/text_generation_server/models/__init__.py Outdated Show resolved Hide resolved
@tensimixt
Copy link

tensimixt commented Jul 19, 2024

This PR refactors the loading and mapping of lora adapters and avoids the need for model specific changes for adapters. The goal of this PR is to simplify the loading flow and avoid modify modeling code for loras to work.

Does this also apply to lora adapters generated by using the mistral-finetune repo also? As the weights in these adapters use keys that are in the consolidated.safetensors file typically found in recent mistral huggingface repos, and these differ from conventional key names. The best example is Mistral 7b instruct v0.3.

For example have tried to use a lora adapter generated from mistral-finetune but it did not work with TGI. It also did not work with vllm. Also to get it in a state where it might work with vllm had mapped the weights keys to what is conventionally used but found lm_head, embed_tokens and layernorm weight keys that vllm could not handle. Will this PR address these issues also?

Thank you!

@drbh drbh force-pushed the simplify-lora-adapter-layer-loading branch from eced5b7 to 59022c2 Compare July 22, 2024 13:46
@drbh
Copy link
Collaborator Author

drbh commented Jul 22, 2024

Hi @tensimixt, thank you for your question. If mistral-finetune loads LoRA with different adapter names than expected, this may not be resolved in this PR. The goal of this PR is to simplify the LoRA logic. Improvements, including support for mistral-finetune, will be explored in future PRs.

danieldk
danieldk previously approved these changes Jul 24, 2024
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about _get_model/get_model, maybe the naming can be improved?

Looks good to me otherwise.

Comment on lines +213 to +214


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could have a mapping table here per model type? But probably better for another PR.

server/text_generation_server/models/__init__.py Outdated Show resolved Hide resolved
@drbh drbh merged commit 5d85a95 into main Jul 24, 2024
9 checks passed
@drbh drbh deleted the simplify-lora-adapter-layer-loading branch July 24, 2024 19:32
ErikKaum pushed a commit that referenced this pull request Jul 25, 2024
* fix: refactor adapter weight loading and mapping

* feat: enable lora load from directory

* fix: adjust launcher for local lora adapters

* feat: improve weight loading and add tests

* fix: improve logging and rebase syntax issue

* fix: impove adapter merge comments and remove unused conditional

* fix: improve get_model_with_lora_adapters naming

* fix: comment typo
ErikKaum pushed a commit that referenced this pull request Jul 26, 2024
* fix: refactor adapter weight loading and mapping

* feat: enable lora load from directory

* fix: adjust launcher for local lora adapters

* feat: improve weight loading and add tests

* fix: improve logging and rebase syntax issue

* fix: impove adapter merge comments and remove unused conditional

* fix: improve get_model_with_lora_adapters naming

* fix: comment typo
@mhou7712
Copy link

mhou7712 commented Aug 9, 2024

Any update about this PR? We will have this PR included in the next release build? Thanks.

@ErikKaum
Copy link
Member

Hi @mhou7712 👋

So this PR (#2193) has been merged and should be in the next release 👍

@mhou7712
Copy link

Hi @ErikKaum , cool and thanks a lots!!

yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Sep 26, 2024
* fix: refactor adapter weight loading and mapping

* feat: enable lora load from directory

* fix: adjust launcher for local lora adapters

* feat: improve weight loading and add tests

* fix: improve logging and rebase syntax issue

* fix: impove adapter merge comments and remove unused conditional

* fix: improve get_model_with_lora_adapters naming

* fix: comment typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants