🦙 Newer Llamas support #129

tengomucho · 2024-12-13T14:12:45Z

What does this PR do?

This PR mainly adds support for RoPE scaling to Jetstream Pytorch Llama models, that is required to support newer Llama models on TGI, such as Llama 3.1, Llama 3.2 and Llama 3.3.
Note that Llama3.3 is a 70B model, so quantization should be enabled to serve it.

Before submitting

Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-12-13T14:15:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The original Llama model has MLP bias disabled, but it might happen that it is enabled for finetuned models: in this case an error will appear when serving with Jetstream Pytorch.

If the config is enabled, on the Jetstream Pt TGI we create an alias, that will eventually result in untied weights (but at least it will work).

This does not seem to be necessary anymore.

Otherwise it only uses the last marker.

dacorvo

LGTM, thanks ! Ideally we should avoid monkey-patching, but I like the pattern you used (with the context).

tengomucho · 2024-12-13T14:54:03Z

LGTM, thanks ! Ideally we should avoid monkey-patching, but I like the pattern you used (with the context).

I don't like monkey patching models either, but hopefully I will be able to remove it once AI-Hypercomputer/jetstream-pytorch#205 lands into Jetstream/Pytorch

baptistecolle

LGTM

baptistecolle · 2024-12-16T08:48:43Z

text-generation-inference/tests/test_decode_jetstream_quant.py

    ],
-    ids=["Mixtral-8x7B", "Meta-Llama-3-8B" ,"Meta-Llama-3-70B"],
+    ids=["Mixtral-8x7B", "Meta-Llama-3-8B" ,"Meta-Llama-3-70B", "Llama-3.3-70B-Instruct"],


Only question should we add test for 3.1 and 3.2 or that's unnecessary?

We could, but 3.2 is tested with the 1B variant and 3.1 architecture is identical, so I do not think testing it would add any value.

tengomucho changed the title ~~🦙 Newer llamas support~~ 🦙 Newer Llamas support Dec 13, 2024

tengomucho added 9 commits December 13, 2024 14:17

feat(jetstream): support RoPE scaling args

dfd3879

feat(Jetstream): add an explicit error in Llama if MLP bias is enabled

8c73261

The original Llama model has MLP bias disabled, but it might happen that it is enabled for finetuned models: in this case an error will appear when serving with Jetstream Pytorch.

fix(Jetstream): support tied word embeddings

21d3da7

If the config is enabled, on the Jetstream Pt TGI we create an alias, that will eventually result in untied weights (but at least it will work).

chore(test): add small comment on test to explain why we pick a model

e2716da

chore(jetstream): support avoids checking if xla was imported before

ecca401

This does not seem to be necessary anymore.

fix(tests): marker names retrieval should be done with iter_markers

1338b16

Otherwise it only uses the last marker.

test(llama): add test to show support for Llama 3.2

b11181d

test(llama): add quantized test for Llama 3.3-70B

d39435c

doc(readme): update to mention we support newer Llama versions with TGI

da1b9b8

tengomucho force-pushed the test_llama3.1 branch from 8219b67 to da1b9b8 Compare December 13, 2024 14:17

tengomucho marked this pull request as ready for review December 13, 2024 14:40

tengomucho requested review from dacorvo and baptistecolle December 13, 2024 14:40

dacorvo approved these changes Dec 13, 2024

View reviewed changes

baptistecolle approved these changes Dec 16, 2024

View reviewed changes

baptistecolle reviewed Dec 16, 2024

View reviewed changes

tengomucho merged commit 02c2d9c into main Dec 16, 2024
5 checks passed

tengomucho deleted the test_llama3.1 branch December 16, 2024 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🦙 Newer Llamas support #129

🦙 Newer Llamas support #129

tengomucho commented Dec 13, 2024

HuggingFaceDocBuilderDev commented Dec 13, 2024

dacorvo left a comment

tengomucho commented Dec 13, 2024

baptistecolle left a comment

baptistecolle Dec 16, 2024

tengomucho Dec 16, 2024

🦙 Newer Llamas support #129

🦙 Newer Llamas support #129

Conversation

tengomucho commented Dec 13, 2024

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Dec 13, 2024

dacorvo left a comment

Choose a reason for hiding this comment

tengomucho commented Dec 13, 2024

baptistecolle left a comment

Choose a reason for hiding this comment

baptistecolle Dec 16, 2024

Choose a reason for hiding this comment

tengomucho Dec 16, 2024

Choose a reason for hiding this comment