-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🦙 Newer Llamas support #129
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The original Llama model has MLP bias disabled, but it might happen that it is enabled for finetuned models: in this case an error will appear when serving with Jetstream Pytorch.
If the config is enabled, on the Jetstream Pt TGI we create an alias, that will eventually result in untied weights (but at least it will work).
This does not seem to be necessary anymore.
Otherwise it only uses the last marker.
8219b67
to
da1b9b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks ! Ideally we should avoid monkey-patching, but I like the pattern you used (with the context).
I don't like monkey patching models either, but hopefully I will be able to remove it once AI-Hypercomputer/jetstream-pytorch#205 lands into Jetstream/Pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
], | ||
ids=["Mixtral-8x7B", "Meta-Llama-3-8B" ,"Meta-Llama-3-70B"], | ||
ids=["Mixtral-8x7B", "Meta-Llama-3-8B" ,"Meta-Llama-3-70B", "Llama-3.3-70B-Instruct"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only question should we add test for 3.1 and 3.2 or that's unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but 3.2 is tested with the 1B variant and 3.1 architecture is identical, so I do not think testing it would add any value.
What does this PR do?
This PR mainly adds support for RoPE scaling to Jetstream Pytorch Llama models, that is required to support newer Llama models on TGI, such as Llama 3.1, Llama 3.2 and Llama 3.3.
Note that Llama3.3 is a 70B model, so quantization should be enabled to serve it.
Before submitting