Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetstream by default #118

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Jetstream by default #118

wants to merge 5 commits into from

Conversation

tengomucho
Copy link
Collaborator

What does this PR do?

This makes all the changes to allow having the Jetstream Pytorch engine to be the default backend for TGI on TPUs.
This backend is reliable and performant and give the best throughput on TGI.

Implementation is slightly different, so a separate test is added.
Most tests work for both, except for the continuous batching one.
This allows to remove the old GPT2 based tests, that are quite slow and
do not use any sharding or KV cache, so they might not really be
representative of most relevant models on TGI.
There are equivalent tests now on the TinyLlama model, that run faster,
use the KV cache and sharding.
The only test that does not have an equivalence is the continuous
batching one, but the test was not working for most other models, so I
prefer to remove it anyway, as having it passing was not representative
anyway of the current state.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Now that the engine is stable and tested, its engine is set as the
default one for TGI.
@tengomucho tengomucho marked this pull request as ready for review November 22, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants