This is the first release of Optimum TPU that includes support for Jetstream Pytorch engine as backend for Test Generation Inference (TGI).
JetStream is a throughput and memory optimized engine for LLM inference on TPUs, and its Pytorch implementation allows for a seamless integration in the TGI code. The supported models (for now Llama 2 and Llama 3, Gemma 1 and Mixtral, and serving inference on these models resulted has given results close to 10x in terms of tokens/sec compared to the previously used backend (Pytorch XLA/transformers).
On top of that, it is possible to use quantization to serve using even less resources while maintaining a similar throughput and quality.
Details follow.
What's Changed
- Update colab examples by @wenxindongwork in #86
- ci(docker): update torch-xla to 2.4.0 by @tengomucho in #89
✈️ Introduce Jetstream/Pytorch in TGI by @tengomucho in #88- 🦙 Llama3 on TGI - Jetstream Pytorch by @tengomucho in #90
- ☝️ Update Jetstream Pytorch revision by @tengomucho in #91
- Correct extra token, start preparing docker image for TGI/Jetstream Pt by @tengomucho in #93
- Fix generation using Jetstream Pytorch by @tengomucho in #94
- Fix slow tests by @tengomucho in #95
- 🧹 Cleanup and fixes for TGI by @tengomucho in #96
- Small TGI enhancements by @tengomucho in #97
- fix(TGI Jetstream Pt): prefill should be done with max input size by @tengomucho in #98
- 💎 Gemma on TGI Jetstream Pytorch by @tengomucho in #99
- Fix ci nightly jetstream by @tengomucho in #101
- CI ephemeral TPUs by @tengomucho in #102
- 🍃 Added Mixtral on TGI / Jetstream Pytorch by @tengomucho in #103
- Add CLI to install dependencies by @tengomucho in #104
- ⛰ CI: mount hub cache and fix issues with cli by @tengomucho in #106
- fix(docker): correct jetstream installation in TGI docker image by @tengomucho in #107
- ✏️ docs: Add training guide and improve documentation consistency by @baptistecolle in #110
- Quantization Jetstream Pytorch by @tengomucho in #111
- fix: graceful shutdown was not working with entrypoint, exec launcher by @co42 in #112
- fix(doc): correct link to deploy page by @tengomucho in #115
- More Jetstream Pytorch fixes, prepare for release by @tengomucho in #116
New Contributors
- @wenxindongwork made their first contribution in #86
- @baptistecolle made their first contribution in #110
- @co42 made their first contribution in #112
Full Changelog: v0.1.5...v0.2.0