Skip to content

Commit

Permalink
Hotfix link2 (#2812)
Browse files Browse the repository at this point in the history
2nd hotfix ?
  • Loading branch information
Narsil authored Dec 9, 2024
1 parent a70dd29 commit b2fac5d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/conceptual/chunking.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Long: `MODEL_ID=$MODEL_ID HOST=localhost:8000 k6 run load_tests/long.js`

### Results

![benchmarks_v3](https://github.com/huggingface/text-generation-inference/blob/042791fbd5742b1644d42c493db6bec669df6537/assets/v3_benchmarks.png)
![benchmarks_v3](https://raw.githubusercontent.com/huggingface/text-generation-inference/refs/heads/main/assets/v3_benchmarks.png)

Our benchmarking results show significant performance gains, with a 13x speedup over vLLM with prefix caching, and up to 30x speedup without prefix caching. These results are consistent with our production data and demonstrate the effectiveness of our optimized LLM architecture.

Expand Down

0 comments on commit b2fac5d

Please sign in to comment.