Implement kv cache #3

certik · 2023-03-06T02:10:57Z

Here is some information what kv cache is: https://kipp.ly/blog/transformer-inference-arithmetic/#kv-cache

Roughly speaking, when new tokens are added at the end of the input and new token is generated, a lot of the computation could be reused from the previous iteration. We need to cache the results and reused them.

Here is a reference implementation in picoGPT: jaymody/picoGPT#7 (and the accompanying blog post https://immortal3.github.io/becoming-the-unbeatable/posts/gpt-kvcache/) that should be straightforward to adapt.

Fixes #3.

certik added a commit that referenced this issue Mar 14, 2023

Implement kv-cache

664276e

Fixes #3.

certik mentioned this issue Mar 14, 2023

Implement kv-cache #21

Merged

certik closed this as completed in #21 Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement kv cache #3

Implement kv cache #3

certik commented Mar 6, 2023 •

edited

Loading

Implement kv cache #3

Implement kv cache #3

Comments

certik commented Mar 6, 2023 • edited Loading

certik commented Mar 6, 2023 •

edited

Loading