Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement kv cache #3

Closed
certik opened this issue Mar 6, 2023 · 0 comments · Fixed by #21
Closed

Implement kv cache #3

certik opened this issue Mar 6, 2023 · 0 comments · Fixed by #21

Comments

@certik
Copy link
Owner

certik commented Mar 6, 2023

Here is some information what kv cache is: https://kipp.ly/blog/transformer-inference-arithmetic/#kv-cache

Roughly speaking, when new tokens are added at the end of the input and new token is generated, a lot of the computation could be reused from the previous iteration. We need to cache the results and reused them.

Here is a reference implementation in picoGPT: jaymody/picoGPT#7 (and the accompanying blog post https://immortal3.github.io/becoming-the-unbeatable/posts/gpt-kvcache/) that should be straightforward to adapt.

certik added a commit that referenced this issue Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant