Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace AttnPagedCache with BasePagedAttentionCache #565

Merged
merged 23 commits into from
Nov 26, 2024
Merged

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Nov 18, 2024

Creates space for #593 (prefix-sharing)

Coming next: #607 , which should be the last thing I do before I can check in my blocktrie implementation.

Summary of changes:

  • copied over stella's cache.py and renamed it to page_pool.py
  • each inference request now notifies the cache when its pages are done written to

@renxida renxida changed the title Early stage draft for radix attention Add a few levels of indirection to prepare for prefix kv sharing Nov 22, 2024
@renxida renxida changed the title Add a few levels of indirection to prepare for prefix kv sharing Replace AttnPagedCache with BasePagedAttentionCache Nov 22, 2024
@renxida renxida merged commit ddc3091 into nod-ai:main Nov 26, 2024
14 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants