Replace AttnPagedCache with BasePagedAttentionCache #565

renxida · 2024-11-18T22:29:51Z

Creates space for #593 (prefix-sharing)

Coming next: #607 , which should be the last thing I do before I can check in my blocktrie implementation.

Summary of changes:

copied over stella's cache.py and renamed it to page_pool.py
each inference request now notifies the cache when its pages are done written to

shortfin/python/shortfin_apps/llm/components/kvcache/page_pool.py

shortfin/python/shortfin_apps/llm/components/kvcache/base_attention_cache.py

shortfin/python/shortfin_apps/llm/components/kvcache/page_pool.py

shortfin/tests/apps/llm/components/kvcache/page_pool_test.py

This reverts commit d1cdf64.

shortfin/python/shortfin_apps/llm/components/kvcache/base_attention_cache.py

renxida force-pushed the radix branch 4 times, most recently from 40f335b to 503bd71 Compare November 22, 2024 09:55

renxida changed the title ~~Early stage draft for radix attention~~ Add a few levels of indirection to prepare for prefix kv sharing Nov 22, 2024

renxida changed the title ~~Add a few levels of indirection to prepare for prefix kv sharing~~ Replace AttnPagedCache with BasePagedAttentionCache Nov 22, 2024

stbaione reviewed Nov 22, 2024

View reviewed changes

shortfin/python/shortfin_apps/llm/components/kvcache/page_pool.py Show resolved Hide resolved

stbaione reviewed Nov 22, 2024

View reviewed changes

shortfin/python/shortfin_apps/llm/components/kvcache/base_attention_cache.py Show resolved Hide resolved

stbaione reviewed Nov 22, 2024

View reviewed changes

shortfin/python/shortfin_apps/llm/components/kvcache/page_pool.py Show resolved Hide resolved

stbaione reviewed Nov 22, 2024

View reviewed changes

shortfin/tests/apps/llm/components/kvcache/page_pool_test.py Show resolved Hide resolved

renxida mentioned this pull request Nov 23, 2024

[tracking] Minimal prefix-sharing kv cache #593

Open

20 tasks

renxida marked this pull request as ready for review November 23, 2024 21:51

renxida force-pushed the radix branch from 0eaae82 to b194658 Compare November 23, 2024 21:52

renxida and others added 17 commits November 25, 2024 15:02

some copy pasta boilerplate

4a7a82a

check in what i have now

78c2c8f

split page pool and radix tree; add page pool test

e2e806f

cleanup unused file

8fa4b92

add current work

a0d1cb2

might be ready for testing

e74fcce

fix precommit formatting for my notes file

405b2c2

fix tests

90870aa

fix cache test

409a3d8

device construction fix on non-amdgpu

1ab5505

remove radix tree bcs not relevant to this pr

5d79dbc

clean up some stragglers

d6a6a2e

fix pytest scope

81d0131

fix tests

49b10bb

replace some more references

5c20685

various changes for compatibility with new PagePool

6233957

add tokenizers as nogil test dependency

f8856c3

renxida added 5 commits November 25, 2024 15:02

Revert "add tokenizers as nogil test dependency"

5eb2ef3

This reverts commit d1cdf64.

skip when dependencies not found in a way compatible with shortfin

3324062

add allow_module_level and apply to sd too

022d845

actually let's not touch sd for now

8298dda

better error message upon _deps skip

a8d59cc

renxida force-pushed the radix branch from 24f31ef to a8d59cc Compare November 25, 2024 23:05

renxida requested review from stbaione, rsuderman and kumardeepakamd November 25, 2024 23:20

stbaione reviewed Nov 25, 2024

View reviewed changes

shortfin/python/shortfin_apps/llm/components/kvcache/base_attention_cache.py Outdated Show resolved Hide resolved

missed problem with math

1c08e34

renxida requested a review from stbaione November 26, 2024 00:10

stbaione approved these changes Nov 26, 2024

View reviewed changes

renxida merged commit ddc3091 into nod-ai:main Nov 26, 2024
14 of 19 checks passed

renxida mentioned this pull request Nov 28, 2024

TriePagedAttentionCache - 2 #628

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace AttnPagedCache with BasePagedAttentionCache #565

Replace AttnPagedCache with BasePagedAttentionCache #565

renxida commented Nov 18, 2024 •

edited

Loading

Replace AttnPagedCache with BasePagedAttentionCache #565

Replace AttnPagedCache with BasePagedAttentionCache #565

Conversation

renxida commented Nov 18, 2024 • edited Loading

renxida commented Nov 18, 2024 •

edited

Loading