You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Phase 2: Janky implementation -> minimum working implementation
Note: At this stage, the class should work for LLM integration tests but may fail sglang integration tests. Stephen's involvement will be crucial for hardening against concurrency and optimizing benefits.
Concurrent access testing
Multi-request batch testing with shared prefixes
Verify error-free execution of requests
Same-batch common-prefix related tests
Demonstrate that same-batch inputs with common prefixes do not crash system / produce bad tokens.
Demonstrate that same-batch inputs with common prefixes are not able to share cache pages and have no performance advantage over same-batch inputs without common prefixes. Do so by implementing a test that expects improved performance and xfailing it
Performance comparison testing against BaseAttentionCache. The tests should expect substantial performance improvement vs Base. Xfail if we don't find the improvement and figure out why.
Creates space for #593 (prefix-sharing)
Coming next: #607 , which should be the last thing I do before I can
check in my blocktrie implementation.
Summary of changes:
- copied over stella's cache.py and renamed it to page_pool.py
- each inference request now notifies the cache when its pages are done
written to
Block Trie Attention Implementation Plan
Project Goal
Implement a KV cache that:
Implementation Tasks
Phase 1: Preparation
Phase 2: Janky implementation -> minimum working implementation
Note: At this stage, the class should work for LLM integration tests but may fail sglang integration tests. Stephen's involvement will be crucial for hardening against concurrency and optimizing benefits.
Phase 3: Benchmarking / polishing
Reference Implementations
Priority Note
Focus on achieving minimum viable solution to facilitate Stephen's involvement in the project.
The text was updated successfully, but these errors were encountered: