-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial KV RingAttention code #684
Open
joshpopelka20
wants to merge
109
commits into
EricLBuehler:master
Choose a base branch
from
joshpopelka20:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,001
−27
Open
Changes from 45 commits
Commits
Show all changes
109 commits
Select commit
Hold shift + click to select a range
b186a77
test minimal changes
joshpopelka20 152a41c
add to struct
joshpopelka20 636a48f
Merge branch 'EricLBuehler:master' into master
joshpopelka20 3e1f47b
add chunks logic
joshpopelka20 bba0fdd
Merge branch 'master' of https://github.com/joshpopelka20/mistral.rs
joshpopelka20 04ee7de
clone chunks
joshpopelka20 83ab8f2
clone x for chunk
joshpopelka20 36d5eb2
remove chunk to device
joshpopelka20 c1a973a
push x
joshpopelka20 e5aee16
fix x move
joshpopelka20 7aeb80c
dont clone chunks
joshpopelka20 00f019f
unwrap chunk
joshpopelka20 4b836bc
change to reference
joshpopelka20 3763fee
iter
joshpopelka20 32d3b0d
pop chunks
joshpopelka20 40a502b
clone x
joshpopelka20 c0c87e4
change to vec new
joshpopelka20 728e838
store tensor reference
joshpopelka20 daf1028
extract by index
joshpopelka20 ccc6c50
remove unwrap
joshpopelka20 2ca9acc
clone
joshpopelka20 e822953
mutably borrow
joshpopelka20 086e76f
derefernce
joshpopelka20 ce3d418
create vec of tensors
joshpopelka20 ffbe3f9
make new vec
joshpopelka20 9f91594
type tensor
joshpopelka20 c88cdf5
push to chunks
joshpopelka20 0eb85f5
self chunks
joshpopelka20 b7edfbe
create vec of chunks
joshpopelka20 8732bbc
clone x
joshpopelka20 9c961f3
remove reference
joshpopelka20 1aaca72
clone for move
joshpopelka20 4ab98c8
remove clone
joshpopelka20 2e0b2fd
add back clone
joshpopelka20 7bb3cf7
change to copy
joshpopelka20 f433517
unwrap copy
joshpopelka20 cf5b204
remove copy
joshpopelka20 9e0e6c8
use my candle
joshpopelka20 03be02a
mvoe back to EricLBuehler
joshpopelka20 d75ee88
move back to josh
joshpopelka20 f79ef6f
revert candle
joshpopelka20 4b9ed28
remove copy mapper
joshpopelka20 b54a5af
clone chunks
joshpopelka20 a50883b
copy instead of clone
joshpopelka20 edf82da
move loggers
joshpopelka20 3e9cc26
add sequence parallelism
joshpopelka20 30f6b40
add IndexOp import
joshpopelka20 7e23976
only use chunk on first block index
joshpopelka20 f20005a
split input into multiple chunks
joshpopelka20 9d0b6ce
add missing variable block_chunks
joshpopelka20 0c6a64c
use each chunk first
joshpopelka20 86e1e54
clone x in accumulated attention
joshpopelka20 535e5c7
change mapper with block_chunks
joshpopelka20 8d55784
give block chunks a type
joshpopelka20 f738105
make as type tensor
joshpopelka20 4addbb5
move block chunks
joshpopelka20 d6ffb10
add to accumulated attention
joshpopelka20 61b9b8a
unwrap x
joshpopelka20 c1cc882
&tensor
joshpopelka20 f87ead1
fix block_chunks
joshpopelka20 f665810
make generic type
joshpopelka20 8140413
fix blocks_chunks to device
joshpopelka20 23af80c
another fix for concat block_chunks
joshpopelka20 dd689e3
remove ? operator
joshpopelka20 2106933
replace with try_collect
joshpopelka20 71fdd71
change type of block_chunks
joshpopelka20 0b129fa
clone to move blcok_chunks
joshpopelka20 c09b459
remove add
joshpopelka20 d201134
switch to four devices
joshpopelka20 c5b4fde
fix compile error with &
joshpopelka20 79f7606
uodate metadata device
joshpopelka20 f50a159
add kv cache rotation
joshpopelka20 b913fee
add missing num_caches
joshpopelka20 0a7b422
fix compile error
joshpopelka20 c98dcb7
clone mapper
joshpopelka20 962f744
remove clone
joshpopelka20 7cd3503
clone reference
joshpopelka20 ea04012
return tensor
joshpopelka20 a57e1c9
remove borrow
joshpopelka20 b69edcc
fix value moved
joshpopelka20 7cfb29d
borrow on accumulate
joshpopelka20 da65eb2
add logging
joshpopelka20 9c5cd38
more logging
joshpopelka20 cdd480d
fix chunk to device chunk
joshpopelka20 4eb4775
remove concat block_chunks
joshpopelka20 ee27e98
move cache to chunk device
joshpopelka20 57ae1d8
fix error in masker
joshpopelka20 34bf2d1
move all to block device
joshpopelka20 a20d7a4
change to block device
joshpopelka20 042c0a1
change block device args
joshpopelka20 770806a
add device to block
joshpopelka20 0b3c911
fix llama struct
joshpopelka20 bcf6f84
revert blocks device
joshpopelka20 9945a8d
revert to device chunk
joshpopelka20 8d0bc24
add block device
joshpopelka20 06525fe
add reference
joshpopelka20 a03670d
update tensor device
joshpopelka20 deabd31
borrow device chunk
joshpopelka20 3170304
more logging
joshpopelka20 1e3e55d
log logits
joshpopelka20 4fab476
try to clone out all caches
joshpopelka20 5863802
add logging in cacher
joshpopelka20 ddcd848
revert clone out cache
joshpopelka20 d9ac7ec
skip clone out
joshpopelka20 81cd584
have cache out do nothing
joshpopelka20 1456c72
fix syntax
joshpopelka20 d52cdd8
remove clone in cache
joshpopelka20 bf80940
remove loggers
joshpopelka20 a4dcd1e
test speculative
joshpopelka20 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the bit which is causing the issue. It looks like we aren't using the values from the last block as the inputs are always from the embeddings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's going to be a problem. From the algorithm in the paper, I need to use the local block on each host. So I need to iterate through the layers and then through the local blocks.
Any suggestions on how to redesign it?