-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A serious issue in your code #15
Comments
Thank you so much for pointing out! This is possibly a bug related to transformer versions. I recall that I encountered this issue before, but I don't remember exactly the Transformer version is. Would you mind changing the Transformer version to 4.41? This is expected to solve this issue. And if it works out, the key_state.shape should be as follows for all the samples: If you are already using Transformers 4.41, please let me know. I would try to figure out. |
Thank you for your timely response. After updating the transformers version to 4.41, the issue has been resolved. I suggest you consider advising users about this issue, as it doesn’t produce any error messages and could easily lead to confusing results. |
Thank you so much for pointing out. We would add special advice as you suggest. This is indeed confusing. |
I have a question about |
It is initialized at https://github.com/Zefan-Cai/PyramidKV/blob/73c08b1dc1104b2d614c0670478d297a7a4df8c1/pyramidkv/llama_model.py#L1382. This function was to replace the original preparation function with monkey patch. It will take care of the kv_seq_len initialization. Without this function, it will result in the situation as you mentioned. So this is probably not because of transformer versions. |
Oh! Thank you for your timely reply. It's my carelessness. :) |
There seems to be a serious issue in
run_longbench.py
. Theupdate_kv
is only called during the first sample inlongbench
, therefore the statementprint(f"PyramidKV max_capacity_prompt {max_capacity_prompt}")
only outputs during the first sample. This implies that only the first sample uses PyramidKV. Below are thekey_states.shape
printed for the first sample:Starting from the second sample,
update_kv
is no longer executed, and thekey_states.shape
is as follows:This implies that starting from the second sample, all attention is full attention, and there is no compression of kv cache at all. Looking forward to your response.
The text was updated successfully, but these errors were encountered: