Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问如何消除整除带来的误差? #22

Open
CN-LWZ opened this issue Sep 26, 2024 · 2 comments
Open

请问如何消除整除带来的误差? #22

CN-LWZ opened this issue Sep 26, 2024 · 2 comments

Comments

@CN-LWZ
Copy link

CN-LWZ commented Sep 26, 2024

您好,感谢您分享的代码!冒昧打扰。
我有两个疑问,希望作者能为我解惑
1,pyramidkv_utils.py中计算steps的公式steps = (max_num - min_num) // self.num_hidden_layers,按照等差数列的求和公式,不是应该除以self.num_hidden_layers-1吗,是我的理解哪里出了疏忽吗?

2.我注意到,计算steps中运用到了整除,这会带来误差,我在使用llama3-8b-instruct模型,max_capacity_prompt设为128时,除了每层保留的大小为8的局部窗口外,PyramidKV32层共使用4016大小的KVcache,而其他三种方法则是使用了大小为120*32=3840的KVcache,如果这样的话,如何保证性能的提高是因为PyramidKV方法本身,而不是多出来的176个KVcache呢?

@Zefan-Cai
Copy link
Owner

Zefan-Cai commented Sep 30, 2024

感谢指出!

确实是应该除以self.num_hidden_layers-1,这一步对最终每个layer的max_capacity_prompt没有影响,因为整除self.num_hidden_layers和整除self.num_hidden_layers-1的结果是一样的,已经修正。PyramidKV确实因为整除不可避免会有误差,不过这个误差相对于整体的KVcache的量相对较小。我们对整除并-5的KVcache进行了实验,PyramidKV依然有优势。后续会更正这一结果。

@CN-LWZ
Copy link
Author

CN-LWZ commented Oct 1, 2024

好的,感谢及时回复!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants