We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,感谢您分享的代码!冒昧打扰。 我有两个疑问,希望作者能为我解惑 1,pyramidkv_utils.py中计算steps的公式steps = (max_num - min_num) // self.num_hidden_layers,按照等差数列的求和公式,不是应该除以self.num_hidden_layers-1吗,是我的理解哪里出了疏忽吗?
2.我注意到,计算steps中运用到了整除,这会带来误差,我在使用llama3-8b-instruct模型,max_capacity_prompt设为128时,除了每层保留的大小为8的局部窗口外,PyramidKV32层共使用4016大小的KVcache,而其他三种方法则是使用了大小为120*32=3840的KVcache,如果这样的话,如何保证性能的提高是因为PyramidKV方法本身,而不是多出来的176个KVcache呢?
The text was updated successfully, but these errors were encountered:
感谢指出!
确实是应该除以self.num_hidden_layers-1,这一步对最终每个layer的max_capacity_prompt没有影响,因为整除self.num_hidden_layers和整除self.num_hidden_layers-1的结果是一样的,已经修正。PyramidKV确实因为整除不可避免会有误差,不过这个误差相对于整体的KVcache的量相对较小。我们对整除并-5的KVcache进行了实验,PyramidKV依然有优势。后续会更正这一结果。
Sorry, something went wrong.
好的,感谢及时回复!
No branches or pull requests
您好,感谢您分享的代码!冒昧打扰。
我有两个疑问,希望作者能为我解惑
1,pyramidkv_utils.py中计算steps的公式steps = (max_num - min_num) // self.num_hidden_layers,按照等差数列的求和公式,不是应该除以self.num_hidden_layers-1吗,是我的理解哪里出了疏忽吗?
2.我注意到,计算steps中运用到了整除,这会带来误差,我在使用llama3-8b-instruct模型,max_capacity_prompt设为128时,除了每层保留的大小为8的局部窗口外,PyramidKV32层共使用4016大小的KVcache,而其他三种方法则是使用了大小为120*32=3840的KVcache,如果这样的话,如何保证性能的提高是因为PyramidKV方法本身,而不是多出来的176个KVcache呢?
The text was updated successfully, but these errors were encountered: