Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

仅截取了输入数据的前4k和后4k token, 但longbench最长32k token? #18

Open
JulietLJY opened this issue Jul 18, 2024 · 1 comment

Comments

@JulietLJY
Copy link

为什么在代码中这样实现呢?这样导致的直接结果是答案并不是长文本阅读,而是只有model_max_length长度的文本阅读。如果答案出现在中间部分,full attention也无法做对。而如果答案出现在前后部分,这样又会导致去噪。不论哪种情况都会使得结果偏离实际。

@Zefan-Cai
Copy link
Owner

只有LlaMa3会这样做,因为LlaMa3只有8k长度。Mistral就是32k。代码中这样实现是因为这样相比截断后面或者截断前面相对合理。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants