-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeepSeek V2 config #1293
Add DeepSeek V2 config #1293
Conversation
Hi @gagika, I will meet error running v2-lite config as the q_lora_rank=0. It shows the embedding dim is different to H from here. The embedding_dim=64, while inputs's last dim is 0. Full logs. Did I miss anything here? If this is expected in the current code, we may need a workaround, otherwise, this branch could be useless. |
Hi Ran, that brunch wasn't tested, there is a bug, could you please change the
Please add the fix in your PR or I can sent a PR with that one line fix, if you prefer that way. |
Cool! Let me add this one line change here. |
c99d1bc
to
e384138
Compare
e384138
to
a1ac837
Compare
Both tests are working now. V2lite-16b with q_lora_rank=0, and v2-236b with q_lora_rank=1536. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, Thanks Ran!
Description
Add DeepSeek V2 config for fast development:
Tests
V2-16b:
Small config of V2-236b:
Checklist
Before submitting this PR, please make sure (put X in square brackets):