-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add starcode past-kv shape for TSModelForCausal class #371
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition !
@@ -273,13 +273,17 @@ def forward( | |||
num_attention_heads = self.normalized_config.num_attention_heads | |||
hidden_size = self.normalized_config.hidden_size | |||
d_k = hidden_size // num_attention_heads | |||
|
|||
if self.config.model_type != "bloom": | |||
if self.config.model_type == "gpt_bigcode": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test to verify inference is behaving as expected ? (using INCModelForCausalLM
can be enough for now)
@echarlaix could you please review and merge if no more comments? thx |
Sure, can a test be added before we can merge ? |
when I use the
There is debuging from myside:
after using when I get the traced model with |
6f42527
to
1f57059
Compare
Yes you're right, since huggingface/optimum#1358
huggingface/optimum#1381 introduces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this integration @changwangss, could you add a test before we merge ?
a4a3142
to
7507dd5
Compare
Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
What does this PR do?
this PR is used to support generate bigcode/starcoder past-kv shape, please check and review. @echarlaix
here is the related PR in optimum, huggingface/optimum#1170
I don't find the tiny model, could you help us create a tiny model if necessary.
Fixes # (issue)
Before submitting