-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do not merge] Hosted handler for MPT #382
base: main
Are you sure you want to change the base?
Conversation
@@ -100,7 +100,7 @@ def download_convert(s3_path: Optional[str] = None, | |||
class MPTFTHostedModelHandler: | |||
# This is what the user request will contain | |||
INPUT_GENERATE_KWARGS = { | |||
'max_new_tokens': 256, | |||
'max_tokens': 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we end up on this? It is kind of confusing because it doesn't say whether it's generated or prompt + generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo it should be max_new_tokens
, max_tokens
is not very helpful, because your prompt can be longer than 256 and then you just get a 500 from the server.
* Enable CodeQL for pull requests (mosaicml#374) This reverts commit 1a04923. * Update --------- Co-authored-by: bandish-shah <[email protected]>
Opening this for review but do not merge