Specify MAX_NEW_TOKENS for ktransformers server #92

arthurv · 2024-09-19T13:48:57Z

max_new_tokens = 1000 by default, and this can be specified in ktransformers.local_chat through --max_new_tokens, but not the server.

Please add the --max_new_tokens option to the ktransformers server so we can specify longer output context lengths, and add more generation options (like input context, etc).

Azure-Tang · 2024-09-23T03:15:47Z

Apologies for the inconvenience. If you’re building from source, you can modify the max_new_tokens parameter in ktransformers/server/backend/args.py. We will include this update in the next Docker release.

bitbottrap · 2024-10-14T14:41:49Z

I just encountered this limitation. It would be even better if the REST API honored the maximum context length and maximum number of generation tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify MAX_NEW_TOKENS for ktransformers server #92

Specify MAX_NEW_TOKENS for ktransformers server #92

arthurv commented Sep 19, 2024

Azure-Tang commented Sep 23, 2024

bitbottrap commented Oct 14, 2024

Specify MAX_NEW_TOKENS for ktransformers server #92

Specify MAX_NEW_TOKENS for ktransformers server #92

Comments

arthurv commented Sep 19, 2024

Azure-Tang commented Sep 23, 2024

bitbottrap commented Oct 14, 2024