You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the following code only produces 16 completion tokens :
fromlangchain_nvidia_ai_endpoints.chat_modelsimportChatNVIDIAllm=ChatNVIDIA(model="ai-mixtral-8x7b-instruct")
llm.invoke("Tell me a story about langchain")
ChatMessage(content=' Once upon a time, in a world not too different from ours, there was', response_metadata={'token_usage': {'prompt_tokens': 16, 'total_tokens': 32, 'completion_tokens': 16}, 'model_name': 'ai-mixtral-8x7b-instruct'}, role='assistant')
And it seems to be the same with other models (I tried ai-llama3-8b and ai-gemma-7b for instance).
The default max_tokens value of ChatNVIDIA is None and not 16. If we don't provide max_tokens it would be much more natural and less confusing if the model use the maximum context length of the model (e.g. 32k for ai-mixtral-8x7b-instruct).
Thanks !
The text was updated successfully, but these errors were encountered:
SimJeg
changed the title
ChatNVIDIA silently set max_tokens to 16
ChatNVIDIA silently sets max_tokens to 16
Apr 23, 2024
I have the same experience, except if using the "old" models whose names have the "playground_" prefix where the max_tokens is definitely larger than 16. Therefore, I don't think this is necessarily a problem with ChatNVIDIA but rather with limits of the models themselves.
Hello,
I noticed that the following code only produces 16 completion tokens :
And it seems to be the same with other models (I tried ai-llama3-8b and ai-gemma-7b for instance).
The default
max_tokens
value ofChatNVIDIA
is None and not 16. If we don't providemax_tokens
it would be much more natural and less confusing if the model use the maximum context length of the model (e.g. 32k for ai-mixtral-8x7b-instruct).Thanks !
The text was updated successfully, but these errors were encountered: