You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using DJL container 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 with vLLM as inference engine to serve Llama 3.1 - Llama 3.3 models. Models files include "generation_config.json" file which can specify default values for sampling parameters : temperature, top_p, top_k.
The default inference values specified in the generation_config.json file are not being applied to the inference requests.
Can it be implemented ?
We would like to populate the "generation_config.json" file with values that are performing best for the model.
It seems that currently DJL ignores this file and uses defaults from
Thanks for reporting this issue. It looks like we will need to pass the generation_config.json file to the engine args in vllm https://docs.vllm.ai/en/latest/serving/engine_args.html. I will take a look at this and get back to you with a fix - I expect this to be available in the 0.32.0 release, scheduled for first week of February.
Description
We are using DJL container 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 with vLLM as inference engine to serve Llama 3.1 - Llama 3.3 models. Models files include "generation_config.json" file which can specify default values for sampling parameters : temperature, top_p, top_k.
The default inference values specified in the generation_config.json file are not being applied to the inference requests.
Can it be implemented ?
We would like to populate the "generation_config.json" file with values that are performing best for the model.
It seems that currently DJL ignores this file and uses defaults from
djl-serving/engines/python/setup/djl_python/seq_scheduler/search_config.py
Line 19 in 7315729
Thank you.
The text was updated successfully, but these errors were encountered: