You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If your model is a chat model, try to use --hf-type chat, this will use chat_template of the model. On the other hand, under the hood, OC uses HF to generate, try to call the original HF generate for one example to see if it also takes that long time.
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
it can inference correctly
Reproduces the problem - code/configuration sample
python run.py --datasets math_500_gen --hf-type base --hf-path /home/maoshizhuo/2025/deepseek-Qwen-1.5B --debug --max-out-len 32768
02/25 23:53:14 - OpenCompass - INFO - Loading math_500_gen: /home/maoshizhuo/2025/opencompass/opencompass/configs/./datasets/math/math_500_gen.py
02/25 23:53:14 - OpenCompass - INFO - Loading example: /home/maoshizhuo/2025/opencompass/opencompass/configs/./summarizers/example.py
02/25 23:53:14 - OpenCompass - INFO - Current exp folder: outputs/default/20250225_235314
02/25 23:53:14 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
02/25 23:53:14 - OpenCompass - INFO - Partitioned into 1 tasks.
02/25 23:53:16 - OpenCompass - WARNING - Only use 1 GPUs for total 4 available GPUs in debug mode.
02/25 23:53:16 - OpenCompass - INFO - Task [deepseek-Qwen-1.5B_hf/math-500]
02/25 23:53:33 - OpenCompass - INFO - Try to load the data from /home/maoshizhuo/.cache/opencompass/./data/math/
02/25 23:53:33 - OpenCompass - INFO - Start inferencing [deepseek-Qwen-1.5B_hf/math-500]
11%|███████████████ | 7/63 [13:49:33<118:18:44, 7605.80s/it]
Reproduces the problem - command or script
python run.py --datasets math_500_gen --hf-type base --hf-path /home/maoshizhuo/2025/deepseek-Qwen-1.5B --debug --max-out-len 32768
Reproduces the problem - error message
need too much time to get result about 131 hours
Other information
有什么办法加速推理吗?我注意到vllm可以加速推理,但是由于其集成了量化技术,得到的精度并不准确,我希望得到准确的结果并加速。我的实验环境有4张V100-32G GPU。谢谢!
The text was updated successfully, but these errors were encountered: