You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Please use English, otherwise it will be closed.
Motivation
During the test for #3615, I have noticed that the benchmark script (bench_serving.py) provided with sglang only supports basic datasets like ShareGPT for evaluation. The benchmarking scripts for LIMO and AIME datasets are missing. Could you please provide evaluation scripts for the LIMO and AIME datasets? If not, I would be happy to contribute by adding support for these datasets to facilitate future evaluations.
Related resources
No response
The text was updated successfully, but these errors were encountered:
tanzelin430
changed the title
[Feature] benchmark program for LIMO&AIME
benchmark program for LIMO&AIME
Feb 19, 2025
Please see this issue for guidance.
Also you can contact me on Slack (Simon V). @zhaochenyang20 I think we can close this issue in favor of the one i wrote.
Checklist
Motivation
During the test for #3615, I have noticed that the benchmark script (bench_serving.py) provided with sglang only supports basic datasets like ShareGPT for evaluation. The benchmarking scripts for LIMO and AIME datasets are missing. Could you please provide evaluation scripts for the LIMO and AIME datasets? If not, I would be happy to contribute by adding support for these datasets to facilitate future evaluations.
Related resources
No response
The text was updated successfully, but these errors were encountered: