benchmark program for LIMO&AIME #3703

tanzelin430 · 2025-02-19T13:59:47Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

During the test for #3615, I have noticed that the benchmark script (bench_serving.py) provided with sglang only supports basic datasets like ShareGPT for evaluation. The benchmarking scripts for LIMO and AIME datasets are missing. Could you please provide evaluation scripts for the LIMO and AIME datasets? If not, I would be happy to contribute by adding support for these datasets to facilitate future evaluations.

Related resources

No response

Fridge003 · 2025-02-19T21:23:21Z

cc @zhaochenyang20 @jhinpan

Fridge003 · 2025-02-19T22:17:13Z

Hi @tanzelin430 , we don't have evaluation scripts for LIMO and AIME. Feel free to raise a PR~

zhaochenyang20 · 2025-02-20T03:29:05Z

@tanzelin430 contact @simveit for help.

simveit · 2025-02-20T07:54:28Z

Please see this issue for guidance.
Also you can contact me on Slack (Simon V).
@zhaochenyang20 I think we can close this issue in favor of the one i wrote.

tanzelin430 changed the title ~~[Feature] benchmark program for LIMO&AIME~~ benchmark program for LIMO&AIME Feb 19, 2025

Fridge003 self-assigned this Feb 19, 2025

Fridge003 closed this as completed Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark program for LIMO&AIME #3703

benchmark program for LIMO&AIME #3703

tanzelin430 commented Feb 19, 2025

Fridge003 commented Feb 19, 2025 •

edited

Loading

Fridge003 commented Feb 19, 2025

zhaochenyang20 commented Feb 20, 2025

simveit commented Feb 20, 2025

benchmark program for LIMO&AIME #3703

benchmark program for LIMO&AIME #3703

Comments

tanzelin430 commented Feb 19, 2025

Checklist

Motivation

Related resources

Fridge003 commented Feb 19, 2025 • edited Loading

Fridge003 commented Feb 19, 2025

zhaochenyang20 commented Feb 20, 2025

simveit commented Feb 20, 2025

Fridge003 commented Feb 19, 2025 •

edited

Loading