Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark program for LIMO&AIME #3703

Closed
2 tasks done
tanzelin430 opened this issue Feb 19, 2025 · 4 comments
Closed
2 tasks done

benchmark program for LIMO&AIME #3703

tanzelin430 opened this issue Feb 19, 2025 · 4 comments
Assignees

Comments

@tanzelin430
Copy link

Checklist

Motivation

During the test for #3615, I have noticed that the benchmark script (bench_serving.py) provided with sglang only supports basic datasets like ShareGPT for evaluation. The benchmarking scripts for LIMO and AIME datasets are missing. Could you please provide evaluation scripts for the LIMO and AIME datasets? If not, I would be happy to contribute by adding support for these datasets to facilitate future evaluations.

Related resources

No response

@tanzelin430 tanzelin430 changed the title [Feature] benchmark program for LIMO&AIME benchmark program for LIMO&AIME Feb 19, 2025
@Fridge003
Copy link
Collaborator

Fridge003 commented Feb 19, 2025

cc @zhaochenyang20 @jhinpan

@Fridge003 Fridge003 self-assigned this Feb 19, 2025
@Fridge003
Copy link
Collaborator

Hi @tanzelin430 , we don't have evaluation scripts for LIMO and AIME. Feel free to raise a PR~

@zhaochenyang20
Copy link
Collaborator

@tanzelin430 contact @simveit for help.

@simveit
Copy link
Contributor

simveit commented Feb 20, 2025

Please see this issue for guidance.
Also you can contact me on Slack (Simon V).
@zhaochenyang20 I think we can close this issue in favor of the one i wrote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants