Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Llama.cpp benchmark experiment? #189

Open
zlwu92 opened this issue Feb 11, 2025 · 2 comments
Open

Add Llama.cpp benchmark experiment? #189

zlwu92 opened this issue Feb 11, 2025 · 2 comments

Comments

@zlwu92
Copy link

zlwu92 commented Feb 11, 2025

Hi,

I am a beginner in LLM and am new to learn structure generation with Xgrammer.
I find that you have provided the benchmark results for Llama.cpp in the blog post and paper.
However, I do not find the benchmark experiment in the open source Xgrammer repo: examples/benchmark/bench_grammar_compile_mask_gen.py (I think it should be written here?)
If so, would you please add the test code snapshot for benchmarking Llama.cpp and show how to integrate it with Xgrammer?
Thanks.

Another question is when I run the python bench_grammar_compile_mask_gen.py --backend lmformatenforcer , I got the following error

Image with the same dataset in the file downloaded from huggingface. Image Image

What might be the problem?

@Ubospica
Copy link
Collaborator

Ubospica commented Feb 12, 2025

Hi @zlwu92, thanks for asking questions about beginning to use XGrammar and testing about llama.cpp.

For beginners, I would suggest following our tutorial that describes how to use xgrammar and huggingface transformer to guide the generation process. It's easy to learn and a very useful application scenario.

Regrading benchmark, the benchmark of llama.cpp and its internal grammar engine was done on our own fork because we needed to measure the speed of grammar initialization and mask generation.

show how to integrate it with Xgrammer

We do have a plan to integrate XGrammar into llama.cpp because we have a C++ API with complete features. That would come later.

Other baselines have changed a bit since we did our benchmark. We will update the script accordingly to make it work.

@zlwu92
Copy link
Author

zlwu92 commented Feb 12, 2025

Thank you.

Currently, does the open-sourced xgrammer include scripts for the two benchmarking experiments (1. speed of masking logits and 2. end-to-end evaluation for the LLM inference engine efficiency in serving scenarios) in the paper or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants