Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python GIL problems when using GrammarMatcher in multithreading environment #213

Open
tuoyikuan opened this issue Feb 22, 2025 · 1 comment

Comments

@tuoyikuan
Copy link

I am using XGrammar in an LLM inference engine where multiple requests are processed simultaneously. Each request is managed by a separate instance of the GrammarMatcher finite state machine. In our setup, parallel updates to different GrammarMatcher instances are desired to better utilize available CPU cores.

It appears that while the GrammarCompiler bindings release the GIL during its CPU-intensive operations, the GrammarMatcher bindings do not. This behavior is limiting for our multithreaded use case because even though each GrammarMatcher is used by a single thread at a time, the global interpreter lock (GIL) still imposes serialization of CPU-bound tasks across multiple threads.

Would it be possible to modify the GrammarMatcher pybind bindings (e.g., for methods such as accept_token.) to release the GIL similar to the approach taken in GrammarCompiler? This enhancement would improve the throughput in multi-core, multi-threaded environments without relying on multiprocessing, which may introduce memory management complexities in our application.

@Ubospica
Copy link
Collaborator

@tuoyikuan Thanks for bringing that up! It is reasonable to release GIL for all these operations. We will update the code to release GIL for most operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants