-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nnsight with multithreading #280
Comments
Additionally, I get the error: |
@lithafnium Hey I'd love to know more about this if you could get me some small reproduceable example. I dont think nnsight is close to threadsafe although maybe there are some features in nnsight you could disable to get it working. On another note, 0.4 is going to have vllm support so potentially you can just use that? |
Config flag to turn off global tracing
@lithafnium You can now turn off global tracing #327 . Should work for multithreading! |
This is more of a niche feature but I'm attempting to serve
nnsight
as an api using FastAPI. I'm accessing the model and invoking the trace usingloop.run_in_executor()
. This mostly works. However, when I run multiple requests at the same time I receive the error:RuntimeError: trying to pop from empty mode stack
. I assume this is because innnsight
there's some sort of global tracing involved when calculating the compute graph, which according to the error is not threadsafe? Not sure if thats the case, would love some some insight on this.I'm fairly certain
loop.run_in_executor()
should work fine, as this functions normally with regular huggingface, and this is howvLLM
handles async requests in theirAsyncLLMEngine
. I'm wondering if other people have noticed this error and whether there are any ways to circumvent this.The text was updated successfully, but these errors were encountered: