-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve PythonAsyncEngine error handling and Increase Tokio thread count #129
Conversation
max_worker_threads: 16, | ||
max_blocking_threads: 16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few notes/questions:
- Should these be env vars similar to the
from_settings
support for configurability? ex:
TRITON_RUNTIME_NUM_WORKER_THREADS=16
TRITON_RUNTIME_MAX_BLOCKING_THREADS=16
-
It looks like there is already a native tokio env var for worker threads:
TOKIO_WORKER_THREADS
- but I'm fine with defining our own for consistency with other config settings -
From the docs - it looks like the default behavior, if left unset, is to use the num cpu cores rather than trying to apply a fixed number to various hardwares and scenarios. Do we want to define a fixed number here?
The default value is the number of cores available to the system.
- Docs seem to say that
worker_threads
will spawn/use that many threads - so I think it's more aptly named"num_worker_threads"
than"max_worker_threads"
(which implies it may not spawn/use them all). Formax_blocking_threads
, I think that name fits well based on the docs:
Unlike the worker_threads, they are not always active and will exit if left idle for too long. You can change this timeout duration with thread_keep_alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do make these configurable -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TRITON_RUNTIME_
envs are the correct envs. We can also use TOML files in specific paths to override compiled defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could rename NUM_WORKER_THREADS
to NUM_ASYNC_THREADS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #146 as a good intro task for someone to pick up - but feel free to commit it if you already have it done
Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Ryan Olson <[email protected]>
@rmccorm4 - LGTM - good with your review - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we run this via pytest - maybe nightly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so - we can test-ify and add the pytest.mark.nightly
marker here - but I think the CI needs an update to run a nightly scheduled job @nv-anants
- DLIS-7968
- DLIS-7969
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i marked the rust soak.rs test as a feature, so by default it will not run.
cargo test --features integration
we'll want a mark for test that depend on etcd/nats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #150
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM- defer to @rmccorm4 for final review
…; adding soak test
…server/triton_distributed into ryan/250207-push-fixes
let msg = format!("critical error: failed to offload the python async generator to a new thread: {}", e); | ||
log::error!(request_id, "{}", msg); | ||
msg | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to match none of the cases here? if so, should we have a default case here? Or it's guaranteed to be a ResponseProcessingError
and will compile-time fail if new errors are added but no corresponding match case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the latter. its good practice not to have a default, because the compiler will enforce that all possible arms of the match statement be defined.
if you add another arm in the enum, you will see a compiler error on the match, or any where in the code base this is used. in this case the enum is private, so it's static to the file for now.
What does the PR do?
The default single threaded Tokio runtime was being starved, likely due to GIL contention in the python soak.py test.
When we grab the GIL, we are actually blocking our rust tokio async threads. We might consider scheduling GIL calls ot the offload threads meant for blocking calls.
We might also consider using separate single threaded tokio runtime to handle all task that will touch the GIL.
A third, and likely best option is to use static tokio mutex handle which need to be acquired before accessing the GIL, thus put a yielding tokio lock around the GIL.
The multi-threaded RT is sufficient now to unblock the soak test, but long-term strategies on GIL handling need to be considered.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
If the channel is closed, the send errors. We were not checking the return status and unwrapping them.
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)