-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Update the response queue in the server to reuse response slots #7879
base: main
Are you sure you want to change the base?
Changes from all commits
c9bcbd9
2fba1dd
16d347a
12cc2d2
740d167
9020014
6ccee17
62e719c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
// Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
// Copyright 2022-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
// | ||
// Redistribution and use in source and binary forms, with or without | ||
// modification, are permitted provided that the following conditions | ||
|
@@ -306,6 +306,7 @@ enum TritonOptionId { | |
OPTION_GRPC_ADDRESS, | ||
OPTION_GRPC_HEADER_FORWARD_PATTERN, | ||
OPTION_GRPC_INFER_ALLOCATION_POOL_SIZE, | ||
OPTION_GRPC_MAX_RESPONSE_POOL_SIZE, | ||
OPTION_GRPC_USE_SSL, | ||
OPTION_GRPC_USE_SSL_MUTUAL, | ||
OPTION_GRPC_SERVER_CERT, | ||
|
@@ -536,6 +537,11 @@ TritonParser::SetupOptions() | |
"allocated for reuse. As long as the number of in-flight requests " | ||
"doesn't exceed this value there will be no allocation/deallocation of " | ||
"request/response objects."}); | ||
grpc_options_.push_back( | ||
{OPTION_GRPC_MAX_RESPONSE_POOL_SIZE, "grpc-max-response-pool-size", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need an extra argument for this? Why no reuse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tanmayv25, We use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Considering the above, I think it may be better to maintain a separate flag for controlling the response queue limit without affecting the state bucket. |
||
Option::ArgInt, | ||
"The maximum number of inference response objects that can remain " | ||
"allocated in the pool at any given time."}); | ||
grpc_options_.push_back( | ||
{OPTION_GRPC_USE_SSL, "grpc-use-ssl", Option::ArgBool, | ||
"Use SSL authentication for GRPC requests. Default is false."}); | ||
|
@@ -1438,6 +1444,14 @@ TritonParser::Parse(int argc, char** argv) | |
case OPTION_GRPC_INFER_ALLOCATION_POOL_SIZE: | ||
lgrpc_options.infer_allocation_pool_size_ = ParseOption<int>(optarg); | ||
break; | ||
case OPTION_GRPC_MAX_RESPONSE_POOL_SIZE: | ||
lgrpc_options.max_response_pool_size_ = ParseOption<int>(optarg); | ||
if (lgrpc_options.max_response_pool_size_ <= 0) { | ||
throw ParseException( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we also need to have an upper limit on the check? |
||
"Error: --grpc-max-response-pool-size must be greater " | ||
"than 0."); | ||
} | ||
break; | ||
case OPTION_GRPC_USE_SSL: | ||
lgrpc_options.ssl_.use_ssl_ = ParseOption<bool>(optarg); | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test plan in the description on what we are trying to test here?
Why have we set
--grpc-max-response-pool-size
only to1
?Also can we add a test to confirm memory footprint decreses with using
--grpc-max-response-pool-size
VS not using--grpc-max-response-pool-size
?