-
Notifications
You must be signed in to change notification settings - Fork 44
Update h2 windowing algo & Http Client benchmark #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request introduces 1 alert when merging 4cd7338 into f81ee94 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging f445de0 into f81ee94 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging f86a9f7 into f81ee94 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging ca0c7c5 into f81ee94 - view on LGTM.com new alerts:
|
source/h2_connection.c
Outdated
@@ -1762,6 +1767,8 @@ static void s_handler_installed(struct aws_channel_handler *handler, struct aws_ | |||
aws_linked_list_push_back( | |||
&connection->thread_data.outgoing_frames_queue, &connection_window_update_frame->node); | |||
connection->thread_data.window_size_self += initial_window_update_size; | |||
/* For automatic window management, we only update connectio windows when it droped blow 50% of MAX. */ | |||
connection->thread_data.window_size_self_dropped_threshold = AWS_H2_WINDOW_UPDATE_MAX / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pull this magic number into a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's derive from a constant... and it's more clear about where it comes from/
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #388 +/- ##
==========================================
+ Coverage 79.48% 79.52% +0.03%
==========================================
Files 27 27
Lines 11686 11701 +15
==========================================
+ Hits 9289 9305 +16
+ Misses 2397 2396 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -408,12 +408,6 @@ static int s_localhost_integ_h2_upload_stress(struct aws_allocator *allocator, v | |||
s_tester.alloc = allocator; | |||
|
|||
size_t length = 2500000000UL; | |||
#ifdef AWS_OS_LINUX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this, since it seems to be the flow control window issue.
- we sent out too much
- the initial window from the local server is small (default to the magic 65535). https://httpwg.org/specs/rfc7540.html#iana-settings
- The local server code also update the window as data received.
Don't know why it matters that much for linux comparing to the other platform. (I think we do found this windowing issue affect linux more as well from the canary before, which matches this)
bin/canary/CMakeLists.txt
Outdated
|
||
list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake") | ||
|
||
file(GLOB ELASTICURL_SRC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELASTICURL
bin/canary/CMakeLists.txt
Outdated
target_link_libraries(${PROJECT_NAME} aws-c-http) | ||
|
||
if (BUILD_SHARED_LIBS AND NOT WIN32) | ||
message(INFO " elasticurl will be built with shared libs, but you may need to set LD_LIBRARY_PATH=${CMAKE_INSTALL_PREFIX}/lib to run the application") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elasticurl
CMakeLists.txt
Outdated
if (AWS_BUILD_CANARY) | ||
add_subdirectory(bin/canary) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother having an AWS_BUILD_CANARY
option? why not just always build it if we're building tests, like we're already doing with elasticurl
bin/canary/CMakeLists.txt
Outdated
EXPORT ${PROJECT_NAME}-targets | ||
COMPONENT Runtime | ||
RUNTIME | ||
DESTINATION bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DESTINATION bin | |
DESTINATION ${CMAKE_INSTALL_BINDIR} |
bin/canary/CMakeLists.txt
Outdated
@@ -0,0 +1,28 @@ | |||
project(canary C) | |||
|
|||
list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary anymore
list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake") |
raise RuntimeError("Return code {code} from: {cmd}".format( | ||
code=process.returncode, cmd=args_str)) | ||
else: | ||
print(output.decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait.
we print the output even if everything went right?
In that case, just DON'T capture output at all. It will print to the console. you don't need to pass anything to suprocess.run() except args and timeout.
and remove the comment about "gather all stderr and stdout to a single string that we print only if things go wrong"
and remove like 80% of the code in this script because the whole reason it's so complicated was to suppress output if the test passed
bin/canary/main.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been looking at this PR a while and have no idea what "canary" does. Can you add a README.md with a very brief description? and very instructions instructions
@@ -51,6 +51,7 @@ def __init__(self): | |||
def connection_made(self, transport: asyncio.Transport): | |||
self.transport = transport | |||
self.conn.initiate_connection() | |||
self.conn.increment_flow_control_window(int(2147483647/2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment explaining why you're doing this, and why this magic number
if isinstance(event, RequestReceived): | ||
self.request_received(event.headers, event.stream_id) | ||
self.conn.increment_flow_control_window( | ||
int(2147483647/2), event.stream_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? I understand you need to increment the window after DataReceived
(below). But why this change?
self.conn.increment_flow_control_window(event.flow_controlled_length) | ||
self.conn.increment_flow_control_window( | ||
event.flow_controlled_length, event.stream_id) | ||
self.receive_data(event.data, event.stream_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you move these calls from the def receive_data()
function, out to here? Seems weird to make a function to handle everything related to this event ... and then move some of the code outside that function
@@ -98,6 +98,17 @@ struct aws_h2_connection { | |||
* Reduce the space after receiving a flow-controlled frame. Increment after sending WINDOW_UPDATE for | |||
* connection */ | |||
size_t window_size_self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial / kinda-related: maybe change window_size_self
and window_size_peer
from size_t to uint32_t
they're not legally allowed to exceed 2^31-1. Having them be variably sized is just ... confusing
* received. | ||
* When manual management for connection window is on, the dropped size equals to the size of all the padding in | ||
* the data frame received */ | ||
uint32_t window_size_self_dropped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial / naming: "dropped" doesn't make immediate sense to me. maybe:
- pending_window_update_size and window_update_threshold?
- window_increment_pending_size and window_increment_threshold_size?
source/h2_connection.c
Outdated
@@ -384,6 +384,8 @@ static struct aws_h2_connection *s_connection_new( | |||
connection->thread_data.window_size_peer = AWS_H2_INIT_WINDOW_SIZE; | |||
connection->thread_data.window_size_self = AWS_H2_INIT_WINDOW_SIZE; | |||
|
|||
connection->thread_data.window_size_self_dropped_threshold = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have a non-zero value?
even if someone is doing "manual window management" on their stream, the HTTP client should still batch up its window update frames
source/h2_connection.c
Outdated
CONNECTION_LOGF(TRACE, connection, "%" PRIu32 " Bytes of padding received.", total_padding_bytes); | ||
} | ||
connection->thread_data.window_size_self_dropped += auto_window_update; | ||
if (connection->thread_data.window_size_self_dropped > connection->thread_data.window_size_self_dropped_threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic holds back a WINDOW_UPDATE frame until its size would be > the threshold.
Should there be another threshold, where we also send it immediately if window_size_self gets too low?
also trivial, but maybe >= instead of >, since the threshold is probably going to be a nice round number
source/h2_connection.c
Outdated
@@ -1762,6 +1767,8 @@ static void s_handler_installed(struct aws_channel_handler *handler, struct aws_ | |||
aws_linked_list_push_back( | |||
&connection->thread_data.outgoing_frames_queue, &connection_window_update_frame->node); | |||
connection->thread_data.window_size_self += initial_window_update_size; | |||
/* For automatic window management, we only update connection windows when it droped blow 50% of MAX. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trvial
/* For automatic window management, we only update connection windows when it droped blow 50% of MAX. */ | |
/* For automatic window management, we only update connection window when it drops below 50% of MAX. */ |
source/h2_connection.c
Outdated
return aws_h2err_from_last_error(); | ||
} | ||
connection->thread_data.window_size_self_dropped = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this logic into s_connection_send_update_window()
? I know it's only called from this one place, but it seems like if anywhere else ever wanted to call it, they should be minding all this math as well. I guess if we do that, we should rename it just s_connection_update_window()
source/h2_connection.c
Outdated
|
||
if (auto_window_update != 0) { | ||
if (s_connection_send_update_window(connection, auto_window_update)) { | ||
if (total_padding_bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utterly trivial: put some whitespace between the if/else and the next if. Otherwise it looks like an if/else-if/else-if chain
if (total_padding_bytes) { | |
if (total_padding_bytes) { |
source/h2_connection.c
Outdated
if (auto_window_update != 0) { | ||
if (s_connection_send_update_window(connection, auto_window_update)) { | ||
if (total_padding_bytes) { | ||
CONNECTION_LOGF(TRACE, connection, "%" PRIu32 " Bytes of padding received.", total_padding_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just remove this LOG. The old statement was about how the connection window was being updated, but now it's just about how much padding was received, but the decoder is already logging about padding.
* drops below the threshold. | ||
* Default to half of the initial connection flow-control window size, which is 32767. | ||
*/ | ||
uint32_t conn_window_size_threshold_to_send_update; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extremely debatable: you could just leave these out of the public options, until someone actually asks for it
* The client will send the WINDOW_UPDATE frame to the server only valid. | ||
* If the pending_window_update_size is too large, we will leave the excess to send it out later. | ||
*/ | ||
uint64_t pending_window_update_size_thread; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utterly trivial. It's already in a struct named thread_data
, no need to repeat
uint64_t pending_window_update_size_thread; | |
uint64_t pending_window_update_size; |
or
uint64_t pending_window_update_size_thread; | |
uint64_t pending_window_update_size_self; |
@@ -150,7 +165,7 @@ struct aws_h2_connection { | |||
bool is_cross_thread_work_task_scheduled; | |||
|
|||
/* The window_update value for `thread_data.window_size_self` that haven't applied yet */ | |||
size_t window_update_size; | |||
uint64_t pending_window_update_size_sync; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: it's already in synced_data.
uint64_t pending_window_update_size_sync; | |
uint64_t pending_window_update_size; |
or
uint64_t pending_window_update_size_sync; | |
uint64_t pending_window_update_size_self; |
@@ -150,7 +165,7 @@ struct aws_h2_connection { | |||
bool is_cross_thread_work_task_scheduled; | |||
|
|||
/* The window_update value for `thread_data.window_size_self` that haven't applied yet */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* The window_update value for `thread_data.window_size_self` that haven't applied yet */ | |
/* Value for `thread_data.pending_window_update_size` that we haven't applied yet */ |
* The client will send the WINDOW_UPDATE frame to the server only valid. | ||
* If the pending_window_update_size is too large, we will leave the excess to send it out later. | ||
**/ | ||
uint64_t pending_window_update_size_thread; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread
} | ||
s_unlock_synced_data(connection); | ||
} /* END CRITICAL SECTION */ | ||
if (err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 yay less chance for errors!
@@ -209,24 +209,62 @@ static struct aws_h2err s_check_state_allows_frame_type( | |||
return aws_h2err_from_h2_code(h2_error_code); | |||
} | |||
|
|||
static int s_stream_send_update_window_frame(struct aws_h2_stream *stream, size_t increment_size) { | |||
static int s_stream_send_update_window_if_needed(struct aws_h2_stream *stream, uint64_t window_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: again, window_size sounds like the absolute value, but it's a relative value
static int s_stream_send_update_window_if_needed(struct aws_h2_stream *stream, uint64_t window_size) { | |
static int s_stream_send_update_window_if_needed(struct aws_h2_stream *stream, uint64_t window_update_size) { |
if (connection->stream_window_size_threshold_to_send_update) { | ||
stream->window_size_threshold_to_send_update = connection->stream_window_size_threshold_to_send_update; | ||
} else { | ||
stream->window_size_threshold_to_send_update = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream->window_size_threshold_to_send_update = | |
/* Set reasonable default of: 50% initial window size (same as Netty's DEFAULT_WINDOW_UPDATE_RATIO) */ | |
stream->window_size_threshold_to_send_update = |
|
||
if (!stream_window_update_frame) { | ||
/* Cap the window to AWS_H2_WINDOW_UPDATE_MAX */ | ||
int32_t previous_window = stream->thread_data.window_size_self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: same thing again where we don't need this variable
int32_t previous_window = stream->thread_data.window_size_self; |
stream, | ||
"WINDOW_UPDATE frame on stream failed to be sent, error %s", | ||
aws_error_name(aws_last_error())); | ||
stream->thread_data.window_size_self = previous_window; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream->thread_data.window_size_self = previous_window; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all my feedback was style stuff. Looks good overall
AWS_BUILD_CANARY=ON
cmake flagcd build/bin/canary && ./canary
ISSUE FOUND & FIXED
We update the window for connection for each data frame received, which really slows us down for frequent small chunk of data frames receiving.
Providing tiny increments to flow control in WINDOW_UPDATE frames can cause a sender to generate a large number of DATA frames. from here
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.