Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve throughput when vectored IO is not enabled #712

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

xiaoyawei
Copy link
Contributor

@xiaoyawei xiaoyawei commented Sep 4, 2023

As discussed in #711, the current implementation of sending data is suboptimal when vectored I/O is not enabled: data frame's head is likely to be sent in a separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O case. In short, it sets a larget chain threhold, and also makes sure a data frame's head is sent along with the beginning part of the real data payload.

All existing unit tests passed. Also I take a look at the e2e https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs but realize that all the benchmarks there are for the case of vectored I/O if the OS supports vectored I/O. There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations. @seanmonstar Would appreciate if there's some advice on how to do benchmarks for this change.

This PR also includes a trivial bug fix, and addressing a straightforward TODO comment

As discussed in hyperium#711, the
current implementation of sending data is suboptimal when vectored
I/O is not enabled: data frame's head is likely to be sent in a
separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O
case. In short, it sets a larget chain threhold, and also makes
sure a data frame's head is sent along with the beginning part of
the real data payload.

All existing unit tests passed. Also I take a look at the e2e
https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs
but realize that all the benchmarks there are for the case of
vectored I/O if the OS supports vectored I/O. There isn't a specific
case for non-vectored I/O so I am not sure how to proceed with
benchmark for performance evaluations.
@seanmonstar
Copy link
Member

There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations.

For comparison purposes, you could write a struct NonVecIo<T>(T), and implement AsyncRead and AsyncWrite for it, forwarding each method on, except for is_write_vectored(). You could do that in the end_to_end.rs file, and compare before and after. I don't think we need commit that change, though...

@xiaoyawei
Copy link
Contributor Author

xiaoyawei commented Sep 15, 2023

@seanmonstar I forked hyper and put together a commit xiaoyawei/hyper@6f3c379 to benchmark vectored io. In my test setup, parallelism is 10, both request and response payload is of 500 bytes. I run the benchmark with h2 0.3.9 and this fix 3 times, and the result is as follows

0.3.9

  • 242576 ns/iter (+/- 32535)
  • 257091 ns/iter (+/- 82970)
  • 258892 ns/iter (+/- 216218)

w/ this fix

  • 184308 ns/iter (+/- 57584)
  • 181720 ns/iter (+/- 51599)
  • 179991 ns/iter (+/- 38387)

Looks like the perf improvement is pretty solid

@seanmonstar
Copy link
Member

Nice! Thanks for putting that together.

@nox or @Noah-Kennedy, does this seem good in your cases too?

@Noah-Kennedy
Copy link
Contributor

I'll take a look today. Not sure if @nox has opinions or not.

.gitignore Outdated Show resolved Hide resolved
@xiaoyawei
Copy link
Contributor Author

@seanmonstar @Noah-Kennedy

I removed the gitignore rules; let me know if there are other feedback, thanks ;)

@seanmonstar seanmonstar merged commit a3f01c1 into hyperium:master Sep 28, 2023
@seanmonstar
Copy link
Member

Excellent work, thank you for the benchmark results, they certainly help in feeling confident with the change.

@xiaoyawei xiaoyawei deleted the data_frame_overhead branch November 14, 2023 11:16
0xE282B0 pushed a commit to 0xE282B0/h2 that referenced this pull request Jan 11, 2024
As discussed in hyperium#711, the
current implementation of sending data is suboptimal when vectored
I/O is not enabled: data frame's head is likely to be sent in a
separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O
case. In short, it sets a larget chain threhold, and also makes
sure a data frame's head is sent along with the beginning part of
the real data payload.

All existing unit tests passed. Also I take a look at the e2e
https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs
but realize that all the benchmarks there are for the case of
vectored I/O if the OS supports vectored I/O. There isn't a specific
case for non-vectored I/O so I am not sure how to proceed with
benchmark for performance evaluations.
0xE282B0 pushed a commit to 0xE282B0/h2 that referenced this pull request Jan 11, 2024
As discussed in hyperium#711, the
current implementation of sending data is suboptimal when vectored
I/O is not enabled: data frame's head is likely to be sent in a
separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O
case. In short, it sets a larget chain threhold, and also makes
sure a data frame's head is sent along with the beginning part of
the real data payload.

All existing unit tests passed. Also I take a look at the e2e
https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs
but realize that all the benchmarks there are for the case of
vectored I/O if the OS supports vectored I/O. There isn't a specific
case for non-vectored I/O so I am not sure how to proceed with
benchmark for performance evaluations.
0xE282B0 pushed a commit to 0xE282B0/h2 that referenced this pull request Jan 16, 2024
As discussed in hyperium#711, the
current implementation of sending data is suboptimal when vectored
I/O is not enabled: data frame's head is likely to be sent in a
separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O
case. In short, it sets a larget chain threhold, and also makes
sure a data frame's head is sent along with the beginning part of
the real data payload.

All existing unit tests passed. Also I take a look at the e2e
https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs
but realize that all the benchmarks there are for the case of
vectored I/O if the OS supports vectored I/O. There isn't a specific
case for non-vectored I/O so I am not sure how to proceed with
benchmark for performance evaluations.

Signed-off-by: Sven Pfennig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants