-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timeouts firing while tarballs are extracted #6130
Conversation
r? @matklad (rust_highfive has picked a reviewer for you, use r? to override) |
r? @ehuss |
Can this synchronous work cause Cargo to pause reading bytes from a TCP socket that is in the middle of fetching another HTTP response? Not timing out unnecessarily in the client is good, but it’d also be good if we don’t give servers a reason to time out. |
It can indeed! That's where to fix it we'll need to do tarball extraction in parallel. |
I've been looking through this. I don't think I fully understand the details, but it seems to work. Would this continue to be necessary if the extraction was done in a separate thread? |
It sort of depend how we'd do extraction, but I think it would be required, yeah. Right now before we actually extract the tarball we reacquire a file lock that was previously dropped after we saw that it didn't exist. This acquisition can take a long time and, if it does, a message is printed to the console saying we're waiting on a lock. We wouldn't want to fork of 5 tarball extractions to all print "Waiting on ..." all at once, so I think the main thread will continue to be the one to acquire file locks. To that end we can still have long pauses on the main thread we'll need to account for in timing out. I am slightly worried though in that this feature is definitely balooning in complexity quickly, although I'm not sure how that's best handled... |
☔ The latest upstream changes (presumably #6166) made this pull request unmergeable. Please resolve the merge conflicts. |
This is what curl recommends we use so let's be sure to not mess up any timers by accident!
Prevoiusly retries were tracked per-session, which meant that if you were downloading 100 packages and they all failed spuriously for the same reason you'd immediately abort. Instead though let's keep track of retries per package so if they're all coupled on one connection a failure of one will end up retrying all of them. We want to make sure that we actually retry again!
This commit fixes rust-lang#6125 by ensuring that while we're extracting tarballs or doing other synchronous work like grabbing file locks we're not letting the timeout timers of each HTTP transfer keep ticking. This is curl's default behavior (which we don't want in this scenario). Instead the timeout logic is inlined directly and we manually account for the synchronous work happening not counting towards timeout limits. Closes rust-lang#6125
a52c731
to
0b0f089
Compare
Rebased! |
ping @ehuss, have you had more time to think about this? With the beta cutoff happening soon I think we'll definitely want to merge this or back out parallel donwloads on the beta branch |
My concern is that this seems to add a lot of complexity. If the core issue is that Also, I'm not sure if I see it being as critical, since parallel downloads is opt-in, and it only seems to have an issue in a pathological case. Unless I'm missing something? I would definitely opt to merge this over backing anything out. |
Is it? When I hit #6125 I’m pretty sure I had upgraded the toolchain without opting into anything new. |
Also you can call this case pathological but it was with a real crate, not a synthetic stress test. |
Oh so I should probably have been more clear, but #6005 actually did enable parallel downloads by default. It just disabled HTTP/2 by default! The parallel downloads part comes about from the restructuring of the code to use curl's This indeed is a pathological case if we have one-at-a-time downloads, but with parallel I think it can actually happen somewhat commonly! @ehuss this does remind me, though, that I could work on an alternative patch to switch to one-by-one downloads using the same infrastructure we have now, I think that'd be a much smaller patch compared to this. Do you think that'd be best to land for beta? |
Ah, for some reason I was thinking it only mattered that the http/2 connection was being held open, but I was mistaken of the other scenario with unfinished transfers in flight. I think the background extraction would be the best approach. It helps everyone, and doesn't look too risky. Do you not agree? (Also, I think being able to configure the concurrency would be good regardless what else we do.) |
Yeah background extraction is definitely the best way to go here long term I think, I was just brainstorming other short-term solutions while that's worked on. If you've got it almost ready to go though then we can do that instead! |
I can probably post it today or tomorrow, and we can see how risky it looks. I just had some questions, which I guess I'll just ask here:
|
While |
I spent some time cleaning up my background branch, and decided it is probably too risky. I'm unhappy with the way it turned out. Some problems I ran into:
On the bright side, it does fix the original issue, and it does make things slightly faster (~20% faster for that version of servo on a slow hard drive). Here's the code if you want to look: ehuss@872bcb6 Let me know if you'd still be interested in a PR. So I think I'm r+ this PR if you are OK with it. Or if you want to do something different, let me know and I'll try to help. I'm not sure if you meant by the one-by-one comment of just changing the |
Ok thanks for the info! I'm ok going with this version for now and we can always figure out how to get async extraction later should be fine. I suspect that we'll want to go "full async" here eventually and basically have an async event loop (like curl is already) where work for tarball extraction and/or acquiring file locks is all farmed out to this async loop. That means that something like a download would simply return a That's a bit of a larger scale refactor but I think it's probably the best looking implementation in terms of cleanliness and whatnot? |
Oh and for one-by-one it's not actually as easy as I thought it would be to retrofit, but I don't think it'd be the end of the world to throw it in last second. |
@alexcrichton Have you decided what you want to do about this? |
Oh jeez I've sort of forgotten the context here... I think I'd probably prefer to land this as-is at this point, I'm finding it hard keeping all other possibilities in my head :( |
@bors r+ Yea, I don't see anything specifically wrong with this other than it is complicated. Maybe someday in the future it will be easier to extract asynchronously which might make this less necessary. |
📌 Commit 0b0f089 has been approved by |
⌛ Testing commit 0b0f089 with merge 022bfa397829fdbcbd283fd11aa2898402c0b56d... |
💔 Test failed - status-appveyor |
Fix timeouts firing while tarballs are extracted This commit fixes #6125 by ensuring that while we're extracting tarballs or doing other synchronous work like grabbing file locks we're not letting the timeout timers of each HTTP transfer keep ticking. This is curl's default behavior (which we don't want in this scenario). Instead the timeout logic is inlined directly and we manually account for the synchronous work happening not counting towards timeout limits. Closes #6125
☀️ Test successful - status-appveyor, status-travis |
[beta]: Fix timeouts firing with tarball extraction This is a backport of #6130 to the 1.31.0 branch
This commit switches the timeout logic implemented in rust-lang#6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes rust-lang#6284
Timeout batch downloads, not each download This commit switches the timeout logic implemented in #6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes #6284
This commit switches the timeout logic implemented in rust-lang#6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes rust-lang#6284
This commit fixes #6125 by ensuring that while we're extracting tarballs
or doing other synchronous work like grabbing file locks we're not
letting the timeout timers of each HTTP transfer keep ticking. This is
curl's default behavior (which we don't want in this scenario). Instead
the timeout logic is inlined directly and we manually account for the
synchronous work happening not counting towards timeout limits.
Closes #6125