-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic overflow in rustc-rayon-core causing CI failures #90227
Comments
. rust-lang#89776 enabled overflow checks in CI but these lead to two failures already: rust-lang#90042 (comment) rust-lang#90222 (comment) The (first?) problem has been identified: rust-lang#90227 This PR temporarily disables the overflow checks again so we don't have to deal with the "spurious" CI failures until rustc-rayon is fixed. Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/ci.20failed.20with.20.20.22attempt.20to.20subtract.20with.20overflow.22
@rustbot label +T-compiler +A-parallel-queries |
…rflow-checks, r=Mark-Simulacrum Temporarily turn overflow checks off for rustc-rayon-core The rustc fork of Rayon has deadlock detection code which intermittently causes overflows in the CI (see rust-lang#90227). So, as a workaround, we unconditionally turn overflow checks off for this crate only. This workaround should be removed once rust-lang#90227 is fixed. r? `@Mark-Simulacrum` cc `@matthiaskrgr`
Interestingly enough the errors only appeared on Windows, in rustdoc and in builds which don't have the parallel queries enabled. So I guess Rustdocs use of rustc-rayon triggers it? Lines 66 to 77 in d1d8145
|
cc @jyn514 since IIRC you were looking into the DocFS code at some point recently |
I don't have any insight here, I think this just happens to be the only part of the compiler using rayon when cfg(parallel_compiler) is disabled. |
Hm... I would suggest that rustdoc probably wants to use mainline rayon here, rather than rustc's fork of it, since I/O code likely doesn't want to interop with jobserver and such. I don't know whether that will fix this particular issue, but it seems like a good idea. |
@Mark-Simulacrum what jobserver are you referring to here? I don't know what the differences between mainline rayon and rustc's fork are. |
The fork is primarily intended to interface with the jobserver that Cargo creates (though I don't know that rustdoc ever configures it as such) to ensure that rayon's worker threads are not using more than the allocated share of CPU resources. This is the same jobserver that manages e.g. LLVM parallelism within each rustc compilation, for example, to ensure there's at most "ncpu" threads running at the same time. For I/O heavy work like this, I'd expect that you don't want that - your threads are mostly blocked on I/O completion which is not CPU-heavy for the most part. |
The rustc fork of rayon integrates with Cargo's jobserver to limit the amount of parallelism. However, rustdoc's use case is concurrent I/O, which is not CPU-heavy, so it should be able to use mainline rayon. See this discussion [1] for more details. [1]: rust-lang#90227 (comment) Note: I chose rayon 1.3.1 so that the rayon version used elsewhere in the workspace does not change.
rustdoc: Switch to mainline rayon The rustc fork of rayon integrates with Cargo's jobserver to limit the amount of parallelism. However, rustdoc's use case is concurrent I/O, which is not CPU-heavy, so it should be able to use mainline rayon. See [this discussion][1] for more details. [1]: rust-lang#90227 (comment) Note: I chose rayon 1.3.1 so that the rayon version used elsewhere in the workspace does not change. r? `@Mark-Simulacrum` cc `@jyn514`
Is that really the right solution? It seems like the overflow checks have found a bug in Isn't the goal of adding overflow checks to uncover latent bugs and then fix them rather than finding new ways to ignore them? |
We should still try to fix the bug, absolutely, and investigate whether it manifests in upstream rayon as well. However, I don't think that changes the assessment that relying on upstream rayon is likely better than the fork in the long run for rustdoc's use case. |
Yeah, we switched to mainline rayon mostly for other reasons. That's why I left this issue open :) |
@hkratz Please paste the contents of links in issues instead of just providing a URL. Those links to CI logs now dead, so the only hint now about how to reproduce this is
|
It seems that enabling overflow checks (#89776) uncovered non-deterministic overflows in rustc-rayon-core causing the build failures of #90042 and #90222:
Overflow location:
https://github.com/rust-lang/rustc-rayon/blob/c8ec88d8a2236d1fe19d65a4ab38834f76d256b7/rayon-core/src/sleep/mod.rs#L330
This code is not in upstream rayon but just in our fork, introduced here:
rust-lang/rustc-rayon@27911f7#diff-cde9d726ca2f32c319420be6c61d2e57ad9daff7f85a5f8f7be25474a15f9c4dR328
The text was updated successfully, but these errors were encountered: