-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build error in one module aborts (successful) compilation of parallel module builds #1
Comments
In Shake, if 1 rule fails, that's immediate failure of all rules currently running. Since failing is typically faster, that's a common scenario. The way to avoid that is to enable staunch mode, causing Shake to continue until there is nothing further it can do, but it does delay getting errors. While the above setup does waste CPU cycles, it doesn't waste any programmer time until C successfully compiles, and even then it only "wastes" the time between finishing C and A. I think that makes it a sensible trade off. |
oh, but it does waste programmer time. failing on purely C is quicker than failing on (A|B|C) in the very basic testing i just did. my results roughly were: 4.0sec for plain cabal build; 3.7s for cabal build with ghc-shake; 2.7s for directly invoking ghc on the failing module (i.e. without --make and exactly one module, but otherwise the exact same flags as the cabal invokation); ~5.9s for cabal build with ghc-shake in presence of the above behaviour. |
To enable staunch mode with ghc-shake, pass |
to be fair, my testcase may be way too simple. this might add a small worst-case constant factor, while greatly improving the linear factor on general project size (or size of the project dependency graph or whatever). |
I'm confused:
Are you saying its 3.7s if the error is missing, but 5.9 if the error is present? I would expect that the time to report a build failure in C is either T(C) (the time to compute the error), or X + T(C) where X indicates useful work done before starting C, and thus work that is not repeated next time. |
i can work around the above by fixing C, then compiling A,B,C, then breaking C again. Now
and 3.7s. After a clean (and corresponding to the first comment's output) the time is 5.9s. |
Interesting! I don't think that should be the case... Getting some kind of traces is probably desirable, something like http://shakebuild.com/profiling#chrome-traces. However, I don't know how well ghc-shake meshes with the existing profiling, and the profiling wasn't designed for failed builds, so I don't really know what it does in those circumstances. |
Looking at it visually, does C seem to start later in the slower trace, or does it seem that the time from C failing to the end is longer? |
I would assume they started simultaneously, but I'd have to check again. Will also look into those traces, but i gotta run for now. |
If they do start simultaneously it suggests that the action being performed to build a GHC file is not being interrupted properly. I just confirmed in a small test that Shake really does interrupt tasks as soon as one fails, so one for @ezyang to see what that particular task is doing. |
…e slow success gets killed Tests ezyang/ghc-shake#1
There's now a test in Shake that ensures a fast failing computation and a slow succeeding computation aborts quickly. |
Tested; i even observed a case where the failing module was first now (i.e. Using @ndmitchell but more work being done in parallel could just result in each task taking longer (if the task itself uses multiple cores), right? |
Yes, more work could slow down each individual piece of work. It's possible, but I'd say it was pretty unlikely. Can you use something system task manager (e.g. Process Explorer) to check your CPU load doesn't hit 100%? If you have plenty of clearance it's unlikely to be that. |
When I try @lspitzner's example in staunch mode, the error gets reported quite quickly. So it is unlikely that extra CPU use is causing the problem; rather, it is a combination of two things: (1) Without staunch mode, Shake waits till it kills everything before reporting the exception (that's partially my fault, as I pass |
Shake waits to kill everything because that's the most hygienic thing to do - otherwise you can't guarantee someone isn't going to access the database afterwards, which would no longer be valid. GHC not being killed by async exceptions is the thing to look at. |
i do see high (good 90%) cpu usage during the parallel A,B,C case. for me, when i was in any kind of "error loop", i never observed any noticeable delay between printing of the error message and termination. |
So the inability to kill things in a prompt fashion is due to ndmitchell/shake#474 since ghc-shake is still pegged to an old, patched version of Shake which leaks masking information. I wonder if GHC make is able to kill things promptly; it's hard to tell because the output is interleaved too coarsely, see https://ghc.haskell.org/trac/ghc/ticket/12349 |
in a setup of modules like A, B, C, D where D depends on A,B,C. If there is an error in C, after a
cabal clean
,cabal build -wghc-shake
does the following, roughly:and this repeats when you try to recompile. I.e. A and B never finish compilation, wasting cpu-cycles at each build while the programmer works on fixing C.
The text was updated successfully, but these errors were encountered: