-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CI performance for Windows #2600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me, just added a nit :) Thanks for looking into this!
(Re-requesting your reviews to make sure I handled your comments satisfactory.) |
.github/workflows/windows.yaml
Outdated
@@ -7,23 +7,25 @@ jobs: | |||
test: | |||
env: | |||
DOWNLOAD_CACHE: 'd:\downloadcache' | |||
GOPATH: 'd:\a\_temp\go\work' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we're after is putting both the go build/test cache and mod cache in this directory. I think I'd rather we use GOMODCACHE
instead of GOPATH
here then (just in case anything else sneaks in under GOPATH
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, sounds good. Since this basically involves breaking the cache again, I'm going to go ahead and give a shot at making the setup-go@v4 path work; these are the two directories that setup-go@v4 caches, so it ought to be at least similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - just one small suggestion.
OK, now with Performance so far:
OK, still looks good. Nice. The one last thing would be to check to see if we really need the silly _temp dir. @pkwarren has been instrumental in finding underlying issues relating to this problem and found actions/setup-go#393 which suggests that it's not that the temp directory is fast, it's that C: is slow. So in theory, we can move the cache dirs to just D: and it will be fast still. |
And the verdict is in: no weird For more information on what's going on, see this issue. Although it's not said explicitly, I have a hunch at what the problem might be. C: drive obviously needs to contain the operating system and basically all of the runner image defaults, whereas D: is just a fresh, new drive. Internally this is probably implemented as creating a disk from a snapshot. And when creating a disk from a snapshot in a VM, it's pretty typical for the performance to be slow. In fact, even the Azure documentation references this. It's not clear that this is exactly how GitHub Actions works internally, but it would suffice to say that the constraint that C: drive must contain all of the runner image contents probably makes it a lot slower due to needing copy-on-write behavior. So now, and hopefully for the last time...
Phew. This one was really tricky and had a lot of unexpected twists and turns. Thanks @pkwarren for the help figuring things out, and I hope everyone feels I addressed their comments sufficiently. I'll leave this open for a few more minutes to make sure nobody else has anything to say. |
Sometimes the existing workflow runs quickly - sometimes it runs incredibly slowly.
This seems to come down to a disk I/O issue. The cache tarball will get downloaded very quickly, but then it will take minutes to untar. It's unclear why. I suspect there is something in particular wrong with the home directory in the Windows runner. In addition,
go test
itself seems to have unreliable performance characteristics.Merely just moving the Go cache and path into the GitHub Actions
_temp
dir seems to vastly improve consistency. I ran the workflow multiple times to collect some runs:The cache restore times are reliably counted in seconds. There is still some run-to-run variance, but pathological behavior seems to have disappeared, both for
go test
and cache restore. While this doesn't bring performance quite up to par with the other workflows, it does seem to help quite a bit.I'm really hoping I didn't miss anything here. Any reason why I may have tricked myself into thinking I solved the problem? :)