Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CI performance for Windows #2600

Merged
merged 5 commits into from
Nov 17, 2023
Merged

Conversation

jchadwick-buf
Copy link
Member

Sometimes the existing workflow runs quickly - sometimes it runs incredibly slowly.

This seems to come down to a disk I/O issue. The cache tarball will get downloaded very quickly, but then it will take minutes to untar. It's unclear why. I suspect there is something in particular wrong with the home directory in the Windows runner. In addition, go test itself seems to have unreliable performance characteristics.

Merely just moving the Go cache and path into the GitHub Actions _temp dir seems to vastly improve consistency. I ran the workflow multiple times to collect some runs:

The cache restore times are reliably counted in seconds. There is still some run-to-run variance, but pathological behavior seems to have disappeared, both for go test and cache restore. While this doesn't bring performance quite up to par with the other workflows, it does seem to help quite a bit.

I'm really hoping I didn't miss anything here. Any reason why I may have tricked myself into thinking I solved the problem? :)

Copy link
Member

@doriable doriable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me, just added a nit :) Thanks for looking into this!

.github/workflows/windows.yaml Outdated Show resolved Hide resolved
@jchadwick-buf
Copy link
Member Author

(Re-requesting your reviews to make sure I handled your comments satisfactory.)

@@ -7,23 +7,25 @@ jobs:
test:
env:
DOWNLOAD_CACHE: 'd:\downloadcache'
GOPATH: 'd:\a\_temp\go\work'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we're after is putting both the go build/test cache and mod cache in this directory. I think I'd rather we use GOMODCACHE instead of GOPATH here then (just in case anything else sneaks in under GOPATH).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, sounds good. Since this basically involves breaking the cache again, I'm going to go ahead and give a shot at making the setup-go@v4 path work; these are the two directories that setup-go@v4 caches, so it ought to be at least similar.

Copy link
Member

@pkwarren pkwarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just one small suggestion.

@jchadwick-buf
Copy link
Member Author

OK, now with setup-go@v4 caching. Hopefully this will be a lot more pleasing to people.

Performance so far:

OK, still looks good. Nice.

The one last thing would be to check to see if we really need the silly _temp dir. @pkwarren has been instrumental in finding underlying issues relating to this problem and found actions/setup-go#393 which suggests that it's not that the temp directory is fast, it's that C: is slow. So in theory, we can move the cache dirs to just D: and it will be fast still.

@jchadwick-buf
Copy link
Member Author

jchadwick-buf commented Nov 17, 2023

And the verdict is in: no weird _temp directory needed.

For more information on what's going on, see this issue. Although it's not said explicitly, I have a hunch at what the problem might be. C: drive obviously needs to contain the operating system and basically all of the runner image defaults, whereas D: is just a fresh, new drive. Internally this is probably implemented as creating a disk from a snapshot. And when creating a disk from a snapshot in a VM, it's pretty typical for the performance to be slow. In fact, even the Azure documentation references this. It's not clear that this is exactly how GitHub Actions works internally, but it would suffice to say that the constraint that C: drive must contain all of the runner image contents probably makes it a lot slower due to needing copy-on-write behavior.

So now, and hopefully for the last time...

Phew. This one was really tricky and had a lot of unexpected twists and turns. Thanks @pkwarren for the help figuring things out, and I hope everyone feels I addressed their comments sufficiently. I'll leave this open for a few more minutes to make sure nobody else has anything to say.

@jchadwick-buf jchadwick-buf merged commit 11089b9 into main Nov 17, 2023
7 checks passed
@jchadwick-buf jchadwick-buf deleted the jchadwick/faster-windows-ci branch November 17, 2023 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants