-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more parameters for caching #551
Comments
Hi, @sta-szek 👋 Thanks for the feature request, we will take a look at it! |
I have a similar problem (it never finds the cache, maybe related to subdir, not sure). But while investigating this I wondered why it would need a cache key based on the POM at all. Because even if the Pom has changed the old m2 repo is more worth than none. In fact only drastic changes to the Pom (complete new repo url) would partially justify it. so I had the idea you can (meanwhile?) specify the dependency file which is used for the hash. What I try is to use the .github/workflow file instead. This should stay constant most of the time. Would that solve your problem as well? Minor thing the „cache not found“ message in setup Java should probably print the key (and also the matched dependency files), that would make my debugging easier. (edit, turning on debug modes prints the details. Turns out the not cached had two reasons, one that Pom changed nearly every time and the second that the cache which was hashed two times was meanwhile removed since so many other one shot cache uploads purged it) |
If you use an older cache as a base, dirty entries will never get removed and the cache will become too large (either for GitHub to store, or to be more efficient to restore than simply downloading what you actually need). Related discussion here - #269 |
Description:
Expose
actions/cache
properties inactions/setup-java
:restore-keys
->cache-restore-keys
key
->cache-key
Justification:
Current implementation has some drawback that prevent us from using caching functionallity in
setup-java
.The problem is that
cache-key
(docs) for maven is hardcoded to all pom.xml files in given repository.It means, whenever somebody decide to change any property in pom like: name, description, version, scm details, build configuration the action will compute new hash and will create new cache.
Here I believe this is kind of trade-off, with assumption that changes to pom.xml contains only dependency changes, while in our case 80% of changes are versions/description.
Problem 1: 80% chache miss
That is problem when we have 100 active projects in one repo and every project at least once per day change such details (mostly version).
That causes the cache to be missed and new is created, which obviously causes that all dependencies are re-downloaded.
In given case, having cache is the same as not having cache -> in most cases the key is different and new is created.
Effect: 80% of time cache does not work.
Problem 2: Cache race condition
Another problem is that when there are multiple workflows for the same project, imagine two: build app, build docs.
That is common practice on our side to split workflows in order to be able to rerun individual failed WF, not whole build.
In such case, if both action use cache, then there will be created two cache entries:
And I noticed that somehow this first (5mb) wins all the time, so next time when we try to use cache it again download missing 295mb of data...
Effect: 100% of time, the faster (less dependencies) workflow wins, causing the cache to be "almost empty" and we need to always download new files
Problem 3: Project race condition
On top of that, since the
cache-key
uses all pom.xml, we have single hash for many projects, which obviously have different dependencies.E.g.
projectA
- 100 dependencies from googleprojectB
- 50 dependencies from githubIn given case,
projectB
will use the 5MB cache fromprojectA
which are completely different.So it will download all dependencies again and at the end cache wont be updated.
Where i think the problem lays:
I also found this line , which says:
Somehow i have the feeling the assumption was done having single-project repo in mind and also that in pom.xml we can change dependencies only.
Proposal 1: Reconfigure default
cache-key
andcache-restore-keys
Here the idea would be of building cache gradually, for all apps single cache.
That would allow to:
This idea is not perfect - it can cause huge cache size and very often cache deleteion when cache size hits 10gb the old would be deleted.
But at least it allows to effectivly use caches, or rather to allow to use cache at all in huge monorepos.
Also, maybe day-of-year and timestamp are not the best. However the idea stays the same: try to find biggest cache and extend it, save at the end so next apps can extend more.
Proposal 2: Allow to configure
cache-key
andcache-restore-keys
So everybody can control on his own.
Currently i unfortunately have to add
actions/cache
on my own, but i will also have better control.Still i think we could improve the situation with proposal 1 so more people can benefit.
Proposal 3: Forcibly override cache
setup-java-${{ platform }}-${{ packageManager }}-${{ fileHash }}
would work if we can forcibly override it, so different apps can "add their dependencies" to cache.By overriding forcibly i mean to really override if GH allows or delete and create.
Not sure how it will behave if we try to delete while another WF is downloading.
All in all, i have a bit feeling that the cache is working fine with singlerepo and monorepo which is having very simple workflows that run few times per day.
Once we put a bit more load 100+ project with 500+ builds where 300 of them are chaning pom.xml, then it does not work well. 🤔
What do you think?
The text was updated successfully, but these errors were encountered: