Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache: only store executable permission bit #11541

Merged
merged 2 commits into from
Mar 25, 2025
Merged

Cache: only store executable permission bit #11541

merged 2 commits into from
Mar 25, 2025

Conversation

ElectreAAS
Copy link
Collaborator

@ElectreAAS ElectreAAS commented Mar 17, 2025

This PR is built on top of the not-yet-merged #11534 main!, and includes the fix proposed in #11533.
We just stop storing the read & write permissions bits, only store the executable one. The strategy is the same as in git, we check if any of the u+x/g+x/o+x bits are set.

@ElectreAAS ElectreAAS requested a review from emillon March 17, 2025 14:59
@maiste maiste added the shared-cache Shared artefacts cache label Mar 17, 2025
@Leonidas-from-XIV
Copy link
Collaborator

I've merged #11534 so you can rebase now :-)

@ElectreAAS
Copy link
Collaborator Author

Done!

{ st_kind = stats.st_kind; st_perm = stats.st_perm }
(* Check if any of the +x bits are set, ignore read and write *)
let executable = 0o111 land stats.st_perm <> 0 in
{ st_kind = stats.st_kind; executable }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda wonder whether Path.Permissions.executable would make sense to be used here, but maybe it is not worth the hassle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.Permissions.(test executable) will only test if the flag is set for the current user (u+x, or 0o100)
I wanted here to explicitly test if that flag is set for anyone.
Granted, files with [u-x g+x o+x] are probably few and far between, but still.

@Alizter
Copy link
Collaborator

Alizter commented Mar 19, 2025

Is there a discussion of this design change anywhere? I'm curious for the motivation.

@ElectreAAS
Copy link
Collaborator Author

There is only the discussion happening in the issue #11533. My motivation was fixing the issue :)
In a conversation with emillon we also linked https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/ContentLimitations.html for the info of "git only stores the +x flag"

@Alizter Alizter requested a review from rgrinberg March 19, 2025 16:16
@maiste
Copy link
Collaborator

maiste commented Mar 20, 2025

I'm curious: do we know the effect it will have on existing cache codebase? Will it retrigger the computation for the already cached elements?

@Leonidas-from-XIV
Copy link
Collaborator

Will it retrigger the computation for the already cached elements?

I would assume so since the change is in Stats_for_digest and that's probably the digest the cache uses to look up files. Which is.. ok I believe, if this is the way we think it should be done.

@emillon
Copy link
Collaborator

emillon commented Mar 21, 2025

Note that we can't just change the behavior and call it a fix. This is observable from build rules.
For example chmodding a file would cause a rebuild of anything that depends on it, and it wouldn't anymore.

Concretely, if you have a rule like this:

(rule
 (with-stdout-to stat.txt
 (run stat file.txt)))

The stat.txt in _build won't be updated when the permissions change on file.txt.

That "simplified" mode like git uses might be beneficial, but it can be surprising, and there might be build rules that depend on the full permissions.

@Leonidas-from-XIV
Copy link
Collaborator

In such case storing and restoring the exact permissions should be 100% backwards compatible, while also solving the issue.

Is it worth adjusting the cache representation to allow to store the permissions? I don't know, it could be also some change that will not make a difference in real life.

@rgrinberg What's your take on this?

@ElectreAAS
Copy link
Collaborator Author

In such case storing and restoring the exact permissions should be 100% backwards compatible, while also solving the issue.

I'm working on a PR to do just that, as I think the naive approach of this PR may be too naive

@rgrinberg
Copy link
Member

Seems fine to me. CHANGES entry please.

@ElectreAAS
Copy link
Collaborator Author

ElectreAAS commented Mar 24, 2025

I've added a CHANGES entry.
I am however still unconvinced by this approach, it feels like a cop out.

I tried doing what is suggested above, you can see my progress [here].(https://github.com/ocaml/dune/compare/main...ElectreAAS:dune:digest-full-perm?expand=1)

but I've ran into a problem: storing the permissions of files means we need to compute (stat) them somewhere, and then carry them to where the store happens. This has meant changing Digest.t Targets.Produced.t into (Digest.t * Unix.file_perm) Targets.Produced.t in a million different places, which feels like I'm taking the wrong approach...

edit: I feel like I'm going somewhere with my other approach, stay tuned

@mefyl
Copy link
Collaborator

mefyl commented Mar 25, 2025

I'm not sure restoring the full permissions is desirable, as I explained in #11533 . Most notably, if dune cache still removes write permissions in hardlink mode the interaction will be weird. Using stat (2), one can also read the number of hardlink references to a file, and that definitely cannot be "restored". This, to my, implies that some file metadata is to be considered out of the scope of dune and that having rules depend on it is unreasonable.

I don't think "group" and "other" permissions belong in a cache or a versioning system since they are local to each machine, I'm not sure it's worth dealing with the weird u-r situation since then dune will not even be able to hash the file and things might not work anyway, and I think the dune cache will force u-w in hardlink mode anyway. I could be misguided, but I think it's worth double checking we actually want to restore these permissions before investing work in it.

Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
@ElectreAAS
Copy link
Collaborator Author

I'm not sure restoring the full permissions is desirable, as I explained in #11533.

Oh sorry I missed your message there. Even if it still feels wrong to change the behaviour, as pointed by emillon, I think you've convinced me, so let's go with this approach.
I've rebased on main, and added the changelog entry, and people seem to generally approve - I'll merge once CI is green

@ElectreAAS ElectreAAS merged commit 0442b11 into main Mar 25, 2025
24 of 25 checks passed
@ElectreAAS ElectreAAS deleted the digest-no-perm branch March 25, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
shared-cache Shared artefacts cache
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants