Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance jarmode tools to set timestamps of created directories during extraction for reproducible builds #44910

Open
jkuipers opened this issue Mar 26, 2025 · 11 comments
Labels
status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged

Comments

@jkuipers
Copy link

The Spring Boot jarmode tools are already smart enough to set the various timestamps of extracted entries to those of the jar entries (in ExtractCommand#extractEntry). However, the directories that are created for those entries in ExtractCommand#mkdirs always use the current timestamp.
As a result, when you perform the extraction and add the extracted layers to a Docker build twice, the two images are considered to be different even though they should be considered to be the same.

To help with creating fully reproducible builds, it would be great if those directories could be created with the timestamps of the files that causes them to be created, so that they're always the same. Using fixed timestamps for all created directories would achieve the same effect.

If you consider this as a good idea, I wouldn't mind to create a pull request for this: before putting in the effort I wanted to check first what you think of this idea.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 26, 2025
@jkuipers
Copy link
Author

Set_timestamps_of_created_directories.patch
Probably something like this would suffice

@wilkinsona
Copy link
Member

I think it makes sense to set the creation time. However, I wonder if we should try to use the attributes from the directory entries in the jar. Right now, directory entries are ignored. Instead, we could use them to create the necessary directories on disk with attributes that match those in the jar. WDYT?

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Mar 26, 2025
@jkuipers
Copy link
Author

jkuipers commented Mar 26, 2025

Yeah, that would work as well. It would require to ensure all directory entries are processed before any regular file entries are created, but that probably wouldn't be much of an issue. Or are parent directories separate entries? Then they'd need to be ordered according to depth.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 26, 2025
@jkuipers
Copy link
Author

Something like this?
Set_timestamps_of_created_directories.patch

@wilkinsona
Copy link
Member

That approach feels a little brittle to me. If I've read the code correctly, it relies on the entries being processed in a particular order. If that order changes, the attributes on the created directories may change.

The javadoc of java.util.jar.JarFile.entries() makes no mention of ordering while the javadoc of java.util.jar.JarFile.stream() states that it uses the central directory's ordering. So it looks like one option may be to switch to stream().

Another option would be to extract all the files, collecting the directory entries as we go. The directory entries could then be processed and the attributes set on each directory on disk as they should already exist by that point. I think this may be a slightly better approach as it should result in the attributes on disk being the same as those in the source jar.

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 28, 2025
@jkuipers
Copy link
Author

jkuipers commented Mar 28, 2025

My idea was to ensure that we create each directory individually, i.e. never with parent directories, so that we can respect the attributes as they exist in the jar (they could differ for parent directories, so you don't want to create multi-level paths).
I did not check if the stream would guarantee any particular ordering for that; instead I created my own, which guarantees that we create the higher-level directories before the lower-level ones and that we only extract the regular files once all their containing directories have been created. I believe that would in the end result in all directories having the same attributes as the jar entries.

Your idea to post-process all directories after extracting all files should have the same end result, so that should work fine as well. Given that the code allows for transformation of directories (I think it's used for the libraries), I believe that would require some additional bookkeeping that's not needed with my approach (because it could be non-trivial to map back to the corresponding directory entry in the jar for certain transformations so you'd need to store the mapping), but honestly I didn't investigate that closely.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 28, 2025
@wilkinsona
Copy link
Member

so that we can respect the attributes as they exist in the jar

As far as I can tell, it doesn't do that. When processing a file entry, it gets its creation time and then uses it to set the attributes on all of the directories needed for that file (the do-while loop in mkdirsWithCreationTime). As a result, isn't it ignoring the attributes of the directory entries in the jar? It's also sensitive to ordering as the attributes on any directories that are created may vary depending on the entry that triggered their creation.

@jkuipers
Copy link
Author

Ah, I think you're looking at the first patch that I posted. I added another one after you suggested to use the directory entry timestamp as it exists in the jar. That one never creates multiple directories with mkdirs but only creates a single one based on the jar entry and then updates its timestamps accordingly.

@wilkinsona
Copy link
Member

wilkinsona commented Mar 28, 2025

🤦 having failed to notice the new and old patches have the same name, that's exactly what I was doing. The new patch looks good to me and a pull request would be most welcome.

@jkuipers
Copy link
Author

Cool, I'll create something along the lines of that 2nd patch then. Will be a bit more work to do it properly with tests and everything, but I'll make it happen :)

@jkuipers
Copy link
Author

jkuipers commented Apr 1, 2025

We are still looking into our issue with Docker image diffs that seem to indicate that the directory timestamps are the only difference: on closer reading of the Dockers docs, it seems like these directory timestamps actually shouldn't be significant for the image hash, so it shouldn't be necessary to set the directory timestamps on extraction of the fat jar either. That would make this issue moot.

It's unclear still why we are seeing that diff: it could be caused by using an old Docker version for our build, so we're going to update our build server and see if that helps. Will post updates here as we learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

No branches or pull requests

3 participants