-
Notifications
You must be signed in to change notification settings - Fork 41.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance jarmode tools to set timestamps of created directories during extraction for reproducible builds #44910
Comments
Set_timestamps_of_created_directories.patch |
I think it makes sense to set the creation time. However, I wonder if we should try to use the attributes from the directory entries in the jar. Right now, directory entries are ignored. Instead, we could use them to create the necessary directories on disk with attributes that match those in the jar. WDYT? |
Yeah, that would work as well. It would require to ensure all directory entries are processed before any regular file entries are created, but that probably wouldn't be much of an issue. Or are parent directories separate entries? Then they'd need to be ordered according to depth. |
Something like this? |
That approach feels a little brittle to me. If I've read the code correctly, it relies on the entries being processed in a particular order. If that order changes, the attributes on the created directories may change. The javadoc of Another option would be to extract all the files, collecting the directory entries as we go. The directory entries could then be processed and the attributes set on each directory on disk as they should already exist by that point. I think this may be a slightly better approach as it should result in the attributes on disk being the same as those in the source jar. |
My idea was to ensure that we create each directory individually, i.e. never with parent directories, so that we can respect the attributes as they exist in the jar (they could differ for parent directories, so you don't want to create multi-level paths). Your idea to post-process all directories after extracting all files should have the same end result, so that should work fine as well. Given that the code allows for transformation of directories (I think it's used for the libraries), I believe that would require some additional bookkeeping that's not needed with my approach (because it could be non-trivial to map back to the corresponding directory entry in the jar for certain transformations so you'd need to store the mapping), but honestly I didn't investigate that closely. |
As far as I can tell, it doesn't do that. When processing a file entry, it gets its creation time and then uses it to set the attributes on all of the directories needed for that file (the |
Ah, I think you're looking at the first patch that I posted. I added another one after you suggested to use the directory entry timestamp as it exists in the jar. That one never creates multiple directories with mkdirs but only creates a single one based on the jar entry and then updates its timestamps accordingly. |
🤦 having failed to notice the new and old patches have the same name, that's exactly what I was doing. The new patch looks good to me and a pull request would be most welcome. |
Cool, I'll create something along the lines of that 2nd patch then. Will be a bit more work to do it properly with tests and everything, but I'll make it happen :) |
We are still looking into our issue with Docker image diffs that seem to indicate that the directory timestamps are the only difference: on closer reading of the Dockers docs, it seems like these directory timestamps actually shouldn't be significant for the image hash, so it shouldn't be necessary to set the directory timestamps on extraction of the fat jar either. That would make this issue moot. It's unclear still why we are seeing that diff: it could be caused by using an old Docker version for our build, so we're going to update our build server and see if that helps. Will post updates here as we learn more. |
The Spring Boot jarmode tools are already smart enough to set the various timestamps of extracted entries to those of the jar entries (in
ExtractCommand#extractEntry
). However, the directories that are created for those entries inExtractCommand#mkdirs
always use the current timestamp.As a result, when you perform the extraction and add the extracted layers to a Docker build twice, the two images are considered to be different even though they should be considered to be the same.
To help with creating fully reproducible builds, it would be great if those directories could be created with the timestamps of the files that causes them to be created, so that they're always the same. Using fixed timestamps for all created directories would achieve the same effect.
If you consider this as a good idea, I wouldn't mind to create a pull request for this: before putting in the effort I wanted to check first what you think of this idea.
The text was updated successfully, but these errors were encountered: