Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new source hashing methods: content_md5, content_sha1, content_sha256 #5277

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Apr 12, 2024

Description

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@jaimergp jaimergp requested a review from a team as a code owner April 12, 2024 15:41
@jaimergp jaimergp marked this pull request as draft April 12, 2024 15:41
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 12, 2024
Copy link

codspeed-hq bot commented Apr 12, 2024

CodSpeed Performance Report

Merging #5277 will not alter performance

Comparing jaimergp:content-hash (73a23ae) with main (03f99c1)

Summary

✅ 5 untouched benchmarks

@wolfv
Copy link
Contributor

wolfv commented Apr 15, 2024

I think this is cool. It would also work nicely with the new proposal for "rendered recipes" (conda/ceps#74).

On that note - should we continue adding features to conda-build without any standardization (e.g. CEP) process?

@jaimergp
Copy link
Contributor Author

should we continue adding features to conda-build without any standardization (e.g. CEP) process?

I'm planning to submit a CEP. I opened this draft to explore what kind of things are needed for a stable yet robust logic, cross platform. Things like permissions and so on don't translate well to Windows.

@wolfv
Copy link
Contributor

wolfv commented Apr 15, 2024

Awesome. Yeah, I also recently looked at a few content hash implementations in Rust but didn't find anything super convincing yet. There are a bunch though (https://crates.io/search?q=content%20hash)

@jaimergp
Copy link
Contributor Author

So far the scheme I followed looks a lot like https://github.com/DrSLDR/dasher?tab=readme-ov-file#hashing-scheme. Things to standardize would be how the tree is sorted, the normalization of the path, the separators (to prevent this), and the allowed algorithms.

I've seen a few merkle tree based packages but we don't need all the proof stuff, or leaf querying; just comparing the root hash.

Maybe it could be implemented in a recursive way that doesn't involve obtaining the whole file tree beforehand if that increases performance or simplifies implementation elsewhere. IMO this feels like one of those CEPs that does require prototyping first to see which things have to be standardized.

@jaimergp
Copy link
Contributor Author

pre-commit.ci autofix

@jaimergp jaimergp changed the title add content_sha256 hash checks Add new source hashing methods: content_md5, content_sha1, content_sha256 Nov 20, 2024
@@ -2045,10 +2045,20 @@ def compute_content_hash(
hasher.update(b"D")
elif path.is_file():
hasher.update(b"F")
# We need to normalize line endings for Windows-Unix compat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

Better hashing of sources
4 participants