Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving 100 match for two different fuzzy hashes #1

Open
ross-spencer opened this issue Sep 24, 2016 · 1 comment
Open

Receiving 100 match for two different fuzzy hashes #1

ross-spencer opened this issue Sep 24, 2016 · 1 comment

Comments

@ross-spencer
Copy link

ross-spencer commented Sep 24, 2016

Attached are two files (twofiles.zip) I'm getting different strings for but 100 match.

Hashing file
1536:6ZmdmkLfq8/HRhOzv4lvxGyo2oDhUjYfJxIuPM9PvbmXS1aKMlv5ZagPuNKpwjj:PFLfL/xi0ShXXqPiX3KMlvbPt
Hashing file
1536:S/pXbPRCzY5dSdmkLfq8/HRhOzv4lvxGyo2oDhUjYfJxIuPM9PvbmXS1aKMlv5ZQ:PFLfL/xi0ShXXqPiX3KMlvbPt
MATCH: score = 100

I've been using a Golang wrapper so I checked in SSdeep 2.13 and to double-check the native result I hacked the sample.c program to hash and compare these two files, gist here as a demo:

https://gist.github.com/ross-spencer/ac0d5546a2511ad692aa4ff27abd9ba0

The files are publicly available via archway.govt.nz as part of the collections held by Archives NZ. And i've a handful of other culprits if you are keen on additional files for testing purposes too. Let me know if it's best to share the files, or if the hash strings are enough.

twofiles.zip

NB. Looks to be the same on the other branches as well.

@a4lg
Copy link

a4lg commented Dec 19, 2016

Hi, I'm Tsukasa OI, a maintainer of ssdeep. It appears it's not the bug.

If the block size is equal (1536 in this case), both first and second blocks are compared and higher score is taken. In this case, value of the second block (which represents rough representation of the file than the first one) is the same ("PFLfL/xi0ShXXqPiX3KMlvbPt") and it will return 100.

This is expected behavior (to prevent false negative I guess). Two "different" hashes can be matched perfectly (two files might be different, but can be considered very similar).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants