Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CanGenerateHashFromString is broken in JDK 9+ when string contains non-latin characters or +XX:-CompactStrings JVM flag is used #53

Open
seanrohead opened this issue Mar 19, 2021 · 7 comments

Comments

@seanrohead
Copy link
Contributor

seanrohead commented Mar 19, 2021

CanGenerateHashFromStringByteArray, which is used for JDK9+, assumes that the string is stored using the UTF-8 character encoding and that the length of the underlying byte[] is the same as the length of the string. This assumption only holds true if the string only contains characters from the ISO-8859-1/Latin-1 character set. If the string contains other characters, the string is stored in the underlying byte array as UTF-16 characters and the length of the byte array is 2x the number of characters in the string. Additionally, it is possible to disable this storage optimization using the +XX:-CompactStrings JVM flag in which case all strings are stored as UTF-16 characters. See here and here for more information.

seanrohead pushed a commit to seanrohead/bloom-filter-scala that referenced this issue Mar 19, 2021
… the length of the string because the length of the byte array can sometimes be 2x the length of the string, depending on which character encoding the string is stored with.
@seanrohead
Copy link
Contributor Author

I opened a pull request for this: https://github.com/alexandrnikitin/bloom-filter-scala/pull/54/files

@yarosman
Copy link

Have similar error but with CanGenerateHashFromString

Caused by: java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:27)
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:23)

@seanrohead
Copy link
Contributor Author

@yarosman Are you using the latest version of the library? That issue was fixed in 0.13.0.

@yarosman
Copy link

yarosman commented Mar 31, 2021

@yarosman Are you using the latest version of the library? That issue was fixed in 0.13.0.

@seanrohead We use 0.13.1

@seanrohead
Copy link
Contributor Author

@yarosman Are you loading the bloom filter using serialization by any chance?

@yarosman
Copy link

yarosman commented Apr 1, 2021

@seanrohead Yes, we do. And I found that we don't use predefined method writeTo/readTo therefore we serialize with CanGenerateHashFrom, which dependent from java.
Or you have another explanation or idea ?

@yufan022
Copy link

Have similar error but with CanGenerateHashFromString

Caused by: java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:27)
    at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:23)

Did you try use CanGenerateHashFromStringByteArray?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants