-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CanGenerateHashFromString is broken in JDK 9+ when string contains non-latin characters or +XX:-CompactStrings JVM flag is used #53
Comments
… the length of the string because the length of the byte array can sometimes be 2x the length of the string, depending on which character encoding the string is stored with.
I opened a pull request for this: https://github.com/alexandrnikitin/bloom-filter-scala/pull/54/files |
Have similar error but with Caused by: java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:27)
at bloomfilter.CanGenerateHashFrom$CanGenerateHashFromString$.generateHash(CanGenerateHashFrom.scala:23) |
@yarosman Are you using the latest version of the library? That issue was fixed in 0.13.0. |
@seanrohead We use 0.13.1 |
@yarosman Are you loading the bloom filter using serialization by any chance? |
@seanrohead Yes, we do. And I found that we don't use predefined method writeTo/readTo therefore we serialize with CanGenerateHashFrom, which dependent from java. |
Did you try use |
CanGenerateHashFromStringByteArray, which is used for JDK9+, assumes that the string is stored using the UTF-8 character encoding and that the length of the underlying byte[] is the same as the length of the string. This assumption only holds true if the string only contains characters from the ISO-8859-1/Latin-1 character set. If the string contains other characters, the string is stored in the underlying byte array as UTF-16 characters and the length of the byte array is 2x the number of characters in the string. Additionally, it is possible to disable this storage optimization using the +XX:-CompactStrings JVM flag in which case all strings are stored as UTF-16 characters. See here and here for more information.
The text was updated successfully, but these errors were encountered: