-
Notifications
You must be signed in to change notification settings - Fork 35
UTF8 string serialization #15
Comments
I think that having support for offheap strings in the API is a great idea. I'm not sure about details of the implementation yet, but I'll update the issue once I have some more concrete thoughts on the topic. |
I agree that the conversion to |
Unsafe has memcopy, too bad it doesn't have memcompare... :( -Evan
|
JNI might be the answer here. Considering the fact that we don't need to copy any data over (as the data is already effectively allocated in C heap) we wouldn't have much performance overhead. Of course we need to benchmark to validate this. |
Hi Denys, With the jemalloc JNI binding, we can add utility functions as well to expose low level operations from or potentially SIMD instructions. I think for the latter case we might have to be careful as to chipset family for the target platforms. I can dig into some of the hotspot code from openjdk and check their implementation. For now I can put this work into a parallel branch while we flush out the jemalloc binding and just plan to include that in the JNI library that houses jemalloc. |
@arosenberger Please don't use GPL code bases as a reference. We use Scala license (3-clause BSD derivative) for our code and can only borrow implementation ideas from software with compatible license. Otherwise we might get in to legal trouble some day even if we don't borrow any code. (Note to self: this really needs to be documented somewhere.) |
@arosenberger I think that we need to concentrate on getting 0.1 out before we proceed with this. I'm afraid there are lots of corner cases in string support and it will take a while to get it right. |
Thanks for the heads up on the GPL. I'll focus on finishing up jemalloc and adding the ArrayOps methods from the other issues. We can revisit this one down the road. |
Strings form a large portion of many objects. Just storing a pointer to the on-heap String object is not a practical way to reduce GC pressure. Instead, how about having a UTF8-based string wrapper class that can offer support for basic operations:
other more complex methods can be delegated to the native Java/Scala string class by serializing to a string on-heap on demand, but the above would offer enough support for simple things like HTTP or JSON parsing.
The goal is to allow for basic fast string operations without the expensive conversion and object allocation to serialize UTF8-encoded strings to UTF16-native Java byte format.
The text was updated successfully, but these errors were encountered: