-
Notifications
You must be signed in to change notification settings - Fork 175
Considerations about strings
J. Zebedee edited this page Oct 5, 2015
·
1 revision
This document describes the implementation of MsgPack-CLI design and implementation for it.
The de-facto standard interpretation of MessagePack specification is that a Unicode string should be encoded as UTF-8 without BOM and stored on Raw type.
MsgPack-CLI is implemented as following:
-
Packer
packsString
(or Char sequence) as UTF-8 bytes on Raw type. Note thatPacker
provides overloaded methods which acceptsSystem.Text.Encoding
to specify custom character encoding. -
Unpacker
andMessagePackObject
handles Raw type value asbyte[]
, and they provideReadString
orAsString
methods which handle character decoding from unpacked Raw type value. -
MessagePackSerializer<T>
uses above primitive APIs as following rules: - If target field or property is String type, then UTF-8 encoding will be used. If deserializing stream contains invalid byte sequence as UTF-8, an exception is thrown.
- If target field or property is
Byte[]
type, then raw bytes will be stored as is. - If you want to handle other encoding like Latin-1 string, Shift-JIS string etc., you must build custom serializer by hand.