You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NX_CHAR:
The preferred string representation is UTF-8. Both fixed-length strings and variable-length strings are valid. String arrays cannot be used where only a string is expected (title, start_time, end_time, NX_class attribute,…). Fields or attributes requiring the use of string arrays will be clearly marked as such (like the NXdata attribute auxiliary_signals). This is the default field type.
At the nexus level we should decide if "NX_CHAR" is "sequence-of-char-as-8-byte-good-luck-with-encoding" a-la c or "sequence of unicode points" a-la strings in modern programming languages.
If it is the second then we should drop the sentence, If it is the first we should at least change the language to be "encoding" (rather than the representation), possible change to "when using hdf5 use the utf-8 enocding", or still consider dropping it.
I think we should go with the second option and assert that the details of the encoding are the business of the underlying file format not of nexus proper (any more than we would pull endianess up into nexus). In the case of xml, the whole file has an encoding (which should be at the top!) and hdf5 (and h5py) can also handle this:
from https://manual.nexusformat.org/nxdl-types.html#data-types-allowed-in-nxdl-specifications
At the nexus level we should decide if "NX_CHAR" is "sequence-of-char-as-8-byte-good-luck-with-encoding" a-la c or "sequence of unicode points" a-la strings in modern programming languages.
If it is the second then we should drop the sentence, If it is the first we should at least change the language to be "encoding" (rather than the representation), possible change to "when using hdf5 use the utf-8 enocding", or still consider dropping it.
For reference the h5py docs on strings: https://docs.h5py.org/en/stable/strings.html#strings and notes on encoding https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
I think we should go with the second option and assert that the details of the encoding are the business of the underlying file format not of nexus proper (any more than we would pull endianess up into nexus). In the case of xml, the whole file has an encoding (which should be at the top!) and hdf5 (and h5py) can also handle this:
which if we poke at the files gives:
This works because h5py does put the encoding in the hdf5 file:
and you can see it in
h5dump
which shows h5py is doing this using what I believe are standard hdf5 tools so I would expect this to be available to any language.
The text was updated successfully, but these errors were encountered: