-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node name case sensitivity #57
Comments
Then do you want to add on the spec that stores must be able to handle case sensitive names ? In the spec it says |
Hi @Carreau, The subsection on storage keys describes how keys are constructed from node paths. It is then up to a store spec to decide how to use these keys and store data. For example, the file system store spec defines how to translate keys into file paths. I expect this will be a commonly-used store, and so the question about case (in)sensitivity mainly applies to this store. But it is not a fixed choice, alternative store specs could be proposed which takes a different approach. Hope that makes sense. |
Yes, thanks that what I understood, I just want to make sure decisions are not taken on implementations leaking. The fact that the directory store happens to be case (in)sensitive and how to do key retrieval can as far as I can tell be worked around pretty easily).
Now each of those have different tradeoff on readability/length of path, key enumeration... etc. Now the problem I see with requiring avoiding if possible, is that you may end up with an code implementation on one system that happened to rely on some case insensitivity of the underlying store and will be broken on another system. If the choice were mine, I would probably impose case sensitivity for a store spec. This is also not a complete unrelated discussion from unicode as casefolding for non-ascii and case folding it complex (it may even depends on user locale, i'm not sure) (https://unicode.org/faq/casemap_charprop.html have some crazy things.) from a performance perspective you may even be able to just do bytes comparison as well, which I'm guessing compiled languages would be happy(ier) about. |
Thanks @Carreau, very helpful.
FWIW I think this is the most natural thing to do. To put it into the terms of the v3 protocol spec, I think this would mean stating in the section on the store interface that we expect stores to perform a case sensitive comparison of keys for the What then should the file system store spec do? That spec defines a translation from storage keys to file paths that relies on the underlying file system being case sensitive, and so should not be used on a case insensitive file system. Maybe that's OK, and we just add a big red note saying "only use on case sensitive file systems". I guess we could always define another file system storage spec which is designed to work on case insensitive file systems, and so carries some of the extra overhead of longer and/or more obscure file paths. The two storage specs could live alongside each other, and people could choose to implement and use which they think best. Thoughts welcome. |
probably
I think that might be tough. And yes, I would go with a different file storage spec, and is also one of the reasons I opened the issue about detecting the kind of store independently of the zarr version in order to dispatch. Now I'm' pretty new to the project, I'm trying to give my best advice with what I believe is the direction of the project you are envisaging. I might be completely wrong. |
Jim Pivarski suggest we escape uppercase characters, and this would keep readability. |
The v3 core spec currently states that
which should resolve that at least on the spec level, right? Should this issue still be kept open? I guess it will become important once stores which are case insensitive are added, then escaping uppercase characters might be the proper solution. |
This requirement would preclude use of the filesystem store (without escaping) with normal local filesystems on macos and windows. Instead it should be defined by the store/implementation. |
See also #175, where I added notes about case insensitivity for Windows and MacOS. However, escaping might still be an option. Should that be added as an option to the current FileSystemStore? (There's no such thing as to specify configs for a store in the metadata atm, will open an issue about that soon). |
Escaping could be a solution, but it would essentially amount to a separate incompatible store, and would sacrifice readability of the keys. |
Good point, so it could also be added later as an additional store. IMO this is good enough to go forward with v3. |
PS: Just noticed that escaping might be done via a storage transformer. |
See also #201. |
@jstriebel: what was the status here? |
https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#node-names We basically leave it up to the underlying store. |
Currently the v3.0 core protocol draft section on node names states the following:
This constraint, however, is problematic in scenarios where nodes are being created in parallel. This is because requiring that sibling nodes have a unique name under case-insensitive comparison would require that a check is made before node creation to ensure no name collisions, and any such check would require synchronisation of node creation at least within the same group.
Currently a design goal for the spec is to avoid constraints which require operations to be synchronised, thus allowing multiple parallel processes to work simultaneously, including creating nodes within the same hierarchy. This avoids implementation complexity (no need for locks or other synchronisation mechanisms) and also works better on stores with eventual consistency behaviour.
I suggest we remove this constraint from the core protocol. I.e., implementations of the core protocol are not expected to check for case-insensitive node name collisions.
However, instead we add a usage note, which provides some guidance for implementing applications using a zarr protocol implementation, including recommending that applications avoid creating sibling nodes with case-insensitive name collisions wherever that is possible.
I.e., we push this to the application layer.
The text was updated successfully, but these errors were encountered: