-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistency of names of stored children / dataarrays with keys in wrapping object #9447
Comments
In my opinion, I would also note that from a UIX perspective having persistent references to DataTree objects that allow changing their name (or other data) inplace is much better than the alternative of rebuilding hierarchies of references for simple name changes or manually updating Frozen lists. I'm not sure if this perspective has been considered? If due to the design of the library the name property simply cannot be used this way, then I would still argue that some other method that allows renaming inplace is needed. |
We should definitely have a I don't think we should allow setting names on a child node, like child = root['child']
child.name = 'childish' As noted, it would also be inconsistent with the long standing behavior of This would be a relatively straightforward change, adding an error to the Alternatively, we might consider re-implementing DataTree more like Dataset, to re-create the DataTree objects corresponding to nodes whenever they are accessed. I wonder if this would break any critical use-cases? In practice it would look like automatically calling |
Just to clarify, as far as I understand |
What is your use-case for persistent DataTree references? Elsewhere in Xarray, we have removed most "inplace" methods, in favor of methods that return modified objects. |
I have a GUI interface to a DataTree hierarchy that currently stores the references to each node. A user clicks on a node and renames it in the GUI. At the moment, this requires either repopulating all the descendent node references in the GUI tree representation or manually editing Frozen children lists. This does not seem to me like a good way forward. The "inplace" method would solve this. Now that I have thought about this more and better understand the issue, I could potentially rewrite the GUI interface to use paths instead of references which would maybe be more inline with how you are thinking DataTree should be used. So maybe I can achieve what I want with a rename method that still returns a new object, will need to explore. That said, I probably won't be the last person to take the node reference approach above, for which an "inplace" method would solve the issue. |
You also might be able to use the upcoming copied_root = root.move(origin='/path/old_name', destinatiom='/path/new_name') would return a new tree root with everything the same except that your one node had been renamed. This usage would be like using |
The more I think about it, the more I am convinced that it will be possible to alter the GUI design so that all tree operations use a singular root reference and paths. In that case, a It may be worth bringing up in the docs that a tree of node references is really not the most ideal way to work with DataTree (if that is already described, my apologies), especially given that this is probably a common initial design for trees. In any case, I think it is important that the name property setter throws an understandable error message as that is a massive confusion. Cheers! |
This ensures that the tree cannot be in an inconsistent state. Fixes pydata#9447
What is your issue?
In xarray-contrib/datatree#309 @marcel-goldschen-ohm pointed out some counterintuitive behaviour of the
DataTree.name
setter.this looks fine, but
which means that
Okay, so we just fix it so that setting
.name
changes the key in the parent node right? But as @etienneschalk pointed out in xarray-contrib/datatree#309 (comment), that wouldn't be consistent withDataArray
andDataset
names behave: they don't update the key in the wrapping object either. Instead they just completely ignore that you asked to update it at all:This happens because
Dataset
objects do not actually storeDataArray
objects, they only storeVariables
(which are un-named), and reconstruct a newDataArray
upon__getitem__
calls. So if you use.name
to alter the name of that newly-constructedDataArray
you're not actually touching the wrappingDataset
at all, and therefore have no way to update its keys.Indeed the same thing happens (for the same reason) when altering the name of a Variable that is part of a
DataTree
node:Is this ignoring of the name setter intended, or a bug? And should
DataTree
follow the behaviour ofDataset
here or not? If it doesn't follow the behaviour ofDataset
then we're going to end up with a situation whereactually makes a change to
root
, butdoes not, so the behaviour of
.name
depends on whether or not your key corresponds to aDataTree
value or aVariable
value. 🫤In the
DataTree
case I think it is a lot easier to update the key in the wrapping object when.name
is set because there is a bi-directional linkage back to the parent node, which doesn't exist to link theDataArray
instance back toDataset
.cc @shoyer - I would appreciate your thoughts here.
The text was updated successfully, but these errors were encountered: