Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Setting node name keeps tree linkage #310

Conversation

etienneschalk
Copy link
Contributor

Small bugfix. I added a test reproducing the example in #309, and tests are not broken

Note: In _post_attach, I set the private _name instead of setting the name. Indeed, it can lead to infinite recursions when a setter is used inside of a class. I assume the _post_attach method is like a "runtime assertion"

Comment on lines 604 to 607
parent = self._parent
self.orphan()
self._name = name
self._set_parent(parent, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this orphaning then un-orphaning logic, couldn't we just change the key in .children directly via some logic more like in _post_attach

Copy link
Contributor Author

@etienneschalk etienneschalk Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem immediate to me right now (I tried some things but not successfully until now)
I still need to remove then add again the node in the parent's children at some point 🤔

In all cases I get more messy code than the current solution

__delitem__ itself uses orphan under the hood for instance

Copy link
Contributor Author

@etienneschalk etienneschalk Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, I went for a .children = assignment. I ensured the order of children is preserved by re-creating an OrderedDict from a generator on the existing parent's children. Code is more verbose, but I think this is unavoidable to ensure the order is preserved. Indeed, orphaning then re-attaching would have lost the ordering information.

Note: _post_attach solely won't rename the node _name, so two renaming still occur:

  • One on the parent's children (preserving order)
  • One on the node _name itself

Edit: if the node had a parent, there is no need to manually set the ._name. Reassigning the updated list of children to its parents renames the node along the way

@etienneschalk
Copy link
Contributor Author

etienneschalk commented Feb 17, 2024

I remained conservative and forbid the renaming of a node to None if it has a parent

Edit: according to a comment in the datatree design document mentioned in pydata/xarray#8747, section 4) Are nodes named? in practice, only the root node would remain anonmyous. Hence it makes sense to only authorize a node without a parent (root) to be renamed to None?

See my comment #309 (comment)

Edit: build is failing because of the new change in printing sizes of DataArrays: pydata/xarray#8702

Whether build or dev-build pass, they are mutually exclusive. What is the one that should be favoured? Current version of xarray of next version?

@marcel-goldschen-ohm
Copy link

marcel-goldschen-ohm commented Mar 1, 2024

I see I got to this pretty late, so it may be solved already (if so, please ignore this). But I couldn't quite tell from this conversation if everything was working or not. If not, I have a solution which passes all pytests (except for a format error that was not passing prior to my changes) at https://github.com/marcel-goldschen-ohm/datatree/tree/main. Changes are in datatree.py in name.setter and _attach methods, and I added a test_namechange method to test_datatree.py. Again, if this is already solved, please ignore.

@marcel-goldschen-ohm
Copy link

marcel-goldschen-ohm commented Jun 18, 2024

Any news on this? It would be great if this could finally be incorporated into the PyPI distribution of datatree.

@marcel-goldschen-ohm
Copy link

I know everyone is busy, so I hate to harp on this. But I'm surprised that even after 7 months this very fundamental issue with what seems like should be a simple solution is still not resolved (as far as I can tell). I'm not sure what the status is or what the issue is regarding unsuccessful checks for @etienneschalk's solution. I also have a solution that works for me at https://github.com/marcel-goldschen-ohm/datatree/tree/main as previously mentioned. How can I help get this resolved?

@TomNicholas
Copy link
Member

Sorry @marcel-goldschen-ohm for letting this slip! I understand it's frustrating that we didn't just merge this and release.

For context we've been concentrating on trying to get datatree merged upstream into xarray (see pydata/xarray#8572 (comment)), which would allow us to completely deprecate and archive this repository. We were hoping to have that done in July. 😅 This process is necessary, but it's an awkward time because it's tricky to keep track of both repos at once, especially as now some bugs in one are fixed in the other...

We've made various small changes during that migration, including to whether or not copies of trees are created during operations, with the aim of making everything more consistent with the rest of xarray (see for example pydata/xarray#9297). We've also been trying to close issues here in favour of raising them upstream in xarray (because annoyingly we can't just auto-transfer all these issues to xarray upstream as it's in a different github org).


However I've been looking at this specific issue again and I don't think it's trivial. After @etienneschalk 's comment in #309 (comment) it is now genuinely not clear to me what the desired behaviour should be. I have raised an upstream issue on xarray to track that question of what the behaviour should be going forward (pydata/xarray#9447).

I am happy to merge this PR here and release another version if that would really help you, but my main concern has been ensuring that the upstream (non-prototype) version of datatree makes the right decisions going forward, not making sure all bugs are fixed in this version that's about to be deprecated.

@marcel-goldschen-ohm
Copy link

Regarding @etienneschalk 's comment in #309 (comment) , my personal opinion is that xds['a'].name = 'b' should behave the same as xda = xds['a']; xda.name = 'b'. Anything else is highly nonintuitive from an end user experience if Xarray is going to be widely adopted (my opinion). At the very least, it needs to throw an error unless the idea is to give everyone a headache, and provide a method for renaming in place. Of course, this is an Xarray issue. If the goal of DataTree is to match this behavior, then I would strongly suggest against it (I would also suggest it should be changed in Xarray). At the very least, there must be a method that allows renaming a DataTree instance without having to create a new instance. Without such a method, developing UIX apps to work with DataTrees can become much more painful as references to DataTree nodes tend to be shared and stored in the UIX, and thus a pain to have to recreate for a simple renaming operation.

All of that said, I hear you that you have a lot on your plate with the move to Xarray. Also, although the desired behavior is not unclear to me, I hear you that Xarray itself apparently has some pretty silly behavior in this regard, and I get wanting to be Xarray compatible although I disagree with it in this case. Now that I know what the issue is, I can find a way to work around it, so don't worry about this for my benefit. Hopefully in the future both Xarray and DataTree will be more intuitive in this regard. Best wishes on the merge with Xarray! And also a big thank you for DataTree in the first place, I couldn't agree more that a tree structure of Datasets is highly useful :)

@TomNicholas
Copy link
Member

Thanks @marcel-goldschen-ohm for your (very useful) perspective.

Now that I know what the issue is, I can find a way to work around it, so don't worry about this for my benefit.

In that case I'm closing this in favour of the discussion upstream in pydata/xarray#9447.

@TomNicholas TomNicholas closed this Sep 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setting node name breaks tree linkage
3 participants