Node equality comparison #5187

mbercx · 2021-10-20T07:10:40Z

mbercx
Oct 20, 2021
Maintainer

In #1917,@borellim raised the issue that distinct instances of the same node are not equal when compared with ==. This sparked a long discussion on when nodes should be considered equal, that hasn't been resolved yet. Below I try to summarise the discussion and the current status.

Summary

The discussion basically boils down to which equivalence condition should hold (see this comment from @greschd):

(a) two nodes are the exact same Python object
(b) two nodes have the same UUID
(c) two nodes have the same content

During the discussion, it was already agreed that (b) should be implemented as a fallback, which was done by @ramirezfranciscof in #4753. What isn't clear yet is if (c) should hold for any node types, i.e. if we should ever compare base equality on the content of a node.

Current behavior

From the data types in AiiDA-core, AFAIK only Bool, Int, Float, Str, and List compare equality based on the content. So, basically all data types that correspond to Python built-in types (or "base types"), except for Dict. After the changes in #4753, all other node types compare equal when their UUID is the same.

Possible changes

Several options for considering equality by content were considered:

Equality comparison should only be based on UUID. Hence the content comparison for Bool, Int, Float, Str, and List should be removed.
Use the hashing mechanism. However, several issues with this were raised by @greschd here:
- hashes only take into account the immutable attributes of a node
- the whole rehashing story doesn't mesh well with this (would need to rehash everything when updating code that touches hashing)
- for float types, the hash is more akin to an "almost equal", due to wanting to avoid unstable hashes when going from in-memory to DB
Make the equality comparison between all base types consistent (also see this comment in Small changes to List and Dict data types API #4495). This means the only change is that Dict nodes now do compare equal by content.

Finally, there is also the question on whether we should consider other data types to compare equality by content, i.e. implement their own _eq_ method, or advise against this and stick with only the base types.

Pinging @aiidateam/plugin-developers for comments.

Answered by mbercx

Nov 3, 2021

After discussing this again in the team meeting, the consensus is to move forward with making the Python base types consistently compare by content/value (option 3). So, we'll change/add the __eq__ method for the Dict data type.

For other data types, we will not implement equality comparison by content for now, but also not discourage others from doing so.

View full answer

sphuber · 2021-10-20T15:07:48Z

sphuber
Oct 20, 2021
Maintainer

Thanks a lot for the summary @mbercx , very useful. Having read this again, I think I would definitely vote against option 2 as it simply would be too complex to come up with a consistent implementation that would behave intuitively in all cases for all people. I think it is fine to suggest that if users are interested whether the content of two data nodes are identical they define what measure of equality is fine. We can document that they can use the hash, together with the caveats that this comes which, such as only certain attributes being considered. I think this will be the most transparent.

Then, whether we choose option 1 or 3: I think there is something to be said for either of them and I would be fine with either of them. I think that ultimately we should just document this clearly. Removing the specific implementation of the __eq__ for the base types would make all nodes behave consistent, which would be an easy message to convey. Although, any data plugin is free to override __eq__ and so there is no way we can guarantee this, so maybe we should tell users that this is always an option. On the other hand, if we were to remove it and people need to compare equality for these, that would still be very easy as they can simply do if node_a.value == node_b.value which is not that bad instead of if node_a == node_b.

That being said, given that these base types were designed more or less to mirror the Python base built ins, there is something to say for them behaving the same in the comparison operator and so adding __eq__ for Dict makes sense, as two dicts with identical content also compare as equal.

So since both for me would be fine, maybe the final decision can be done based on backward compatibility: adopting either 1 or 3 could both break existing code, but I think the chances of breaking code by adopting 3 are less compared to adopting 1. So if we had to choose, maybe I would go for 3.

TLDR:

Option 2 definitely not because behavior would be unpredictable and inconsistent
Option 1 has highest consistency, but nothing stops data plugins from overriding __eq__ and breaking the consistency
Option 3 has a clear concept: base equality is based on UUID and data plugins can implement more specific logic in __eq__. The base types shipped with aiida-core implement this equality based on their value mirroring the behavior of the analog Python base types.
I opt for Option 3 as it is the most clear, consistent and intuitive, while also having the smallest impact on backwards compatibility

0 replies

giovannipizzi · 2021-10-21T08:19:31Z

giovannipizzi
Oct 21, 2021
Maintainer

I agree with all points of @sphuber's comment above: I would go with 3, document we only do it for our base types (listing explicitly what a base type is, to avoid that data types in AiiDA core like StructureData or SingleFileData are mistakenly imagined as "base" types).

The only think I don't know is whether we should leave fully open the option to compare also other types by value (and, then, maybe implement a few more, e.g. for SingleFileData); or discourage this behaviour (meaning that we cannot prevent it, but we clearly document that we prefer people don't do it).

I guess I would go with the first: we can leave this open, document that most nodes will only be equal if the UUID is the same, and (after fixing Dict) we can decide if we want to provide __eq__ for some other AiiDA-core nodes or not (not critical, we can decide to limit anyway to base nodes only, which is simple to explain).

0 replies

mbercx · 2021-11-03T10:04:57Z

mbercx
Nov 3, 2021
Maintainer Author

After discussing this again in the team meeting, the consensus is to move forward with making the Python base types consistently compare by content/value (option 3). So, we'll change/add the __eq__ method for the Dict data type.

For other data types, we will not implement equality comparison by content for now, but also not discourage others from doing so.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AiiDA team

Node equality comparison #5187

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

AiiDA team

Node equality comparison #5187

mbercx Oct 20, 2021 Maintainer

Summary

Current behavior

Possible changes

Replies: 3 comments

sphuber Oct 20, 2021 Maintainer

giovannipizzi Oct 21, 2021 Maintainer

mbercx Nov 3, 2021 Maintainer Author

mbercx
Oct 20, 2021
Maintainer

sphuber
Oct 20, 2021
Maintainer

giovannipizzi
Oct 21, 2021
Maintainer

mbercx
Nov 3, 2021
Maintainer Author