Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array comparison in core.Structure.merge_sites, also allow int property to be merged instead of float alone, mode only allow full name #4198

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Nov 25, 2024

Summary


Warning

Minor behaviour change (fix) in merge_sites property handling

Only allowing float to be averaged may not be a good idea, we should include int as well, otherwise properties={"prop1": 100} would lead to the site property being reset to None while properties={"prop1": 100.0} doesn't, which is super confusing

navs2 = Structure.from_spacegroup(160, lattice, species, coords, site_properties=site_props)
navs2.insert(0, "Na", coords[0], properties={"prop1": 100.0})
navs2.merge_sites(mode="a")

Also the docstring is clarified to explain this behaviour.


Warning

mode only allow "full name" instead of initials after an one-year deprecation period

I believe it's beneficial to allow only full name ("sum", "delete", "average") instead of checking first letter only:

  • more readable: mode="sum" instead of mode="s"
  • consistent with currently type annotation:
    def merge_sites(self, tol: float = 0.01, mode: Literal["sum", "delete", "average"] = "sum") -> Self:

@@ -4746,7 +4746,7 @@ def merge_sites(self, tol: float = 0.01, mode: Literal["sum", "delete", "average
offset = self[i].frac_coords - coords
coords += ((offset - np.round(offset)) / (n + 2)).astype(coords.dtype)
for key in props:
if props[key] is not None and self[i].properties[key] != props[key]:
Copy link
Contributor Author

@DanielYang59 DanielYang59 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO tag: Using array_equal may not be a good idea either as the property could be (sequence of) floats

TODO2: test failure

# Test that we can average the site properties that are floats
lattice = Lattice.hexagonal(3.587776, 19.622793)
species = ["Na", "V", "S", "S"]
coords = [
[0.333333, 0.666667, 0.165000],
[0, 0, 0.998333],
[0.333333, 0.666667, 0.399394],
[0.666667, 0.333333, 0.597273],
]
site_props = {"prop1": [3.0, 5.0, 7.0, 11.0]}
navs2 = Structure.from_spacegroup(160, lattice, species, coords, site_properties=site_props)
navs2.insert(0, "Na", coords[0], properties={"prop1": 100.0})
navs2.merge_sites(mode="a")
assert len(navs2) == 12
assert 51.5 in [itr.properties["prop1"] for itr in navs2]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property can actually be anything, including value supplied by user.

Copy link
Contributor Author

@DanielYang59 DanielYang59 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for stepping in, I'm here adding a note for myself to work on later.

Yes in this case using == for comparison may not be ideal when the value could be float or np.array, this is very similar to #4092 where we want to compare the equality of two dict whose value could be np.array

But I assume we would not be able to use np.allclose as the value could be non-numerical, so as far as I'm aware, using array_equal would be the best approach (though handling of float would be sub-optimal)

dist_mat = self.distance_matrix
# TODO: change the code the allow full name after 2025-12-01
# TODO2: add a test for mode value, currently it only checks if first letter is "s/a"
if mode.lower() not in {"sum", "delete", "average"} and mode.lower()[0] in {"s", "d", "a"}:
Copy link
Contributor Author

@DanielYang59 DanielYang59 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to hear more opinions on this, I guess it's beneficial to allow only full name ("sum", "delete", "average") instead of checking first letter only to make use of the IDE auto-complete feature and facilitate typing

Also using the full name would be more readable: mode="sum" instead of mode="s"

for n, i in enumerate(inds[1:]):
sp = self[i].species
if mode.lower()[0] == "s":
species += sp
offset = self[i].frac_coords - coords
coords += ((offset - np.round(offset)) / (n + 2)).astype(coords.dtype)
for key in props:
if props[key] is not None and self[i].properties[key] != props[key]:
if props[key] is not None and not np.array_equal(self[i].properties[key], props[key]):
if mode.lower()[0] == "a" and isinstance(props[key], float):
Copy link
Contributor Author

@DanielYang59 DanielYang59 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess only allowing float to be averaged may not be a good idea, we should include int as well, otherwise properties={"prop1": 100} would lead to the site property being reset to None while properties={"prop1": 100.0} doesn't, which is super confusing

navs2 = Structure.from_spacegroup(160, lattice, species, coords, site_properties=site_props)
navs2.insert(0, "Na", coords[0], properties={"prop1": 100.0})
navs2.merge_sites(mode="a")

Also the docstring would be clarified to explain this behaviour.

@DanielYang59 DanielYang59 changed the title Fix array comparison in core.Structure.merge_sites Fix array comparison in core.Structure.merge_sites, merge_sites also allow int property to be merged instead of float alone, mode only allow full name Nov 26, 2024
@DanielYang59 DanielYang59 changed the title Fix array comparison in core.Structure.merge_sites, merge_sites also allow int property to be merged instead of float alone, mode only allow full name Fix array comparison in core.Structure.merge_sites, also allow int property to be merged instead of float alone, mode only allow full name Nov 26, 2024
@DanielYang59 DanielYang59 marked this pull request as ready for review November 26, 2024 13:37
@@ -216,7 +216,7 @@ class SiteCollection(collections.abc.Sequence, ABC):
"""

# Tolerance in Angstrom for determining if sites are too close
DISTANCE_TOLERANCE = 0.5
DISTANCE_TOLERANCE: ClassVar[float] = 0.5
_properties: dict
Copy link
Contributor Author

@DanielYang59 DanielYang59 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEED CONFIRM on 091c911: Is _properties misplaced? an instance variable maybe?

class SiteCollection(collections.abc.Sequence, ABC):
"""Basic SiteCollection. Essentially a sequence of Sites or PeriodicSites.
This serves as a base class for Molecule (a collection of Site, i.e., no
periodicity) and Structure (a collection of PeriodicSites, i.e.,
periodicity). Not meant to be instantiated directly.
"""
# Tolerance in Angstrom for determining if sites are too close
DISTANCE_TOLERANCE = 0.5
_properties: dict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Structure.merge_sites fails when selective_dynamics is property present
2 participants