Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: chemsys parameter not returning all entries in chemsys #918

Open
1 of 3 tasks
kavanase opened this issue Jul 5, 2024 · 7 comments
Open
1 of 3 tasks

[Bug]: chemsys parameter not returning all entries in chemsys #918

kavanase opened this issue Jul 5, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@kavanase
Copy link

kavanase commented Jul 5, 2024

Code snippet

# new API:
with MPRester("new_MP_API_key") as mpr:
    Cd_Te_PD_docs = mpr.materials.summary.search(chemsys="Cd-Te")
print(len(Cd_Te_PD_docs))  # should be 23, returns 11

What happened?

Related to the bug report in #859

Using the chemsys parameter in the search() method (specifically for mpr.materials.summary.search here but this affects all BaseRester.search() methods), not all entries in the specified chemical system are actually being returned.
Correct behaviour with legacy API:
image

But with the new API and setting chemsys = "Cd-Te", only Cd_xTe_y entries are returned, and not Cd or Te-only entries (which should also be returned):
image

Can test this for other material systems too, by just comparing the returned docs from summary.search(chemsys="X-Y") with the output of searching "X-Y" on the Materials Project website.

Version

v0.41.2

Which OS?

  • MacOS
  • Windows
  • Linux

Log output

No response

@kavanase kavanase added the bug Something isn't working label Jul 5, 2024
@kavanase
Copy link
Author

kavanase commented Jul 5, 2024

And also just to clarify, get_entries_in_chemsys for the new MPRester also works fine and as expected, it's just summary.search() that has the issue:
image

@kavanase
Copy link
Author

kavanase commented Jul 5, 2024

I've found the issue, it's that in get_entries_in_chemsys(), it uses this code to convert the input chemsys parameter to a format that works for the summary.search() methods:
https://github.com/materialsproject/api/blob/2c13d6a515972503b9b52447df6986974c5affd5/mp_api/client/mprester.py#L1179C9-L1187C60

if isinstance(elements, str):
        elements = elements.split("-")

elements_set = set(elements)  # remove duplicate elements

all_chemsyses = []
for i in range(len(elements_set)):
    for els in itertools.combinations(elements_set, i + 1):
        all_chemsyses.append("-".join(sorted(els)))

I think for consistency it would be best for this handling to be applied for all uses of chemsys, not just in this function? So the handling of chemsys across all the functions (and as used on the Materials Project website) is consistent?

@munrojm
Copy link
Member

munrojm commented Jul 11, 2024

The only issue is there needs to be a way to query for a single chemical system, and not just always query for what in this case turns out to be a list (e.g. ["Cd-Te", "Cd", "Te"]. Since the original rester class had the get_entries_in_chemsys method which queried for ALL entries in the parent chemical system as well as all of it's subsystems, we elected to keep it that way. We still will want to maintain the existing query functionality of summary.search. Perhaps this is an issue that can be fixed in the docs.

@kavanase
Copy link
Author

Ok!
Yeah then maybe just worth clarification in the docs/docstrings? Just because "chemical system" on the MP website and in the old methods meant everything in that chemical space (i.e. "Cd-Te" -> CdxTey where x/y are any combination of integers including zero) rather than the current where it means everything with that exact same combination of elements (but with variable stoichiometry); i.e. where x/y must be > 0

@mkhorton
Copy link
Member

Of course the responsibility lies with MP to make sure documentation is correct, but adding a note that the public docs accept PRs too if there is ever a quick change to suggest that might help provide clarification for other users: https://github.com/materialsproject/public-docs

I confess I would have also found this confusing. I suspect the underlying reason is because get_entries_in_chemsys is often used to quickly obtain necessary data to create a PhaseDiagram object, so is also useful to include subsystems.

@munrojm
Copy link
Member

munrojm commented Jul 11, 2024

Have changed the language of the docstring. Feel free to submit a PR if you feel it isn't enough @kavanase.

@kalvdans
Copy link

Have changed the language of the docstring.

For reference, this was 801c05c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants