Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ValidationError 6 validation errors for MoleculeSummaryDoc builder_meta.build_date #922

Closed
1 of 3 tasks
funihang opened this issue Jul 12, 2024 · 14 comments
Closed
1 of 3 tasks
Labels
bug Something isn't working

Comments

@funihang
Copy link

Code snippet

from mp_api.client import MPRester

with MPRester(api_key) as mpr:
    docs = mpr.molecules.summary.search()

What happened?

I am attempting to get molecules from API using the attached code. It can work at first, but when the progress bar reaches 155361/221598, an error will be reported. Could you please check it?

Version

mp-api 0.41.2

Which OS?

  • MacOS
  • Windows
  • Linux

Log output

ValidationError: 6 validation errors for MoleculeSummaryDoc
builder_meta.build_date
  Value error, Invalid isoformat string: '2023-11-07T22:35:04.718Z' [type=value_error, input_value={'$date': '2023-11-07T22:35:04.718Z'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error
last_updated
  Input should be a valid datetime [type=datetime_type, input_value={'$date': '2023-11-07T22:35:04.718Z'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/datetime_type
origins.0.last_updated
  Value error, Invalid isoformat string: '2020-11-11T12:51:27.833Z' [type=value_error, input_value={'$date': '2020-11-11T12:51:27.833Z'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error
origins.1.last_updated
  Value error, Invalid isoformat string: '2023-08-03T18:52:51.206Z' [type=value_error, input_value={'$date': '2023-08-03T18:52:51.206Z'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error
origins.2.last_updated
  Value error, Invalid isoformat string: '2020-11-11T12:51:27.833Z' [type=value_error, input_value={'$date': '2020-11-11T12:51:27.833Z'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error
has_props
  Input should be a valid list [type=list_type, input_value={'materials': True, 'ther...lse, 'substrates': True}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/list_type
@funihang funihang added the bug Something isn't working label Jul 12, 2024
@munrojm
Copy link
Member

munrojm commented Jul 18, 2024

I will investigate this. For now, you can pass use_document_model=False to MPRester to fix the issue. Note the data will. be returned as dictionaries instead of MPDataDoc objects.

@funihang
Copy link
Author

I will investigate this. For now, you can pass use_document_model=False to MPRester to fix the issue. Note the data will. be returned as dictionaries instead of MPDataDoc objects.

Thanks for your reply. I reinstalled my conda env and downgraded python version to 3.8. It works now, although I'm unsure which step resolved the issue.

@kavanase
Copy link

Just to bump this, I'm also running into the same issue. Setting fields=["structure"] (all I need in this case) avoids it.

@tschaume
Copy link
Member

Thanks for bumping this. We'll need to take a second look at it. As @munrojm mentioned, in the meantime, you can use the use_document_model=False argument to MPRester() to disable validation and return the data as simple dictionaries.

@tschaume
Copy link
Member

tschaume commented Dec 3, 2024

@kavanase I just released mp-api v0.44.0rc0 with a (temporary) fix for this. Could you give it a try? Thanks! Note that we've fixed the data behind the scenes already and this error shouldn't show up when you include a query (e.g. on nelements). Only full downloads without an explicit query are affected when using previous versions of the client.

@kalvdans
Copy link

kalvdans commented Dec 3, 2024

@tschaume can you link to the PR that fixed the issue?

@tschaume
Copy link
Member

tschaume commented Dec 3, 2024

@kalvdans it would be PR #947 but I'd consider it a temporary fix since I simply disabled the rerouting of full download requests from our MongoDB to our OpenData repositories. We're working on sync'ing the data in the OpenData repo and should be able to revert #947 with the upcoming data release. HTH.

@kavanase
Copy link

kavanase commented Dec 4, 2024

Hi @tschaume, I just checked with the new release, but I'm still getting a ValidationError:
image

The workaround is fine for my use case btw, but I guess would be nice for this simpler call to work without issue too.

Full traceback:

ValidationError                           Traceback (most recent call last)
Cell In[1], line 4
      1 from pymatgen.ext.matproj import MPRester
      3 with MPRester() as mpr:
----> 4     docs = mpr.molecules.summary.search()

File ~/Packages/api/mp_api/client/routes/molecules/summary.py:131, in MoleculesSummaryRester.search(self, charge, spin_multiplicity, nelements, chemsys, deprecated, elements, exclude_elements, formula, has_props, molecule_ids, num_chunks, chunk_size, all_fields, fields)
    123     query_params.update({"has_props": ",".join([i.value for i in has_props])})
    125 query_params = {
    126     entry: query_params[entry]
    127     for entry in query_params
    128     if query_params[entry] is not None
    129 }
--> 131 return super()._search(
    132     num_chunks=num_chunks,
    133     chunk_size=chunk_size,
    134     all_fields=all_fields,
    135     fields=fields,
    136     **query_params,
    137 )

File ~/Packages/api/mp_api/client/core/client.py:1182, in BaseRester._search(self, num_chunks, chunk_size, all_fields, fields, **kwargs)
   1160 """A generic search method to retrieve documents matching specific parameters.
   1161 
   1162 Arguments:
   (...)
   1177     A list of documents.
   1178 """
   1179 # This method should be customized for each end point to give more user friendly,
   1180 # documented kwargs.
-> 1182 return self._get_all_documents(
   1183     kwargs,
   1184     all_fields=all_fields,
   1185     fields=fields,
   1186     chunk_size=chunk_size,
   1187     num_chunks=num_chunks,
   1188 )

File ~/Packages/api/mp_api/client/core/client.py:1255, in BaseRester._get_all_documents(self, query_params, all_fields, fields, chunk_size, num_chunks)
   1241 list_entries = sorted(
   1242     (
   1243         (key, len(entry.split(",")))
   (...)
   1250     reverse=True,
   1251 )
   1253 chosen_param = list_entries[0][0] if len(list_entries) > 0 else None
-> 1255 results = self._query_resource(
   1256     query_params,
   1257     fields=fields,
   1258     parallel_param=chosen_param,
   1259     chunk_size=chunk_size,
   1260     num_chunks=num_chunks,
   1261 )
   1263 return results["data"]

File ~/Packages/api/mp_api/client/core/client.py:569, in BaseRester._query_resource(self, criteria, fields, suburl, use_document_model, parallel_param, num_chunks, chunk_size, timeout)
    567         data["meta"]["total_doc"] = len(data["data"])
    568     else:
--> 569         data = self._submit_requests(
    570             url=url,
    571             criteria=criteria,
    572             use_document_model=not query_s3 and use_document_model,
    573             parallel_param=parallel_param,
    574             num_chunks=num_chunks,
    575             chunk_size=chunk_size,
    576             timeout=timeout,
    577         )
    578     return data
    580 except RequestException as ex:

File ~/Packages/api/mp_api/client/core/client.py:716, in BaseRester._submit_requests(self, url, criteria, use_document_model, chunk_size, parallel_param, num_chunks, timeout)
    703 remaining_docs_avail = {}
    705 initial_params_list = [
    706     {
    707         "url": url,
   (...)
    713     for crit in new_criteria
    714 ]
--> 716 initial_data_tuples = self._multi_thread(
    717     self._submit_request_and_process, initial_params_list
    718 )
    720 for data, subtotal, crit_ind in initial_data_tuples:
    721     subtotals.append(subtotal)

File ~/Packages/api/mp_api/client/core/client.py:938, in BaseRester._multi_thread(self, func, params_list, progress_bar)
    935 finished, futures = wait(futures, return_when=FIRST_COMPLETED)
    937 for future in finished:
--> 938     data, subtotal = future.result()
    940     if progress_bar is not None:
    941         if isinstance(data, dict):

File ~/miniconda3/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
    449     raise CancelledError()
    450 elif self._state == FINISHED:
--> 451     return self.__get_result()
    453 self._condition.wait(timeout)
    455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/miniconda3/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
    401 if self._exception:
    402     try:
--> 403         raise self._exception
    404     finally:
    405         # Break a reference cycle with the exception in self._exception
    406         self = None

File ~/miniconda3/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
     55     return
     57 try:
---> 58     result = self.fn(*self.args, **self.kwargs)
     59 except BaseException as exc:
     60     self.future.set_exception(exc)

File ~/Packages/api/mp_api/client/core/client.py:1010, in BaseRester._submit_request_and_process(self, url, verify, params, use_document_model, timeout)
   1007 # other sub-urls may use different document models
   1008 # the client does not handle this in a particularly smart way currently
   1009 if self.document_model and use_document_model:
-> 1010     data["data"] = self._convert_to_model(data["data"])
   1012 meta_total_doc_num = data.get("meta", {}).get("total_doc", 1)
   1014 return data, meta_total_doc_num

File ~/Packages/api/mp_api/client/core/client.py:1046, in BaseRester._convert_to_model(self, data)
   1036 def _convert_to_model(self, data: list[dict]):
   1037     """Converts dictionary documents to instantiated MPDataDoc objects.
   1038 
   1039     Args:
   (...)
   1044 
   1045     """
-> 1046     raw_doc_list = [self.document_model.model_validate(d) for d in data]  # type: ignore
   1048     if len(raw_doc_list) > 0:
   1049         data_model, set_fields, _ = self._generate_returned_model(raw_doc_list[0])

File ~/Packages/api/mp_api/client/core/client.py:1046, in <listcomp>(.0)
   1036 def _convert_to_model(self, data: list[dict]):
   1037     """Converts dictionary documents to instantiated MPDataDoc objects.
   1038 
   1039     Args:
   (...)
   1044 
   1045     """
-> 1046     raw_doc_list = [self.document_model.model_validate(d) for d in data]  # type: ignore
   1048     if len(raw_doc_list) > 0:
   1049         data_model, set_fields, _ = self._generate_returned_model(raw_doc_list[0])

File ~/miniconda3/lib/python3.10/site-packages/pydantic/main.py:509, in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    507 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    508 __tracebackhide__ = True
--> 509 return cls.__pydantic_validator__.validate_python(
    510     obj, strict=strict, from_attributes=from_attributes, context=context
    511 )

ValidationError: 7 validation errors for MoleculeSummaryDoc
partial_charges.NONE.resp
  Input should be a valid list [type=list_type, input_value={'property_id': '90e05663...9, 0.532428, -0.265717]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_charges.NONE.mulliken
  Input should be a valid list [type=list_type, input_value={'property_id': '26f3d39d...51, 0.01468, -0.094106]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_charges.SOLVENT=WATER.resp
  Input should be a valid list [type=list_type, input_value={'property_id': 'b684db14...2, -0.244941, 0.444549]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_charges.SOLVENT=WATER.mulliken
  Input should be a valid list [type=list_type, input_value={'property_id': 'c614aa0c...53, -0.740996, 0.28759]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_spins.NONE.mulliken
  Input should be a valid list [type=list_type, input_value={'property_id': '0bd15a66...86, 0.154226, 0.355101]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_spins.SOLVENT=WATER.mulliken
  Input should be a valid list [type=list_type, input_value={'property_id': '9d206e72...2941, 0.90596, 2.8e-05]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type
has_props
  Input should be a valid list [type=list_type, input_value={'molecules': True, 'bond...True, 'vibration': True}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/list_type

@tschaume
Copy link
Member

tschaume commented Dec 4, 2024

Try using the officially supported mp_api client:

from mp_api.client import MPRester

with MPRester() as mpr:
    docs = mpr.molecules.summary.search(nelements=7)

We generally recommend not mixing the pymatgen and mp_api clients.

@kavanase
Copy link

kavanase commented Dec 4, 2024

Hi @tschaume,
I may be doing something wrong, but using that code I still get a ValidationError:
image

@tschaume
Copy link
Member

tschaume commented Dec 4, 2024

Which emmet-core version are you running? You might have to upgrade emmet-core since it contains the model definitions used for validation.

@kavanase
Copy link

kavanase commented Dec 5, 2024

I was using emmet-core-0.84.2rc4, now have tried with the latest PyPI release (emmet-core-0.84.2), and still getting the same error

@tschaume
Copy link
Member

tschaume commented Dec 5, 2024

Thanks for checking @kavanase! We're in the middle of preparing a new data release that will hopefully fix it. We'll get back to you.

@tschaume tschaume closed this as completed Dec 5, 2024
@tschaume tschaume reopened this Dec 5, 2024
@tschaume
Copy link
Member

@kalvdans @kavanase The new data release v2024.11.14 is out and the mp-api library updated accordingly. Upgrading to mp-api==0.44.0 should fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants