Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for NOMAD JSON? #223

Open
PaulWAyers opened this issue Mar 15, 2021 · 3 comments
Open

Support for NOMAD JSON? #223

PaulWAyers opened this issue Mar 15, 2021 · 3 comments

Comments

@PaulWAyers
Copy link
Member

We should consider supporting the NOMAD databases JSON format. There are several advantages to this format, mostly its native interoperability with the NOMAD database and the large number of parsers and Python utilities it makes available to us.

I had thought about making sure NOMAD supports QCSchema, and that would work. But we could also support the Nomad format. I'm not sure which is better. NOMAD may be better for storing wavefunction data.

You can see more about the types of fields in the Nomad Schema here:
https://nomad-lab.eu/prod/rae/test/gui/metainfo
https://nomad-lab.eu/prod/rae/docs/metainfo.html
https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-meta-info

Because JSON is big (and slow) Nomad uses an internal binary format called MessagePack:
https://msgpack.org/
Toon suggested that msgpack should also be considered for the QCSchema in IOData.

I think @wilhadams may be well-positioned to assess pros vs. cons of QCSchema vs NomadSchema.

@PaulWAyers
Copy link
Member Author

@wilhadams looked at this can came away impressed. We should definitely try to support reading/dumping NOMAD JSON. We might also look at helping NOMAD parse QCSchema, but it seems like NOMAD is far ahead of MolSSI on this one. It might be that the conversion (through IOData) of NOMAD to QCSchema is more helpful.

We should also investigate (perhaps the answer is obvious) whether a NOMAD-compatible JSON can be directly uploaded to/downloaded from the NOMAD database.

@PaulWAyers
Copy link
Member Author

PaulWAyers commented Mar 22, 2021

We should also probably think about supporting the the Materials Project
https://materialsproject.org/
Their pymatgen utility supports Gaussian and Vasp (among others) so should not be so hard for us to use. I didn't figure out what they are doing to store; it seems like an object but there is also a befuddle .json file there somewhere.
https://github.com/materialsproject/pymatgen
They mention that they are in the middle of a major refactor, so maybe it will be better soon. Right now NOMAD seems a lot better structured to me.

EDIT: It seems that Materials Project data is a subset of NOMAD data. So NOMAD should suffice.....

@tovrstra
Copy link
Member

I could not easily determine if pymatgen also has its own serialization like QCSchema or Nomad. It seems they mainly use existing formats and the REST API of materials project.

Nomad is indeed impressive. It seems extensive in principle but I'd have to try it to see how it works. Many of the entries in MetaInfo are not used yet in the database, so it is a bit difficult to see how it exactly works. Anyway, sure worth trying. It could be a good place to upload databases of QC results, which we use for benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants