Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setup and add new schema #1

Merged
merged 16 commits into from
Oct 31, 2024
Merged

Conversation

JosePizarro3
Copy link
Collaborator

@JosePizarro3 JosePizarro3 commented Jul 24, 2024

@jkshenton Sathya (sorry, I cannot tag him as he is not part of this repo),

I've been working on some changes in this repo. Maybe you can drop me a review so I can merge and add this new plugin into NOMAD? You can also add it then as part of the NOMAD OASIS and use it in your CCP-NC database.

I still need to:

  • Add testing for the parser under tests/test_parser.py
  • Fix workflow recognition with CASTEP output files

Changes 1 - new structure

I've adopted the new structure for plugins from NOMAD. This means a lot of changes, but the most important are the ones in pyproject.toml and the new folder structure under src. Some extra stuff:

  • You can release this project in PyPI and maintain it with versioning in the way you want.
  • I might need to do minor changes from time to time to reflect changes in nomad.
  • I added instructions in the README.md, so you can test if these work or not for you.
  • I added a MkDocs page with a NOMAD style, that you can of course delete or restructure at will.

Changes 2 - New schema nomad-simulations

I've changed the parsing to the new schema, nomad-simulations. Some points here:

  • You are welcome to adopt or not this schema, as we already talked. If you need more quantities or a different way of parsing, I suggest you work from something up. The example here would be how I used Outputs from the nomad-simulations schema to extend it with NMR specific properties, and how I used PhysicalProperty from this package to define such NMR properties.
  • Now, I think the properties are much nicer parsed, as they contained references to the specific atomic states, more similar to what you have in the magres file.
  • In FAIRmat we are developing right now a taxonomy for materials properties, so anyways these properties might end up defined in the nomad or nomad-simulations repo, as some other users might be using these anyways.
  • As you can see, these properties have some functionalities to extract information (like the isotropic part of the MagneticShielding tensor), and these could be extended. I left it for you to decide whether you want to do this here or in some other repo you have.
  • Ideally, and to keep a nice collaboration, these properties could be perfectly defined in the nomad-simulations repo. I will be happy if you want to contribute there and update these definitions in there. Other users might need them and it connects nicely with the taxonomy effort I comment on above.

@jkshenton
Copy link
Member

Thanks a lot @JosePizarro3 - really appreciate your continued help with this! The restructuring looks fine to me. The new schema (atoms state thing) will be much easier for us to work with I suspect and in general makes a lot of sense.

In terms of contributing to nomad-simulations, we are absolutely keen. The magres format spec document was already some effort to defining a standard schema for computed magres data for the community. I would be very happy to see it adapted for nomad generally. I count >13 periodic electronic structure codes that can calculate magnetic shielding tensors (one of the key properties) so having a general schema that other parsers can work with makes sense to me.

I've manually added @Sathya-S3 now and I think he's in a better position to review the specific code changes having tested out the initial version.

@JosePizarro3 JosePizarro3 self-assigned this Jul 24, 2024
@JosePizarro3
Copy link
Collaborator Author

Great, thanks @jkshenton !

In terms of contributing to nomad-simulations, we are absolutely keen. The magres format spec document was already some effort to defining a standard schema for computed magres data for the community. I would be very happy to see it adapted for nomad generally. I count >13 periodic electronic structure codes that can calculate magnetic shielding tensors (one of the key properties) so having a general schema that other parsers can work with makes sense to me.

Indeed, the magres format is a very easy case where, whatever you decided for the properties should be incorporated directly, while for the symmetry open questions, we should work a bit more on it.

Copy link
Collaborator

@Sathya-S3 Sathya-S3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on presenting the magres parser in such a polished state, Jose. My main comments are related to understanding the workflow links between CASTEP and magres files, which I have highlighted with the relevant code. I'm happy to approve the pull request after we have had a chance to chat with you about the workflow linking and how we can help with further magres parser development (Quantum Espresso workflows etc.).

I have integrated the magres parser as guided by the README instructions with a NOMAD Oasis instance and have verified that the file parsing works flawlessly. The test was performed with an ethanol CASTEP-magres workflow files archive.

pyproject.toml Show resolved Hide resolved
"Documentation" = "https://nomad-coe.github.io/nomad-parser-magres/"
"Homepage" = "https://github.com/CCP-NC/nomad-parser-magres"
"Bug Tracker" = "https://github.com/CCP-NC/nomad-parser-magres/issues"
"Documentation" = "https://CCP-NC.github.io/nomad-parser-magres/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation page "https://CCP-NC.github.io/nomad-parser-magres/" does not exist at the time of review. I assume this will be auto-rectified when the changes are merged into "master branch". Will run a quick verification after merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we solved this last time we talk, the docu page has to be deployed in the settings of the repo. Check this: https://www.mkdocs.org/user-guide/deploying-your-docs/#github-pages

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkshenton for your attention

src/nomad_parser_magres/parsers/parser.py Show resolved Hide resolved
if outputs is not None:
simulation.outputs.append(outputs)

# Try to resolve the `entry_id` and `mainfile` of other entries in the upload to connect the magres entry with the CASTEP or QuantumESPRESSO entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the metadata section of file upload, there is a files section that lists all files included in the upload. Is this the only intended result? There is mention of linking the NMR workflow directly in the magres entry, could you please clarify how it is done? As per my previous comments, we see two separate file entries and workflows for the .castep and .magres files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are more to connect the castep with the magres entries

README.md Show resolved Hide resolved
@JosePizarro3
Copy link
Collaborator Author

@Sathya-S3 I will do a final clean up next week and push again. I think I can integrate your comments.

@JosePizarro3
Copy link
Collaborator Author

I totally forgot about this PR. @Sathya-S3 reminded me and asked me a couple of things, so the problem is that the workflow might have some issues. I can follow FAIRmat-NFDI/nomad-parser-wannier90#3 to fix it and this will be ready.

Sathya-S3 and others added 3 commits October 30, 2024 10:45
Added new CCPNCSimulation class

Fix dependencies

Added get_files utility function

Fix README
@JosePizarro3
Copy link
Collaborator Author

@Sathya-S3 @jkshenton

Ok, I finished with some changes. Note the changes in src/schema_packages/ccpnc_metada.py, src/schema_packages/package.py and the last lines in src/parsers/parser.py. What I did:

  • In ccpnc_metadata.py: fix some types (instead of tuples use JSON) and shapes (if the shape is empty, no need to define). Add 2 comments.
  • In package.py: add CCPNCSimulation at the end.
  • In parser.py: change import to use CCPNCSimulation instead of nomad_simulations.Simulation. Also, I added some lines at the end of the file to clarify how the parsing of the extra CCP-NC metadata could look like; if you decide to go with another option, that's fine for me as well :-)

If you both agree, we can merge this.

@jkshenton
Copy link
Member

Thanks so much again for your help with this, @JosePizarro3 ! I am happy to merge now and make any further changes in subsequent PRs.

@Sathya-S3 Sathya-S3 merged commit 831f3f0 into develop Oct 31, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants