Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate schema #2

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from
Open

Populate schema #2

wants to merge 23 commits into from

Conversation

Bernadette-Mohr
Copy link
Collaborator

The basic parser functionalities work, but the parsing of CG representations still fails in nomad-simulations.

Copy link
Collaborator

@JFRudzinski JFRudzinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good 👍 Try to address the changes I mentioned, and also maybe there are a few things to discuss based on my comments.

Also, please add tests to make sure that as we move forward everything is working as intended. Ideally the tests should be small and modular. You will see in the H5MD parser that I am still using heavier testing based on an example file. This is not ideal for inside the module, and I need to fix this, but this kind of more extensive/specific test could be implemented into a script that you have externally to the code repo, so that you can make sure that your test file is parsed as you expect as the code develops.

Just let me know when you are ready for another review.

pyproject.toml Outdated
@@ -25,8 +25,11 @@ maintainers = [
{ name = "Bernadette Mohr", email = "[email protected]" },
]
license = { file = "LICENSE" }
dependencies = ["nomad-lab@git+https://github.com/nomad-coe/nomad.git@develop",
"nomad-simulations>=0.0.3", "gsd>=3.3.1"]
dependencies = [# "nomad-lab@git+https://github.com/nomad-coe/nomad.git@develop",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid pushing these type of local references to the repo.

Personally, I create a backup of the remote project.toml called project.toml.remote and then a backup of my local one project.toml.local. I never track the changes of project.toml. When I am developing I replace project.toml with local, and then replace it to remote when ready to push.

Perhaps there is a better way, you could ask Nathan or Esma.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a default .gitignore from the template that prevents all these egg files from being tracked and pushed to the repo. You should remove them.

.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.pyenv

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# for ease of dev
pyproject.toml.local
pyproject.toml.remote

# mainfile_mime_re=,
mainfile_name_re=r'^.*\.gsd$',
aliases=['nomad-parser-gsd', 'gsd'],
description='Entry point for the GSD parser. ',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason for condensing this description of what gsd is?

There used to be a variable called code_url or something like that which allowed you to provide references to the simulation code. You should check if that is still valid

code_name='GSD',
code_category='MD code',
# mainfile_binary_header_re=b'0x65DF65DF65DF65DF', #! Seems to be wrong, didn't cause issues initially.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the issue here? Can you remove this comment?


if TYPE_CHECKING:
#! TODO: Why is TYPE_CHECKING False?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would continue cleaning up the code, and once you are done with resolving other issues, you can tag Alvin here to ask about this.

_program_info['gsd_author_email'] = None
_program_info['gsd_creator_name'] = 'Sharon Glotzer'
_program_info['gsd_creator_version'] = (
'https://glotzerlab.engin.umich.edu/hoomd-blue/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this should just be include in the parser info as a code_URL I think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe not this link, but the link to the gsd project. In principal the gsd file does not need to come from hoomd

except FileNotFoundError:
self.logger.warning(
'No additional logged data found, no user-defined data will be stored.'
def get_molecules_from_bond_list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move all these type of functions into the MDParser, since they will also be used by H5MD? Or am I missing something

# self.parse_system_hierarchy(
# simulation.model_system[-1], topology, path_topology
# )
self._system_time_map = {} # ? Is this required? What is ist used for?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to define the connections between configurations and observables, which may be calculated at different intervals (i.e., the length of the system list and output list is different).

Since you are not yet dealing with any outputs you don't need to worry about this. I think you can completely remove it and we decide later how to deal with such things...

@@ -378,14 +658,29 @@ def write_to_archive(self) -> None:
name=self._program_dict['name'],
version=self._program_dict['version'],
)
simulation.x_gsd_version = self._program_dict['gsd_version']
simulation.x_gsd_schema = Program(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, let's remove the x_gsd stuff and simply extend Program() from nomad-simulations if we need more functionality, e.g., schema_version

@@ -378,14 +658,29 @@ def write_to_archive(self) -> None:
name=self._program_dict['name'],
version=self._program_dict['version'],
)
simulation.x_gsd_version = self._program_dict['gsd_version']
simulation.x_gsd_schema = Program(
name=self._program_dict.get('schema'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of object is schema?

@Bernadette-Mohr Bernadette-Mohr linked an issue Dec 10, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tasks retained from previous merge
2 participants