-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Populate schema #2
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good 👍 Try to address the changes I mentioned, and also maybe there are a few things to discuss based on my comments.
Also, please add tests to make sure that as we move forward everything is working as intended. Ideally the tests should be small and modular. You will see in the H5MD parser that I am still using heavier testing based on an example file. This is not ideal for inside the module, and I need to fix this, but this kind of more extensive/specific test could be implemented into a script that you have externally to the code repo, so that you can make sure that your test file is parsed as you expect as the code develops.
Just let me know when you are ready for another review.
pyproject.toml
Outdated
@@ -25,8 +25,11 @@ maintainers = [ | |||
{ name = "Bernadette Mohr", email = "[email protected]" }, | |||
] | |||
license = { file = "LICENSE" } | |||
dependencies = ["nomad-lab@git+https://github.com/nomad-coe/nomad.git@develop", | |||
"nomad-simulations>=0.0.3", "gsd>=3.3.1"] | |||
dependencies = [# "nomad-lab@git+https://github.com/nomad-coe/nomad.git@develop", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid pushing these type of local references to the repo.
Personally, I create a backup of the remote project.toml
called project.toml.remote
and then a backup of my local one project.toml.local
. I never track the changes of project.toml. When I am developing I replace project.toml
with local, and then replace it to remote when ready to push.
Perhaps there is a better way, you could ask Nathan or Esma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a default .gitignore
from the template that prevents all these egg files from being tracked and pushed to the repo. You should remove them.
.gitignore
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.pyenv
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# for ease of dev
pyproject.toml.local
pyproject.toml.remote
# mainfile_mime_re=, | ||
mainfile_name_re=r'^.*\.gsd$', | ||
aliases=['nomad-parser-gsd', 'gsd'], | ||
description='Entry point for the GSD parser. ', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason for condensing this description of what gsd is?
There used to be a variable called code_url
or something like that which allowed you to provide references to the simulation code. You should check if that is still valid
code_name='GSD', | ||
code_category='MD code', | ||
# mainfile_binary_header_re=b'0x65DF65DF65DF65DF', #! Seems to be wrong, didn't cause issues initially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the issue here? Can you remove this comment?
|
||
if TYPE_CHECKING: | ||
#! TODO: Why is TYPE_CHECKING False? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would continue cleaning up the code, and once you are done with resolving other issues, you can tag Alvin here to ask about this.
_program_info['gsd_author_email'] = None | ||
_program_info['gsd_creator_name'] = 'Sharon Glotzer' | ||
_program_info['gsd_creator_version'] = ( | ||
'https://glotzerlab.engin.umich.edu/hoomd-blue/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this should just be include in the parser info as a code_URL
I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe not this link, but the link to the gsd project. In principal the gsd file does not need to come from hoomd
except FileNotFoundError: | ||
self.logger.warning( | ||
'No additional logged data found, no user-defined data will be stored.' | ||
def get_molecules_from_bond_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move all these type of functions into the MDParser, since they will also be used by H5MD? Or am I missing something
# self.parse_system_hierarchy( | ||
# simulation.model_system[-1], topology, path_topology | ||
# ) | ||
self._system_time_map = {} # ? Is this required? What is ist used for? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to define the connections between configurations and observables, which may be calculated at different intervals (i.e., the length of the system list and output list is different).
Since you are not yet dealing with any outputs you don't need to worry about this. I think you can completely remove it and we decide later how to deal with such things...
@@ -378,14 +658,29 @@ def write_to_archive(self) -> None: | |||
name=self._program_dict['name'], | |||
version=self._program_dict['version'], | |||
) | |||
simulation.x_gsd_version = self._program_dict['gsd_version'] | |||
simulation.x_gsd_schema = Program( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, let's remove the x_gsd
stuff and simply extend Program()
from nomad-simulations if we need more functionality, e.g., schema_version
@@ -378,14 +658,29 @@ def write_to_archive(self) -> None: | |||
name=self._program_dict['name'], | |||
version=self._program_dict['version'], | |||
) | |||
simulation.x_gsd_version = self._program_dict['gsd_version'] | |||
simulation.x_gsd_schema = Program( | |||
name=self._program_dict.get('schema'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of object is schema
?
The basic parser functionalities work, but the parsing of CG representations still fails in nomad-simulations.