-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix metadata parsing #120
Fix metadata parsing #120
Conversation
e0d12d8
to
1807eff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Will you add the new character-constraints to pyaerocom, too, to get a proper feedback if wrong project/experiment name is chosen?
I don't think I am introducing any new character constraints, but rather following the unwritten constraints as they currently exist. The pyaerocom splitting of filenames would already break if these constraints aren't followed. That being said, as part of #121 I support transparently encoding characters that aerovaldb requires for disambiguating metadata in file names which removes these character constraints moving forward. Maybe we can revisit the discussion on that PR? |
Change Summary
Due to some unfortunate choices of separation characters in file names, it is difficult to properly derive the metadata of a file from the file path alone in some instances. Here I implement an ugly hack to make the parsing work for old templates, while changing the template (and encoding scheme) to prevent this issue in future experiment runs.
The hack relies on some assumptions:
network
must not contain _ but can contain -region
can contain _This makes some changes which may break stuff, but I think it is necessary to be able to implement proper querying (see #117) which will allow us to remove a lot of the filename parsing in pyaerocom (which leads to breakage when aerovaldb changes storage location).
For future runs, a new template string will be used which consistently uses
_
as separation character in filenames. Should_
be included in an argument it will be encoded.For context, this goes together with the following PRs, but trying to break it into smaller chunks for easier review:
Related issue number
closes #119
Checklist