-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds multi format reader #250
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lukaspie, I started to work on a generic format reader. We should align here and come up with a common json config file notation and then we can do the parsing from the this reader instead directly in our plugins.
Pull Request Test Coverage Report for Build 7988023443Details
💛 - Coveralls |
This sounds close to what the JsonMapReader does. It already supports multiple formats. Maybe we can bring that in as well. |
Indeed, it is very similar. The main difference I see is that the json map reader uses a mapping file where we want to use a combination file. It is a json which either contains literal values or loads data from another source via However, the way I want to design this is that you can always change how certain formats are read, i.e., that you specify a dict of file-endings + read functions (actually, this is how the YamlJsonReader already works). That way the json map reader could also be implemented in this fashion as an implementation for reading |
I think we should get rid of pickling eventually. I'll need to change the NXiv_temp example. If we will have a generic reader that can be customized like this, then we should streamline it and direct users to one unified place for such functions. I'll have a thorough look later into how the YamlJsonReader works. I don't mind making big changes to bring the two together. The less clutter we have the less we need to work on and users need to figure out. |
Yes, this is also my main endeavour with this PR. You can check the hall or transmission reader to see an implementation using this (I think it's easier to grasp from there). Here is the actual reader for hall: class HallReader(YamlJsonReader):
"""HallReader implementation for the DataConverter
to convert Hall data to Nexus."""
supported_nxdls: List[str] = ["NXroot"]
extensions = {
".txt": lambda fname: parse_txt(fname, "iso-8859-1"),
".json": parse_json,
".yml": lambda fname: parse_yml(fname, None, None),
".yaml": lambda fname: parse_yml(fname, None, None),
} I think it's easy and straightforward to understand and all of the parsing is offloaded into the The current plan with this MultiFormatReader is the following:
|
0f7644f
to
adedc85
Compare
Pull Request Test Coverage Report for Build 8689694136Details
💛 - Coveralls |
FWIW, I have taken quite a different approach in the XPS reader, with different classes for the individual sub-readers for different formats and the selection based on the file format happening in a different script. So if we implement this, I will have to change the XPS reader quite a bit. I would be happy to do so if we get more aligned across the different readers and if the config/eln file parsing is unified. |
I think we might be able to support both. As far as I understood it from a first glance you keep a dict with a reader class. So we could use this in the multi format reader to instantiate the class and invoke its read function whenever we find a class instead of a function. I think this is also a nice feature, that you can either specify a simple read function or a complete reader. But I would anyways keep the config parsing as a separate function. So you could also just reuse this in your code. |
Yeah, it's not actually a "reader" that's called in the sense that it inherits from BaseReader 😄 I have a three-layer structure: there is the Now that I think about it, maybe all of those sub-classes could be readers themselvers, inheriting from our I additionally have the problem that the extension is often not unique, i.e., many vendors have a .txt export, but all the files are actually different. But this logic could probably be handled by passing a function in the extensions dict that does this. I am actually doing that already. And finally, I also have a check that the file comes from the list of supported vendors. |
7ef301c
to
44c4928
Compare
There is also an idea to bring a unified json config and mapping to the dataconverter itself. This is still under discussion but I'm leaving this here as a reminder. |
@sherjeelshabih and @lukaspie, I finished the first draft of the MultiFormatReader now. I did not test it yet but I think we can use this as a basis to discuss about the structure. The general mechanism is as detailed above, that we use the There are also two special functions now: What this reader supports is a config dict with some special syntax which is noted as, e.g.,
Here is the mpes example config file as an implementation of the config file syntax above. I also had a look at the json map reader and from my view it basically does a
|
print(f"WARNING: File {file_path} does not exist, ignoring entry.") | ||
continue | ||
|
||
template.update(self.extensions.get(extension, lambda _: {})(file_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative: I have a function that only updates if there is nothing overwritten that already exists in the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this. In my view it might be useful sometimes to allow overwriting, e.g., with the json dict. But I could also imagine giving the reader a flag to switch this behaviour and set one of the options as default.
What I don't understand: how are multiple entries handled? In the XPS reader, I need to always change the template key from Basically I need to iterate the config file and replace the key for each entry. I can do that inside the different functions, e.g. in |
Yes, this is a good comment. I also thought about this already but did not put it yet, because we just did not have this functionality in the mpes reader/config. |
This is how I currently have it. |
Eventually, bringing in this functionality that Rubel wants to implement anyway (see #255) would be very good as well. |
As a summary to our meeting yesterday, here's what I think would be nice to implement:
Feel free to extend if you have anything else. |
4083860
to
20f4182
Compare
Fixes #213
ToDo:
Will not be implemented here
Considerations