-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactoring to include outputs parsing #80
base: main
Are you sure you want to change the base?
Conversation
6d45efa
to
35a667f
Compare
…s. Made many functions as public since they could be useful to users and outside of their context.
…onversions should take place. In all other files, ase and pymatgen should not be used.
…ibility regarding the number of output files for each type of computation, e.g. NEB.
Several small changes are made to the `PwOutput` class: 1. Use `pathlib` instead of `os.path` for path definitions. 2. `.from_dir` method: Only look for files from the specified directory using list comprehension. 3. Adapt approach for identifying the `stdout` file to only use the first 5 lines to look for the `Program PWSCF` string. 4. Remove the try-except loop that simply catches a general exception and raises a `ValueError` instead. This would obscure the actual error without an immediate benefit. 5. Define the `d_out_xxx` variables as empty dictionaries before parsing, and redefine them in case the data is parsed successfully. 6. Allow the user to pass a `Path` instead of a string to specify the directory to parse from.
Currently the `PwStdoutParser` nor its parent class `BaseOutputFileParser` parse any of the Quantum ESPRESSO `stdout`. Here we add the `BaseStdoutParser` class that has some very minimal parsing features for parsing the code name and version, as well as the walltime of the calculation. A utility function is also added to convert the Quantum ESPRESSO walltime output into seconds. The constructor of the `BaseOutputFileParser` is also adapted to remove the default to `None`. Since there is currently no way of providing the output content after construction, it should be provided at that time for the class to be useful.
The first schema used for the XML output of Quantum ESPRESSO `pw.x` was still missing. Here it is added, along with a set of tests for the XML parsing of default runs.
@gbrunin thanks again for the hard work in restructuring the package. I've added some basic Now I'm adding some basic documentation, also on the design decisions. We should consider these carefully, particularly the UX, since it's hard to change these afterwards. I don't care too much about breaking backwards compatibility if we can converge on a more user-friendly tool. |
Also quick shoutout to @elinscott and his package for QE inputs: https://github.com/elinscott/qe_input_prototype He's already indicated he'd be fine with us integrating that work into |
@gbrunin I've also added some basic documentation and design notes. I'm still not quite sure about some of the APIs. I do like the general design now though, I'm just also wondering if:
I've also made a little image to represent the design: |
…ec to utils. Renamed BaseStdoutParser.parse_stdout_base to parse for compatibility with parent abstract class.
This currently serves no real purpose, as the executable can be easily identified from the class itself. In case we do find a use case later, it can always be added again easily.
I have changed the structure of the code to better reflect the distinction between input and output parsers. Concerning the inputs, all the tests that were already there still pass, it's only a matter of adapting the namespace. For the outputs, I created objects such as
PwOutput
, that inherit from an abstractBaseOutput
, that for now can be instantiated with afrom_dir
method (afrom_files
method could/should be added as well). A user with a job that ran in a given directory could get the outputs easily using this classmethod. In thesefrom_dir
methods, specific XML and/or standard outputParsers
would be used to get the results. EachParser
would parse a single file, and the logic of parsing and extracting the outputs from the different codes would have to be implemented in eachfrom_dir
method. For instance, aNebOutput.from_dir
would parse the standard output of the global computation and probably the standard outputs and/or XML files for each image. The extracted outputs would be stored as a simple dictionary and these objects would not rely on any external package.Then, in
qe_tools.extractors
, ASE and pymatgen objects could be constructed (e.g., ase.Atoms/pymatgen.Structure, band structures,...), allowing each to be optional dependencies.This is only the base logic of the new structure and many things remain to be implemented. The idea now is to see what breaks with this in aiida-quantumespresso and how the parsing could be moved from there to here. Then, more will be added depending on the needs.