Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactoring to include outputs parsing #80

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

gbrunin
Copy link

@gbrunin gbrunin commented Sep 9, 2024

I have changed the structure of the code to better reflect the distinction between input and output parsers. Concerning the inputs, all the tests that were already there still pass, it's only a matter of adapting the namespace. For the outputs, I created objects such as PwOutput, that inherit from an abstract BaseOutput, that for now can be instantiated with a from_dir method (a from_files method could/should be added as well). A user with a job that ran in a given directory could get the outputs easily using this classmethod. In these from_dir methods, specific XML and/or standard output Parsers would be used to get the results. Each Parser would parse a single file, and the logic of parsing and extracting the outputs from the different codes would have to be implemented in each from_dir method. For instance, a NebOutput.from_dir would parse the standard output of the global computation and probably the standard outputs and/or XML files for each image. The extracted outputs would be stored as a simple dictionary and these objects would not rely on any external package.
Then, in qe_tools.extractors, ASE and pymatgen objects could be constructed (e.g., ase.Atoms/pymatgen.Structure, band structures,...), allowing each to be optional dependencies.

This is only the base logic of the new structure and many things remain to be implemented. The idea now is to see what breaks with this in aiida-quantumespresso and how the parsing could be moved from there to here. Then, more will be added depending on the needs.

@mbercx mbercx force-pushed the develop branch 3 times, most recently from 6d45efa to 35a667f Compare November 21, 2024 12:41
mbercx and others added 12 commits November 21, 2024 14:09
…s. Made many functions as public since they could be useful to users and outside of their context.
…onversions should take place. In all other files, ase and pymatgen should not be used.
…ibility regarding the number of output files for each type of computation, e.g. NEB.
Several small changes are made to the `PwOutput` class:

1. Use `pathlib` instead of `os.path` for path definitions.
2. `.from_dir` method: Only look for files from the specified directory using list
   comprehension.
3. Adapt approach for identifying the `stdout` file to only use the first 5 lines to
   look for the `Program PWSCF` string.
4. Remove the try-except loop that simply catches a general exception and raises a
   `ValueError` instead. This would obscure the actual error without an immediate
   benefit.
5. Define the `d_out_xxx` variables as empty dictionaries before parsing, and redefine
   them in case the data is parsed successfully.
6. Allow the user to pass a `Path` instead of a string to specify the directory to
   parse from.
Currently the `PwStdoutParser` nor its parent class `BaseOutputFileParser` parse any
of the Quantum ESPRESSO `stdout`. Here we add the `BaseStdoutParser` class that has
some very minimal parsing features for parsing the code name and version, as well as the
walltime of the calculation. A utility function is also added to convert the Quantum
ESPRESSO walltime output into seconds.

The constructor of the `BaseOutputFileParser` is also adapted to remove the default
to `None`. Since there is currently no way of providing the output content after
construction, it should be provided at that time for the class to be useful.
The first schema used for the XML output of Quantum ESPRESSO `pw.x` was still missing.
Here it is added, along with a set of tests for the XML parsing of default runs.
@mbercx
Copy link
Member

mbercx commented Nov 21, 2024

@gbrunin thanks again for the hard work in restructuring the package. I've added some basic stdout parsing, tests etc, and fixed up the package to run the CI properly on the currently active Python versions (3.9 - 3.13).

Now I'm adding some basic documentation, also on the design decisions. We should consider these carefully, particularly the UX, since it's hard to change these afterwards. I don't care too much about breaking backwards compatibility if we can converge on a more user-friendly tool.

@mbercx
Copy link
Member

mbercx commented Nov 21, 2024

Also quick shoutout to @elinscott and his package for QE inputs:

https://github.com/elinscott/qe_input_prototype

He's already indicated he'd be fine with us integrating that work into qe-tools, so we should consider it in our design discussions.

@mbercx mbercx mentioned this pull request Nov 21, 2024
@mbercx
Copy link
Member

mbercx commented Nov 21, 2024

@gbrunin I've also added some basic documentation and design notes. I'm still not quite sure about some of the APIs. I do like the general design now though, I'm just also wondering if:

  1. We should also add one class that can deal with both inputs and outputs. IIRC some of the output parsing requires, or can be significantly facilitated by having access to the inputs. Since we already use the "parser" nomenclature for the output parsers, I might call this e.g. PwCalc, and then it contains both the inputs and outputs.
  2. In time, we want a PwInput class, that allows for both generating an input file and parsing an existing one. I'm not sure if in this case we need to have the Parser class intermediary.
  3. We should already work a bit on separating the raw parsed data of the XML and actual useful outputs. The raw data is not really that user-friendly, clearly.

I've also made a little image to represent the design:

image

gbrunin and others added 2 commits November 22, 2024 11:01
…ec to utils. Renamed BaseStdoutParser.parse_stdout_base to parse for compatibility with parent abstract class.
This currently serves no real purpose, as the executable can be easily
identified from the class itself. In case we do find a use case later,
it can always be added again easily.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants