[WIP] Refactoring to include outputs parsing #80

gbrunin · 2024-09-09T13:14:43Z

I have changed the structure of the code to better reflect the distinction between input and output parsers. Concerning the inputs, all the tests that were already there still pass, it's only a matter of adapting the namespace. For the outputs, I created objects such as PwOutput, that inherit from an abstract BaseOutput, that for now can be instantiated with a from_dir method (a from_files method could/should be added as well). A user with a job that ran in a given directory could get the outputs easily using this classmethod. In these from_dir methods, specific XML and/or standard output Parsers would be used to get the results. Each Parser would parse a single file, and the logic of parsing and extracting the outputs from the different codes would have to be implemented in each from_dir method. For instance, a NebOutput.from_dir would parse the standard output of the global computation and probably the standard outputs and/or XML files for each image. The extracted outputs would be stored as a simple dictionary and these objects would not rely on any external package.
Then, in qe_tools.extractors, ASE and pymatgen objects could be constructed (e.g., ase.Atoms/pymatgen.Structure, band structures,...), allowing each to be optional dependencies.

This is only the base logic of the new structure and many things remain to be implemented. The idea now is to see what breaks with this in aiida-quantumespresso and how the parsing could be moved from there to here. Then, more will be added depending on the needs.

…s. Made many functions as public since they could be useful to users and outside of their context.

…onversions should take place. In all other files, ase and pymatgen should not be used.

…ibility regarding the number of output files for each type of computation, e.g. NEB.

Several small changes are made to the `PwOutput` class: 1. Use `pathlib` instead of `os.path` for path definitions. 2. `.from_dir` method: Only look for files from the specified directory using list comprehension. 3. Adapt approach for identifying the `stdout` file to only use the first 5 lines to look for the `Program PWSCF` string. 4. Remove the try-except loop that simply catches a general exception and raises a `ValueError` instead. This would obscure the actual error without an immediate benefit. 5. Define the `d_out_xxx` variables as empty dictionaries before parsing, and redefine them in case the data is parsed successfully. 6. Allow the user to pass a `Path` instead of a string to specify the directory to parse from.

Currently the `PwStdoutParser` nor its parent class `BaseOutputFileParser` parse any of the Quantum ESPRESSO `stdout`. Here we add the `BaseStdoutParser` class that has some very minimal parsing features for parsing the code name and version, as well as the walltime of the calculation. A utility function is also added to convert the Quantum ESPRESSO walltime output into seconds. The constructor of the `BaseOutputFileParser` is also adapted to remove the default to `None`. Since there is currently no way of providing the output content after construction, it should be provided at that time for the class to be useful.

The first schema used for the XML output of Quantum ESPRESSO `pw.x` was still missing. Here it is added, along with a set of tests for the XML parsing of default runs.

mbercx · 2024-11-21T14:44:08Z

@gbrunin thanks again for the hard work in restructuring the package. I've added some basic stdout parsing, tests etc, and fixed up the package to run the CI properly on the currently active Python versions (3.9 - 3.13).

Now I'm adding some basic documentation, also on the design decisions. We should consider these carefully, particularly the UX, since it's hard to change these afterwards. I don't care too much about breaking backwards compatibility if we can converge on a more user-friendly tool.

mbercx · 2024-11-21T15:14:58Z

Also quick shoutout to @elinscott and his package for QE inputs:

https://github.com/elinscott/qe_input_prototype

He's already indicated he'd be fine with us integrating that work into qe-tools, so we should consider it in our design discussions.

mbercx · 2024-11-21T22:58:31Z

@gbrunin I've also added some basic documentation and design notes. I'm still not quite sure about some of the APIs. I do like the general design now though, I'm just also wondering if:

We should also add one class that can deal with both inputs and outputs. IIRC some of the output parsing requires, or can be significantly facilitated by having access to the inputs. Since we already use the "parser" nomenclature for the output parsers, I might call this e.g. PwCalc, and then it contains both the inputs and outputs.
In time, we want a PwInput class, that allows for both generating an input file and parsing an existing one. I'm not sure if in this case we need to have the Parser class intermediary.
We should already work a bit on separating the raw parsed data of the XML and actual useful outputs. The raw data is not really that user-friendly, clearly.

I've also made a little image to represent the design:

…ec to utils. Renamed BaseStdoutParser.parse_stdout_base to parse for compatibility with parent abstract class.

This currently serves no real purpose, as the executable can be easily identified from the class itself. In case we do find a use case later, it can always be added again easily.

mbercx force-pushed the develop branch 3 times, most recently from 6d45efa to 35a667f Compare November 21, 2024 12:41

mbercx and others added 12 commits November 21, 2024 14:09

✨ First draft of PwParser

e95d85f

Modified the structure of the code to split between inputs and output…

17dfd54

…s. Made many functions as public since they could be useful to users and outside of their context.

Refactored structure with inputs/outputs.

9283644

Abstraction of base outputs, added ase and pymatgen files where the c…

3135fc4

…onversions should take place. In all other files, ase and pymatgen should not be used.

Added pycharm stuff in gitignore.

2ffcbc7

Added general parse method to output files.

bbc3e9e

Updated imports in tests.

d73b0f0

Added pymatgen and ase checks in respective files.

8ccf333

Refactoring with Outputs objects relying on Parsers. This allows flex…

3e58f13

…ibility regarding the number of output files for each type of computation, e.g. NEB.

✨🧪 Add missing qes-1.0.xsd schema and XML parsing tests

a72eb3f

The first schema used for the XML output of Quantum ESPRESSO `pw.x` was still missing. Here it is added, along with a set of tests for the XML parsing of default runs.

mbercx force-pushed the develop branch from 35a667f to 35a1cbc Compare November 21, 2024 13:10

🔧 Update dependencies; fix pre-commit & tests

de4f8c0

mbercx force-pushed the develop branch from f362f5d to de4f8c0 Compare November 21, 2024 13:54

📚 [WIP] Add some basic documentation and design notes

c31f570

mbercx mentioned this pull request Nov 21, 2024

✨ Add new PwParser class #79

Closed

gbrunin and others added 2 commits November 22, 2024 11:01

Use annotations from future for cleaner type hint. Moved qe_time_to_s…

90b3608

…ec to utils. Renamed BaseStdoutParser.parse_stdout_base to parse for compatibility with parent abstract class.

👌 Remove the executable input from the output classes

c737b43

This currently serves no real purpose, as the executable can be easily identified from the class itself. In case we do find a use case later, it can always be added again easily.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactoring to include outputs parsing #80

[WIP] Refactoring to include outputs parsing #80

gbrunin commented Sep 9, 2024

mbercx commented Nov 21, 2024

mbercx commented Nov 21, 2024

mbercx commented Nov 21, 2024

[WIP] Refactoring to include outputs parsing #80

Are you sure you want to change the base?

[WIP] Refactoring to include outputs parsing #80

Conversation

gbrunin commented Sep 9, 2024

mbercx commented Nov 21, 2024

mbercx commented Nov 21, 2024

mbercx commented Nov 21, 2024