More Flexible `Vasprun` Parsing #4075

kavanase · 2024-09-20T19:47:53Z

Feature Requested

Currently the Vasprun parser crashes (with ParseError) if the file is not complete. If one sets exception_on_bad_xml=False this can be avoided, but then most of the information is not parsed (even though it is still present in the file). This can occur fairly often when large VASP calculations complete the SCF cycles, but then crash at the last moment when writing long eigenvalue outputs or wavefunctions, due to memory or filespace issues etc.
It would be very useful if this information was still able to be pulled from the vasprun file, particularly in the case of large calculations where re-running the whole calculation just to get a properly-formatted output can be quite inefficient (e.g. hybrid+SOC singleshots on a large supercell which crashed at the last moment, with no wavefunction output).

Proposed Solution

This functionality should be achievable relatively easily, by smartly handling the XML elements which aren't fully complete.

As a rough demonstration of one possible approach, this code can be used to determine the current tag stack:

from xml.etree.ElementTree import iterparse

def validate_tags(file_path):
    tag_stack = []
    try:
        with open(file_path, 'r') as file:
            for event, elem in iterparse(file, events=("start", "end")):
                if event == "start":
                    tag_stack.append(elem.tag)
                elif event == "end":
                    if tag_stack and tag_stack[-1] == elem.tag:
                        tag_stack.pop()
                    else:
                        print(f"Mismatched tag found: {elem.tag}")
                        break

    except ET.ParseError as e:
        print(f"Parse error: {e}. Missing closing tag for {tag_stack[-1]} if stack is not empty.")
        if tag_stack:
            print(f"Current tag stack: {tag_stack}")

which in the example partially-complete vasprun.xml I've provided gives:

Parse error: no element found: line 9455, column 0. Missing closing tag for set if stack is not empty.
Current tag stack: ['modeling', 'calculation', 'eigenvalues', 'array', 'set', 'set', 'set']

If I then append these tags to a copy of the loaded file object, parsing can proceed without issue, loading all the information available in the (incomplete) vasprun.xml:

# open file and append closing tags for any missing ones:
# current tag stack: ['modeling', 'calculation', 'eigenvalues', 'array', 'set', 'set', 'set']
file_path = "vasprun.xml"
ionic_steps = []
with open(file_path, 'a+') as file:
    # TODO: This should be a temp file copy, so as not to modify file on system
    # append closing tags for any missing ones:
    file.writelines([
        "</set>\n</set>\n</set>\n</array>\n</eigenvalues>\n</calculation>\n</modeling>"
    ])
    
    file.seek(0)  # move the file pointer back to the beginning to read content
    
    for event, elem in ET.iterparse(file, events=["start", "end"]):
        tag = elem.tag
        if tag == "calculation":
            parsed_header = True
            ionic_steps.append(_parse_ionic_step(vr, elem))
    ...

There are presumably far smarter ways of doing this, this is just a rough example showing how it could be achieved.

This could be implemented only when the user uses the exception_on_bad_xml=False option, which already throws a warning if indeed the vasprun.xml is incomplete.

Example vasprun.xml where this is desirable:
vasprun.xml.gz

Relevant Information

Often this truncated output can result in truncated array outputs (e.g. of the eigenvalues/DOS), so it might require a quick check that all arrays are of the expected size, and if not they are dropped?

The text was updated successfully, but these errors were encountered:

kavanase added the enhancement A new feature or improvement to an existing one label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Flexible `Vasprun` Parsing #4075

More Flexible `Vasprun` Parsing #4075

kavanase commented Sep 20, 2024 •

edited

Loading

More Flexible Vasprun Parsing #4075

More Flexible Vasprun Parsing #4075

Comments

kavanase commented Sep 20, 2024 • edited Loading

Feature Requested

Proposed Solution

Relevant Information

More Flexible `Vasprun` Parsing #4075

More Flexible `Vasprun` Parsing #4075

kavanase commented Sep 20, 2024 •

edited

Loading