You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the Vasprun parser crashes (with ParseError) if the file is not complete. If one sets exception_on_bad_xml=False this can be avoided, but then most of the information is not parsed (even though it is still present in the file). This can occur fairly often when large VASP calculations complete the SCF cycles, but then crash at the last moment when writing long eigenvalue outputs or wavefunctions, due to memory or filespace issues etc.
It would be very useful if this information was still able to be pulled from the vasprun file, particularly in the case of large calculations where re-running the whole calculation just to get a properly-formatted output can be quite inefficient (e.g. hybrid+SOC singleshots on a large supercell which crashed at the last moment, with no wavefunction output).
Proposed Solution
This functionality should be achievable relatively easily, by smartly handling the XML elements which aren't fully complete.
As a rough demonstration of one possible approach, this code can be used to determine the current tag stack:
fromxml.etree.ElementTreeimportiterparsedefvalidate_tags(file_path):
tag_stack= []
try:
withopen(file_path, 'r') asfile:
forevent, eleminiterparse(file, events=("start", "end")):
ifevent=="start":
tag_stack.append(elem.tag)
elifevent=="end":
iftag_stackandtag_stack[-1] ==elem.tag:
tag_stack.pop()
else:
print(f"Mismatched tag found: {elem.tag}")
breakexceptET.ParseErrorase:
print(f"Parse error: {e}. Missing closing tag for {tag_stack[-1]} if stack is not empty.")
iftag_stack:
print(f"Current tag stack: {tag_stack}")
which in the example partially-complete vasprun.xml I've provided gives:
Parse error: no element found: line 9455, column 0. Missing closing tag for set if stack is not empty.
Current tag stack: ['modeling', 'calculation', 'eigenvalues', 'array', 'set', 'set', 'set']
If I then append these tags to a copy of the loaded file object, parsing can proceed without issue, loading all the information available in the (incomplete) vasprun.xml:
# open file and append closing tags for any missing ones:# current tag stack: ['modeling', 'calculation', 'eigenvalues', 'array', 'set', 'set', 'set']file_path="vasprun.xml"ionic_steps= []
withopen(file_path, 'a+') asfile:
# TODO: This should be a temp file copy, so as not to modify file on system# append closing tags for any missing ones:file.writelines([
"</set>\n</set>\n</set>\n</array>\n</eigenvalues>\n</calculation>\n</modeling>"
])
file.seek(0) # move the file pointer back to the beginning to read contentforevent, eleminET.iterparse(file, events=["start", "end"]):
tag=elem.tagiftag=="calculation":
parsed_header=Trueionic_steps.append(_parse_ionic_step(vr, elem))
...
There are presumably far smarter ways of doing this, this is just a rough example showing how it could be achieved.
This could be implemented only when the user uses the exception_on_bad_xml=False option, which already throws a warning if indeed the vasprun.xml is incomplete.
Example vasprun.xml where this is desirable: vasprun.xml.gz
Relevant Information
Often this truncated output can result in truncated array outputs (e.g. of the eigenvalues/DOS), so it might require a quick check that all arrays are of the expected size, and if not they are dropped?
The text was updated successfully, but these errors were encountered:
Feature Requested
Currently the
Vasprun
parser crashes (withParseError
) if the file is not complete. If one setsexception_on_bad_xml=False
this can be avoided, but then most of the information is not parsed (even though it is still present in the file). This can occur fairly often when large VASP calculations complete the SCF cycles, but then crash at the last moment when writing long eigenvalue outputs or wavefunctions, due to memory or filespace issues etc.It would be very useful if this information was still able to be pulled from the vasprun file, particularly in the case of large calculations where re-running the whole calculation just to get a properly-formatted output can be quite inefficient (e.g. hybrid+SOC singleshots on a large supercell which crashed at the last moment, with no wavefunction output).
Proposed Solution
This functionality should be achievable relatively easily, by smartly handling the XML elements which aren't fully complete.
As a rough demonstration of one possible approach, this code can be used to determine the current tag stack:
which in the example partially-complete
vasprun.xml
I've provided gives:If I then append these tags to a copy of the loaded
file
object, parsing can proceed without issue, loading all the information available in the (incomplete)vasprun.xml
:There are presumably far smarter ways of doing this, this is just a rough example showing how it could be achieved.
This could be implemented only when the user uses the
exception_on_bad_xml=False
option, which already throws a warning if indeed thevasprun.xml
is incomplete.Example
vasprun.xml
where this is desirable:vasprun.xml.gz
Relevant Information
Often this truncated output can result in truncated array outputs (e.g. of the eigenvalues/DOS), so it might require a quick check that all arrays are of the expected size, and if not they are dropped?
The text was updated successfully, but these errors were encountered: