Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch QE n_electrons #271

Merged
merged 10 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 32 additions & 12 deletions electronicparsers/quantumespresso/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import re
from datetime import datetime
import os
from typing import Optional

from nomad.units import ureg
from nomad.parsing.file_parser.text_parser import TextParser, Quantity, DataTextParser
Expand Down Expand Up @@ -2037,8 +2038,18 @@ def str_to_sticks(val_in):
),
Quantity(
'number_of_electrons',
rf'number of electrons\s*=\s*({re_float})\s*(?:\(up:\s*({re_float})\s*,\s*down:\s*({re_float}))?',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you send me the text which fails for this regex pattern? as far as i can see this is generic enough for both cases. we should try to keep it in one quantity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern isn't failing itself. The schema only needs the 1st match. There was a bug where the parser passed along all 3, violating the shape. The new shape rules actually brought it too light. So we should maybe consider reprocessing QE calcs...

If for some God-foresaken reason the 1st number can't be extracted, it can still be reconstructed from the latter 2.
It's very niche, I can remove that logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you then simply put this logic in swtting the value based on the length of the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that first (it's in an older commit). The answer's no, since idk which value(s) is missing when the list is of length 2 or 1. The only choice is to support this corruption recovery or not. Up to you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can then make this as a sub-parser to resolve the electron type, this i think is still a better way than introducing multiple quantites

dtype=float,
rf'(number of electrons\s*=[^\n]*)',
sub_parser=TextParser(
quantities=[
Quantity(
'total',
rf'number of electrons\s*=\s*({re_float})',
dtype=float,
),
Quantity('up', rf'up:\s*({re_float})', dtype=float),
Quantity('down', rf'down:\s*({re_float})', dtype=float),
],
),
),
Quantity(
'x_qe_number_of_states',
Expand Down Expand Up @@ -2790,6 +2801,19 @@ def __init__(self):
}
self._re_label = re.compile(r'([A-Z][a-z]?)')

def get_n_electrons_safe(self) -> Optional[float]:
n_electrons = self.out_parser.get('run', [])
if n_electrons:
n_electrons = n_electrons[0].get_header('number_of_electrons', {})
if total_n_electrons := n_electrons.get('total'):
return total_n_electrons
elif (up := n_electrons.get('up')) and (down := n_electrons.get('down')):
self.logger.warning(
'Number of electrons not found. Using spin up + down.'
)
return up + down
return None

def parse_scc(self, run, calculation):
sec_run = self.archive.run[-1]
initial_time = (
Expand Down Expand Up @@ -2909,8 +2933,11 @@ def parse_scc(self, run, calculation):
if np.array(fermi_energy).dtype == float:
sec_energy.fermi = fermi_energy * ureg.eV

n_electrons = run.get_header('number_of_electrons')
if homo is None and fermi_energy is None and n_electrons is None:
if (
homo is None
and fermi_energy is None
and len(self.get_n_electrons_safe()) == 0
):
self.logger.error('Reference energy is not defined')

for key in ['magnetization_total', 'magnetization_absolute']:
Expand Down Expand Up @@ -3489,14 +3516,7 @@ def parse_method(self, run):
if atom_sp[i] is not None:
setattr(sec_method_atom_kind, atom_species_names[i], atom_sp[i])

number_of_electrons = run.get_header('number_of_electrons')
if number_of_electrons is not None:
number_of_electrons = (
[number_of_electrons]
if isinstance(number_of_electrons, float)
else number_of_electrons
)
sec_method.electronic.n_electrons = number_of_electrons
sec_method.electronic.n_electrons = self.get_n_electrons_safe()

def init_parser(self):
self.out_parser.mainfile = self.filepath
Expand Down
2 changes: 1 addition & 1 deletion tests/test_quantumespressoparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def test_scf(parser):
assert len(sec_method.dft.xc_functional.exchange) == 1
assert sec_method.x_qe_xc_igcc_name == 'pbc'
assert sec_method.dft.xc_functional.exchange[0].name == 'GGA_X_PBE'
assert sec_method.electronic.n_electrons[0] == 8
assert sec_method.electronic.n_electrons == 8
assert sec_method.electronic.n_spin_channels == 1
sec_atoms = sec_method.atom_parameters
assert len(sec_atoms) == 2
Expand Down
Loading