You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in e.g. these links [1,2] there are some regex that, with the default python (or perl, ...) implementation, can become exponentially slow with some inputs.
To reproduce, one can e.g. run the following, where cell_parameters_block_re is the one defined in qeinputparser.py:
cell_string="""CELL_PARAMETERS angstrom1.000000000 0.00000000 0.0000000000.000000000 1.00000000 0.0000000000.000000000 0.00000000 1.000000000"""printcell_parameters_block_re.match(cell_string) # does not end after >60secondsprintcell_parameters_block_re.match(cell_string+"\n") # ends instantaneously
Already using a Thompson NFA implementation improves the situation: e.g. I tried this from this BitBucket repo with pip install regex and did a import regex as re as a drop-in replacement for re when generating cell_parameters_block_re can alleviate the problem, bringing the time down to ~3sec.
This can be done, adding a dependency, still it's not perfect and probably our regex can be improved.
At the moment, I suggest to always append a newline at the end of the input for safety, and if we see that this is not enough, we open this issue again and replace re with regex (and/or improve the regex).
# No newline in this stringcell_string="""CELL_PARAMETERS angstrom1.000000000 0.00000000 0.0000000000.000000000 1.00000000 0.0000000000.000000000 0.00000000 1.000000000"""fromqe_tools.parsers.qeinputparserimportparse_cell_parametersprint(parse_cell_parameters(cell_string+'\n'))
At the moment, we added a 'workaround' that solves the problem in a known case.
Please report here any further occurrences of the problem - if we have no news in the next few months we can close this issue.
As discussed in e.g. these links [1,2] there are some regex that, with the default python (or perl, ...) implementation, can become exponentially slow with some inputs.
To reproduce, one can e.g. run the following, where
cell_parameters_block_re
is the one defined inqeinputparser.py
:Already using a Thompson NFA implementation improves the situation: e.g. I tried this from this BitBucket repo with
pip install regex
and did aimport regex as re
as a drop-in replacement forre
when generatingcell_parameters_block_re
can alleviate the problem, bringing the time down to ~3sec.This can be done, adding a dependency, still it's not perfect and probably our regex can be improved.
At the moment, I suggest to always append a newline at the end of the input for safety, and if we see that this is not enough, we open this issue again and replace
re
withregex
(and/or improve the regex).[1] https://stackoverflow.com/questions/25982466/why-does-this-take-so-long-to-match-is-it-a-bug
[2] https://swtch.com/~rsc/regexp/regexp1.html
Mentioning @lekah
The text was updated successfully, but these errors were encountered: