Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow hangs wcEcoli complexation with certain sim seeds #48

Closed
1fish2 opened this issue Dec 8, 2021 · 6 comments · Fixed by #49
Closed

Arrow hangs wcEcoli complexation with certain sim seeds #48

1fish2 opened this issue Dec 8, 2021 · 6 comments · Fixed by #49
Labels
bug Something isn't working

Comments

@1fish2
Copy link
Contributor

1fish2 commented Dec 8, 2021

Some wcEcoli sims can hang during complexation.

See CovertLab/wcEcoli#1229 including @tahorst's boiled down test case.

@1fish2 1fish2 added the bug Something isn't working label Dec 8, 2021
@prismofeverything
Copy link
Member

Hey @1fish2! I can't see the test case or the linked issue (private repo, ha) but this has come up before, if I recall during flagellar complexation. Gillespie is prone to explode under certain conditions if the exponent term in the choice calculation is too large.... the solution is to find the offending reaction and decompose the stoichiometry into an equivalent problem with more steps (I think the flagella had something like 170 identical subunits which is what was causing the problem, breaking it into two+ equivalent reactions fixed it).

Beyond that, adding something to actually catch this error when/before it happens would be helpful. I thought we did that at some point but maybe not, it's been awhile.

1fish2 added a commit that referenced this issue Dec 9, 2021
This is @tahorst's test case to reproduce an Arrow hang.

Is it caused by a Gillespie algorithm blowup? By integer overflow? Something else?

**Note:** `make clean compile` prints an unexpected warning
```
building 'arrow.arrowhead' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include
```

**Note:** `test_flagella` also fails: `arrow/arrow.py:176: SimulationFailure`.
@1fish2
Copy link
Contributor Author

1fish2 commented Dec 9, 2021

I copied @tahorst's test case into this repo, making it a minimal unit test.

This needs debugging. The cause might be the Gillespie algorithm blowup.

@1fish2
Copy link
Contributor Author

1fish2 commented Dec 9, 2021

Note: make clean compile prints an unexpected warning

building 'arrow.arrowhead' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include

Note: test_flagella also fails: arrow/arrow.py:176: SimulationFailure.

@tahorst
Copy link
Member

tahorst commented Dec 9, 2021

I have doubts this is due to overflow (or at least the overflow issue identified in #39) since the symptoms are different from what happened there. Also, in wcEcoli, we run arrow twice (once on all possible molecules and a second time on a reduced set) and this issue happens with the reduced set after the larger set already completed successfully so propensities should be the same or smaller than the first time it was run. From #39, it seems like the overflow will happen in the propensity calculations which should be the same regardless of the random seed for arrow but this issue only pops up with certain random states. Or is it possible the overflow is seed dependent?

If it is overflow and/or negative counts, then maybe the algorithm keeps selecting the negative counts and drives them even more negative in an infinite loop that would slowly eat up memory as more events are recorded.

@prismofeverything
Copy link
Member

Ah yeah, negative counts are the other failure mode.... I thought we dealt with this before? but maybe there is still some lurking corner case that's getting triggered in this point. I've been able to debug these issues with simple print statements in the C before, but you may have more success with gdb in this case (still the GOAT): https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf

@1fish2
Copy link
Contributor Author

1fish2 commented Dec 22, 2021

As Travis found, the bug occurred when the random value point == 0, then the loop would select the first reaction even if its propensity was 0.

PR #49 includes a unit test, the bug fix, and additional robustness checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants