-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert quotes in input files that can foil Boost prog opts #644
Conversation
Partially addresses libqueso#642 by partially reverting "Make boost po opt-in, convert all test input files to getpot format" This reverts the input files modified in commit 00477bd. Conflicts: configure.ac
Hmm, Travis reports test_uqGaussianVectorRVClass fails. Which for me is a stochastic fail that I haven't seen in a while. Is this expected? |
Well it appears the seed for that test is taken from /dev/random, so I guess unlikely that the run would be identical, but the stochsatic test appears to have a tolerance such that it should pass cross-platform? |
Sadly, there is no such thing as "a tolerance such that it should pass cross-platform", only "a tolerance such that it should pass cross-platform N% of the time". Hence the pass with clang and failure with gcc this time. I don't know offhand how to "kick" a Travis check other than by doing an interactive rebase, minor commit message change, and force push. You can try that if you like. I'll build and test this branch on a couple of my own machines, and if it passes on both then I'll merge. |
This worked to restart it, once I signed into Travis using my GitHub account: https://stackoverflow.com/questions/17606874/trigger-a-travis-ci-rebuild-without-pushing-a-commit |
There might actually be something seriously wrong here. The failure seems outright deterministic for me on one system.
They mostly match in the diff, but the latter file has a thousand samples which the former file doesn't. What's especially weird is that they have TEN TIMES as many samples as the same runs on a different system, and neither matches what I'd have expected to see - doesn't the input file ask for ip_mh_rawChain_size of 2e4, not 5e3 or 5e4?? |
Yes, I saw that behavior as well, though I think I saw it with test_intercomm0, not test_uqGaussianVectorRVClass. The latter passes in all my local builds. I also thought something is actually wrong, given that it consistently PASSed with GetPot and FAILed with BPO. However, I stopped digging when you told me it was an inconsistent regression test. I'm probably not qualified to root cause it, but can attempt to dig back in if needed, especially given some pointers on where to look. |
Codecov Report
@@ Coverage Diff @@
## dev #644 +/- ##
==========================================
- Coverage 74.87% 74.84% -0.03%
==========================================
Files 312 312
Lines 23789 23789
==========================================
- Hits 17812 17805 -7
- Misses 5977 5984 +7
Continue to review full report at Codecov.
|
Sorry, didn't mean for any confusion, it actually is test_intercomm0 that's failing (on one system) for me. test_uqGaussianVectorRVClass is working fine with every test configuration I throw at it. |
Not sure what your team development practices are w.r.t. this, but it seems that test_intercomm0 w/BPO is a pre-existing instability on dev and not affected by this PR, so we could merge it. Would you rather track and resolve in this issue #642 / PR #644, #640, or should I create a new issue? I know it sucks to have to keep track of second-order failures like "this PR doesn't make the existing failures any worse," so I'm happy to handle however you like. |
Yeah, test_intercomm0 is a failure on dev for me too. (Although again, only on one of the systems I tried...) I'll try bisecting it (although I swear it was working for me on the gpmsa_new_functional branch, which is based on dev!) and swapping around MPI stacks and see what I can figure out, but IMHO there's no reason it should hold up this PR. |
No, test_intercomm0 should pass every time. |
Chain size is 20000 and filtered chain lag is 20. 20000 / 20 = 1000.
QUESO appends output to output files, not truncate. I suspect this clearly broken |
Oops, this is the url I meant to paste in my previous comment: https://github.com/libqueso/queso/blob/dev/test/test_intercomm0/test_intercomm0_gravity_run.sh#L6 |
Disregard above. |
(Merging this will leave one failure in test_intercomm0, at least on my system, when BPO is enabled. My understanding, subject to stray cosmic rays, is that this test is expected to be stochastic.)
Partially addresses #642 by partially reverting
"Make boost po opt-in, convert all test input files to getpot format"
This reverts the input files modified in
commit 00477bd.
Conflicts:
configure.ac