-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testRebotConnectionTimeouts is sometimes failing #176
Comments
Might be connected with #109, so please do both tickets together. |
I fixed #109 and had the tests run in parallel on my machine with stress-ng in the background, no failure so far. I'd close this and we can re-open/create a new ticket if we see it again |
I just saw on Jenkins that the analysis job failed due to an error in one of the Rebot tests:
|
That's an unfortunate race condition in the shell script I did not forsee and interestingly didn't happen even with maxed out cpus |
Back in ChimeraTK-DeviceAccess-analysis Build 621: unknown location(0): fatal error in "testWriteTimeout": std::exception: Bad file descriptor |
Looking at the rebot client code - in very bad circumstances we might be sending the hello request to the server before the socket is actually connected. there is no check that the async_connect() is actually finished. But that again results in a ctk::runtime_error, not std::runtime_error |
The std::runtime_error points towards the dummy server because problems in the client will result in ctk::runtime_error. EBADF probably is trying to access a not-yet-ready socket. Exception happens in the call to device->open() since that is between the two checkpoints in line 79 and 83. Maybe it is also related to the fact that this is the point where the server is started the second time in the test and there's some dangling initialization. I could, however, not really reproduce it with load, without load, with or without ASAN, in docker or outside docker. |
Under high system load testRebotConnectionTimeouts is failing on Jenkins. It is especially failing for release builds.
My previous suspicion that "std::exception: Bad file descriptor" is coming from Jenkins being under too much load and causing random tests to fail seems to be wrong. This exception is coming from the Rebot backend and is triggered by a race condition in the test.
The text was updated successfully, but these errors were encountered: