-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use real-time sockets in federated execution #274
Conversation
9e8bbf6
to
d59b205
Compare
We should also investigate whether delayed ACKs can be problematic for us: https://en.wikipedia.org/wiki/Nagle%27s_algorithm#Interaction_with_delayed_ACK |
The |
Using the debugger, I just found a serious bug in federate.c where a local variable "result" was shadowed. I've pushed a fix to this branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks great to me. Let's see if my bug fix enables the tests to pass.
I think the bug was in Erling's change, not the original code, right? |
Yes, the bug was not in the original code (I initially thought it might have been, but nothing would have worked). |
…com:lf-lang/reactor-c into real-time-sockets
After Christian's fix of the error reporting system, I will now investigate this test failure and fix it so we can merge it soon. |
Co-authored-by: Marten Lohstroh <[email protected]>
@lhstrh, I got a test failure on a decentralized federated program on macOS. No debug output. Unable to reproduce in Ubuntu. Went away when re-running the tests. I have taken a close look at this PR and only reason I can come up with is either:
|
My bet would be on there being something fishy with the coordination... I don't think we can blame CI for this one (nor the changes in this PR). |
Then I will merge this and we will have to look at federated race conditions at another time |
Got a bug report from @lhstrh which I was able to reproduce in the following trivial federated program:
I observed a strange ~40msec latency between Src and Sink. I tracked it down be a communication latency between Src and the RTI. After alot of digging I have found out that there is something called Nagle algorithm, enabled by default, which bundles short TCP messages together to avoid too much network traffic. So it was delaying our messages. In this draft I have just added socket options for disabling with
TCP_NODELAY
. It is still a draft because we can discuss which sockets this should apply for, maybe all? After we decide, lets put creation of sockets innet_util.c
so we dont litter federate.c with these low-level socket option details.@lhstrh this PR should fix your issue. You should recompile and reinstall the RTI also based on this PR.