Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mitigate delays from rospy long-running callbacks (#1901) #11

Conversation

c-andy-martin
Copy link

@c-andy-martin c-andy-martin commented Jul 9, 2021

Long-running callbacks in rospy can cause extreme amounts of
buffering resulting in unnecessary delay, essentially ignoring the
queue_size setting. This can already by somewhat mitigated by setting
buff_size to be larger than the amount of data that could be buffered
by a long running callback. However, setting buff_size to a correct
value is not possible for the user of the API if the amount of time
in the callback or the amount of data that would be transmitted is
unknown.

Greatly mitigate the delays in such cases by altering the structure
of the receive logic. Instead of recv()ing up to buff_size data, then
calling the callbacks on every message received, interleave calling
recv() between each callback, enforcing queue_size as we go. Also,
recv() all data currently available when calling recv() by calling
recv() non-blocking after calling it blocking. While it is still
possible to have stale data, even with a queue_size of 1, it is less
likely, especially if the publisher of the data is on the same host.
Even if not, the staleness of the data with a queue_size of 1 is now
bounded by the runtime of the callback instead of by buff_size.

This mitigation was chosen over a complete fix to the problem
because a complete fix would involve a new thread to handle
callbacks. While a new thread would allow recv() to be running
all the time, even during the long callback, it is a more complex
solution. Since rospy is going to be replaced in ROS2, this more
tactical mitigation seems appropriate.

This mitigates ros#1901


This change is Reviewable

@c-andy-martin c-andy-martin requested a review from a team July 9, 2021 16:03
@c-andy-martin c-andy-martin self-assigned this Jul 9, 2021
@c-andy-martin c-andy-martin force-pushed the mitigate-rospy-long-running-callbacks branch from 007d9a6 to d32a9f6 Compare July 9, 2021 18:40
Long-running callbacks in rospy can cause extreme amounts of
buffering resulting in unnecessary delay, essentially ignoring the
queue_size setting. This can already by somewhat mitigated by setting
buff_size to be larger than the amount of data that could be buffered
by a long running callback. However, setting buff_size to a correct
value is not possible for the user of the API if the amount of time
in the callback or the amount of data that would be transmitted is
unknown.

Greatly mitigate the delays in such cases by altering the structure
of the receive logic. Instead of recv()ing up to buff_size data, then
calling the callbacks on every message received, interleave calling
recv() between each callback, enforcing queue_size as we go. Also,
recv() all data currently available when calling recv() by calling
recv() non-blocking after calling it blocking. While it is still
possible to have stale data, even with a queue_size of 1, it is less
likely, especially if the publisher of the data is on the same host.
Even if not, the staleness of the data with a queue_size of 1 is now
bounded by the runtime of the callback instead of by buff_size.

This mitigation was chosen over a complete fix to the problem
because a complete fix would involve a new thread to handle
callbacks. While a new thread would allow recv() to be running
all the time, even during the long callback, it is a more complex
solution. Since rospy is going to be replaced in ROS2, this more
tactical mitigation seems appropriate.

This mitigates ros#1901
@c-andy-martin c-andy-martin force-pushed the mitigate-rospy-long-running-callbacks branch from d32a9f6 to 3174ca2 Compare July 12, 2021 15:30
Copy link
Collaborator

@bobhenz-jabil bobhenz-jabil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can already by somewhat mitigated by

Should be "This can already be somewhat mitigated by..."

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @c-andy-martin)

@c-andy-martin c-andy-martin deleted the mitigate-rospy-long-running-callbacks branch July 13, 2021 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants