-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async controllers #3
base: master
Are you sure you want to change the base?
Conversation
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
4fede49
to
a21ef6f
Compare
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/controller_interface_base.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
{ | ||
if (is_running()) | ||
{ | ||
async_update_stop_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deadlock due to lost wake up: stop_async_update()
could hang in thread_join()
if lines 177-178 are executed while thread_
is in line 209
Surround async_update_stop_
with a mutex in both ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already fixed by adding the mutex on both ends
ros2_control/controller_interface/include/controller_interface/async_function_handler.hpp
Lines 183 to 194 in 40efb15
void stop_async_update() | |
{ | |
if (is_running()) | |
{ | |
{ | |
std::unique_lock<std::mutex> lock(async_mtx_); | |
async_update_stop_ = true; | |
} | |
async_update_condition_.notify_one(); | |
thread_.join(); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any mutex protecting this line
ah yes it works but I consider a better practice to protect async_update_stop_
with a mutex because it makes harder this error:
while ((get_state_function_().id() == lifecycle_msgs::msg::State::PRIMARY_STATE_ACTIVE ||
get_state_function_().id() ==
lifecycle_msgs::msg::State::TRANSITION_STATE_ACTIVATING) &&
!async_update_stop_)
{
doing_something_while_async_update_stop_is_true(); // <== my future me committing a bug
{
std::unique_lock<std::mutex> lock(async_mtx_);
async_update_condition_.wait(
lock, [this] { return trigger_in_progress_ || async_update_stop_; });
...
My suggestion is:
std::unique_lock<std::mutex> lock(async_mtx_);
while ((get_state_function_().id() == lifecycle_msgs::msg::State::PRIMARY_STATE_ACTIVE ||
get_state_function_().id() ==
lifecycle_msgs::msg::State::TRANSITION_STATE_ACTIVATING) &&
!async_update_stop_)
{
async_update_condition_.wait(
lock, [this] { return trigger_in_progress_ || async_update_stop_; });
if (!async_update_stop_)
{
async_update_return_ = async_function_(current_update_time_, current_update_period_);
}
trigger_in_progress_ = false;
cycle_end_condition_.notify_all();
}
// Async related variables | ||
std::thread thread_; | ||
std::atomic_bool async_update_stop_{false}; | ||
std::atomic_bool trigger_in_progress_{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no atomic required: if line 195 is changed to lock.owns_lock() && !trigger_in_progress_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for performance reasons, now you can remove atomic from trigger_in_progress_
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
lock, [this] { return trigger_in_progress_ || async_update_stop_; }); | ||
if (async_update_stop_) | ||
{ | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we break here doing nothing makes threads hang if they're blocked in wait_for_trigger_cycle_to_finish()
. We must notify these waiters that we finished the cycle.
The steps that lead to the situation are:
- th1:
wait_for_trigger_cycle_to_finish()
- th2:
stop_async_update()
th1 could hang or crash (if stop_async_update()
has been called from ~AsyncFunctionHandler()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
controller_interface/include/controller_interface/async_function_handler.hpp
Outdated
Show resolved
Hide resolved
// Async related variables | ||
std::thread thread_; | ||
std::atomic_bool async_update_stop_{false}; | ||
std::atomic_bool trigger_in_progress_{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for performance reasons, now you can remove atomic from trigger_in_progress_
@@ -125,7 +127,7 @@ TEST_F(AsyncFunctionHandlerTest, check_triggering) | |||
ASSERT_TRUE(async_class.get_handler().is_initialized()); | |||
ASSERT_TRUE(async_class.get_handler().is_running()); | |||
ASSERT_FALSE(async_class.get_handler().is_stopped()); | |||
async_class.get_handler().wait_for_trigger_cycle_to_finish(); | |||
std::this_thread::sleep_for(std::chrono::microseconds(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally concurrency tests with random delays are suspicious.
So... why did you change that? I guess you observed a deadlock here.
If this is the case, either wait_for_trigger_cycle_to_finish()
needs some changes or we need to clarify its behavior (and change the test accordingly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add this 1 microsecond delay in order to run the update cycle. If not, it immediately aborts the update cycle and fails in the assertion of the counter being 2
ros2_control/controller_interface/include/controller_interface/async_function_handler.hpp
Lines 228 to 233 in 40efb15
if (async_update_stop_) | |
{ | |
trigger_in_progress_ = false; | |
cycle_end_condition_.notify_one(); | |
break; | |
} |
ASSERT_TRUE(async_class.get_handler().is_initialized()); | ||
ASSERT_TRUE(async_class.get_handler().is_running()); | ||
ASSERT_FALSE(async_class.get_handler().is_stopped()); | ||
async_class.get_handler().wait_for_trigger_cycle_to_finish(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential deadlock due to lost wake up, there is no warranty that this thread is awaiting before anyc_class.thread_
is signaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought so, but open reading around, the conditional variable even before waiting., it checks if the predicate is true or false, if it is true, it continues directly without any signal. If it is false, then it starts to wait for the wakeup signal
I've just testing adding 200 ms delay before the wait and the tests pass (no deadlock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding-delays-until-it-works(TM) is a bad approach because it relays heavily on uncontrolled conditions as computer load, OS (dynamic) scheduler policies, etc.
Basically your tests suffer of UB
ASSERT_TRUE(async_class.get_handler().is_initialized()); | ||
ASSERT_TRUE(async_class.get_handler().is_running()); | ||
ASSERT_FALSE(async_class.get_handler().is_stopped()); | ||
async_class.get_handler().wait_for_trigger_cycle_to_finish(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same potential deadlock between trigger()
and wait_for_trigger_cycle_to_finish()
Please add some delay in between to check my assumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atzaros I added like 50 ms delay right after the trigger, and the tests seem to pass
} | ||
async_class.get_handler().stop_async_update(); | ||
|
||
// now the async update should be preempted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean with preemption?
*/ | ||
void join_async_update_thread() | ||
{ | ||
if (is_running()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling is_running()
and join()
from different threads could incur in a data race over thread_
's state, for instance calling wait_for_trigger_cycle_to_finish()
and join_async_update_thread()
concurrently.
I would document clearly that all class' functions using is_running()
and join()
cannot be used from different threads concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to say when we have more then 2 threads involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
void wait_for_trigger_cycle_to_finish() | ||
{ | ||
if (is_running()) | ||
{ | ||
std::unique_lock<std::mutex> lock(async_mtx_); | ||
cycle_end_condition_.wait(lock, [this] { return !trigger_in_progress_; }); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now reading at your tests I understand you want to use this function for detecting progress after a trigger has been issued.
I think what you need here is either a heartbeat [1] or just change your approach in tests using synchronization primitives instead of a plain counter
[1] it means some refactoring around trigger()
and wait_for_trigger_cycle_to_finish()
, the former returning the current heartbeat and the latter checking for any increase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronization primitives sounds like a better idea right? What would you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synch primitives in the tests.
If you walk that way, I would remove wait_for_trigger_cycle_to_finish()
from the API. It is harmful and, I think, it has been created just because you needed it for testing.
Having a measure of progress in the async's is useful for the controller manager to diagnose abnormal situations but it's a nice-to-have.
40efb15
to
0d04cd3
Compare
b5de991
to
f7d015d
Compare
Co-authored-by: atzaros <[email protected]>
6c80b9e
to
ebb69c5
Compare
No description provided.