Controller restart by switch_controller #1568

TakashiSato · 2024-06-14T12:52:30Z

This PR proposes the controller restart function as discussed in issue #674.

In ROS1, controller restart was implemented by passing true to the reset_controller argument of the ControllerManager::update function, as demonstrated here. This approach allowed the controller to be stopped, started, and updated consecutively within a single execution of the ControllerManager::update function, enabling continuous robot operation without interrupting control calculations. However, applying this method to the update function in ROS2 would cause an API break, significantly impacting users.

Alternatively, PR #677 describes the "simplest way" to perform a restart by specifying the same controller name for both start and stop in the switch_controller function. However, the current implementation of ControllerManager cannot handle this input correctly, making it unfeasible. Consequently, the current method involves executing stop first, followed by start, leading to a temporary stop period and causing discontinuity in the controller's operation.

To address this, the implementation has been modified to allow the switch_controller function to perform a restart even when the same controller is specified for both start and stop. Additionally, during the implementation, it was discovered that the current logic of switch_controller is complex and contains potential bugs, necessitating a refactor of its logic.

As a result, this PR includes a significant number of changes. However, I have confirmed that all existing tests, as well as additional tests for previously identified potential bugs, have passed without issues. Furthermore, new tests for the restart functionality have been created and passed successfully.

If you have any comments or feedback, please feel free to share your thoughts.

… deactivate followed by activate when the same controller is specified for both start and stop

codecov · 2024-06-15T19:44:58Z

Codecov Report

Attention: Patch coverage is 95.42744% with 23 lines in your changes missing coverage. Please review.

Project coverage is 88.49%. Comparing base (1ed61ef) to head (3721ad6).
Report is 6 commits behind head on master.

❗ Current head 3721ad6 differs from pull request most recent head 1865ab7

Please upload reports for the commit 1865ab7 to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1568      +/-   ##
==========================================
+ Coverage   87.70%   88.49%   +0.79%     
==========================================
  Files         102      105       +3     
  Lines        8704     9015     +311     
  Branches      780      790      +10     
==========================================
+ Hits         7634     7978     +344     
+ Misses        790      758      -32     
+ Partials      280      279       -1

Flag	Coverage Δ
unittests	`88.49% <95.42%> (+0.79%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
.../include/controller_manager/controller_manager.hpp	`100.00% <ø> (ø)`
...chainable_controller/test_chainable_controller.cpp	`85.24% <100.00%> (+1.03%)`	⬆️
...chainable_controller/test_chainable_controller.hpp	`100.00% <ø> (ø)`
...r_manager/test/test_controller/test_controller.cpp	`95.45% <100.00%> (+0.71%)`	⬆️
...r_manager/test/test_controller/test_controller.hpp	`100.00% <ø> (ø)`
...ller_with_command/test_controller_with_command.hpp	`100.00% <100.00%> (ø)`
...t_controllers_chaining_with_controller_manager.cpp	`99.24% <100.00%> (+0.39%)`	⬆️
...ontroller_manager/test/test_restart_controller.cpp	`100.00% <100.00%> (ø)`
...ller_with_command/test_controller_with_command.cpp	`88.23% <88.23%> (ø)`
controller_manager/src/controller_manager.cpp	`78.82% <90.78%> (+4.26%)`	⬆️

christophfroehlich · 2024-06-16T09:18:16Z

With "restart" you mean deactivate and activate transition? Could you please comment on a usecase, where this is necessary? Isn't there something missing in the controller's interfaces if something like this is necessary?

TakashiSato · 2024-06-18T08:02:27Z

With "restart" you mean deactivate and activate transition?

Yes. I referred to the process of deactivating a controller from an active state and then reactivating it within the same update loop as a "restart."

Here are some use cases where this might be useful:

In common real robots like robotic arms, there are often functions to stop the robot when it receives an external signal input to the controller or when a self-diagnosis function detects a dangerous condition. These stopping methods include an "emergency stop," which cuts off power supply to the joint motors and reduces driving torque to zero (typically, electromagnetic brakes are also applied to hold the joints to prevent them from falling due to gravity), and a "protective stop," which continues to generate torque to hold the position after decelerating through joint control.

When these stopping functions are triggered in a robot using ros2_control, the real robot controller will override the commands from ros2_controllers for safety. Currently, ros2_control does not have a general method to detect such stop states or command overrides by the real-world robot controller, which can lead to a mismatch between the actual state of the robot and the commands from ros2_controllers.

While the JointTrajectoryController (JTC) has a function to hold position when the deviation between the robot's current state (e.g., position) and the command value exceeds a certain threshold (here), not all ros2_controllers have such closed-loop monitoring functions (e.g., ForwardCommandController). Even in JTC, deviations within the threshold may not be considered abnormal, allowing discrepancies between command and actual values. If the stop state is released under these conditions, there is a risk of sudden joint movement being triggered.

Therefore, in many cases, to re-enable joint control by ros2_control after an emergency stop or protective stop (particularly important), it is necessary to reset the command state of ros2_controllers (e.g., matching the command position with the current position, setting the command speed to zero, erase buffered value, etc.). As mentioned in the first comment of this PR, since there is no general method to do this in ros2_control, I proposed a workaround to "restart" the same controller by specifying it in switch_controller.

However, after reconsidering your feedback, it may not be necessary to perform the deactivate (stop) and activate (start) within the same update loop as I originally proposed (i.e., they could be executed by separate switch_controller requests). Here is how this relates to our robot use case:

In our robot, we have implemented processing in a custom HardwareComponent (SystemInterface) to switch the commands sent to the real robot controller based on the situation where the command_interface (Position/Velocity/Effort, etc.) is claimed. Related to this processing, we have implemented a decision process to disable the servo state and apply the electromagnetic brakes for safety when there are joints not claimed by any ros2_controllers, resulting in an undefined command_interface.

Therefore, if we handle deactivate and activate with separate switch_controller requests, the servo state would be disabled during the period when no ros2_controllers are assigned, triggering the decision process mentioned above. To avoid this (and reset the controller state while keeping the servo state enabled), I proposed executing deactivate and activate within the same update loop.

If "restart" is not considered useful and only adds complexity to the logic, I am open to revising the PR to focus on refactoring and bug fixes instead of the "restart" feature.

christophfroehlich · 2024-06-19T11:01:57Z

Wouldn't it be useful if your hardware component reports the emergency stop to the controller_manager, e.g. by returning an error in the read/write methods, and the controllers claiming the interfaces are stopped automatically? Then everything could be started again later on. but i'm not sure how far this is implemented already tbh.

saikishor · 2024-06-19T16:16:38Z

Hello!

I suggest doing as @christophfroehlich suggested. If there is this kind of error or the emergency stop, you can return return_type::ERROR either in the read cycle or in the write cycle or both. This will make the controller manager automatically deactivate all the active controllers that use the interfaces of the failing hardware.
Once you do the hardware lifecycle, you can restart the controllers automatically.

However, the servo_state you were mentioning earlier would be affected. Would the earlier explained process work for you? Or not?

Could you also please share with us the list of bugs you have addressed here?.

Thank you

christophfroehlich · 2024-06-19T19:43:58Z

Despite the discussion of your use case, we agreed that restarting should be possible.
@TakashiSato This PR has 2300 line changes, which makes it very hard to review. Could you maybe split this into several ones, and describe which bugs/changes you address with every PR?

TakashiSato · 2024-06-22T08:27:59Z

@christophfroehlich @saikishor
Thank you for your comments!

After reviewing the implementation and behavior of the method you suggested, where the hardware component's read/write returns return_type::ERROR, we believe this approach is feasible. However, since we had previously implemented the function to handle e_stop or p_stop requests and their release as a ros_controller (named safety_controller) based on our ROS1 implementation, we found it challenging to directly apply the aforementioned method, which deactivates all controllers.

Nevertheless, we believe this issue can be resolved by implementing functionality similar to safety_controller on the hardware_component side.

Additionally, we discovered that by returning return_type::DEACTIVATE in the hardware component's read/write, we can transition the hardware component to an inactive state without affecting the controllers.
(For reference, links to the related implementation are provided below:)

Using return_type::DEACTIVATE may allow us to retain functionality similar to safety_controller on the ros2_controller side. This approach seems particularly suitable for purposes such as protective stop with deceleration control or maintaining the servo_state.

The choice of approach will depend on our requirements and implementations, so we will consider methods that induce state transitions in the hardware component moving forward.

On the other hand, as @christophfroehlich mentioned, the controller restart function itself seems useful regardless of the above discussion.

Regarding the list of bugs I have addressed and the division of PRs, it may take some time to organize the information and code, but I plan to address this in the near future.

saikishor · 2024-06-26T05:49:20Z

@TakashiSato may be it's better to open separate PRs for separate bugs, so it's faster and easier to get them merged. What do you think?

TakashiSato · 2024-06-26T08:05:49Z

@saikishor
Thank you for your advice!
I'm sorry for the confusion caused by some of my commits in an effort to organize information.
I am planning to submit bug fixes and the implementation of restart (a version with fewer implementation changes) in separate PRs shortly.

TakashiSato · 2024-06-26T11:29:26Z

@christophfroehlich @saikishor
I have separated the changes in this PR into the following two PRs:

Bug fix for switch_controller when using controller chaining #1591: Bug fix for switch_controller
Controller restart by switch_controller (minimal implementation) #1592: Addition of restart_controllers functionality

As included in this PR, I am also considering proposing further refactoring related to the switch_controller logic. However, since the above two PRs address all functional issues, I may not proceed with the refactoring in the immediate future.

Should I close this PR?

christophfroehlich · 2024-07-03T16:42:10Z

Thanks, yes let's close this PR.

TakashiSato added 12 commits June 13, 2024 17:06

Modify switch_controllers to implement a reset operation by executing…

123cb10

… deactivate followed by activate when the same controller is specified for both start and stop

refactor

3989c24

fix a bug and add test

f9aa9ca

move internal function to anonymous namespace. add comment

ed0a187

remove debug code and add comment

bf9f2fc

fix logic and add test

8c41a0c

renamed from reset to restart

7319996

refactor and cleanup code

df79a64

fix the points indicated by pre-commit run

7c370a0

remove unused input arg

704990e

add dummy lambda definition to maintain ABI compatibility

0a956c1

modify comment

3721ad6

github-actions bot requested review from bijoua29, bmagyar, destogl, fmauch, livanov93, moriarty and VanshGehlot June 14, 2024 12:52

christophfroehlich requested a review from saikishor June 16, 2024 09:14

TakashiSato added 3 commits June 23, 2024 08:58

enhance chainig test

5cce35d

restored the test case that had been accidentally deleted

70d417e

fix test case numbers

b9dab55

removed unnecessary sorting of (de)activate request

1865ab7

This was referenced Jun 26, 2024

Bug fix for switch_controller when using controller chaining #1591

Open

Controller restart by switch_controller (minimal implementation) #1592

Open

christophfroehlich closed this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller restart by switch_controller #1568

Controller restart by switch_controller #1568

TakashiSato commented Jun 14, 2024

codecov bot commented Jun 15, 2024 •

edited

Loading

christophfroehlich commented Jun 16, 2024

TakashiSato commented Jun 18, 2024

christophfroehlich commented Jun 19, 2024

saikishor commented Jun 19, 2024

christophfroehlich commented Jun 19, 2024 •

edited

Loading

TakashiSato commented Jun 22, 2024

saikishor commented Jun 26, 2024

TakashiSato commented Jun 26, 2024

TakashiSato commented Jun 26, 2024

christophfroehlich commented Jul 3, 2024

Controller restart by switch_controller #1568

Controller restart by switch_controller #1568

Conversation

TakashiSato commented Jun 14, 2024

codecov bot commented Jun 15, 2024 • edited Loading

Codecov Report

christophfroehlich commented Jun 16, 2024

TakashiSato commented Jun 18, 2024

christophfroehlich commented Jun 19, 2024

saikishor commented Jun 19, 2024

christophfroehlich commented Jun 19, 2024 • edited Loading

TakashiSato commented Jun 22, 2024

saikishor commented Jun 26, 2024

TakashiSato commented Jun 26, 2024

TakashiSato commented Jun 26, 2024

christophfroehlich commented Jul 3, 2024

codecov bot commented Jun 15, 2024 •

edited

Loading

christophfroehlich commented Jun 19, 2024 •

edited

Loading