Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

follow_path action keeps on running after having canceled the goal #4659

Open
milidam opened this issue Sep 3, 2024 · 5 comments
Open

follow_path action keeps on running after having canceled the goal #4659

milidam opened this issue Sep 3, 2024 · 5 comments

Comments

@milidam
Copy link
Contributor

milidam commented Sep 3, 2024

Bug report

Required Info:

  • Operating System:
    • Custom Linux distribution
  • ROS2 Version:
    • Iron 2024-07-12
  • Version or commit hash:
    • navigation2 1.2.9
  • DDS implementation:
    • CycloneDDS

Steps to reproduce issue

In some cases, the follow_path action keeps on running after having canceled a navigate_to_pose goal.

Expected behavior

After canceling the navigate_to_pose goal, we would expect the controller_server to properly interrupt its follow_path action.

[controller_server-8] [INFO]: Goal was canceled. Stopping the robot.
[controller_server-8] [WARN]: [follow_path] [ActionServer] Client requested to cancel the goal. Cancelling.

Actual behavior

The bt_navigator gets a Failed to get result for compute_path_to_pose in node halt! or a Failed to get result for follow_path in node halt! error, and the controller_server keeps on trying to follow the last received path.

[bt_navigator-12] [INFO]: Starting path following
[controller_server-8] [INFO]: Received a goal, begin computing control effort.
[planner_server-9] [WARN]: Map update loop missed its desired rate of 5.0000Hz. Current loop rate is 0.9686Hz
[bt_navigator-12] [WARN]: Behavior tree engine cycle is slower than expected: 114 ms (timeout is 50 ms)
[bt_navigator-12] [WARN]: Time required to tick the tree is 114 ms
[bt_navigator-12] [WARN]: Timed out while waiting for action server to acknowledge goal request for follow_path
[bt_navigator-12] [INFO]: Pausing path following due to slow path computation (>500ms)
[planner_server-9] [INFO]: Attempting to find a path from (-12.62, 3.38) to (-54.58, 11.29).
[planner_server-9] [INFO]: SmacPlannerHybrid: Start pose in world: (-12.624103, 3.381263), in costmap (452, 103)
[planner_server-9] [INFO]: SmacPlannerHybrid: Goal pose in world: (-54.581326, 11.286437), in costmap (32, 182)
[app-21] [INFO]: Request to cancel goals on /navigate_to_pose was accepted by server, waiting for result
[controller_server-8] [WARN]: Control loop missed its desired rate of 10.0000 Hz. Current loop rate is 9.2498 Hz.
[bt_navigator-12] [ERROR]: Failed to get result for compute_path_to_pose in node halt!
[bt_navigator-12] [INFO]: Goal canceled
[bt_navigator-12] [WARN]: [navigate_to_pose] [ActionServer] Client requested to cancel the goal. Cancelling.
[controller_server-8] [WARN]: Control loop missed its desired rate of 10.0000 Hz. Current loop rate is 9.5395 Hz.
[controller_server-8] [WARN]: Control loop missed its desired rate of 10.0000 Hz. Current loop rate is 10.9403 Hz.

Additional information

In the case reported above, before that new instance of the Starting path following, in a previous attempt in the same instance of the navigate_to_pose behavior tree, the bt_navigator already encountered the following error

[bt_navigator-12] [ERROR]: Failed to get result for follow_path in node halt!

what could indicate that the bt_navigator might somehow be desynchronized with the controller_server?

Also, the map is quite big, and the planner_server takes some time to compute a path. The goal canceling occurred while the compute_path_to_pose action was still running.

@SteveMacenski
Copy link
Member

That occurs here [1], so the server timeout doesn't cancel the action in time. If you were to set this to say - 1 second - does that remove your issue? Also, what so you have your server_timeout set to - that could be unrealistically short.

It looks like this condition was added in [2] and may need to (generally speaking) have a longer timeout than the server timeout proportional to steady-state BT node ticking. If we're halting / exiting out a BT, we may find it better to have the rate exceeded a bit in exchange for a more realistic chance of those actions being canceled successfully.

Also, the map is quite big, and the planner_server takes some time to compute a path. The goal canceling occurred while the compute_path_to_pose action was still running.

This is definitely why. If you have a server timeout of 10ms and you've just started a planning request that is 100ms in length, then it won't always cancel in time. We added a cancel checker [3] in the Planner API so that we query in the planning loop if a cancel is requested to be more responsive, but that is not backported to Iron and you'll need to upgrade to Jazzy to receive that. That actually by itself might completely fix your issue.

[1]

if (callback_group_executor_.spin_until_future_complete(future_result, server_timeout_) !=

[2] bfb4506

[3] https://github.com/ros-navigation/navigation2/blob/main/nav2_navfn_planner/src/navfn_planner.cpp#L137

@SteveMacenski
Copy link
Member

@milidam have you had a chance to look into these items? I'd love to get feedback on the precise nature of it to write up a solution for you to test

@milidam
Copy link
Contributor Author

milidam commented Sep 10, 2024

Hi @SteveMacenski,
Our server_timeout is currently set to 100ms.
I haven't had time yet to try to reproduce the issue with an increased timeout.

Our planner's max_planning_time is 30.0s.
Does that also increase the risk of not being able to actually cancel it?

Note that migration to Jazzy is planned, but not yet started.

@SteveMacenski
Copy link
Member

SteveMacenski commented Sep 10, 2024

I haven't had time yet to try to reproduce the issue with an increased timeout.

Do you know when you would be able to? If you increased that to 500ms or 1s and it went away for Follow Path, that could be something we could replace that spin with since that cancel should only happen when halting an entire tree on task exit (which keeping the 10ms tick rate is unnecessary)

Does that also increase the risk of not being able to actually cancel it?

If you haven't upgraded, I think so. This is why I want you to check for me :-) But that shouldn't impact the follow path action, just computing path to pose -- since canceling path planning that is mid-execution is only supported in Jazzy and newer.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 11, 2024

@milidam any updates? It sounds like the issue is fixed in Jazzy and Iron is now EOL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants