Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goals get randomly aborted #4778

Open
patham9 opened this issue Dec 3, 2024 · 6 comments
Open

Goals get randomly aborted #4778

patham9 opened this issue Dec 3, 2024 · 6 comments
Labels
question Further information is requested

Comments

@patham9
Copy link

patham9 commented Dec 3, 2024

Nav2 is not reliable and needs to be re-invoked multiple times before it manages to get to a goal location.
Specifically, it aborts the goal randomly. Using the in-built Turtlebot4 simulation I would have expected a reliability that matches the ROS1 navigation stack.

Required Info:

  • Operating System:
    • Ubuntu 24
  • ROS2 Version:
    • ROS2 Jazzy
  • Version or commit hash: ros-jazzy-navigation2 1.3.2-1noble.20240922.123501
  • DDS implementation: Fast-RTPS

Steps to reproduce issue

Normal Turtlebot 4 bringup.

Expected behavior

No random aborts.

Actual behavior

Random goal abortion

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 3, 2024

Please provide logs, a situation description, what you've tried, and preferably something we can reproduce. Currently, this description doesn't provide what's needed to be actionable. I have no such issues in my Jazzy docker container with Nav2 with the current binaries. We have many system tests that run reliably and hundreds of companies using Nav2 in deployed applications, so I believe it is generally reliable 😉

But, if we can find where the disconnect is for your application, then we can work to resolve it

@SteveMacenski SteveMacenski added the question Further information is requested label Dec 3, 2024
@patham9
Copy link
Author

patham9 commented Dec 3, 2024

Thank you for your quick response! Yes that's what I also guess due to its widespread use, surprisingly it just didn't work reliably for me personally yet. Maybe there is a violated timing assumption, as when the operating system is also running compute-heavy machine learning models, delay spikes can sometimes occur.
Just to confirm, your docker also runs binary version 1.3.2-1noble.20240922.123501?
Which logs would you prefer to see?

@SteveMacenski
Copy link
Member

Maybe there is a violated timing assumption, as when the operating system is also running compute-heavy machine learning models, delay spikes can sometimes occur.

If those spikes also trash your networking bandwidth that could explain alot. That would then be less to do with nav2 and more to do with the resource allocation in your system. But what do your logs say -- just dump me the logs from your Nav2 stdout terminal (or isolate it to just give me the Nav2 bringup parts of it). I want to see "why" its failing, whether its network related, algorithmically not working well, tuning, etc.

Just to confirm, your docker also runs binary version 1.3.2-1noble.20240922.123501?

I see 1.3.3-1noble.20241115.195529

  • sudo docker run -it --net=host --privileged -v .:/root/jazzy_ws --volume="${XAUTHORITY}:/root/.Xauthority" --env="DISPLAY=$DISPLAY" -v="/tmp/.gazebo/:/root/.gazebo/" -v /tmp/.X11-unix:/tmp/.X11-unix:rw --shm-size=1000mb osrf/ros:jazzy-desktop-full
  • apt update
  • rosdep init
  • rosdep update
  • apt install ros-jazzy-nav2-bringup ros-jazzy-navigation2

Then run the normal demos, all seems fine to me and the system tests.

@SteveMacenski
Copy link
Member

@patham9 any update?

@patham9
Copy link
Author

patham9 commented Dec 11, 2024

I apologize that it took me so long, thank you for the reminder.
I hope the event log is useful: https://gist.github.com/patham9/276120eaf098bb2eeec274cff01a056f
particularly the first error seems relevant: "Failed to create a plan from potential when a legal potential was found. This shouldn't happen.".
Please let me know if this is helpful, I am happy to help investigating the issue further as I now found it also appears for a colleague of mine.

@SteveMacenski
Copy link
Member

That is very odd. I know you commented on the ticket with the same error (#4655). I've never seen this happen personally in all my years and until that was filed, no one's said they've had a problem either in 10+ years.

The only 2 change I see made anywhere near there is (1) adding in cancel goal checkers and terminating if the planning time is exceeded. Are you seeing this happen when the planning time you set in your config file is exceeded by the planner? I could tentatively see the error state that it introduces not being properly handled in some case that we can plug the hole on. Or (2) we removed checking if the start cell is occupied, so it could be that its not getting a path because you're in occupied space -- are you seeing this happen in that case?

Please also provide your config file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants