Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely shutdown dynamic_param callback threads instead of only dyn_param_handler_.reset() #4496

Closed
GoesM opened this issue Jun 26, 2024 · 5 comments · Fixed by #4521
Closed

Comments

@GoesM
Copy link
Contributor

GoesM commented Jun 26, 2024

Bug report

Required Info:

  • Operating System:
    • ubuntu22.04
  • ROS2 Version:
    • humble & Iron
  • Version or commit hash:
    • the latest
  • DDS implementation:
    • the defaulted

Steps to reproduce issue

  1. Launch the navigation2 as following steps:
#!/bin/bash
export ASAN_OPTIONS=halt_on_error=0:new_delete_type_mismatch=0:detect_leaks=0:log_pah=asan
source install/setup.bash
export TURTLEBOT3_MODEL=waffle
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:/opt/ros/humble/share/turtlebot3_gazebo/models
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=True use_rviz:=False use_composition:=False 
  1. sending request for change any dynamic_parameter of nav2_amcl, like following one:
ros2 param set amcl scan_topic scan
  1. deacitvate and cleanup nav2_amcl, like by press Ctrl+C

Expected behavior

no UAF bug

Actual behavior

UAF bug occured during on_cleanup() of nav2_amcl, the asan report is as following;

=================================================================
==354519==ERROR: AddressSanitizer: heap-use-after-free on address 0x617000096c30 at pc 0x7f60ad65b9f7 bp 0x7ffee9e82c80 sp 0x7ffee9e82c78
READ of size 8 at 0x617000096c30 thread T0
    #0 0x7f60ad4b1c00 in nav2_amcl::AmclNode::dynamicParametersCallback(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x174c00) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #1 0x7f60ad5628c6 in rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > std::__invoke_impl<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(std::__invoke_memfun_deref, rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x2258c6) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #2 0x7f60ad562542 in std::_Function_handler<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&), std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)> >::_M_invoke(std::_Any_data const&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x225542) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #3 0x7f60ae062401  (/opt/ros/humble/lib/librclcpp.so+0x10d401) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #4 0x7f60ae066f31 in rclcpp::node_interfaces::NodeParameters::set_parameters_atomically(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/opt/ros/humble/lib/librclcpp.so+0x111f31) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #5 0x7f60ae0e016c  (/opt/ros/humble/lib/librclcpp.so+0x18b16c) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #6 0x7f60ae09d757  (/opt/ros/humble/lib/librclcpp.so+0x148757) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #7 0x7f60ae0abcfe  (/opt/ros/humble/lib/librclcpp.so+0x156cfe) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #8 0x7f60ae03e375  (/opt/ros/humble/lib/librclcpp.so+0xe9375) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #9 0x7f60ae03bd59 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) (/opt/ros/humble/lib/librclcpp.so+0xe6d59) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #10 0x7f60ae03c0c5 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (/opt/ros/humble/lib/librclcpp.so+0xe70c5) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #11 0x7f60ae04397f in rclcpp::executors::SingleThreadedExecutor::spin() (/opt/ros/humble/lib/librclcpp.so+0xee97f) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #12 0x7f60ae043b94 in rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) (/opt/ros/humble/lib/librclcpp.so+0xeeb94) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #13 0x55de5cf30a67 in main (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xe1a67) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #14 0x7f60ac9abd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #15 0x7f60ac9abe3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #16 0x55de5ce70424 in _start (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0x21424) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)

0x617000096c30 is located 48 bytes inside of 704-byte region [0x617000096c00,0x617000096ec0)
freed by thread T1 here:
    #0 0x55de5cf2e89d in operator delete(void*) (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xdf89d) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #1 0x7f60ad4b5c9c in nav2_amcl::AmclNode::on_cleanup(rclcpp_lifecycle::State const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x178c9c) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #2 0x7f60adf388ec  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x288ec) (BuildId: 97f6428dc1ee45fd402b522b3b8e6b4fcfeabe76)

previously allocated by thread T0 here:
    #0 0x55de5cf2e03d in operator new(unsigned long) (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xdf03d) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #1 0x7f60ad49e792 in nav2_amcl::AmclNode::initMessageFilters() (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x161792) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #2 0x7f60ad492985 in nav2_amcl::AmclNode::on_configure(rclcpp_lifecycle::State const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x155985) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #3 0x7f60adf388ec  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x288ec) (BuildId: 97f6428dc1ee45fd402b522b3b8e6b4fcfeabe76)

Thread T1 created by T0 here:
    #0 0x55de5cedc6ec in __interceptor_pthread_create (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0x8d6ec) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #1 0x7f60acca7328 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc328) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #2 0x7f60ae18607f  (/opt/ros/humble/lib/librcutils.so+0xa07f) (BuildId: cf5ab83f9f550b1f79433195029ab96f67a27c78)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x31e9f6) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058) in message_filters::Signal1<sensor_msgs::msg::LaserScan_<std::allocator<void> > >::removeCallback(std::shared_ptr<message_filters::CallbackHelper1<sensor_msgs::msg::LaserScan_<std::allocator<void> > > > const&)
Shadow bytes around the buggy address:
  0x0c2e8000ad30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad60: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000ad70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2e8000ad80: fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ada0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000adb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000adc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000add0: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==354519==ABORTING

Additional information


The cause of the UAF bug:

The callback thread dynamicParametersCallback() of dynamic_handler_ is still running though nav2_amcl has been executing on_cleanup().
And on_cleanup() would release many resources of nav2_amcl but the callback thread would access those released resources, so that the UAF bug occurs.

@GoesM
Copy link
Contributor Author

GoesM commented Jun 26, 2024

However, I noticed that during deactivate(), dyn_handler_.reset() is executed to try to stop the dynamic parameter callback thread:

nav2_util::CallbackReturn
AmclNode::on_deactivate(const rclcpp_lifecycle::State & /*state*/)
{
RCLCPP_INFO(get_logger(), "Deactivating");
active_ = false;
// Lifecycle publishers must be explicitly deactivated
pose_pub_->on_deactivate();
particle_cloud_pub_->on_deactivate();
// reset dynamic parameter handler
dyn_params_handler_.reset();

But it seems that it doesn't actually achieve this, so I conducted the following experiment to verify:

Experiment:

Proving that the callback function continues to execute even after dyn_handler_.reset() and executor_thread_.reset():

Insert delay and log:

/**
  * @brief Callback executed when a parameter change is detected
  * @param event ParameterEvent message
  */
rcl_interfaces::msg::SetParametersResult
AmclNode::dynamicParametersCallback(
  std::vector<rclcpp::Parameter> parameters)
{
  std::lock_guard<std::recursive_mutex> cfl(mutex_);
  rcl_interfaces::msg::SetParametersResult result;
// insert
RCLCPP_INFO(get_logger(), "-----------------------dynamicParamCallback Called!");
std::this_thread::sleep_for(std::chrono::seconds(5));
RCLCPP_INFO(get_logger(), "-----------------------dynamicParamCallback still executing!");
//
...
...
nav2_util::CallbackReturn
AmclNode::on_cleanup(const rclcpp_lifecycle::State & /*state*/)
{
  RCLCPP_INFO(get_logger(), "Cleaning up");

  executor_thread_.reset();
// insert 
RCLCPP_INFO(get_logger(), "-------------------After executor_thread_.reset()");
//

Then, run navigation2 and publish a dynamic parameter change request, such as ros2 param set amcl min_particles 500. Immediately after sending the request, press Ctrl+C to let navigation2 sequentially execute the on_deactivate() and on_cleanup() phases. We will see the following logs:

...
...
[amcl-5] [INFO] [1719331177.062007626] [amcl]: -----------------------dynamicParamCallback Called!
...
...
[amcl-5] [INFO] [1719331179.520602552] [rclcpp]: signal_handler(signum=2)
[amcl-5] [INFO] [1719331179.520758931] [amcl]: Running Nav2 LifecycleNode rcl preshutdown (amcl)
[amcl-5] [INFO] [1719331179.520976194] [amcl]: Deactivating
[amcl-5] [INFO] [1719331179.521025282] [amcl]: Destroying bond (amcl) to lifecycle manager.
...
...
[amcl-5] [INFO] [1719331179.623240551] [amcl]: Cleaning up
[amcl-5] [INFO] [1719331179.623584548] [amcl]: -------------------After executor_thread_.reset()
[amcl-5] [INFO] [1719331179.636316062] [amcl]: Destroying bond (amcl) to lifecycle manager.
...
...
[amcl-5] [INFO] [1719331182.062343761] [amcl]: -----------------------dynamicParamCallback still executing!
[amcl-5] [INFO] [1719331182.074333144] [amcl]: Destroying
[amcl-5] [ERROR] [1719331182.074424004] [amcl]: Unable to start transition 5 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at ./src/rcl/publisher.c:389, at ./src/rcl_lifecycle.c:368
[amcl-5] [WARN] [1719331182.074453949] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(shuttingdown

It proves that the callback thread would be working even after dyn_handler_.reset() and executor_thread_.reset().

suggestion

SO perhaps we need to update the way to completely shutdown dyn_handler_ ?

@SteveMacenski
Copy link
Member

Perhaps we need to https://github.com/ros2/rclcpp/blob/8230d15ef7ffa9e824d7e0f9c623537ae331dc73/rclcpp/src/rclcpp/node.cpp#L475-L479 remove_on_set_parameters_callback rather than just resetting the std::function. That would apply to your other ticket (and all other Nav2 servers). If so, then you could close that other ticket and rename this one about that general problem. It should be pretty fast to batch out, there's about ~30 uses of the dynamic parameter callback, but it should all just be copy-paste adding 1 line before the reset:

node->remove_on_set_parameters_callback( dyn_handler_);
 dyn_handler_.reset();

@GoesM GoesM changed the title [nav2_amcl] UAF bug caused by dynamicParametersCallback executing even after dyn_handler_.reset() Completely shutdown dynamic_param callback threads instead of only dyn_param_handler_.reset() Jun 27, 2024
@GoesM
Copy link
Contributor Author

GoesM commented Jun 27, 2024

Sure! My future work will follow these steps:

  • Conduct experiments to ensure node->remove_on_set_parameters_callback(dyn_handler_); could shutdown the callback threads properly.
  • Review all other instances where the dynamic parameter callback is used and correct them as necessary.

Perhaps I'm not good at renaming issues, so feel free to modify it if it seems inappropriate. 😜

@SteveMacenski
Copy link
Member

That's perfect!

@SteveMacenski
Copy link
Member

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants