Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(timing violation monitor): add timing violation monitor node #2983

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
96c4756
add tilde_msg
xygyo77 Dec 23, 2022
e242ae6
add tilde_msg
xygyo77 Dec 23, 2022
a8f5787
rm tilde_msg & add tilde_msg to package.xml(ndt_scan_matcher)
xygyo77 Dec 23, 2022
9dd277e
Add tier4_timing_violation_monitor_utils directory
y-okumura-isp Feb 9, 2023
2f69bd8
Add tier4_timing_violation_monitor_utils skeleton
y-okumura-isp Feb 9, 2023
8a6ddbf
Pass test
y-okumura-isp Feb 9, 2023
85886e1
Introduce MessageConsumptionNotifier to NDTScanMatcher
y-okumura-isp Feb 9, 2023
1fcebe6
use Message Tracking Tag instead of builtin Time
nabetetsu Feb 10, 2023
3522953
Delete LISENCE
y-okumura-isp Feb 14, 2023
93d591d
clean up folder structure
nabetetsu Feb 15, 2023
758ca2c
clean pre-commit failures
nabetetsu Feb 16, 2023
8ea26fc
chore(timing_violation_monitor): update copyright AD
nabetetsu Feb 17, 2023
a673138
chore(ndt_scan_matcher): avoid using abbreviation
nabetetsu Feb 17, 2023
9ae582f
chore(timing_violation_monitor): fix "Tier IV" to "TIER IV"
nabetetsu Feb 17, 2023
b4e9f39
fix: update package parameters
nabetetsu Feb 20, 2023
c3048da
feat/add timing violation monitor node (moved from tilde_lite reposit…
nabetetsu Feb 22, 2023
d828b2e
refactor: delete logs on stdout
nabetetsu Feb 27, 2023
5df1643
refactor: reduce abbreviations
nabetetsu Feb 27, 2023
1481c9f
doc(timing violation monitor): update doc to adapt to latest implemen…
nabetetsu Feb 27, 2023
59a28a5
refactor(timing violation monitor): delete debug code
nabetetsu Feb 28, 2023
576f220
chore(timing violation monitor): update copyright
nabetetsu Feb 28, 2023
21eb821
refactor(timing violation monitor): use autoware_package() in CMakeLi…
nabetetsu Feb 28, 2023
36cf2f9
chore(timing violatoin monitor): fix pre-commit fails
nabetetsu Feb 28, 2023
1ebf4ce
Update system/timing_violation_monitor/README.md
nabetetsu Mar 10, 2023
17f4562
refactor: delete calling timing violation monitor function to separat…
nabetetsu Mar 10, 2023
d5bc51c
refactor: clean up build tool dependency
nabetetsu Mar 10, 2023
23a715c
style(pre-commit): autofix
pre-commit-ci[bot] Mar 10, 2023
4cfbc2d
feat(timing violation monitor): update servirity to warn
nabetetsu Mar 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions system/timing_violation_monitor/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
cmake_minimum_required(VERSION 3.8)
project(timing_violation_monitor)

find_package(autoware_cmake REQUIRED)
autoware_package()

ament_auto_add_executable(${PROJECT_NAME}
src/timing_violation_monitor_core.cpp
src/timing_violation_monitor_debug.cpp
src/timing_violation_monitor_node.cpp
)

ament_auto_add_library(timing_violation_monitor_utils SHARED
src/utils/message_consumption_notifier.cpp
include/timing_violation_monitor_utils/message_consumption_notifier.hpp)

if(BUILD_TESTING)
find_package(ament_cmake_gtest REQUIRED)
find_package(ament_lint_auto REQUIRED)
# the following line skips the linter which checks for copyrights
# comment the line when a copyright and license is added to all source files
set(ament_cmake_copyright_FOUND TRUE)
# the following line skips cpplint (only works in a git repo)
# comment the line when this package is in a git repo and when
# a copyright and license is added to all source files
set(ament_cmake_cpplint_FOUND TRUE)
ament_lint_auto_find_test_dependencies()

ament_add_gtest(test_message_consumption_notifier
test/test_message_consumption_notifier.cpp)
target_link_libraries(test_message_consumption_notifier
timing_violation_monitor_utils)
endif()

ament_auto_package(INSTALL_TO_SHARE
launch
config
)
164 changes: 164 additions & 0 deletions system/timing_violation_monitor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Timing violation monitor framework

A lightweight framework for timing violation detection

## Brief Description

This package serves the framework to detect timing violations. The timing violation detection framework measures the response time on selected paths and checks if it is less than expected. Here, a path means a series processing across nodes.

The following figure illustrates what the framework measures. The response time is the elapsed time from the start of `Node S` to the end of `Node E`, represented by the blue dotted lines `<---->`.

![Response time](./docs/images/response_time.png "Response time")

## Design

### Design overview

The initial design of the timing violation framework is intended to be deployed as a small start, and is designed with the following three points.

- Minimize changes to Autoware user code as much as possible
- Have the flexibility to add or delete paths
- Run in a different Linux process than the monitored entity in order to keep the monitored entity stable
- Avoid delays in the execution of the monitored entity

To fulfill the policy, the framework utilizes existing timestamp in header of topic messages.

![Basic idea](./docs/images/timing_violation_detection_basic_design.png)

The framework assumes that the timestamp, represented `ts: t0` in the figure, is not changed while it goes through the path.

The timing violation monitor receives the topic message, `/topic_e`, sent from the end node, `Node E`. The monitor check occurrence of timing violation with the topic message.

Autoware has several paths that do not change timestamps, such as Sensing, Perception, or Localization. If you apply the framework to such paths, you only need to add the timing violation monitor and don't need to change any user code.

### Limitation

As mentioned above, the framework assumes that paths does not change timestamps while the timestamp goes through the path. If the value of timestamp is replaced by another in the path, the framework cannot be applied to the path.

That limitation allow users to apply the framework to not all paths. If you want to apply any paths, [TILDE](https://github.com/tier4/TILDE) might be a good candidate.

### Other materials

The fundamental of the timing violation framework is described in [the page](./docs/design_timing_violation_detection.md). That page describes the requirements and high-level design policy of the timing violation framework.

[The other page](./docs/internal_design.md) shows the design which is the basis of the implemented timing violation monitor. This might be helpful if you want to apply the framework to another node.

## Usage

If you want to monitor a new path with the framework, you need to execute the following steps.

1. Define the path and its corresponding topic
2. Add `MessageTrackingNotifier` to the end of path if necessary
3. Add new path definition to the configuration file

### Define the path

The first step is to decide which path to monitor. As mentioned in [limitation](#limitation) section, you have to choose the path where the input timestamp is not changed at all.

In the step, you have to check if the framework is applied to target path.

### Add `MessageTrackingNotifier` to the end of path

You can skip this step if the end node publishes message topic and this topic is specified as the end of the path. If so, skip this step and specify the information for the last message topic to be output in the configuration file in the next step.

The second step is to add `MessageTrackingNotifier` which is mentioned as the add-on in [the design page](./docs/internal_design.md). `MessageTrackingNotifier` notifies when the topic data is consumed. It is needed if you apply the framework to the path which does not publish any message topic.

If you want to add `MessageTrackingNotifier` to a path, you need to change files as below.

1. Add `timing_violation_monitor` to your package.xml

```xml
<depend>timing_violation_monitor</depend>
```

2. Change the header file of end node as below

```cpp
// Statements which include header files.
#include <rclcpp/rclcpp.hpp>
#include <timing_violation_monitor_utils/message_consumption_notifier.hpp> // *** Add this statements ***/

class EndNode : public rclcpp::Node {

public:
EndNode(); // constructor

private:
// callback
void main_method();
void receive_method();
//...

// subscribers and publishers
rclcpp::Subscription<PointCloud2>::SharedPtr sub_;
rclcpp::Publisher<PointCloud2>::SharedPtr pub_;

std::unique_ptr<timing_violation_monitor_utils::MessageConsumptionNotifier> notifier_; // *** Add this statements ***/
};
```

3. Add notifier execution on the end node

```cpp
// Constructor.
EndNode::EndNode() {
sub_ = this->create_subscription<PointCloud2>(...);
pub_ = this->create_publisher<PointCloud2>(...);

notifier_ = std::make_unique<timing_violation_monitor_utils::MessageConsumptionNotifier>(this, "notifier_topic_message_name", 10); // *** Add this statements ***/
}
// Other statements ....

// Definition of main_method().
void EndNode::main_method() {
// user code.
// ...

// when target topic is consumed.
consume_message(message);
notifier->notify(message.header.stamp); // *** Add this statements ***/

// user code.
// ...
}
```

### Add new path definition to the configuration file

The timing violation monitor has a configuration file to know which path and topic to monitor. In this step, users have to write configuration file.

The sample configuration file is shown as below. If you add a new path, append the items in `target_paths`.

```yaml
ros__parameters:
diag_period_sec: 5.0 # frequency of report
target_paths:
ekf-to-ndt: # path name. Can be set to any name.
topic: /localization/pose_estimator/for_tilde_interpolator_mtt # topic name published by MessageConsumptionNotifier.
message_type: tilde_msg/msg/MessageTrackingTag # message type
severity: warn # severity
period: 100.0 # execution frequency of path
deadline: 200.0 # deadline of response time
violation_count_threshold: 5 # threshold to judge warn or not.

pointcloudPreprocessor-to-ndt: # path name
topic: /localization/pose_estimator/pose_with_covariance # topic name
message_type: geometry_msgs/msg/PoseWithCovarianceStamped # message type
severity: error # severity
period: 100.0 # execution frequency of path
deadline: 150.0 # deadline of response time
violation_count_threshold: 1 # threshold to judge error or not.
```

In this sample, path `ekf-to-ndt` uses MessageConsumptionNotifier to notify when the topic data is consumed. On the other hand, path `pointcloudPreprocessor-to-ndt` need no code changes.

## Output Message

The timing violation monitor transmits the topic message whose name is `/diagnostics`. `/diagnostics` is the common topic message served by ROS 2 [`diagnostic_updater`](https://github.com/ros/diagnostics). The format of `/diagnostics` is defined by `diagnostic_updater` also.

<!-- prettier-ignore-start -->

!!! Note
What this section describes is tentative.

<!-- prettier-ignore-end -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
/**:
ros__parameters:
diag_period_sec: 5.0 # frequency of report
target_paths:
ekf-to-ndt: # path name
topic: /localization/pose_estimator/for_tilde_interpolator_mtt # topic name
message_type: tier4_system_msgs/msg/MessageTrackingTag # message type
severity: warn # severity
period: 100.0 # execution frequency of path
deadline: 200.0 # deadline of response time
violation_count_threshold: 2 # threshold to judge warn or not.

pointcloudPreprocessor-to-ndt: # path name
topic: /localization/pose_estimator/pose_with_covariance # topic name
message_type: geometry_msgs/msg/PoseWithCovarianceStamped # message type
severity: warn # severity
period: 100.0 # execution frequency of path
deadline: 150.0 # deadline of response time
violation_count_threshold: 1 # threshold to judge error or not.
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Design Overview

**NOTE: It has not been implemented yet.**

## Description

Autonomous driving is a real-time system. The **Topic State Monitor** checks the real-time constraints of the system. To achieve the real-time constraints, a certain monitoring functionality is required to detect when nodes are performing below tolerance and react appropriately.

![Introduction of Real-Time](images/introduction-realtime.drawio.svg)

This figure shows the high-level architecture of Autoware.
An autonomous driving system consists of a set of functions.
Each function would be implemented as a ROS node or a set of ROS nodes.
A set of functions would have a real-time constraint.

| [Path](<https://en.wikipedia.org/wiki/Path_(graph_theory)>) | Real-Time Constraint |
| --------------------------------------------------------------------------- | ------------------------- |
| Sensors > Sensing > Perception > Planning > Control > Vehicle I/F > Vehicle | Brake reaction distance |
| Planning > Control > Vehicle I/F > Vehicle | Braking distance accuracy |
| Vehicle > Vehicle I/F > Control > Vehicle I/F > Vehicle | Control accuracy |
| Sensors > Sensing > Localization | Localization accuracy |

This table shows examples of real-time constraints.
Each column shows a real-time constraint corresponding to a path in the architecture graph.

## Formulation

![Formulation of Real-Time](images/formulation-realtime.drawio.svg)

[A path in a graph is a finite sequence of edges which joins a sequence of vertices which are all distinct (and since the vertices are distinct, so are the edges)](<https://en.wikipedia.org/wiki/Path_(graph_theory)>).
The words `trail` and `walk` are not used in this context since each node and edge would have only one role in a path
even if they appear multiple times in the path.

A path is an unit that has a real-time constraint.
A system has a set of paths.

Each `Path_i` has a set of nodes. The starting point of `Path_i` is `Node S`. The end point of `Path_i` is `Node E`.
`Path_i` would have the other nodes `Node N` between `Node S` and `Node E`.
`Node S` and `Node E` would be a same node if a function that has a real-time
constraint is implemented in one node.

`Node S` is invoked (released) at every `p_i` period.
The release time of `j`-th job of `Path_i` is `r_{i,j}`. `Node S` would be immediately executed at the release time if `Node S` is immediately scheduled at the release time by the scheduler of operating system.
`Node S` would not be executed immediately at the release if other nodes are selected to be scheduled on CPUs.

`Node N` and `Node E` can be executed once each node subscribes a depending topic in `Path_i`, i.e., the nodes of `Path_i` is executed sequentially.

A latency of a `j`-th job of `Path_i` is presented by `l_{i,j}`, which is the latency between `r_{i,j}` and the completion of `j`-th job of `Node E`.
`l_{i,j}` may be larger than `p_i`.

The relative deadline of `Path_i` is `d_i`.
If `l_{i,j} <= d_i` for all `j`, `Path_i` meets its real-time constraint.

The period `p_i` and the relative deadline `d_i` does not have the parameter `j`
since this formulation assumes that `p_i` and `d_i` are static parameters,
which are not changed at run-time.

## Requirements

| # | System Requirement | Related Component |
| ------------- | ---------------------------------------------------------------------- | ------------------- |
| Requirement 1 | The system shall detect deadline misses (i.e., `l_{i,j} > d_i`). | Topic State Monitor |
| Requirement 2 | The system can trigger some actions once deadline misses are detected. | Emergency Handler |

## Limitation

- The relative deadline `d_i` of `Path_i` has static value. `d_i` does not change at runtime with change in for example the velocity of ego-vehicle. This limitation comes from the assumption that `d_i` shall be the minimum value that can be safe in a given ODD.
- Not interrupt callback or node execution
- Not control over error handling outside of autoware (e.g. normal detection using heart-beats on the hardware side)

## Design

The design of Topic State Monitor focuses on Requirement 1. Requirement 2 is designed in [Emergency Handler](https://github.com/autowarefoundation/autoware.universe/tree/main/system/emergency_handler).

![Introduction of Real-Time](images/design-realtime.drawio.svg)

Topic State Monitor subscribes the topic published by `Node E` in `Path_i`.
Topic State Monitor detects a deadline miss of the `j`-th job of `P_i`
if Topic State Monitor does not subscribe the topic by `r_{i,j} + d_i`.

Since Topic State Monitor would not subscribe the `j`-th topic of `Node E`
before the deadline, Topic State Monitor shall detect deadline misses by
time-out.
Topic State Monitor saves `r_{i,j-1}` in the initialization phase to calculate
the absolute deadline of `j`-th job in the detection phase.
The initialization phase and the detection phase are executed at every `j`.

Existing timestamp value in each topic may not fit for Dead-line miss detection.
Since topic's timestamp may be passed to the next node unchanged from the previous value, or it may be overwritten in current topic. It depends on each node's specification.
To determine using existing timestamp, it is needed to check the specifications of all the nodes in the Path.
If it is difficult to use, add a new field for Dead-line miss detection.

### Early detection

If user wants to detect a dead-line miss in the middle of a Path, user specify it by defining multiple Path as below.

- ex) Whole `Path_i` is defined as: **`Node S` -> `Node N1` -> `Node N2` -> `Node E`**
- early detect on Node S: define additional Path as `Node S`
- early detect on Node N2: define additional Path as `Node S` -> `Node N1` -> `Node N2`
- detect on Node E: `Node S` -> `Node N1` -> `Node N2` -> `Node E` (same as `Path_i`)

## Initialization Phase

1. `Node S` sets `r_{i,j-1}` into a topic, and publishes the topic into `Path_i`.
2. `Node N` and `Node E` relays `r_{i,j-1}` on topics without modification.
3. Topic State Monitor saves `r_{i,j-1}` for the Detection Phase.

## Detection Phase

1. Topic State Monitor calculates `r_{i,j} = r{i,j-1} + p_i`.
2. Topic State Monitor calculates the absolute deadline = `r_{i,j} + d_i`.
3. If (1) Topic State Monitor does not subscribe the topic in `Path_i` and
(2) the current time exceeds the absolute deadline, Topic State Monitor publishes the deadline miss event via `/diagnostics` topic.

## Implementation Challenge

- How Topic State Monitor handles `j = 1`
- `r_{i,j-1}` is not defined at `j = 1`. `j = 1` is the system initialization. Topic State Monitor could use `if`-guard or ROS2 node lifecycle to wait the system initialization.
- How `Node S` gets `r_{i,j-1}` in the initialization phase.
- The design assumes that `Node S` sets `r_{i,j-1}` and does not set the start time of `Node S` into the topic. If the period of `Node S` is triggered by for example sensor hardware, `r_{i,j-1}` should consider the latency of the sensor.

## Another Design

This section describes another design that is not employed.

| Design | Reason why not employed |
| --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| Topic State Monitor gets `r_{i,1}`, and calculates `r_{i,j} = r_{r,1} + p_i * (j - 1)`. | `r_{i,j}` might slip forward or backward gradually if `p_i` stored in Topic State Monitor is not accurate. |
Loading