Skip to content

Commit

Permalink
fix(system_monitor): prevent nethogs from monitoring all networks due…
Browse files Browse the repository at this point in the history
… to high CPU load (autowarefoundation#2474)

* fix(system_monitor): prevent nethogs from monitoring all networks due to high CPU load

Signed-off-by: ito-san <[email protected]>

* ci(pre-commit): autofix

* fix(system_monitor): fix include guards

Signed-off-by: ito-san <[email protected]>

* fix(system_monitor): fix build error

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): change lower camel case to snake case

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): fix clang-tidy errors and warnings

Signed-off-by: ito-san <[email protected]>

* ci(pre-commit): autofix

* fix(net_monitor): fix clang-tidy warnings

Signed-off-by: ito-san <[email protected]>

* ci(pre-commit): autofix

* fix(net_monitor: fix clang-tidy warnings)

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): fix clang-tidy warnings

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): change C-style socket to boost::asio

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): fix clang-tidy warnings

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): fix clang-tidy warnings

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): first refactoring

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): refactoring

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): fix clang-tidy errors

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): update README

Signed-off-by: ito-san <[email protected]>

* fix(net_monitor): add lock guard to protect variable

Signed-off-by: ito-san <[email protected]>

Signed-off-by: ito-san <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
2 people authored and TomohitoAndo committed Dec 23, 2022
1 parent 5b979a2 commit 56f9be9
Show file tree
Hide file tree
Showing 12 changed files with 1,252 additions and 908 deletions.
3 changes: 2 additions & 1 deletion system/system_monitor/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ ament_auto_add_executable(hdd_reader
)

ament_auto_add_executable(traffic_reader
reader/traffic_reader/traffic_reader.cpp
reader/traffic_reader/traffic_reader_main.cpp
reader/traffic_reader/traffic_reader_service.cpp
)

find_library(NL3 nl-3 REQUIRED)
Expand Down
3 changes: 2 additions & 1 deletion system/system_monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ Every topic is published in 1 minute interval.
| | HDD WriteIOPS |||| |
| | HDD Connection |||| |
| Memory Monitor | Memory Usage |||| |
| Net Monitor | Network Usage |||| |
| Net Monitor | Network Connection |||| |
| | Network Usage |||| Notification of usage only, normally error not generated. |
| | Network CRC Error |||| Warning occurs when the number of CRC errors in the period reaches the threshold value. The number of CRC errors that occur is the same as the value that can be confirmed with the ip command. |
| | IP Packet Reassembles Failed |||| |
| NTP Monitor | NTP Offset |||| |
Expand Down
1 change: 0 additions & 1 deletion system/system_monitor/config/net_monitor.param.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
/**:
ros__parameters:
devices: ["*"]
traffic_reader_port: 7636
monitor_program: "greengrass"
crc_error_check_duration: 1
crc_error_count_threshold: 1
Expand Down
16 changes: 8 additions & 8 deletions system/system_monitor/docs/ros_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,14 @@ mem_monitor:

net_monitor:

| Name | Type | Unit | Default | Notes |
| :-------------------------------- | :----------: | :-----: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------------------------- |
| devices | list[string] | n/a | none | The name of network interface to monitor. (e.g. eth0, \* for all network interfaces) |
| usage_warn | float | %(1e-2) | 0.95 | Generates warning when network usage reaches a specified value or higher. |
| crc_error_check_duration | int | sec | 1 | CRC error check duration. |
| crc_error_count_threshold | int | n/a | 1 | Generates warning when count of CRC errors during CRC error check duration reaches a specified value or higher. |
| reassembles_failed_check_duration | int | sec | 1 | IP packet reassembles failed check duration. |
| reassembles_failed_check_count | int | n/a | 1 | Generates warning when count of IP packet reassembles failed during IP packet reassembles failed check duration reaches a specified value or higher. |
| Name | Type | Unit | Default | Notes |
| :-------------------------------- | :----------: | :--: | :--------: | :--------------------------------------------------------------------------------------------------------------------------------------------------- |
| devices | list[string] | n/a | none | The name of network interface to monitor. (e.g. eth0, \* for all network interfaces) |
| monitor_program | string | n/a | greengrass | program name to be monitored by nethogs name. |
| crc_error_check_duration | int | sec | 1 | CRC error check duration. |
| crc_error_count_threshold | int | n/a | 1 | Generates warning when count of CRC errors during CRC error check duration reaches a specified value or higher. |
| reassembles_failed_check_duration | int | sec | 1 | IP packet reassembles failed check duration. |
| reassembles_failed_check_count | int | n/a | 1 | Generates warning when count of IP packet reassembles failed during IP packet reassembles failed check duration reaches a specified value or higher. |

## <u>NTP Monitor</u>

Expand Down
53 changes: 30 additions & 23 deletions system/system_monitor/docs/topics_net_monitor.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# ROS topics: Net Monitor

## <u>Network Connection</u>

/diagnostics/net_monitor: Network Connection

<b>[summary]</b>

| level | message |
| ----- | -------------- |
| OK | OK |
| WARN | no such device |

<b>[values]</b>

| key | value (example) |
| --------------------- | ------------------- |
| Network [0-9]: status | OK / no such device |
| HDD [0-9]: name | wlp82s0 |

## <u>Network Usage</u>

/diagnostics/net_monitor: Network Usage
Expand Down Expand Up @@ -38,41 +56,30 @@
| ----- | ------- |
| OK | OK |

<b>[values] program</b>
<b>[values when specified program is detected]</b>

| key | value (example) |
| -------------------------------- | ------------------------------------------- |
| nethogs [0-9]: PROGRAM | /lambda/greengrassSystemComponents/1384/999 |
| nethogs [0-9]: SENT (KB/Sec) | 1.13574 |
| nethogs [0-9]: RECEIVED (KB/Sec) | 0.261914 |

<b>[values] all</b>

| key | value (example) |
| --------------------- | -------------------------------------------------------------- |
| nethogs: all (KB/Sec) | python3.7/1520/999 0.274414 0.354883 |
| | /lambda/greengrassSystemComponents/1299/999 0.487305 0.0966797 |
| | sshd: muser@pts/5/15917/1002 0.396094 0.0585938 |
| | /usr/bin/python3.7/2371/999 0 0 |
| | /greengrass/ggc/packages/1.10.0/bin/daemon/906/0 0 0 |
| | python3.7/4362/999 0 0 |
| | unknown TCP/0/0 0 0 |
| nethogs [0-9]: program | /lambda/greengrassSystemComponents/1384/999 |
| nethogs [0-9]: sent (KB/Sec) | 1.13574 |
| nethogs [0-9]: received (KB/Sec) | 0.261914 |

<b>[values] error</b>
<b>[values when error is occurring]</b>

| key | value (example) |
| ----- | ----------------------------------------------------- |
| error | [nethogs -t] execve failed: No such file or directory |
| key | value (example) |
| ----- | ---------------------------------------- |
| error | execve failed: No such file or directory |

## <u>Network CRC Error</u>

/diagnostics/net_monitor: Network CRC Error

<b>[summary]</b>

| level | message |
| ----- | ------- |
| OK | OK |
| level | message |
| ----- | --------- |
| OK | OK |
| WARN | CRC error |

<b>[values]</b>

Expand Down
Loading

0 comments on commit 56f9be9

Please sign in to comment.