Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eventd]: Eventd failing to start due to rsyslogd on multi-asic #20775

Open
liamkearney-msft opened this issue Nov 13, 2024 · 1 comment
Open
Assignees
Labels
Issue for 202405 MSFT Triaged this issue has been triaged

Comments

@liamkearney-msft
Copy link
Contributor

liamkearney-msft commented Nov 13, 2024

Description

On T2 / multi-asic chassis, it can be seen when coming out of reboot, that sometimes eventd fails to start, leading to a degraded system according to systemd. The reason for this failure is attributed to rsyslogd failing to initially start, which cascades into eventd failing to start. rsyslogd eventually restarts and comes up fine, but eventd does not have auto-restart configured so it stays down. Manually restarting the service recovers the system.

rsyslogd is failing to initially start due to "network unreachable", as it's likely coming up before the docker service is ready / racing with docker (for multi-asic chassis, rsyslogd attaches to docker0 interface to pull the logs). Once rsyslogd is auto-restarted (which happens after the docker interface / service is up), it starts up fine.

See below for journalctl output for rsyslogd:

Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd/udp: socket 11: sendto() error: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd: socket 11: error 101 sending via udp: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: action 'action-19-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2302.0 try https://www.rsyslog.com/e/2007 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: action 'action-19-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2302.0 try https://www.rsyslog.com/e/2359 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd/udp: socket 11: sendto() error: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd: socket 11: error 101 sending via udp: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: action 'action-19-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2302.0 try https://www.rsyslog.com/e/2007 ]

Steps to reproduce the issue:

  1. Reboot a t2 / multiasic chassis
  2. run systemctl and see that eventd is down
  • note: can be semi-reliably reproduced with platform_tests/test_reload_config::test_reload_configuration_checks test, although is flaky as its a race condition

Describe the results you received:

eventd fails to start due to rsyslogd failing on initial start

Describe the results you expected:

eventd either autorestarts on failure, or services are sequenced such that rsyslogd starts on first invocation (dependencies can be tricky here as we dont want to drop logs on boot)

Output of show version:

On nokia 7250 chassis

SONiC Software Version: SONiC.internal-202405.107587339-62ab6b6719                                                                                                                                                                                          SONiC OS Version: 12                                                                                                                                                                                                                                        Distribution: Debian 12.6                                                                                                                                                                                                                                   Kernel: 6.1.0-22-2-amd64                                                                                                                                                                                                                                    Build commit: 62ab6b6719 

/ 202405

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

This is probably related to issues #20544 and #20521

@liamkearney-msft
Copy link
Contributor Author

@arlakshm for vis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202405 MSFT Triaged this issue has been triaged
Projects
Status: No status
Development

No branches or pull requests

3 participants