Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[active-active] no icmp probes with default route present #194

Open
lolyu opened this issue Mar 30, 2023 · 0 comments
Open

[active-active] no icmp probes with default route present #194

lolyu opened this issue Mar 30, 2023 · 0 comments
Assignees

Comments

@lolyu
Copy link
Contributor

lolyu commented Mar 30, 2023

issue description

It is observed that with the default route present, linkmgrd doesn't send icmp probes.

/var/log/syslog.34.gz:Mar 23 19:03:33.659448 WARNING write_standby: Applying state to interfaces {'Ethernet108': 'active', 'Ethernet112': 'active', 'Ethernet116': 'active', 'Ethernet12': 'active', 'Ethernet120': 'active', 'Ethernet16': 'active', 'Ethernet4': 'active', 'Ethernet44': 'active', 'Ethernet48': 'active', 'Ethernet52': 'active', 'Ethernet56': 'active', 'Ethernet68': 'active', 'Ethernet72': 'active', 'Ethernet76': 'active', 'Ethernet8': 'active', 'Ethernet80': 'active'}
/var/log/syslog.34.gz:Mar 23 19:03:34.074570 INFO lldp#lldpd[22]: MSAP has changed for port Ethernet76, sending a shutdown LLDPDU
/var/log/syslog.34.gz:Mar 23 19:03:34.074917 INFO lldp#supervisord: lldpd 2023-03-23T19:03:34 [INFO/lldp] MSAP has changed for port Ethernet76, sending a shutdown LLDPDU
/var/log/syslog.34.gz:Mar 23 19:03:34.270684 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet76, enable 1, state 1
/var/log/syslog.34.gz:Mar 23 19:03:34.270684 NOTICE swss#orchagent: :- setState: [Ethernet76] Set MUX state from standby to active
/var/log/syslog.34.gz:Mar 23 19:03:34.273156 NOTICE swss#orchagent: :- updateNeighbor: Processing update on neighbor 10.50.147.28 for mux Ethernet76, add 1, state 1
/var/log/syslog.34.gz:Mar 23 19:03:34.287891 NOTICE swss#orchagent: :- updateNeighbor: Processing update on neighbor 2603:10b0:d11:8618::a32:931c for mux Ethernet76, add 1, state 1
/var/log/syslog.34.gz:Mar 23 19:03:34.306135 NOTICE swss#orchagent: :- addOperation: Mux State set to active for port Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:34.675441 INFO caclmgrd[1315]: dhcp packet mark update : '('Ethernet76', 'SET', (('mark', '0x67014'),))'
/var/log/syslog.34.gz:Mar 23 19:03:34.687475 INFO caclmgrd[6577]: DROP  all opt -- in * out *  0.0.0.0/0  -> 0.0.0.0/0   PHYSDEV match --physdev-in Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:34.699444 INFO caclmgrd[1315]: Update DHCP chain: iptables --delete DHCP -m physdev --physdev-in Ethernet76 -j DROP
/var/log/syslog.34.gz:Mar 23 19:03:35.073452 NOTICE swss#orchagent: :- addOperation: Mux setting State DB entry (hw state active, mux state active) for port Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:35.081328 INFO caclmgrd[1315]: mux cable update : '('Ethernet76', 'SET', (('state', 'active'),))'
/var/log/syslog.34.gz:Mar 23 19:03:40.752552 WARNING mux#linkmgrd: MuxManager.cpp:222 updatePortCableType: Ethernet76: Port cable type: active-active
/var/log/syslog.34.gz:Mar 23 19:03:40.761061 WARNING mux#linkmgrd: MuxManager.cpp:162 addOrUpdateMuxPort: Ethernet76: server IP: 10.50.147.28
/var/log/syslog.34.gz:Mar 23 19:03:40.765750 WARNING mux#linkmgrd: MuxManager.cpp:185 addOrUpdateMuxPortSoCAddress: Ethernet76: SoC IP: 10.50.147.29
/var/log/syslog.34.gz:Mar 23 19:03:40.777676 WARNING mux#linkmgrd: MuxPort.cpp:434 handleTsaEnable: port: Ethernet76, configuring mux mode due to tsa_enable notification from CONFIG DB. 
/var/log/syslog.34.gz:Mar 23 19:03:40.779564 WARNING mux#linkmgrd: MuxManager.cpp:207 updateMuxPortConfig: Ethernet76: Mux port config: auto
/var/log/syslog.34.gz:Mar 23 19:03:40.787474 WARNING mux#linkmgrd: MuxManager.cpp:262 addOrUpdateMuxPortLinkState: Ethernet76: link state: up
/var/log/syslog.34.gz:Mar 23 19:03:40.788014 WARNING mux#linkmgrd: MuxManager.cpp:288 addOrUpdateMuxPortMuxState: Ethernet76: state db mux state: active
/var/log/syslog.34.gz:Mar 23 19:03:40.788619 WARNING mux#linkmgrd: MuxPort.cpp:365 handleDefaultRouteState: port: Ethernet76, state db default route state: na
/var/log/syslog.34.gz:Mar 23 19:03:40.789398 WARNING mux#linkmgrd: MuxPort.cpp:365 handleDefaultRouteState: port: Ethernet76, state db default route state: ok
/var/log/syslog.34.gz:Mar 23 19:03:40.790520 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:176 handleMuxStateNotification: Ethernet76: state db mux state: Active
/var/log/syslog.34.gz:Mar 23 19:03:40.791360 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:77 activateStateMachine: Ethernet76: MUX port link prober initialized with server IP: 10.50.147.29, server MAC: 04:27:28:7a:00:4c
/var/log/syslog.34.gz:Mar 23 19:03:40.791485 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:86 activateStateMachine: Ethernet76: (P: Wait, M: Active, L: Up) -> (P: Active, M: Active, L: Up)
/var/log/syslog.34.gz:Mar 23 19:03:40.807267 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:917 setLabel: Ethernet76: Linkmgrd state is: Active Unhealthy
/var/log/syslog.34.gz:Mar 23 19:03:40.807868 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:917 setLabel: Ethernet76: Linkmgrd state is: Active Healthy
/var/log/syslog.34.gz:Mar 23 19:03:40.807923 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:336 handlePeerMuxStateNotification: Ethernet76: server side peer forwarding state : Active
/var/log/syslog.34.gz:Mar 23 19:03:40.872978 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:1010 switchMuxState: Ethernet76: Switching MUX state to 'Standby'
/var/log/syslog.34.gz:Mar 23 19:03:40.873073 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:1275 handleDefaultRouteStateNotification: Ethernet76: (P: Active, M: Active, L: Up) -> (P: Active, M: Standby, L: Up)
/var/log/syslog.34.gz:Mar 23 19:03:40.873165 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:917 setLabel: Ethernet76: Linkmgrd state is: Standby Unhealthy
/var/log/syslog.34.gz:Mar 23 19:03:40.873258 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:277 handleProbeMuxStateNotification: Ethernet76: Received unsolicited MUX state probe notification!
/var/log/syslog.34.gz:Mar 23 19:03:40.873803 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:581 handlePeerStateChange: Ethernet76: Received peer link prober event, new state: PeerActive
/var/log/syslog.34.gz:Mar 23 19:03:40.873953 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:1010 switchMuxState: Ethernet76: Switching MUX state to 'Active'
/var/log/syslog.34.gz:Mar 23 19:03:40.873953 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:520 handleStateChange: Ethernet76: Received mux state event, new state: Active
/var/log/syslog.34.gz:Mar 23 19:03:41.107892 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:482 handleStateChange: Ethernet76: Received link prober event, new state: Unknown
/var/log/syslog.34.gz:Mar 23 19:03:41.108072 WARNING mux#linkmgrd: DbInterface.cpp:259 postLinkProberMetricsEvent: Ethernet76: posting link prober event link_prober_unknown_start
/var/log/syslog.34.gz:Mar 23 19:03:41.108280 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:1010 switchMuxState: Ethernet76: Switching MUX state to 'Standby'
/var/log/syslog.34.gz:Mar 23 19:03:41.108458 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:502 handleStateChange: Ethernet76: (P: Active, M: Active, L: Up) -> (P: Unknown, M: Standby, L: Up)
/var/log/syslog.34.gz:Mar 23 19:03:41.110323 WARNING mux#linkmgrd: DbInterface.cpp:511 handlePostLinkProberMetrics: Ethernet76: posting link prober event link_prober_unknown_start
/var/log/syslog.34.gz:Mar 23 19:03:41.158618 NOTICE swss#orchagent: :- setState: [Ethernet76] Set MUX state from active to active
/var/log/syslog.34.gz:Mar 23 19:03:41.158957 NOTICE swss#orchagent: :- setState: [Ethernet76] Maintaining current MUX state
/var/log/syslog.34.gz:Mar 23 19:03:41.159113 NOTICE swss#orchagent: :- addOperation: Mux State set to active for port Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:41.251892 NOTICE swss#orchagent: :- setState: [Ethernet76] Set MUX state from active to standby
/var/log/syslog.34.gz:Mar 23 19:03:41.251940 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet76, enable 0, state 2
/var/log/syslog.34.gz:Mar 23 19:03:41.282362 NOTICE swss#orchagent: :- addOperation: Mux State set to standby for port Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:41.407679 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:1207 handleMuxWaitTimeout: Ethernet76: orchagent timed out responding to linkmgrd, current state: (P: Unknown, M: Standby, L: Up)
/var/log/syslog.34.gz:Mar 23 19:03:42.355681 NOTICE swss#orchagent: :- addOperation: Mux setting State DB entry (hw state standby, mux state standby) for port Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:42.356618 WARNING mux#linkmgrd: MuxManager.cpp:288 addOrUpdateMuxPortMuxState: Ethernet76: state db mux state: standby
/var/log/syslog.34.gz:Mar 23 19:03:42.356863 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:176 handleMuxStateNotification: Ethernet76: state db mux state: Standby
/var/log/syslog.34.gz:Mar 23 19:03:42.408051 INFO caclmgrd[1315]: mux cable update : '('Ethernet76', 'SET', (('state', 'standby'),))'
/var/log/syslog.34.gz:Mar 23 19:03:42.929087 NOTICE swss#orchagent: :- addOperation: Mux setting State DB entry (hw state standby, mux state standby) for port Ethernet76
/var/log/syslog.34.gz:Mar 23 19:03:42.929087 WARNING mux#linkmgrd: MuxManager.cpp:288 addOrUpdateMuxPortMuxState: Ethernet76: state db mux state: standby
/var/log/syslog.34.gz:Mar 23 19:03:42.929540 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:176 handleMuxStateNotification: Ethernet76: state db mux state: Standby
  • after a config mux active, link prober is reported active:
/var/log/syslog.32.gz:Mar 24 07:14:01.453764 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:1010 switchMuxState: Ethernet76: Switching MUX state to 'Active'
/var/log/syslog.32.gz:Mar 24 07:14:01.453764 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:253 handleMuxConfigNotification: Ethernet76: (P: Unknown, M: Standby, L: Up) -> (P: Unknown, M: Active, L: Up)
/var/log/syslog.32.gz:Mar 24 07:14:01.456502 NOTICE swss#orchagent: :- setState: [Ethernet76] Set MUX state from standby to active
/var/log/syslog.32.gz:Mar 24 07:14:01.457226 NOTICE swss#orchagent: :- nbrHandler: Processing neighbors for mux Ethernet76, enable 1, state 1
/var/log/syslog.32.gz:Mar 24 07:14:01.462369 NOTICE swss#orchagent: :- updateNeighbor: Processing update on neighbor 10.50.147.28 for mux Ethernet76, add 1, state 1
/var/log/syslog.32.gz:Mar 24 07:14:01.472146 WARNING mux#linkmgrd: DbInterface.cpp:259 postLinkProberMetricsEvent: Ethernet76: posting link prober event link_prober_active_start
/var/log/syslog.32.gz:Mar 24 07:14:01.472146 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:482 handleStateChange: Ethernet76: Received link prober event, new state: Active
/var/log/syslog.32.gz:Mar 24 07:14:01.472191 WARNING mux#linkmgrd: link_manager/LinkManagerStateMachineActiveActive.cpp:502 handleStateChange: Ethernet76: (P: Unknown, M: Active, L: Up) -> (P: Active, M: Active, L: Up)
/var/log/syslog.32.gz:Mar 24 07:14:01.472348 WARNING mux#linkmgrd: DbInterface.cpp:511 handlePostLinkProberMetrics: Ethernet76: posting link prober event link_prober_active_start
/var/log/syslog.32.gz:Mar 24 07:14:01.481870 NOTICE swss#orchagent: :- updateNeighbor: Processing update on neighbor 2603:10b0:d11:8618::a32:931c for mux Ethernet76, add 1, state 1
/var/log/syslog.32.gz:Mar 24 07:14:01.496055 NOTICE swss#orchagent: :- addOperation: Mux State set to active for port Ethernet76
/var/log/syslog.32.gz:Mar 24 07:14:01.505583 NOTICE swss#orchagent: :- addOperation: Mux setting State DB entry (hw state active, mux state active) for port Ethernet76
/var/log/syslog.32.gz:Mar 24 07:14:01.507186 INFO caclmgrd[1315]: mux cable update : '('Ethernet76', 'SET', (('state', 'active'),))'

analysis

  • there are two issues:
  1. linkmgrd does three toggles after initialization, the second toggle to active is unnecessary, and it is fixed by PR: [active-active] Add mux server state correction mechanism #191
  2. the icmp probe sending is stopped after the first handleDefaultRouteStateNotification with default route na, but the second default route event ok doesn't start the probe sending as expected. After the config mux active, heartbeats replies are received so it could be guaranteed that the icmp probe sending flag is enabled(as handleMuxConfigNotification calls shutdownOrRestartLinkProberOnDefaultRoute to restart/stop the icmp probes).
@lolyu lolyu self-assigned this Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant