-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zebra process crashes intermittently during 'config reload' on the DUT line cards #15803
Comments
@mlok-nokia please check if similar issue/fix exisits in frr github |
@mlok-nokia will help collect stack trace using debug image and we can follow up with FRR team afterwards. please open an issue with FRR team once we have all the information available. |
@vmittal-msft Jul 11 19:39:20.466949 ixre-egl-board1 CRIT bgp0#ZEBRA[44]: Received signal 11 at 1689104360 (si_addr 0x4, PC 0x7f1489871646); aborting... |
@vmittal-msft @arlakshm , We have the core and docker-fpm-frr-dbg.gz files. But i am unable attach them here since the size is bigger. Can we put in our teams shared link?. |
Also seen the zebra crash with this back trace |
No. We don't find any related fix |
Issue has been raised on the FRRouting submodule FRRouting/frr#14092 |
Why I did it Fixes #15803 In SONiC chassis, routes have recursive nexthop resolution when the routes are learnt from remote linecard. In some cases after recursive nexthop resolution the number of nexthop for a route could reach 256. Zebra ran out of space when filling up 256 nexthops which causes zebra crash. Work item tracking Microsoft ADO (24997365): How I did it Create a patch to port FRRouting/frr#14096 which has change to ignore duplicate nexthop when filling up fpm message Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>
) Why I did it Fixes sonic-net#15803 In SONiC chassis, routes have recursive nexthop resolution when the routes are learnt from remote linecard. In some cases after recursive nexthop resolution the number of nexthop for a route could reach 256. Zebra ran out of space when filling up 256 nexthops which causes zebra crash. Work item tracking Microsoft ADO (24997365): How I did it Create a patch to port FRRouting/frr#14096 which has change to ignore duplicate nexthop when filling up fpm message Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>
) Why I did it Fixes sonic-net#15803 In SONiC chassis, routes have recursive nexthop resolution when the routes are learnt from remote linecard. In some cases after recursive nexthop resolution the number of nexthop for a route could reach 256. Zebra ran out of space when filling up 256 nexthops which causes zebra crash. Work item tracking Microsoft ADO (24997365): How I did it Create a patch to port FRRouting/frr#14096 which has change to ignore duplicate nexthop when filling up fpm message Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <[email protected]>
Description
On a T2 chassis line card, when we do 'sudo config reload -y', we see 'zebra' process getting crashed and generates a core. We see this issue intermittently happening. (~ approx once in 30 attempts or so)
sonic-buildimage-msft commit:
Azure/sonic-buildimage-msft@6f19e12
Following logs are seen on the bgp docker, when the crash is happening.
Crash logs:
Attached the zebra core generated and the frr logs for reference.
frr.zip
zebra.1689104360.44.0.core.gz
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: