Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

namespace vrf's and deletion have issues #13902

Closed
donaldsharp opened this issue Jul 1, 2023 · 3 comments
Closed

namespace vrf's and deletion have issues #13902

donaldsharp opened this issue Jul 1, 2023 · 3 comments
Labels
topotest_ci_bug triage Needs further investigation

Comments

@donaldsharp
Copy link
Member

On deletion of a namespace that FRR is actively using, zebra can crash because the namespace deletion event causes the namespace to delete the struct nlsock *nl stored in the kernel_netlink_nlsock hash. At the same time if FRR has scheduled a batch of dplane contexts to send to the kernel in that namespace, zebra will crash when attempting to lookup the struct nlsock * stored in the hash.

$1 = 21
(gdb) bt
#0  0x0000560384c691ee in netlink_recv_msg (nl=0x0, msg=0x7f5956ca4780) at zebra/kernel_netlink.c:948
#1  0x0000560384c69e74 in nl_batch_read_resp (bth=0x7f5956ca4860) at zebra/kernel_netlink.c:1312
#2  0x0000560384c6a486 in nl_batch_send (bth=0x7f5956ca4860) at zebra/kernel_netlink.c:1496
#3  0x0000560384c6a94b in kernel_update_multi (ctx_list=0x7f5956ca4900) at zebra/kernel_netlink.c:1704
#4  0x0000560384ca37ff in kernel_dplane_process_func (prov=0x56038531ca50) at zebra/zebra_dplane.c:6333
#5  0x0000560384ca4139 in dplane_thread_loop (event=0x7f5956ca4ae0) at zebra/zebra_dplane.c:6768
#6  0x00007f5957d4c334 in event_call (thread=0x7f5956ca4ae0) at lib/event.c:1995
#7  0x00007f5957cbe6a3 in fpt_run (arg=0x5603854dbcd0) at lib/frr_pthread.c:296
#8  0x00007f5957cbe0b2 in frr_pthread_inner (arg=0x5603854dbcd0) at lib/frr_pthread.c:145
#9  0x00007f5957894b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#10 0x00007f5957926a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 

On a namespace shutdown, if using namespaces as vrf's, the dplane context's that are queued should be sifted through and those that are attempting to use the namespace should be removed from the data being planned to send to the kernel

@donaldsharp donaldsharp added triage Needs further investigation topotest_ci_bug labels Jul 1, 2023
@donaldsharp
Copy link
Member Author

I'm seeing this crash in our CI system in all namespace based tests that are in the topotest system

Copy link

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Dec 29, 2023

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Jan 5, 2024
@frrbot frrbot bot closed this as completed Jan 5, 2024
@frrbot frrbot bot closed this as completed Jan 5, 2024
@frrbot frrbot bot removed the autoclose label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topotest_ci_bug triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

1 participant