Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd,lib: fix logging from rpki_create_socket() #15265

Merged
merged 2 commits into from
Feb 3, 2024

Conversation

louis-6wind
Copy link
Contributor

Fix the following crash when logging from rpki_create_socket():

#0 raise (sig=) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f6e21723798 in core_handler (signo=6, siginfo=0x7f6e1e502ef0, context=0x7f6e1e502dc0) at lib/sigevent.c:248
#2
#3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#4 0x00007f6e2144e537 in __GI_abort () at abort.c:79
#5 0x00007f6e2176348e in _zlog_assert_failed (xref=0x7f6e2180c920 <_xref.16>, extra=0x0) at lib/zlog.c:670
#6 0x00007f6e216b1eda in rcu_read_lock () at lib/frrcu.c:294
#7 0x00007f6e21762da8 in vzlog_notls (xref=0x0, prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed", ap=0x7f6e1e504248) at lib/zlog.c:425
#8 0x00007f6e217632fb in vzlogx (xref=0x0, prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed", ap=0x7f6e1e504248) at lib/zlog.c:627
#9 0x00007f6e217621f5 in zlog (prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed") at lib/zlog.h:73
#10 0x00007f6e21763596 in _zlog_assert_failed (xref=0x7f6e2180c920 <_xref.16>, extra=0x0) at lib/zlog.c:687
#11 0x00007f6e216b1eda in rcu_read_lock () at lib/frrcu.c:294
#12 0x00007f6e21762da8 in vzlog_notls (xref=0x7f6e21a50040 <_xref.68>, prio=4, fmt=0x7f6e21a4999f "getaddrinfo: debug", ap=0x7f6e1e504878) at lib/zlog.c:425
#13 0x00007f6e217632fb in vzlogx (xref=0x7f6e21a50040 <_xref.68>, prio=4, fmt=0x7f6e21a4999f "getaddrinfo: debug", ap=0x7f6e1e504878) at lib/zlog.c:627
#14 0x00007f6e21a3f774 in zlog_ref (xref=0x7f6e21a50040 <_xref.68>, fmt=0x7f6e21a4999f "getaddrinfo: debug") at ./lib/zlog.h:84
#15 0x00007f6e21a451b2 in rpki_create_socket (_cache=0x55729149cc30) at bgpd/bgp_rpki.c:1337
#16 0x00007f6e2120e7b7 in tr_tcp_open (tr_socket=0x5572914d1520) at rtrlib/rtrlib/transport/tcp/tcp_transport.c:111
#17 0x00007f6e2120e212 in tr_open (socket=0x5572914b5e00) at rtrlib/rtrlib/transport/transport.c:16
#18 0x00007f6e2120faa2 in rtr_fsm_start (rtr_socket=0x557290e17180) at rtrlib/rtrlib/rtr/rtr.c:130
#19 0x00007f6e218b7ea7 in start_thread (arg=) at pthread_create.c:477
#20 0x00007f6e21527a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

rpki_create_socket() is a hook function called from the rtrlib library. The issue arises because rtrlib initiates its own separate pthread in which it runs the hook, which does not establish an FRR RCU context. Consequently, this leads to failures in the logging mechanism that relies on RCU.

Initialize a new FRR pthread context from the rtrlib pthread with a valid RCU context to allow logging from the rpki_create_socket() and dependent functions.

Link: #15260
Fixes: a951752 ("bgpd: create cache server socket in vrf")

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code change creates inherent maintainability issues. Let's try to abstract the startup of pthreads such that something else that starts pthreads can get the same behavior by calling one function, once the pthread hits our code.

bgpd/bgp_rpki.c Outdated
@@ -1307,6 +1321,15 @@ static int rpki_create_socket(void *_cache)
if (!CHECK_FLAG(vrf->status, VRF_ACTIVE) || vrf->vrf_id == VRF_UNKNOWN)
return -1;

/* Create FRR pthread context to allow logging */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not good from my perspective

a) Every new non FRR controlled/started pthread has to know a list of things to do with this change
b) As new items are added in frr_pthread.c, someone will have to remember to go into bgp_rpki.c and add new shit.

What I would do is create abstract the bits and bobs in frr_pthread.c into a generic startup function that is static and then have a extern function that is named something like frr_pthread_non_controlled_startup() that bgp_rpki.c will call. This way bgp_rpki.c doesn't care about the internals of frr pthreads and when a new pthread is introduced by another library in the system, it will be an easy one function call to make.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback
Changes done

External libraries can re-enter the FRR code through a hook function. A
crash occurs when logging from this hook function if the library has
initiated a new pthread, as the FRR RCU context is not initialized for
this thread.

Add frr_pthread_non_controlled_startup() function to initialize a valid
RCU context within a FRR pthread context, originating from an external
pthread.

Signed-off-by: Louis Scalbert <[email protected]>
Fix the following crash when logging from rpki_create_socket():

> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f6e21723798 in core_handler (signo=6, siginfo=0x7f6e1e502ef0, context=0x7f6e1e502dc0) at lib/sigevent.c:248
> #2  <signal handler called>
> #3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> FRRouting#4  0x00007f6e2144e537 in __GI_abort () at abort.c:79
> FRRouting#5  0x00007f6e2176348e in _zlog_assert_failed (xref=0x7f6e2180c920 <_xref.16>, extra=0x0) at lib/zlog.c:670
> FRRouting#6  0x00007f6e216b1eda in rcu_read_lock () at lib/frrcu.c:294
> FRRouting#7  0x00007f6e21762da8 in vzlog_notls (xref=0x0, prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed", ap=0x7f6e1e504248) at lib/zlog.c:425
> FRRouting#8  0x00007f6e217632fb in vzlogx (xref=0x0, prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed", ap=0x7f6e1e504248) at lib/zlog.c:627
> FRRouting#9  0x00007f6e217621f5 in zlog (prio=2, fmt=0x7f6e217afe50 "%s:%d: %s(): assertion (%s) failed") at lib/zlog.h:73
> FRRouting#10 0x00007f6e21763596 in _zlog_assert_failed (xref=0x7f6e2180c920 <_xref.16>, extra=0x0) at lib/zlog.c:687
> FRRouting#11 0x00007f6e216b1eda in rcu_read_lock () at lib/frrcu.c:294
> FRRouting#12 0x00007f6e21762da8 in vzlog_notls (xref=0x7f6e21a50040 <_xref.68>, prio=4, fmt=0x7f6e21a4999f "getaddrinfo: debug", ap=0x7f6e1e504878) at lib/zlog.c:425
> FRRouting#13 0x00007f6e217632fb in vzlogx (xref=0x7f6e21a50040 <_xref.68>, prio=4, fmt=0x7f6e21a4999f "getaddrinfo: debug", ap=0x7f6e1e504878) at lib/zlog.c:627
> FRRouting#14 0x00007f6e21a3f774 in zlog_ref (xref=0x7f6e21a50040 <_xref.68>, fmt=0x7f6e21a4999f "getaddrinfo: debug") at ./lib/zlog.h:84
> FRRouting#15 0x00007f6e21a451b2 in rpki_create_socket (_cache=0x55729149cc30) at bgpd/bgp_rpki.c:1337
> FRRouting#16 0x00007f6e2120e7b7 in tr_tcp_open (tr_socket=0x5572914d1520) at rtrlib/rtrlib/transport/tcp/tcp_transport.c:111
> FRRouting#17 0x00007f6e2120e212 in tr_open (socket=0x5572914b5e00) at rtrlib/rtrlib/transport/transport.c:16
> FRRouting#18 0x00007f6e2120faa2 in rtr_fsm_start (rtr_socket=0x557290e17180) at rtrlib/rtrlib/rtr/rtr.c:130
> FRRouting#19 0x00007f6e218b7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
> FRRouting#20 0x00007f6e21527a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

rpki_create_socket() is a hook function called from the rtrlib library.
The issue arises because rtrlib initiates its own separate pthread in which
it runs the hook, which does not establish an FRR RCU context. Consequently,
this leads to failures in the logging mechanism that relies on RCU.

Initialize a new FRR pthread context from the rtrlib pthread with a
valid RCU context to allow logging from the rpki_create_socket() and
dependent functions.

Link: FRRouting#15260
Fixes: a951752 ("bgpd: create cache server socket in vrf")
Signed-off-by: Louis Scalbert <[email protected]>
Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the documentation on rcu.rst?

@donaldsharp donaldsharp merged commit 3a6c312 into FRRouting:master Feb 3, 2024
8 of 9 checks passed
@donaldsharp
Copy link
Member

@ton31337 I raised a new PR with some doc changes in it. I wanted this bug fix in now. We can work on the doc in PR #15287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants