Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: Fix read after free in labeled_unicast on shutdown #15583

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

donaldsharp
Copy link
Member

Fix:

==2937693== Invalid read of size 2
==2937693== at 0x2EFD44: bgp_delete_listnode (bgp_table.c:170)
==2937693== by 0x2EFB22: bgp_dest_unlock_node (bgp_table.c:81)
==2937693== by 0x26BAC1: check_bgp_lu_cb_unlock (bgp_labelpool.c:208)
==2937693== by 0x26BC58: bgp_lp_finish (bgp_labelpool.c:243)
==2937693== by 0x1FE006: bgp_exit (bgp_main.c:256)
==2937693== by 0x1FDD99: sigint (bgp_main.c:163)
==2937693== by 0x49BA8BB: frr_sigevent_process (sigevent.c:117)
==2937693== by 0x49D5912: event_fetch (event.c:1782)
==2937693== by 0x4955BBF: frr_run (libfrr.c:1216)
==2937693== by 0x1FEA22: main (bgp_main.c:543)
==2937693== Address 0x91af6cc is 60 bytes inside a block of size 96 free'd
==2937693== at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2937693== by 0x4968C57: qfree (memory.c:130)
==2937693== by 0x2EFC55: bgp_node_destroy (bgp_table.c:116)
==2937693== by 0x49CB7DB: route_node_free (table.c:76)
==2937693== by 0x49CB8A2: route_table_free (table.c:111)
==2937693== by 0x49CB6E5: route_table_finish (table.c:46)
==2937693== by 0x2EFA15: bgp_table_unlock (bgp_table.c:35)
==2937693== by 0x2EFA6F: bgp_table_finish (bgp_table.c:44)
==2937693== by 0x3625FA: bgp_free (bgpd.c:4072)
==2937693== by 0x353858: bgp_unlock (bgpd.h:2525)
==2937693== by 0x362411: bgp_delete (bgpd.c:4034)
==2937693== by 0x1FDF13: bgp_exit (bgp_main.c:205)
==2937693== by 0x1FDD99: sigint (bgp_main.c:163)
==2937693== by 0x49BA8BB: frr_sigevent_process (sigevent.c:117)
==2937693== by 0x49D5912: event_fetch (event.c:1782)
==2937693== by 0x4955BBF: frr_run (libfrr.c:1216)
==2937693== by 0x1FEA22: main (bgp_main.c:543)
==2937693== Block was alloc'd at
==2937693== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2937693== by 0x4968B05: qcalloc (memory.c:105)
==2937693== by 0x2A5667: bgp_node_get (bgp_table.h:239)
==2937693== by 0x2A9C36: bgp_afi_node_get (bgp_route.c:154)
==2937693== by 0x2B4BA3: bgp_update (bgp_route.c:4225)
==2937693== by 0x3D5255: bgp_nlri_parse_label (bgp_label.c:452)
==2937693== by 0x290CBF: bgp_nlri_parse (bgp_packet.c:345)
==2937693== by 0x29616F: bgp_update_receive (bgp_packet.c:2453)
==2937693== by 0x29ACEE: bgp_process_packet (bgp_packet.c:4022)
==2937693== by 0x49D5F18: event_call (event.c:2011)
==2937693== by 0x4955BA6: frr_run (libfrr.c:1217)
==2937693== by 0x1FEA22: main (bgp_main.c:543)

This is occurring because the shutdown code is hard deleting the route node but labeled unicast has a pointer to it. Free up the labeled unicast data before hard free'ing the route nodes.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ton31337
Copy link
Member

@Mergifyio backport dev/10.0

Copy link

mergify bot commented Mar 20, 2024

backport dev/10.0

🟠 Waiting for conditions to match

  • merged [📌 backport requirement]

@ton31337
Copy link
Member

Seems new ASAN issue:

    ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55ac304015a1 bp 0x7ffeee56b310 sp 0x7ffeee56b270 T0)
    ==13463==The signal is caused by a READ memory access.
    ==13463==Hint: address points to the zero page.
        #0 0x55ac304015a0 in bgp_lp_release bgpd/bgp_labelpool.c:472
        #1 0x55ac304028e5 in bgp_label_per_nexthop_free bgpd/bgp_labelpool.c:1738
        #2 0x55ac30414e09 in bgp_mplsvpn_path_nh_label_unlink bgpd/bgp_mplsvpn.c:1298
        #3 0x55ac3041d980 in vpn_leak_from_vrf_withdraw_all bgpd/bgp_mplsvpn.c:2009
        #4 0x55ac3058aa55 in vpn_leak_prechange bgpd/bgp_mplsvpn.h:264
        #5 0x55ac3058aa55 in bgp_delete bgpd/bgpd.c:3862
        #6 0x55ac3033ef9d in bgp_exit bgpd/bgp_main.c:202
        #7 0x55ac3033ef9d in sigint bgpd/bgp_main.c:163
        #8 0x7f317a486289 in frr_sigevent_process lib/sigevent.c:117
        #9 0x7f317a4b1e74 in event_fetch lib/event.c:1782
        #10 0x7f317a3ea15f in frr_run lib/libfrr.c:1216
        #11 0x55ac30340b81 in main bgpd/bgp_main.c:543
        #12 0x7f3179412c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
        #13 0x55ac3033e9d9 in _start (/usr/lib/frr/bgpd+0x2d99d9)

@riw777
Copy link
Member

riw777 commented Mar 26, 2024

looks like a compile error or some such, running just the one test again to see

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@riw777
Copy link
Member

riw777 commented Apr 16, 2024

looks like a compile error or some such ... needs to be looked at :-(

@riw777
Copy link
Member

riw777 commented May 28, 2024

error seems to be:

<body>
<!--StartFragment-->
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
-- | --
16-Apr-2024 11:55:17 | Unable to publish artifact [ErrorLog]:
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 21.96 ms
16-Apr-2024 11:55:17 | Publishing an artifact: AddressSanitizerError
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
16-Apr-2024 11:55:17 | Unable to publish artifact [AddressSanitizerError]:
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 8.044 ms
16-Apr-2024 11:55:17 | Publishing an artifact: Topotest Details
16-Apr-2024 11:55:17 | Unable to publish artifact [Topotest Details]: the source directory /tmp/topotests does not exist.
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 503.3 μs
16-Apr-2024 11:55:17 | Publishing an artifact: TopotestLogs
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
16-Apr-2024 11:55:17 | Unable to publish artifact [TopotestLogs]:

<!--EndFragment-->
</body>
</html>```

looks like this is after all the tests pass (?)

@riw777
Copy link
Member

riw777 commented Jul 16, 2024

Failure is here --

24-Jun-2024 21:25:02 | ospfd/ospf_vty.c: In function ‘show_ip_ospf_interface_sub’:
-- | --
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c:3743:3: note: #pragma message: Use all fields following ospfEnabled from interfaceIp hierarchy
24-Jun-2024 21:25:02 | CPP_NOTICE(
24-Jun-2024 21:25:02 | ^~~~~~~~~~
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c: In function ‘ospf_show_gr_helper_details’:
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c:10525:3: note: #pragma message: Remove deprecated json key: restartSupoort
24-Jun-2024 21:25:02 | CPP_NOTICE("Remove deprecated json key: restartSupoort")

I suspect this needs to be rebased ...

@riw777
Copy link
Member

riw777 commented Jul 16, 2024

@Mergifyio rebase

Copy link

mergify bot commented Jul 16, 2024

rebase

✅ Branch has been successfully rebased

@riw777 riw777 force-pushed the labeled_unicast_invalid branch from 618a34b to 22ba8b4 Compare July 16, 2024 15:03
@frrbot frrbot bot added the bgp label Jul 16, 2024
@riw777
Copy link
Member

riw777 commented Jul 24, 2024

sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution

and then it's downhill from there ... trying to rerun just this failing test to see if we can get past this

@riw777
Copy link
Member

riw777 commented Aug 27, 2024

ci:rerun

@riw777
Copy link
Member

riw777 commented Sep 24, 2024

This needs to be rebased and we need to get these CI failures cleared ...

@donaldsharp donaldsharp force-pushed the labeled_unicast_invalid branch from 22ba8b4 to 58499e3 Compare November 4, 2024 19:22
@riw777
Copy link
Member

riw777 commented Nov 26, 2024

Seems like we have a crash now ... :-(

E Failed: New core[s] found: /tmp/topotests/ospf_sr_topo1.test_ospf_sr_topo1/rt6/ospfd_core-sig_6-pid_28560.dmp

@riw777
Copy link
Member

riw777 commented Dec 17, 2024

@Mergifyio rebase

Fix:

==2937693== Invalid read of size 2
==2937693==    at 0x2EFD44: bgp_delete_listnode (bgp_table.c:170)
==2937693==    by 0x2EFB22: bgp_dest_unlock_node (bgp_table.c:81)
==2937693==    by 0x26BAC1: check_bgp_lu_cb_unlock (bgp_labelpool.c:208)
==2937693==    by 0x26BC58: bgp_lp_finish (bgp_labelpool.c:243)
==2937693==    by 0x1FE006: bgp_exit (bgp_main.c:256)
==2937693==    by 0x1FDD99: sigint (bgp_main.c:163)
==2937693==    by 0x49BA8BB: frr_sigevent_process (sigevent.c:117)
==2937693==    by 0x49D5912: event_fetch (event.c:1782)
==2937693==    by 0x4955BBF: frr_run (libfrr.c:1216)
==2937693==    by 0x1FEA22: main (bgp_main.c:543)
==2937693==  Address 0x91af6cc is 60 bytes inside a block of size 96 free'd
==2937693==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2937693==    by 0x4968C57: qfree (memory.c:130)
==2937693==    by 0x2EFC55: bgp_node_destroy (bgp_table.c:116)
==2937693==    by 0x49CB7DB: route_node_free (table.c:76)
==2937693==    by 0x49CB8A2: route_table_free (table.c:111)
==2937693==    by 0x49CB6E5: route_table_finish (table.c:46)
==2937693==    by 0x2EFA15: bgp_table_unlock (bgp_table.c:35)
==2937693==    by 0x2EFA6F: bgp_table_finish (bgp_table.c:44)
==2937693==    by 0x3625FA: bgp_free (bgpd.c:4072)
==2937693==    by 0x353858: bgp_unlock (bgpd.h:2525)
==2937693==    by 0x362411: bgp_delete (bgpd.c:4034)
==2937693==    by 0x1FDF13: bgp_exit (bgp_main.c:205)
==2937693==    by 0x1FDD99: sigint (bgp_main.c:163)
==2937693==    by 0x49BA8BB: frr_sigevent_process (sigevent.c:117)
==2937693==    by 0x49D5912: event_fetch (event.c:1782)
==2937693==    by 0x4955BBF: frr_run (libfrr.c:1216)
==2937693==    by 0x1FEA22: main (bgp_main.c:543)
==2937693==  Block was alloc'd at
==2937693==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2937693==    by 0x4968B05: qcalloc (memory.c:105)
==2937693==    by 0x2A5667: bgp_node_get (bgp_table.h:239)
==2937693==    by 0x2A9C36: bgp_afi_node_get (bgp_route.c:154)
==2937693==    by 0x2B4BA3: bgp_update (bgp_route.c:4225)
==2937693==    by 0x3D5255: bgp_nlri_parse_label (bgp_label.c:452)
==2937693==    by 0x290CBF: bgp_nlri_parse (bgp_packet.c:345)
==2937693==    by 0x29616F: bgp_update_receive (bgp_packet.c:2453)
==2937693==    by 0x29ACEE: bgp_process_packet (bgp_packet.c:4022)
==2937693==    by 0x49D5F18: event_call (event.c:2011)
==2937693==    by 0x4955BA6: frr_run (libfrr.c:1217)
==2937693==    by 0x1FEA22: main (bgp_main.c:543)

This is occurring because the shutdown code is hard deleting the
route node but labeled unicast has a pointer to it.  Free up
the labeled unicast data before hard free'ing the route nodes.

Signed-off-by: Donald Sharp <[email protected]>
Copy link

mergify bot commented Dec 17, 2024

rebase

✅ Branch has been successfully rebased

@riw777 riw777 force-pushed the labeled_unicast_invalid branch from 58499e3 to 990e48e Compare December 17, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants