Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: fix table heap-after-free crash #16614

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

louis-6wind
Copy link
Contributor

Fix a heap-after-free that causes zebra to crash even without address-sanitizer. To reproduce:

echo "100 my_table" | tee -a /etc/iproute2/rt_tables
ip route add blackhole default table 100
ip route show table 100
ip l add red type vrf table 100
ip l del red
ip route del blackhole default table 100

==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
READ of size 1 at 0x606000154f54 thread T0
#0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
#1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
#2 0x7fa32474d783 in route_node_get lib/table.c:283
#3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
#7 0x7fa32476689c in event_call lib/event.c:1996
#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
#9 0x55b0e4e6c32a in main zebra/main.c:526
#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)

0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
freed by thread T0 here:
#0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
#1 0x7fa324668d8f in qfree lib/memory.c:130
#2 0x7fa32474c421 in route_table_free lib/table.c:126
#3 0x7fa32474bf96 in route_table_finish lib/table.c:46
#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
#13 0x7fa32476689c in event_call lib/event.c:1996
#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
#15 0x55b0e4e6c32a in main zebra/main.c:526
#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
#0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fa324668c4d in qcalloc lib/memory.c:105
#2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
#3 0x7fa32474e73c in route_table_init lib/table.c:512
#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
#12 0x7fa32476689c in event_call lib/event.c:1996
#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
#14 0x55b0e4e6c32a in main zebra/main.c:526
#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")

@louis-6wind
Copy link
Contributor Author

ci:rerun

@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch from 45bdb43 to 7c3d151 Compare August 22, 2024 09:41
@louis-6wind louis-6wind marked this pull request as ready for review August 22, 2024 09:42
@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch 2 times, most recently from ac9b780 to 311df79 Compare August 22, 2024 15:22
@github-actions github-actions bot added size/M and removed size/S labels Aug 22, 2024
@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch from 311df79 to 345a1a3 Compare August 22, 2024 15:54
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what is being fixed here - can you explain what you think is going wrong, and why you think this is the right fix? It seems ... surprising to try to change the vrf ids of route_entries. The crash backtrace looks like there's an array cell or a hash/tree entry that should have been NULL'd.

We should handle the possibility that dataplane results refer to prefixes or to vrfs that are no longer present - but of course we have to be able to detect that condition.

Or, a table-id should only be in one vrf - is that the issue, that we're not handling the migration of the tableid among vrfs, or failing to force that id to share fate with a vrf?

@donaldsharp
Copy link
Member

I actually think it's an ok way to solve the problem. Perhaps though Louis should spend a few minutes expanding the comments in the code, as that I had to spend a bit of time understanding the change

Effectively this code moves the table(and all the pointers from various data structures) into and out of the default vrf or the vrf created, when the non-default vrf is created or deleted. This way the pointers for the vrf always are correct from a data structure point of view and we will not get any type of use after free that was happening.

It does look like to me, though, that there are address sanitizer issues that were found with this approach. At that point I stopped looking at it and was waiting for Louis to get back to it and fix the problems.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 28, 2024

ok - but there are things I don't think I get.
the config example creates a vrf with a table, then the vrf is deleted. our normal practice is that when a vrf is deleted, all the resources in that vrf are deleted. how do we detect the case where that "vrf RED table 100" isn't just being deleted, and decide not to free the RED routes, but move them to some other vrf?

the config example uses table 100, in the default vrf, I guess, then uses it for a vrf. is the correct thing to do to delete everything, and start with an empty table 100 in the new vrf? or is the correct thing to "move" ... things from one vrf to the new vrf? do we have other cases where we just "move stuff" among vrfs?

I actually think it's an ok way to solve the problem. Perhaps though Louis should spend a few minutes expanding the comments in the code, as that I had to spend a bit of time understanding the change

If we want to have a function that says "move this table from vrf A to vrf B", or "move the routes (and ... anything else?) from vrf A to vrf B", then that should be a function. just sticking a loop in that pokes at the cells in an re is ... awkward.

Effectively this code moves the table(and all the pointers from various data structures) into and out of the default vrf or the vrf created, when the non-default vrf is created or deleted. This way the pointers for the vrf always are correct from a data structure point of view and we will not get any type of use after free that was happening.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 28, 2024

Looking a little more, I think the problem is introduced early on: we have a route in vrf default, table 100; we are informed that vrf RED table 100 has been created - and that vrf/zvrf takes over the table, but we just ... leave the route in that table, which has sort of become the property of vrf RED. I think we should fix that - and also decide what to do about cleanup. maybe kernel routes are a special case, that need special handling, since the kernel behavior here is sort of ... distinctive?

@donaldsharp
Copy link
Member

fix it to what? I actually disagree that it's vrf default table 100.
it's just table 100.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 28, 2024

every route (re, rn) has a vrf - and in the case of this config example, that is the default vrf.
and when the config creates the new vrf, those routes ... just sort of float around, in a table that new represents RED, but pointing to default. and there's at least one nhg too, in this case, that's also left in "default".

fix it to what? I actually disagree that it's vrf default table 100. it's just table 100.

@donaldsharp
Copy link
Member

I spent some time talking w/ Mark about this on Friday and I also spent some time testing this fix out. I believe that some of the issues that Mark was seeing was a result of another problem that was unrelated( and fixed recently as well ). I'm pretty happy with this at the moment.

@louis-6wind
Copy link
Contributor Author

Thank you for the time you spent on this @donaldsharp and @mjstapp
I try to take some times to answer to you this week

@donaldsharp
Copy link
Member

During testing this change introduces these new memory leaks:

commit fb23d4e2b364525cfbea7768ac6fe22d69fd1b16
Author: Louis Scalbert <[email protected]>
Date:   Thu Aug 22 11:12:29 2024 +0200

    zebra: fix table heap-after-free crash
    
    Fix a heap-after-free that causes zebra to crash even without
    address-sanitizer. To reproduce:

@donaldsharp
Copy link
Member

so after some more thought I am pretty happy with this, we need to fix the memory leaks introduced as well as I think we need a topotest for this situation.

@donaldsharp
Copy link
Member

2024-09-04 09:41:14,405 WARNING: topo: r1: zebra exited leaving pidfile /var/run/frr/zebra.pid (562470)
r1: zebra has memory leaks:
zebra: 
  ## showing active allocations in memory group libfrr
zebra:     Nexthop                       :      4 *        152
zebra: 
  ## showing active allocations in memory group Native message allocations
zebra: 
  ## showing active allocations in memory group logging subsystem
zebra: 
  ## showing active allocations in memory group Label Manager
zebra: 
  ## showing active allocations in memory group Table Manager
zebra: 
  ## showing active allocations in memory group SRv6 Manager
zebra: 
  ## showing active allocations in memory group zebra
zebra:     Zebra DPlane Ctx              :      4 *       1520

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there's something missing - here's an example:

$ sudo ip route add blackhole default table 111
$ sudo ip link add RED type vrf table 111

and the frr show output:

ubu-24-arm# do sho ip route table all
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 254:
K>* 0.0.0.0/0 [0/1002] via 10.211.55.2, enp0s5, src 10.211.55.6, weight 1, 00:02:54
C>* 2.2.2.0/24 is directly connected, ANNIE, weight 1, 00:02:54
L>* 2.2.2.1/32 is directly connected, ANNIE, weight 1, 00:02:54
C>* 3.3.3.0/24 is directly connected, BETTY, weight 1, 00:02:54
L>* 3.3.3.1/32 is directly connected, BETTY, weight 1, 00:02:54
K * 10.211.55.0/24 [0/1002] is directly connected, enp0s5, weight 1, 00:02:54
C>* 10.211.55.0/24 is directly connected, enp0s5, weight 1, 00:02:54
K>* 10.211.55.2/32 [0/100] is directly connected, enp0s5, weight 1, 00:02:54
L>* 10.211.55.6/32 is directly connected, enp0s5, weight 1, 00:02:54

ubu-24-arm# 
ubu-24-arm# do sho ip route table 111
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 111:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:39

ubu-24-arm# do sho ip route vrf RED
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF RED:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:53

ubu-24-arm#

@@ -216,8 +220,21 @@ static int zebra_vrf_disable(struct vrf *vrf)
* we no-longer need this pointer.
*/
for (safi = SAFI_UNICAST; safi <= SAFI_MULTICAST; safi++) {
zebra_router_release_table(zvrf, zvrf->table_id, afi,
safi);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear about removing this: if there's an empty table, wouldn't it be ok to call 'release' on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now only remove it it is empty

@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch from 345a1a3 to d0588ad Compare September 20, 2024 15:05
@github-actions github-actions bot added size/L and removed size/M labels Sep 20, 2024
@louis-6wind
Copy link
Contributor Author

I still think there's something missing - here's an example:

$ sudo ip route add blackhole default table 111
$ sudo ip link add RED type vrf table 111

and the frr show output:

ubu-24-arm# do sho ip route table all
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 254:
K>* 0.0.0.0/0 [0/1002] via 10.211.55.2, enp0s5, src 10.211.55.6, weight 1, 00:02:54
C>* 2.2.2.0/24 is directly connected, ANNIE, weight 1, 00:02:54
L>* 2.2.2.1/32 is directly connected, ANNIE, weight 1, 00:02:54
C>* 3.3.3.0/24 is directly connected, BETTY, weight 1, 00:02:54
L>* 3.3.3.1/32 is directly connected, BETTY, weight 1, 00:02:54
K * 10.211.55.0/24 [0/1002] is directly connected, enp0s5, weight 1, 00:02:54
C>* 10.211.55.0/24 is directly connected, enp0s5, weight 1, 00:02:54
K>* 10.211.55.2/32 [0/100] is directly connected, enp0s5, weight 1, 00:02:54
L>* 10.211.55.6/32 is directly connected, enp0s5, weight 1, 00:02:54

ubu-24-arm# 
ubu-24-arm# do sho ip route table 111
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 111:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:39

ubu-24-arm# do sho ip route vrf RED
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF RED:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:53

ubu-24-arm#

You mean that the following ouput was incorrect, right ?

VRF default table 111:

I now display:

Table 111:

@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch 3 times, most recently from de9161f to 48fcd48 Compare September 26, 2024 07:35
Copy link

github-actions bot commented Oct 5, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch from 48fcd48 to 40516f6 Compare October 9, 2024 16:16
@frrbot frrbot bot added the libfrr label Oct 9, 2024
@louis-6wind louis-6wind marked this pull request as draft October 9, 2024 16:16
@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch 3 times, most recently from a5caeb6 to 4b620e6 Compare October 16, 2024 13:24
@louis-6wind louis-6wind marked this pull request as ready for review October 16, 2024 16:13
@louis-6wind
Copy link
Contributor Author

@donaldsharp @mjstapp the pull request is now ready.
I hope I have addresses all your remarks

Copy link

github-actions bot commented Nov 5, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch from 4b620e6 to 4fbd101 Compare November 19, 2024 16:03
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100

Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).

When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.

Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.

Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.

> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
>     #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
>     #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
>     #2 0x7fa32474d783 in route_node_get lib/table.c:283
>     #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
>     FRRouting#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
>     FRRouting#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
>     FRRouting#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
>     FRRouting#7 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#9 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>     FRRouting#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
>     #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
>     #1 0x7fa324668d8f in qfree lib/memory.c:130
>     #2 0x7fa32474c421 in route_table_free lib/table.c:126
>     #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
>     FRRouting#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
>     FRRouting#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
>     FRRouting#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
>     FRRouting#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
>     FRRouting#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
>     FRRouting#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
>     FRRouting#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#13 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
>     #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fa324668c4d in qcalloc lib/memory.c:105
>     #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
>     #3 0x7fa32474e73c in route_table_init lib/table.c:512
>     FRRouting#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
>     FRRouting#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
>     FRRouting#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
>     FRRouting#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
>     FRRouting#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
>     FRRouting#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
>     FRRouting#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#12 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <[email protected]>
When a VRF is deleted, the kernel retains only its own routing entries
in the former VRF table and removes all others.q

This change ensures that routing entries created by FRR daemons are also
removed from the former zebra VRF table when the VRF is disabled.

To test:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip l add du0 type dummy
> ifconfig du0 192.168.0.1/24 up
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l set du0 master red
> vtysh -c 'configure' -c 'vrf red' -c 'ip route 10.0.0.0/24 192.168.0.254'
> vtysh -c 'show ip route table 100'
> sleep 0.1
> ip l del red
> sleep 0.1
> vtysh -c 'show ip route table 100'
> ip l add red type vrf table 100
> ip l set du0 master red
> vtysh -c 'configure' -c 'vrf red' -c 'ip route 10.0.0.0/24 192.168.0.254'
> vtysh -c 'show ip route table 100'
> sleep 0.1
> ip l del red
> sleep 0.1
> vtysh -c 'show ip route table 100'

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <[email protected]>
When a routing table (RT) already has a default route before being
assigned to a VRF, the default route vanishes in zebra after the VRF
assignment.

> root@router:~# ip route add blackhole default table 100
> root@router:~# ip route show table 100
> blackhole default
> root@router:~# vtysh -c 'show ip route table 100'
> [...]
> VRF default table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), 00:00:05
> root@router:~# ip l add red type vrf table 100
> root@router:~# vtysh -c 'show ip route table 100'
> root@router:~#

Do not override the default route if it exists.

Signed-off-by: Louis Scalbert <[email protected]>
Fix vanished blackhole route when kernel routes are updated.

> root@router# echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> root@router# ip l add du0 type dummy
> root@router# ifconfig du0 192.168.0.1/24 up
> root@router# ip route add blackhole default table 100
> root@router# ip route show table 100
> blackhole default
> root@router# vtysh -c 'show ip route table 100'
> [...]
> Table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), weight 1, 00:00:05
> root@router# ip l add red type vrf table 100
> root@router# vtysh -c 'show ip route table 100'
> [...]
> Table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), weight 1, 00:00:16
> root@router# ip l set du0 master red
> root@router# vtysh -c 'show ip route table 100'
> [...]
> Table 100:
> C>* 192.168.0.0/24 is directly connected, du0, weight 1, 00:00:02
> L>* 192.168.0.1/32 is directly connected, du0, weight 1, 00:00:02
> root@router# ip route show table 100
> blackhole default
> 192.168.0.0/24 dev du0 proto kernel scope link src 192.168.0.1
> local 192.168.0.1 dev du0 proto kernel scope host src 192.168.0.1
> broadcast 192.168.0.255 dev du0 proto kernel scope link src 192.168.0.1

Fixes: d528c02 ("zebra: Handle kernel routes appropriately")
Signed-off-by: Louis Scalbert <[email protected]>
Test table ID move to a VRF and the removal of the VRF.

Signed-off-by: Louis Scalbert <[email protected]>
Add a flag to mean that a route entry is owned by a table ID and not by
a VRF. If the VRF associated to the table ID is deleted, the route
entry must not be deleted.

Update to tests with new flag. 2057 is in hexa 0x809, meaning that the
new flag has been to some prefix.

Signed-off-by: Louis Scalbert <[email protected]>
At VRF disabling, keep the route entries that was associated to its
table ID but not to the VRF itself. Kernel flushes these entries so we
need to reinstall them.

Signed-off-by: Louis Scalbert <[email protected]>
Test table ID move to a VRF and the removal of the VRF.

Signed-off-by: Louis Scalbert <[email protected]>
@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch from 4fbd101 to 736a1c7 Compare November 20, 2024 16:16
@louis-6wind
Copy link
Contributor Author

ci:rerun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants