Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport some Fixes to 8.5 #14851

Closed
wants to merge 349 commits into from
Closed

Conversation

donaldsharp
Copy link
Member

commit a0c4ec2 (HEAD -> backport_itis, donald/backport_itis)
Author: Donatas Abraitis [email protected]
Date: Sun Oct 29 22:44:45 2023 +0200

bgpd: Ignore handling NLRIs if we received MP_UNREACH_NLRI

and:
commit 01f232c
Author: Donatas Abraitis [email protected]
Date: Fri Oct 27 11:56:45 2023 +0300

bgpd: Treat EOR as withdrawn to avoid unwanted handling of malformed attrs

are missing on 8.5

ton31337 and others added 30 commits March 18, 2023 11:31
Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit a1e5381)
Before this patch, if we destroy `any` flag for a prefix-list entry, we always
set destination as 0.0.0.0/0 and/or ::/0.

This means that, if we switch from `ip prefix-list r1-2 seq 5 deny any` to
`ip prefix-list r1-2 seq 5 permit 10.10.10.10/32` we will have
`permit any` eventually, which broke ACLs.

Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit 61c07b9)
….5/pr-13024

lib: Adjust only any flag for prefix-list entries if destroying (backport FRRouting#13024)
Clean up some memory leaks found in ospf_apiserver.c  Also
a crash in the original implementation.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 7773d00)
upstream commit 67765a2 has incorect
address family check which prevent from
deleting src/dst config under pbr rule.

Ticket:#3405024
Issue:3405024
Testing Done:

Config:

pbr-map map6 seq 1
 match src-ip 2000::200:100:100:0/96
 match dst-ip 2000::100:100:100:0/96
 set nexthop-group group3

Before:
torc-12(config)# pbr-map map6 seq 1
torc-12(config-pbr-map)# no match src-ip 2000::200:100:100:0/96
Cannot mismatch families within match src/dst

After:
torc-12(config)# pbr-map map6 seq 1
torc-12(config-pbr-map)# no match src-ip 2000::200:100:100:0/96
torc-12(config-pbr-map)#

Signed-off-by: Chirag Shah <[email protected]>
(cherry picked from commit 0349488)
The new_rtrs variable was just generated and then dropped.  Let's
fix that entirely

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit eb7e140)
Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 52ff0e3)
….5/pr-13026

pbrd:fix mismatching in match src-dst (backport FRRouting#13026)
Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit d8986f0)
The same as 61c07b9, but forgot to put IPv6
in place.

Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit 14c1e0a)
….5/pr-13049

lib: IPv6 prefix-list entry handling with `any` (backport FRRouting#13049)
Crash:

(gdb) bt
0  0x00007fee27de15cb in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
1  0x00007fee280ecd9c in core_handler (signo=11, siginfo=0x7ffe56001bb0, context=<optimized out>) at lib/sigevent.c:264
2  <signal handler called>
3  0x0000555e321c41b2 in prefix_rd2str (prd=0x10, buf=buf@entry=0x7ffe56002080 "27.0.0.R\340\373\062\062^U", size=size@entry=28) at bgpd/bgp_rd.c:168
4  0x0000555e321c431a in printfrr_prd (buf=0x7ffe560021a0, ea=<optimized out>, ptr=<optimized out>) at bgpd/bgp_rd.c:224
5  0x00007fee2812069b in vbprintfrr (cb_in=cb_in@entry=0x7ffe56002330, fmt0=fmt0@entry=0x555e3229a3ad " RD: %pRD\n", ap=ap@entry=0x7ffe560023d8) at lib/printf/vfprintf.c:564
6  0x00007fee28122ef7 in vasnprintfrr (mt=mt@entry=0x7fee281cb5e0 <MTYPE_VTY_OUT_BUF>, out=out@entry=0x7ffe560023f0 " RD: : R\n", outsz=outsz@entry=1024, fmt=fmt@entry=0x555e3229a3ad " RD: %pRD\n", ap=ap@entry=0x7ffe560023d8) at lib/printf/glue.c:103
7  0x00007fee28103504 in vty_out (vty=vty@entry=0x555e33f82d10, format=format@entry=0x555e3229a3ad " RD: %pRD\n") at lib/vty.c:190
8  0x0000555e32185156 in bgp_evpn_es_show_entry_detail (vty=0x555e33f82d10, es=0x555e33c38420, json=<optimized out>) at bgpd/bgp_evpn_mh.c:2655
9  0x0000555e32188fe5 in bgp_evpn_es_show (vty=vty@entry=0x555e33f82d10, uj=false, detail=true) at bgpd/bgp_evpn_mh.c:2721
notice prd=0x10 in #3.  This is because in bgp_evpn_mh.c we are sending &es->es_base_frag->prd.

There is one spot in the code where during output the es->es_base_frag is checked for non nullness
Let's just make sure it's right in all the places.

Signed-off-by: Donald Sharp <[email protected]>
…l_deref_8.5

bgpd: Prevent Null pointer deref when outputting data
Due to the wrong input argv id, "argv[idx_word]->arg"
fetched in-correctly and it clears all the route-maps instead of
specific one.

Now correct argv id is passed to clear the given route-map counters.

Also, use RMAP_NAME which allows to show list of configured
route-maps in the system.

After Fix:-

Ticket:#3407773
Issue:3407773

Testing: UT done

Before:
TORC11# clear route-map counters
  <cr>
  WORD  route-map name

After:

TORC11# clear route-map counters
  <cr>
  RMAP_NAME  route-map name
     my-as

Signed-off-by: Chirag Shah <[email protected]>
Signed-off-by: Sindhu Parvathi Gopinathan's <[email protected]>
(cherry picked from commit 463110f)
When deleting a bfd peer during shutdown, let's ensure
that any scheduled events are actually stopped.

==7759== Invalid read of size 4
==7759==    at 0x48BF700: _bfd_sess_valid (bfd.c:419)
==7759==    by 0x48BF700: _bfd_sess_send (bfd.c:470)
==7759==    by 0x492F79C: thread_call (thread.c:2008)
==7759==    by 0x48E9BD7: frr_run (libfrr.c:1223)
==7759==    by 0x1C739B: main (bgp_main.c:550)
==7759==  Address 0xfb687a4 is 4 bytes inside a block of size 272 free'd
==7759==    at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==7759==    by 0x48BFA5A: bfd_sess_free (bfd.c:535)
==7759==    by 0x2B7034: bgp_peer_remove_bfd (bgp_bfd.c:339)
==7759==    by 0x29FF8A: peer_free (bgpd.c:1160)
==7759==    by 0x29FF8A: peer_unlock_with_caller (bgpd.c:1192)
==7759==    by 0x2A0506: peer_delete (bgpd.c:2633)
==7759==    by 0x208190: bgp_stop (bgp_fsm.c:1639)
==7759==    by 0x20C082: bgp_event_update (bgp_fsm.c:2751)
==7759==    by 0x492F79C: thread_call (thread.c:2008)
==7759==    by 0x48E9BD7: frr_run (libfrr.c:1223)
==7759==    by 0x1C739B: main (bgp_main.c:550)
==7759==  Block was alloc'd at
==7759==    at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==7759==    by 0x48F53AF: qcalloc (memory.c:116)
==7759==    by 0x48BF98D: bfd_sess_new (bfd.c:397)
==7759==    by 0x2B76DC: bgp_peer_configure_bfd (bgp_bfd.c:298)
==7759==    by 0x2B76DC: bgp_peer_configure_bfd (bgp_bfd.c:279)
==7759==    by 0x29BA06: peer_group2peer_config_copy (bgpd.c:2803)
==7759==    by 0x2A3D96: peer_create_bind_dynamic_neighbor (bgpd.c:4107)
==7759==    by 0x2A4195: peer_lookup_dynamic_neighbor (bgpd.c:4239)
==7759==    by 0x21AB72: bgp_accept (bgp_network.c:422)
==7759==    by 0x492F79C: thread_call (thread.c:2008)
==7759==    by 0x48E9BD7: frr_run (libfrr.c:1223)
==7759==    by 0x1C739B: main (bgp_main.c:550)

tl;dr -> Effectively, in this test setup we have 300 dynamic bgp
sessions all of which are using bfd.  When a peer collision is detected
or we remove the peers, if an event has been scheduled but not actually
executed yet the event event was not actually being stopped, leaving
the bsp pointer on the thread->arg and causing a crash when it is
executed.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit f83431c)
….5/pr-13061

lib: on bfd peer shutdown actually stop event (backport FRRouting#13061)
….5/pr-13064

lib: fix clear route-map command (backport FRRouting#13064)
Issue:
When a netns is deleted, since zebra doesn’t receive interface down/delete
notifications from kernel, it manually deletes the interface without removing
the association between zebra_l3vni and the interface that is being deleted
(i.e it deletes the interface without setting “zl3vni->vxlan_if” to NULL).

Later, during the deletion of netns, when zl3vni_rmac_uninstall() is called to
uninstall the remote RMAC from the kernel, zebra ends up accessing stale
“zl3vni->vxlan_if” pointer, which now points to freed memory.
This was causing heap use-after-free.

Fix:
Before zebra starts deleting the interfaces when it receives netns delete notification,
appropriate functions() are being called to remove the association between evpn structs
and interface and set “zl3vni->vxlan_if” to NULL. This ensures that when
zl3vni_rmac_uninstall() is called during netns deletion, it will bail because
“zl3vni->vxlan_if” is NULL.

Signed-off-by: Pooja Jagadeesh Doijode <[email protected]>
(cherry picked from commit 7eefea9)
….5/pr-13062

zebra: Fix for heap-use-after-free in EVPN (backport FRRouting#13062)
Prevent a use after free and tell the bfd subsystem
we are shutting down in staticd.

./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460:==2264460==ERROR: AddressSanitizer: heap-use-after-free on address 0x61f000004698 at pc 0x7f65d1eb11b2 bp 0x7ffdbface490 sp 0x7ffdbface488
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-READ of size 4 at 0x61f000004698 thread T0
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #0 0x7f65d1eb11b1 in zclient_bfd_command lib/bfd.c:307
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #1 0x7f65d1eb20f5 in _bfd_sess_send lib/bfd.c:507
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #2 0x7f65d20510aa in thread_call lib/thread.c:1989
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #3 0x7f65d2051f0a in _thread_execute lib/thread.c:2081
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #4 0x7f65d1eb271b in _bfd_sess_remove lib/bfd.c:544
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #5 0x7f65d1eb278d in bfd_sess_free lib/bfd.c:553
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #6 0x7f65d1eb5400 in bfd_protocol_integration_finish lib/bfd.c:1029
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #7 0x7f65d1f42f77 in hook_call_frr_fini lib/libfrr.c:41
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #8 0x7f65d1f494a1 in frr_fini lib/libfrr.c:1199
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #9 0x563b7abefd76 in sigint staticd/static_main.c:70
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #10 0x7f65d200ef91 in frr_sigevent_process lib/sigevent.c:115
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #11 0x7f65d204fac6 in thread_fetch lib/thread.c:1758
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #12 0x7f65d1f49377 in frr_run lib/libfrr.c:1184
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #13 0x563b7abefed1 in main staticd/static_main.c:160
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #14 0x7f65d1b92d09 in __libc_start_main ../csu/libc-start.c:308
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-    #15 0x563b7abefa99 in _start (/usr/lib/frr/staticd+0x15a99)
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 7a185ac)
….5/pr-13073

staticd: Tell bfd that we are shutting down (backport FRRouting#13073)
Description:
	After area range config, summary lsas are aggerated to configured
	route but later it was being flushed instead of the actual summary
	lsa. This was seen when prefix-id of the aggregated route is same
	as one of the actual summary route.
	Here, aggregated summary lsa need to be returned to set the flag
	SUMMARY_APPROVE after originating aggregated summary lsa but its not.
	Which is being cleaned up as part of unapproved summary cleanup.
	Corrected this now.

Issue: FRRouting#13028

Signed-off-by: Rajesh Girada <[email protected]>
(cherry picked from commit c8c1a24)
We have this valgrind trace:

==1125== Invalid read of size 4
==1125==    at 0x170A7D: pim_if_delete (pim_iface.c:203)
==1125==    by 0x170C01: pim_if_terminate (pim_iface.c:80)
==1125==    by 0x174F34: pim_instance_terminate (pim_instance.c:68)
==1125==    by 0x17535B: pim_vrf_terminate (pim_instance.c:260)
==1125==    by 0x1941CF: pim_terminate (pimd.c:161)
==1125==    by 0x1B476D: pim_sigint (pim_signals.c:44)
==1125==    by 0x4910C22: frr_sigevent_process (sigevent.c:133)
==1125==    by 0x49220A4: thread_fetch (thread.c:1777)
==1125==    by 0x48DC8E2: frr_run (libfrr.c:1222)
==1125==    by 0x15E12A: main (pim_main.c:176)
==1125==  Address 0x6274d28 is 1,192 bytes inside a block of size 1,752 free'd
==1125==    at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1125==    by 0x174FF1: pim_vrf_delete (pim_instance.c:181)
==1125==    by 0x4925480: vrf_delete (vrf.c:264)
==1125==    by 0x4925480: vrf_delete (vrf.c:238)
==1125==    by 0x49332C7: zclient_vrf_delete (zclient.c:2187)
==1125==    by 0x4934319: zclient_read (zclient.c:4003)
==1125==    by 0x492249C: thread_call (thread.c:2008)
==1125==    by 0x48DC8D7: frr_run (libfrr.c:1223)
==1125==    by 0x15E12A: main (pim_main.c:176)
==1125==  Block was alloc'd at
==1125==    at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1125==    by 0x48E80AF: qcalloc (memory.c:116)
==1125==    by 0x1750DA: pim_instance_init (pim_instance.c:90)
==1125==    by 0x1750DA: pim_vrf_new (pim_instance.c:161)
==1125==    by 0x4924FDC: vrf_get (vrf.c:183)
==1125==    by 0x493334C: zclient_vrf_add (zclient.c:2157)
==1125==    by 0x4934319: zclient_read (zclient.c:4003)
==1125==    by 0x492249C: thread_call (thread.c:2008)
==1125==    by 0x48DC8D7: frr_run (libfrr.c:1223)
==1125==    by 0x15E12A: main (pim_main.c:176)

and you do this series of events:

a) Create a vrf, put an interface in it
b) Turn on pim on that interface and turn on pim in that vrf
c) Delete the vrf
d) Do anything with the interface, in this case shutdown the system

The move of the interface to a new vrf is leaving the pim_ifp->pim pointer pointing
at the old pim instance, which was just deleted, so the instance pointer was freed.

Let's clean up the pim pointer in the interface pointer as well.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit e60308f)
….5/pr-13065

ospfd: Fixing Summary origination after range configuration (backport FRRouting#13065)
It's not 4 bytes, it was assuming the same as Graceful-Restart tuples.

LLGR has more 3 bytes (Long-lived Stale Time).

Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit b1d33ec)
….5/pr-13088

pimd: Fix use after free issue for ifp's moving vrfs (backport FRRouting#13088)
….5/pr-13100

bgpd: Check 7 bytes for Long-lived Graceful-Restart capability (backport FRRouting#13100)
donaldsharp and others added 10 commits October 25, 2023 15:35
….5/pr-14645

bgpd: A couple more bgpd crashes on malformed attributes (backport FRRouting#14645)
'detail' and 'josn' keyword is given as an optional parameter
for cli arguments. Hence 'detail' keyword was consider as a
pbr 'name' for "show pbr map detail json" command.

Before Fix:

```
cumulus#
cumulus# show pbr map detail json
[
]
cumulus#
```

After Fix:
```
cumulus# show pbr map detail json
[
  {
    "name":"MAP1",
    "valid":false,
    "policies":[
      {
        "id":1,
        "sequenceNumber":10,
        "ruleNumber":309,
        "vrfUnchanged":false,
        "installed":false,
        "installedReason":"Invalid Src or Dst",
        "vrfName":"default"
      }
    ]
  }
]
cumulus#
```

Ticket:#3638600

Issue:3638600

Testing: UT done

Signed-off-by: Sindhu Parvathi Gopinathan's <[email protected]>
(cherry picked from commit d621c36)
Fix the documentation for the pbr map command with
correct syntax.

Signed-off-by: Chirag Shah <[email protected]>
(cherry picked from commit df3d91f)
….5/pr-14665

pbrd: fix show pbr map detail json (backport FRRouting#14665)
Display ptmStatus in correct order in show interface json
output.

Signed-off-by: Sindhu Parvathi Gopinathan's <[email protected]>
(cherry picked from commit aa6dab0)
….5/pr-14681

zebra: display ptmStatus order in interface json (backport FRRouting#14681)
Currently in the single nexthop case w/ evpn sending
down via the FPM the encap type is not being set
for the nexthop.

This looks like the result of some code reorg for the
nexthop happened but the fpm failed to be accounted for.
Let's just move the encap type encoding to where it
will happen.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 4ac659f)
….5/pr-14675

zebra: Add encap type when building packet for FPM (backport FRRouting#14675)
…attrs

Treat-as-withdraw, otherwise if we just ignore it, we will pass it to be
processed as a normal UPDATE without mandatory attributes, that could lead
to harmful behavior. In this case, a crash for route-maps with the configuration
such as:

```
router bgp 65001
 no bgp ebgp-requires-policy
 neighbor 127.0.0.1 remote-as external
 neighbor 127.0.0.1 passive
 neighbor 127.0.0.1 ebgp-multihop
 neighbor 127.0.0.1 disable-connected-check
 neighbor 127.0.0.1 update-source 127.0.0.2
 neighbor 127.0.0.1 timers 3 90
 neighbor 127.0.0.1 timers connect 1
 !
 address-family ipv4 unicast
  neighbor 127.0.0.1 addpath-tx-all-paths
  neighbor 127.0.0.1 default-originate
  neighbor 127.0.0.1 route-map RM_IN in
 exit-address-family
exit
!
route-map RM_IN permit 10
 set as-path prepend 200
exit
```

Send a malformed optional transitive attribute:

```
import socket
import time

OPEN = (b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"
b"\xff\xff\x00\x62\x01\x04\xfd\xea\x00\x5a\x0a\x00\x00\x01\x45\x02"
b"\x06\x01\x04\x00\x01\x00\x01\x02\x02\x02\x00\x02\x02\x46\x00\x02"
b"\x06\x41\x04\x00\x00\xfd\xea\x02\x02\x06\x00\x02\x06\x45\x04\x00"
b"\x01\x01\x03\x02\x0e\x49\x0c\x0a\x64\x6f\x6e\x61\x74\x61\x73\x2d"
b"\x70\x63\x00\x02\x04\x40\x02\x00\x78\x02\x09\x47\x07\x00\x01\x01"
b"\x80\x00\x00\x00")

KEEPALIVE = (b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"
b"\xff\xff\xff\xff\xff\xff\x00\x13\x04")

UPDATE = bytearray.fromhex("ffffffffffffffffffffffffffffffff002b0200000003c0ff00010100eb00ac100b0b001ad908ac100b0b")

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.2', 179))
s.send(OPEN)
data = s.recv(1024)
s.send(KEEPALIVE)
data = s.recv(1024)
s.send(UPDATE)
data = s.recv(1024)
time.sleep(100)
s.close()
```

Reported-by: Iggy Frankovic <[email protected]>
Signed-off-by: Donatas Abraitis <[email protected]>
If we receive MP_UNREACH_NLRI, we should stop handling remaining NLRIs if
no mandatory path attributes received.

In other words, if MP_UNREACH_NLRI received, the remaining NLRIs should be handled
as a new data, but without mandatory attributes, it's a malformed packet.

In normal case, this MUST not happen at all, but to avoid crashing bgpd, we MUST
handle that.

Reported-by: Iggy Frankovic <[email protected]>
Signed-off-by: Donatas Abraitis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.