Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frr-reload: fix single-line context command processing #12542

Closed
wants to merge 3,933 commits into from

Conversation

askorichenko
Copy link
Contributor

@askorichenko askorichenko commented Dec 19, 2022

If a single-line command shares same syntax with a context command (e.g. 'bgp graceful-shutdown', "Graceful shutdown can be configured per BGP instance or globally for all of BGP."),
then frr-reload.py has a chance to apply intended top-level command to a wrong context. Example of 'frr-reload.py --test' output:

Loading Config object from file /etc/frr/bgpd.conf
LINE bgp graceful-shutdown    : single-line context
LINE router bgp 65001         : enter context ['router bgp 65001']
LINE bgp default ipv4-unicast : add to context ['router bgp 65001']
...
Loading Config object from vtysh show running
...
Lines To Add
============
router bgp 65001
bgp default ipv4-unicast
bgp graceful-shutdown

The diff file composed of these lines makes 'vtysh' interpret 'bgp graceful-shutdown' belonging to 'router bgp 65001' context and performs a local BGP instance graceful-shutdown instead.

With this patch frr-reload.py keeps track of the context depth for the previous command and writes a single-line context command into the diff file prepended with appropriate amount of 'exit's from the previous context.

All 3 modes (--reload, --test, --test --test-reset) were modified.

Signed-off-by: Alexander Skorichenko [email protected]

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Dec 19, 2022

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-8961/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-8962/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it might be okay with this patch, overall, especially this command bgp graceful-shutdown can't be applied both globally and per instance. I'm curious what is the specific configuration to handle here in general?

@askorichenko
Copy link
Contributor Author

Thanks. We have FRR-configuration process implemented via config rewrite and calling to frr-reload.py,
in the case of two BGP routers only one was affected by intended "global" bgp graceful-shutdown command.
e.g. when reverting changes by loading saved bgpd.conf.

  • One way to run across this issue is to have some nonempty running config and reload from saved config. Then frr-reload starts shuffling commands with respect to existing contexts, and usually puts bgp graceful-shutdown behind some router bgp context.
  • The other is to use some obsoleted/changed command, like the option bgp default ipv4-unicast from the commit message (I know it should be hidden by default), but it keeps popping up in frr-reload forcing the graceful-shutdown to the wrong context. I found it handy when tested, the point is to have some config file changes that accompany bgp graceful-shutdown command and bring it to the wrong context.

Just to make sure your remark is clear to me, you are saying that bgp graceful-shutdown can't be applied globally if it's already called per some instance (and vice versa). Ok, if you meant something else, please explain.

@ton31337
Copy link
Member

ton31337 commented Jan 2, 2023

Overall, I like the idea of tracking depth and exit appropriately.

Just to make sure your remark is clear to me, you are saying that bgp graceful-shutdown can't be applied globally if it's already called per some instance (and vice versa). Ok, if you meant something else, please explain.

Yep, it can't be globally and per instance. Mutually exclusive.

@askorichenko
Copy link
Contributor Author

Hi,
Is there some final conclusion on the patch?
Thanks!

# If current line is a single-line context command
# prepend it with proper number of 'exit's from
# previous context. This prevents global 'bgp graceful-shutdown'
# from running in a local 'router bgp ...' context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should add exit statements for lots of scenarios that don't involve bgp graceful-shutdown, right? If so, please remove this comment.

@github-actions github-actions bot added the rebase PR needs rebase label Nov 13, 2023
donaldsharp and others added 17 commits November 14, 2023 08:18
I am seeing this output:
2022/12/16 09:16:00.206 BGP: [MNE5N-K0G4Z] Resetting peer (null) due to change in addpath config

Switch over to %pBP

Signed-off-by: Donald Sharp <[email protected]>
Currently when zebra receives a RA with optional types, note
the optional types that we are ignoring.

Signed-off-by: Donald Sharp <[email protected]>
Zebra has a shutdown setup where it asks the dplane to shutdown but can
still be processing data.  This is especially true if something the dplane
is listening on receives data that will be processed by the main dplane thread
from netlink.   When zebra_finalize is called it is possible that a bit
of data comes in before the zebra_dplane_shutdown() function is called
and the memory freed in ns_walk_func() causes the main dplane event
to crash when it cannot find the ns data anymore.

Reverse the order, stop the zebra dplane pthread and then free the
memory associated with the namespaces.

Signed-off-by: Donald Sharp <[email protected]>
The wrong parameter is passed in `inet_ntop()` of `zfpm_log_route_info()` in
old fpm module, so the display of gateway is always wrong. Just remove
that extra ampersand.

Additionally, use "none" as gateway value for the case of no gateway.

Signed-off-by: anlan_cs <[email protected]>
The function bgp_packet_mpattr_prefix_size had an if/else
body that allowed people to add encoding types to bgpd
such that we could build the wrong size packets.  This
was exposed recently in commit:
0a9705a

Where it was discovered flowspec was causing bgp update
messages to exceed the maximum size and the peer to
drop the connection.  Let's be proscriptive about this
and hopefully make it so that things don't work when
someone adds a new safi to the system ( and they'll have
to update this function ).

Signed-off-by: Donald Sharp <[email protected]>
This function was just using default: case statements for
the encoding of nlri's to a peer.  Lay out all the different
cases and make things fail hard when a dev escape is found.

Signed-off-by: Donald Sharp <[email protected]>
The function bgp_packet_mpattr_prefix was using an if statement
to encode packets to the peer.  Change it to a switch and make
it handle all the cases and fail appropriately when something
has gone wrong.  Hopefully in the future when a new afi/safi
is added we can catch it by compilation breaking instead of
weird runtime errors

Signed-off-by: Donald Sharp <[email protected]>
Fix an SA warning about a possibly-uninited local.

Signed-off-by: Mark Stapp <[email protected]>
Changes:
JSON support added for below commands,
     - show ip nht route-map vrf all json
     - show ip nht route-map vrf <name> json
     - show ipv6 nht route-map vrf all json
     - show ipv6 nht route-map vrf <name> json
     - show ipv6 nht route-map json
     - show ip nht route-map json

Testing Done: Unit testing completed.

tor-1# show ip nht route-map vrf default json
{
  "afi":"ipv4",
  "vrfs":{
	"default":{
	  "protocols":{
		"system":"none",
		"kernel":"none",
		"connected":"connected-policy",
		"static":"none",
		"rip":"none",
		"ripng":"none",
		"ospf":"none",
		"ospf6":"none",
		"isis":"none",
		"bgp":"bgp-policy",
		"pim":"none",
		"eigrp":"none",
		"nhrp":"none",
		"hsls":"none",
		"olsr":"none",
		"table":"none",
		"ldp":"none",
		"vnc":"none",
		"vnc-direct":"none",
		"vnc-rn":"none",
		"bgp-direct":"none",
		"bgp-direct-to-nve-groups":"none",
		"babel":"none",
		"sharp":"none",
		"pbr":"none",
		"bfd":"none",
		"openfabric":"none",
		"vrrp":"none",
		"zebra":"none",
		"frr":"none",
		"wildcard":"none",
		"any":"none"
	  }
	}
  }
}

tor-1# show ip nht route-map vrf all json
{
  "afi":"ipv4",
  "vrfs":{
	"default":{
	  "protocols":{
		"system":"none",
		"kernel":"none",
		"connected":"connected-policy",
		"static":"none",
		"rip":"none",
		"ripng":"none",
		"ospf":"none",
		"ospf6":"none",
		"isis":"none",
		"bgp":"bgp-policy",
		"pim":"none",
		"eigrp":"none",
		"nhrp":"none",
		"hsls":"none",
		"olsr":"none",
		"table":"none",
		"ldp":"none",
		"vnc":"none",
		"vnc-direct":"none",
		"vnc-rn":"none",
		"bgp-direct":"none",
		"bgp-direct-to-nve-groups":"none",
		"babel":"none",
		"sharp":"none",
		"pbr":"none",
		"bfd":"none",
		"openfabric":"none",
		"vrrp":"none",
		"zebra":"none",
		"frr":"none",
		"wildcard":"none",
		"any":"none"
	  }
	},
	"mgmt":{
	  "protocols":{
		"system":"none",
		"kernel":"none",
		"connected":"none",
		"static":"none",
		"rip":"none",
		"ripng":"none",
		"ospf":"none",
		"ospf6":"none",
		"isis":"none",
		"bgp":"none",
		"pim":"none",
		"eigrp":"none",
		"nhrp":"none",
		"hsls":"none",
		"olsr":"none",
		"table":"none",
		"ldp":"none",
		"vnc":"none",
		"vnc-direct":"none",
		"vnc-rn":"none",
		"bgp-direct":"none",
		"bgp-direct-to-nve-groups":"none",
		"babel":"none",
		"sharp":"none",
		"pbr":"none",
		"bfd":"none",
		"openfabric":"none",
		"vrrp":"none",
		"zebra":"none",
		"frr":"none",
		"wildcard":"none",
		"any":"none"
	  }
	},
	"sym_1":{
	  "protocols":{
		"system":"none",
		"kernel":"none",
		"connected":"none",
		"static":"none",
		"rip":"none",
		"ripng":"none",
		"ospf":"none",
		"ospf6":"none",
		"isis":"none",
		"bgp":"bgp-policy",
		"pim":"none",
		"eigrp":"none",
		"nhrp":"none",
		"hsls":"none",
		"olsr":"none",
		"table":"none",
		"ldp":"none",
		"vnc":"none",
		"vnc-direct":"none",
		"vnc-rn":"none",
		"bgp-direct":"none",
		"bgp-direct-to-nve-groups":"none",
		"babel":"none",
		"sharp":"none",
		"pbr":"none",
		"bfd":"none",
		"openfabric":"none",
		"vrrp":"none",
		"zebra":"none",
		"frr":"none",
		"wildcard":"none",
		"any":"none"
	  }
	}
  }
}

tor-1# show ipv6 nht route-map vrf default json
{
  "afi":"ipv6",
  "vrfs":{
	"default":{
	  "protocols":{
		"system":"none",
		"kernel":"kernel-policy",
		"connected":"connected-policy",
		"static":"none",
		"rip":"none",
		"ripng":"none",
		"ospf":"none",
		"ospf6":"none",
		"isis":"none",
		"bgp":"none",
		"pim":"none",
		"eigrp":"none",
		"nhrp":"none",
		"hsls":"none",
		"olsr":"none",
		"table":"none",
		"ldp":"none",
		"vnc":"none",
		"vnc-direct":"none",
		"vnc-rn":"none",
		"bgp-direct":"none",
		"bgp-direct-to-nve-groups":"none",
		"babel":"none",
		"sharp":"none",
		"pbr":"none",
		"bfd":"none",
		"openfabric":"none",
		"vrrp":"none",
		"zebra":"none",
		"frr":"none",
		"wildcard":"none",
		"any":"none"
	  }
	}
  }
}

Ticket:#3229016
Issue:3229016

Signed-off-by: Sindhu Parvathi Gopinathan <[email protected]>
show ip ospf border-routers json support added.
commands:
  - show ip ospf vrf default border-routers json
  - show ip ospf vrf all border-routers json
  - show ip ospf border-routers json

Testing Done: Unit testing completed.

rut# show ip ospf vrf all border-routers json
{
  "default":{
    "vrfName":"default",
    "vrfId":0,
    "routers":{
      "0.0.0.8":{
        "routeType":"R ",
        "cost":10,
        "area":"0.0.0.1",
        "routerType":"abr",
        "nexthops":[
          {
            "ip":"12.0.0.2",
            "via":"swp1"
          }
        ]
      },
      "0.0.0.9":{
        "routeType":"R ",
        "cost":10,
        "area":"0.0.0.1",
        "routerType":"abr",
        "nexthops":[
          {
            "ip":"12.0.1.2",
            "via":"swp2"
          }
        ]
      }
    }
  }
}

rut#
rut# show ip ospf vrf all border-routers json
{
  "default":{
	"vrfName":"default",
	"vrfId":0,
        "routers":{
  	  "0.0.0.15":{
	    "routeType":"R ",
	    "cost":30,
	    "area":"0.0.0.0",
	    "routerType":"abr",
	    "nexthops":[
		{
		  "ip":"11.0.0.2",
		  "via":"br1"
		}
	     ]
	  }
      }
  }
}

rut# show ip ospf border-routers json
{
  "routers":{
    "0.0.0.15":{
      "routeType":"R ",
      "cost":30,
      "area":"0.0.0.0",
      "routerType":"abr",
      "nexthops":[
        {
	  "ip":"11.0.0.2",
          "via":"br1"
	}
      ]
   }
 }
}

Ticket:#3229017
Issue:3229017

Co-authored-by: Chirag Shah <[email protected]>
Signed-off-by: Sindhu Parvathi Gopinathan <[email protected]>
…ster and slave.

Problem Statement:
=================
Memory leak backtraces

2022-11-23 01:51:10,525 - ERROR: ==842== 1,100 (1,000 direct, 100 indirect) bytes in 5 blocks are definitely lost in loss record 29 of 31
2022-11-23 01:51:10,525 - ERROR: ==842==    at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13B64B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13B64B: ospf6_dbdesc_recv (ospf6_message.c:1038)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13B64B: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13B64B: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,526 - ERROR: ==842==    by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,526 - ERROR: ==842==    by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,526 - ERROR: ==842==

2022-11-23 01:51:10,524 - ERROR: Found memory leak in module ospf6d
2022-11-23 01:51:10,525 - ERROR: ==842== 220 (200 direct, 20 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 31
2022-11-23 01:51:10,525 - ERROR: ==842==    at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13BBCE: ospf6_dbdesc_recv_master (ospf6_message.c:760)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13BBCE: ospf6_dbdesc_recv (ospf6_message.c:1036)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13BBCE: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x13BBCE: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,525 - ERROR: ==842==    by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,525 - ERROR: ==842==

RCA:
====
These memory leaks are beacuse of last lsa in neighbour's request_list is not
getting freed beacuse of lsa lock. The last request has an addtional lock which
is added as a part of ospf6_make_lsreq, this lock needs to be removed
in order for the lsa to get freed.

Fix:
====
Check and remove the lock on the last request in all the functions.

Signed-off-by: Manoj Naragund <[email protected]>
*** CID 1530035:  Null pointer dereferences  (FORWARD_NULL)
/bgpd/bgp_updgrp_packet.c: 756 in subgroup_update_packet()
750                              * position.
751                              */
752                             mpattr_pos = stream_get_endp(s);
753
754                             /* 5: Encode all the attributes, except MP_REACH_NLRI
755                              * attr. */
>>>     CID 1530035:  Null pointer dereferences  (FORWARD_NULL)
>>>     Passing null pointer "path" to "bgp_packet_attribute", which dereferences it.
756                             total_attr_len = bgp_packet_attribute(
757                                     NULL, peer, s, adv->baa->attr, &vecarr, NULL,
758                                     afi, safi, from, NULL, NULL, 0, 0, 0, path);
759
760                             space_remaining =
761                                     STREAM_CONCAT_REMAIN(s, snlri, STREAM_SIZE(s))

Signed-off-by: Donatas Abraitis <[email protected]>
After `free()`ing a table also set it to NULL so when the instance
release function is called we know whether the pointer is valid or not.

Signed-off-by: Rafael Zalamena <[email protected]>
a) if show_function happened to be NULL we would leak json memory
b) json_lsa_type was being allocated but only used in the default case, leaking memory
c) json output would sometimes produce text output and that is incorrect

Signed-off-by: Donald Sharp <[email protected]>
donaldsharp and others added 2 commits November 14, 2023 08:18
Commit: 3cdb03f
changed the vty_json output to not be pretty printing.
The previous commit in the tree added vty_json_no_pretty
let's use that instead

Signed-off-by: Donald Sharp <[email protected]>
If a single-line command shares same syntax with a context command
(e.g. 'bgp graceful-shutdown', "Graceful shutdown can be configured
per BGP instance or globally for all of BGP."),
then frr-reload.py has a chance to apply intended top-level command
to a wrong context. Example of 'frr-reload.py --test' output:

Loading Config object from file /etc/frr/bgpd.conf
LINE bgp graceful-shutdown    : single-line context
LINE router bgp 65001         : enter context ['router bgp 65001']
LINE bgp default ipv4-unicast : add to context ['router bgp 65001']
...
Loading Config object from vtysh show running
...
Lines To Add
============
router bgp 65001
bgp default ipv4-unicast
bgp graceful-shutdown

The diff file composed of these lines makes 'vtysh' interpret
'bgp graceful-shutdown' belonging to 'router bgp 65001' context
and performs a local BGP instance graceful-shutdown instead.

With this patch frr-reload.py keeps track of the context depth
for the previous command and writes a single-line context command
into the diff file prepended with appropriate amount of 'exit's
from the previous context.

All 3 modes (--reload, --test, --test --test-reset) were modified.

Signed-off-by: Alexander Skorichenko <[email protected]>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@qlyoung
Copy link
Member

qlyoung commented Nov 14, 2023

Looks like a rebase or something went wrong; please rebase it again

@askorichenko
Copy link
Contributor Author

The issue was fixed with tools: always append "exit" in frr-reload.py #10136 hence the rebase conflict.
Thanks to Igor Ryzhov.
Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.