-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bgp filter fun #15466
Bgp filter fun #15466
Conversation
@Mergifyio backport dev/10.0 stable/9.1 stable/9.0 stable/8.5 stable/8.4 |
✅ Backports have been created
|
f93a9ba
to
64b5d2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
64b5d2e
to
c0cdeda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this part of the code be executed when withdraw previously announced default route to the neighbor (I think no matter if it comes from other peer or from default-originate. Maybe the comment in the "if(withdraw)" scope are not correct) ?
I dont clearly understand the logic in subgroup_default_originate() , but maybe is wrong to work with "attr" structure here. "attr" is a something new and blank init and populated above every time when you enter into the function, also it do not carry previously sets attr (maybe ?)
814 struct attr attr; 815 struct attr *new_attr = &attr;
Also seems "attr" will be change from default-originate route map if there are such, which will affect and re-announced default route learned from other peer (if there are such).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute is initially set to a default value then the route-map( if any ) will modify it, thus it is needed. The attribute is used as a way of knowing what to send to a peer so it is needed even in the withdraw case.. That is the purpose of the attr. The changes I made did uncover an additional bug where the aspath in subgroup_default_originate was attempting to free the wrong aspath, in some cases. This was masked by the excessive intern of the attr data structure in this call path. In any event, with the addition of change 1872695559d438. I am happier with the results now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your explanation Donald ! In all cases the patch you are offer is good, should be applied and seems will fix #15459 (most likely they had 0.0.0.0/0 ::/0 among their 6400 testing routes). When backport it to 9.1 and 8.5 I will redo all my debugs and tests against #14828 , but im nearly sure this patch will not solve it. Maybe I am wrong but imagine this situation:
Router1 -- Announce 0.0.0.0/0 --> Router2 -- Announce 0.0.0.0/0 (any from 2) --> Router3 Also generate 0.0.0.0/0 with route-map
If R1 withdraw 0.0.0.0.0/0 I believe subgroup_default_originate() will be called with withdraw flag. Because Router2 have default-originate route-map XXX , attr will be changed (will enter in the if() scope take a look above your change in the code), and withdraw will flush/unintern completely other attributes (pointers in the attr structure) not these was previously set (maybe for self generated default route? not for re-announced, default route). subgroup_default_originate() is called on a fib-out ? (changes on the attr are made after the changes from neighbor output route-map ? ). Since a week testing fix (not sure it is proper, but no massive leak as before) for #14828, (the very first version of your patch you post is applied also), and it change also code in subgroup_default_originate() (before your change) maybe is best to take a look, could need to change something else too, or at least will tell me If im wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the chain of events that would cause a problem. Can you recreate this in a topotest? or more explicitly lay out the steps of your scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how or where to write o topotest (I am not familiar with all of that). Will try to explain more exactly (sorry it is so long) with code...
Here is the all subgroup_default_originate() function taken from stable/9.1/pr-15466 (with your changes), I will put comments in the code to can better explain what I mean...
void subgroup_default_originate(struct update_subgroup *subgrp, int withdraw) { struct bgp *bgp; struct attr attr = { 0 }; struct attr *new_attr = &attr; //Ivaylo new_attr = attr, all changes to new_attr will affect to attr and vice versa struct aspath *aspath; struct prefix p; struct peer *from; struct bgp_dest *dest; struct bgp_path_info *pi; struct peer *peer; struct bgp_adj_out *adj; route_map_result_t ret = RMAP_DENYMATCH; route_map_result_t new_ret = RMAP_DENYMATCH; afi_t afi; safi_t safi; safi_t safi_rib; int pref = 65536; int new_pref = 0; if (!subgrp) return; peer = SUBGRP_PEER(subgrp); afi = SUBGRP_AFI(subgrp); safi = SUBGRP_SAFI(subgrp); if (!(afi == AFI_IP || afi == AFI_IP6)) return; if (safi == SAFI_LABELED_UNICAST) safi_rib = SAFI_UNICAST; else safi_rib = safi; bgp = peer->bgp; from = bgp->peer_self; bgp_attr_default_set(&attr, bgp, BGP_ORIGIN_IGP); /* make coverity happy */ assert(attr.aspath); aspath = attr.aspath; attr.med = 0; attr.flag |= ATTR_FLAG_BIT(BGP_ATTR_MULTI_EXIT_DISC); if ((afi == AFI_IP6) || peer_cap_enhe(peer, afi, safi)) { /* IPv6 global nexthop must be included. */ attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL; /* If the peer is on shared nextwork and we have link-local nexthop set it. */ if (peer->shared_network && !IN6_IS_ADDR_UNSPECIFIED(&peer->nexthop.v6_local)) attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL; } // Ivaylo: In my debugs to catch #14828 (testing with 9.1 current stable) I put a simple: //zlog_err("We are here now"); // Ivaylo: And I see in the logs this message on every announce/withdraw/update message we receive // and/or announce no matter if it is default route (self generated and/or received) // or some other prefix. (It is another question, but do we really need to do this?) Obviously // if the .name ( char[] ) is not 0x00 (the default-originate route-map string name) // we will enter in the if scope. if (peer->default_rmap[afi][safi].name) { struct bgp_path_info tmp_pi = {0}; tmp_pi.peer = bgp->peer_self; SET_FLAG(bgp->peer_self->rmap_type, PEER_RMAP_TYPE_DEFAULT); /* Iterate over the RIB to see if we can announce * the default route. We announce the default * route only if route-map has a match. */ for (dest = bgp_table_top(bgp->rib[afi][safi_rib]); dest; dest = bgp_route_next(dest)) { if (!bgp_dest_has_bgp_path_info_data(dest)) continue; for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { struct attr tmp_attr = attr; tmp_pi.attr = &tmp_attr; // Ivaylo: route_map_apply_ext(), allocate new memory (on the heap) in tmp_pi.attr community/lcom/ecom + // sets their values if they are set in the route-map associated with default-originate clause in the // config. Also if the comm/lcomm/ecomm strings are changed in the RM, we will get a memory leak, because // we will lost their pointers to call free(), nor we can free them here in this stage yet. new_ret = route_map_apply_ext( peer->default_rmap[afi][safi].map, bgp_dest_get_prefix(dest), pi, &tmp_pi, &new_pref); if (new_ret == RMAP_PERMITMATCH) { // Ivaylo: I have no idea what for are new_pref and pref maybe some kind of preference or ? But if we don not // enter in this if() we never will free tmp_pi.attr comm/lcomm/ecomm and this cause #14828 if (new_pref < pref) { pref = new_pref; // IVAYLO: Here we flush new_attr ( the &attr !) So all what have been set before to it gone (eventually the old attr // data is still here if the route_map_apply_ext() (not changed)/copied it from the tmp_pi.attr ), and intern new // data in the attr structure based on tmp_pi.attr data. From this point to the end you will work with this &attr, // Donald. bgp_attr_flush(new_attr); new_attr = bgp_attr_intern( tmp_pi.attr); bgp_attr_flush(tmp_pi.attr); } subgroup_announce_reset_nhop( (peer_cap_enhe(peer, afi, safi) ? AF_INET6 : AF_INET), new_attr); ret = new_ret; } else bgp_attr_flush(&tmp_attr); //Ivaylo: We need to call flush on &tmp_attr = tmp_pi.attr no matter by any condition in the loop, because // on the next iteration we will eventually lost new assigned memory (because of route_map_apply_ext() ) // , and will have huge leak. I am not sure if bgp_attr_intern() allocate new heap memory and copy data in it, // or just copy the pointers (in attr). If second we, have not free() (call flush), because will do a mess. } } bgp->peer_self->rmap_type = 0; if (ret == RMAP_DENYMATCH) { /* * If its a implicit withdraw due to routemap * deny operation need to set the flag back. * This is a convertion of update flow to * withdraw flow. */ if (!withdraw && (!CHECK_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE))) SET_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE); withdraw = 1; } } /* Check if the default route is in local BGP RIB which is * installed through redistribute or network command */ memset(&p, 0, sizeof(p)); p.family = afi2family(afi); p.prefixlen = 0; dest = bgp_safi_node_lookup(bgp->rib[afi][safi_rib], safi_rib, &p, NULL); // Ivaylo: from this point down is your changes, Donald. Because the declarations on the top (attr = new_attr), // and my comments up, I hope explained better why there could be cases when you may work with wrong // attr data down. (I may be wrong of course, because I am not familiar with all the picture) // By the way, before your changes I think there was a leak, because attr never been flush in if(withdraw) // scope, and maybe can explains the leak on withdraw in #15459 (if among their testing routes was default // route) if (withdraw) { /* Withdraw the default route advertised using default * originate */ if (CHECK_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE)) subgroup_default_withdraw_packet(subgrp); UNSET_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE); /* If default route is present in the local RIB, advertise the * route */ if (dest) { for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { if (!CHECK_FLAG(pi->flags, BGP_PATH_SELECTED)) continue; if (subgroup_announce_check(dest, pi, subgrp, bgp_dest_get_prefix( dest), &attr, NULL)) { if (!bgp_adj_out_set_subgroup(dest, subgrp, &attr, pi)) bgp_attr_flush(&attr); } else bgp_attr_flush(&attr); } bgp_dest_unlock_node(dest); } } else { if (!CHECK_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE)) { /* The 'neighbor x.x.x.x default-originate' default will * act as an * implicit withdraw for any previous UPDATEs sent for * 0.0.0.0/0 so * clear adj_out for the 0.0.0.0/0 prefix in the BGP * table. */ if (dest) { /* Remove the adjacency for the previously * advertised default route */ adj = adj_lookup( dest, subgrp, BGP_ADDPATH_TX_ID_FOR_DEFAULT_ORIGINATE); if (adj != NULL) { /* Clean up previous advertisement. */ if (adj->adv) bgp_advertise_clean_subgroup( subgrp, adj); /* Free allocated information. */ adj_free(adj); } bgp_dest_unlock_node(dest); } /* Advertise the default route */ if (bgp_in_graceful_shutdown(bgp)) bgp_attr_add_gshut_community(new_attr); SET_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE); subgroup_default_update_packet(subgrp, new_attr, from); } } aspath_unintern(&aspath); }
At all I think this function is mess and its usage is mess, but more I wrote in the #14828 . Unfortunately I dont have the time and I need to learn and know much more for the all work flow to can rewrite it as I think it should be. Maybe during you write your patch, you have much better idea than me which/how work, and will greatly appreciate if you can help me fix #14828 , without screw up something else.
c0cdeda
to
edc632b
Compare
Can we squash "fixup" commit? |
…some-cases (#18246) Why I did it FRR issue FRRouting/frr#15459 Add patch for FRR PR for 202305 FRRouting/frr#15466 Work item tracking Microsoft ADO (number only): 26876083 How I did it Add patch for FRR PR FRRouting/frr#15466 How to verify it Run the original case locally
edc632b
to
38acef0
Compare
Seems valid failures yet. |
38acef0
to
5a43725
Compare
Currently in subgroup_default_originate the attr.aspath is set in bgp_attr_default_set, which hashs the aspath and creates a refcount for it. If this is a withdraw the subgroup_announce_check and bgp_adj_out_set_subgroup is called which will intern the attribute. This will cause the the attr.aspath to be set to a new value finally at the bottom of the function it intentionally uninterns the aspath which is not the one that was created for this function. This reduces the other aspath's refcount by 1 and if a clear bgp * is issued fast enough the aspath for that will be removed and the system will crash. Signed-off-by: Donald Sharp <[email protected]>
If unsuppress-map is setup for outgoing peers, consider that policy is being applied as for RFC 8212. Signed-off-by: Donald Sharp <[email protected]>
Customer has this valgrind trace: Direct leak of 2829120 byte(s) in 70728 object(s) allocated from: 0 in community_new ../bgpd/bgp_community.c:39 1 in community_uniq_sort ../bgpd/bgp_community.c:170 2 in route_set_community ../bgpd/bgp_routemap.c:2342 3 in route_map_apply_ext ../lib/routemap.c:2673 4 in subgroup_announce_check ../bgpd/bgp_route.c:2367 5 in subgroup_process_announce_selected ../bgpd/bgp_route.c:2914 6 in group_announce_route_walkcb ../bgpd/bgp_updgrp_adv.c:199 7 in hash_walk ../lib/hash.c:285 8 in update_group_af_walk ../bgpd/bgp_updgrp.c:2061 9 in group_announce_route ../bgpd/bgp_updgrp_adv.c:1059 10 in bgp_process_main_one ../bgpd/bgp_route.c:3221 11 in bgp_process_wq ../bgpd/bgp_route.c:3221 12 in work_queue_run ../lib/workqueue.c:282 The above leak detected by valgrind was from a screenshot so I copied it by hand. Any mistakes in line numbers are purely from my transcription. Additionally this is against a slightly modified 8.5.1 version of FRR. Code inspection of 8.5.1 -vs- latest master shows the same problem exists. Code should be able to be followed from there to here. What is happening: There is a route-map being applied that modifes the outgoing community to a peer. This is saved in the attr copy created in subgroup_process_announce_selected. This community pointer is not interned. So the community->refcount is still 0. Normally when a prefix is announced, the attr and the prefix are placed on a adjency out structure where the attribute is interned. This will cause the community to be saved in the community hash list as well. In a non-normal operation when the decision to send is aborted after the route-map application, the attribute is just dropped and the pointer to the community is just dropped too, leading to situations where the memory is leaked. The usage of bgp suppress-fib would would be a case where the community is caused to be leaked. Additionally the previous commit where an unsuppress-map is used to modify the outgoing attribute but since unsuppress-map was not considered part of outgoing policy the attribute would be dropped as well. This pointer drop also extends to any dynamically allocated memory saved by the attribute pointer that was not interned yet as well. So let's modify the return case where the decision is made to not send the prefix to the peer to always just flush the attribute to ensure memory is not leaked. Fixes: FRRouting#15459 Signed-off-by: Donald Sharp <[email protected]>
5a43725
to
addff17
Compare
Bgp filter fun (backport #15466)
Bgp filter fun (backport #15466)
Why I did it Porting fix for FRRouting/frr#15459 Adding patch for PR FRRouting/frr#15466 Work item tracking Microsoft ADO (number only): How I did it Ported the fix FRRouting/frr#15466 to 8.5.1 How to verify it Running the test_stress_route and ensure no memory leak
Porting PR #18272 to 202311. Why I did it Porting fix for FRRouting/frr#15459 Adding patch for PR FRRouting/frr#15466 Work item tracking Microsoft ADO (number only): How I did it Ported the fix FRRouting/frr#15466 to 8.5.1 How to verify it Running the test_stress_route and ensure no memory leak
Why I did it Upgrading FRR 8.5.4 to include latest fixes. Work item tracking Microsoft ADO (number only): How I did it New patches that were added: Patch FRR Pull request Issue fixed 0024-lib-use-snmp-s-large-fd-sets-for-agentx.patch FRRouting/frr#13396 FRRouting/frr#14143 0025-bgp-community-memory-leak-fix.patch FRRouting/frr#15466 FRRouting/frr#15459 0026-bgp-fib-suppress-announce-fix.patch FRRouting/frr#15634 FRRouting/frr#15626 0027-lib-Do-not-convert-EVPN-prefixes-into-IPv4-IPv6-if-n.patch FRRouting/frr#15418 FRRouting/frr#14419 Removed patches: Patch Upstream FRR commit that is present in 8.5.4 0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch FRRouting/frr@3f01977 0020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch FRRouting/frr@fe5f624 0022-bgpd-Don-t-read-the-first-byte-of-ORF-header-if-we-a.patch FRRouting/frr@3515178 0023-bgpd-Make-sure-we-have-enough-data-to-read-two-bytes.patch FRRouting/frr@460ee93 0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch FRRouting/frr@f291f1e 0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch FRRouting/frr@8a4a88c 0026-zebra-Add-encap-type-when-building-packet-for-FPM.patch FRRouting/frr@f0f7b28 0028-bgpd-Check-mandatory-attributes-more-carefully-for-U.patch FRRouting/frr@21418d6 0029-bgpd-Handle-MP_REACH_NLRI-malformed-packets-with-ses.patch FRRouting/frr@30b5c2a 0030-bgpd-Treat-EOR-as-withdrawn-to-avoid-unwanted-handli.patch FRRouting/frr@01f232c 0031-bgpd-Ignore-handling-NLRIs-if-we-received-MP_UNREACH.patch FRRouting/frr@a0c4ec2 0032-zebra-Fix-fpm-multipath-encap-addition.patch FRRouting/frr@10a9a5f Realigned patches: Old Patch New patch 0005-Add-support-of-bgp-l3vni-evpn.patch 0005-Add-support-of-bgp-l3vni-evpn.patch 0021-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch 0019-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch 0027-zebra-Fix-non-notification-of-better-admin-won.patch 0020-zebra-Fix-non-notification-of-better-admin-won.patch Disable-ipv6-src-address-test-in-pceplib.patch 0021-Disable-ipv6-src-address-test-in-pceplib.patch cross-compile-changes.patch 0022-cross-compile-changes.patch 0033-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch 0023-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch How to verify it Running sonic-mgmt test suite.
see individual commits
a) Allows unsuppress-maps to be part of the filtering system that would allow outgoing data to a peer.
b) Fix memory leak associated with late decision not to send data to peer in bgp