-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpd from version 8.4 to 9.1 (current stable) memory leak when using default-originate route map (set large-community / set community) #14828
Comments
can we see the routemaps surrounding large community usage? |
Sure, I send you the complete bgpd.conf on your email, because it could be related and to the other config options I have no idea yet. Just spot bgpd crashing periodically couple of hours ago and so far it is all the debug info I have... |
More info:
|
More debug info: The memory leak come between versions 8.3.1 and 8.4.0 ! Here are some tests # show memory version 8.3.0 community : 500 40 20000 502 20080 community val : 500 variable 17808 502 17968 community str : 480 8192 3936000 481 3944200 Large Community : 1892 40 75696 1893 75736 <--- no change after 10 minutes work Large Community display string: 158 8192 1295600 158 1295600 Large Community value : 1892 variable 46240 1893 46312 <--- no change after 10 minutes work 8.3.1 community : 500 40 20016 501 20056 community val : 500 variable 17808 501 17944 community str : 480 8192 3936000 480 3936000 community-list : 35 56 1960 35 1960 community-list name : 35 variable 840 35 840 community-list entry : 35 56 2072 35 2072 community-list config : 32 variable 768 32 768 community-list handler : 1 120 136 1 136 Large Community : 1891 40 75640 1892 75680 <--- no change after 10 minutes work Large Community display string: 157 8192 1287400 158 1295600 Large Community value : 1891 variable 46232 1892 46256 <--- no change after 10 minutes work 8.4.0 community : 500 40 20000 501 20056 community val : 500 variable 17776 501 17912 community str : 482 8192 3952400 482 3952400 community-list : 35 56 1960 35 1960 community-list name : 35 variable 840 35 840 community-list entry : 35 56 2152 35 2152 community-list config : 32 variable 768 32 768 community-list handler : 1 120 136 1 136 Large Community : 116733 40 4680856 116734 4680896 <-- and keep growing with the time Large Community display string: 158 8192 1295600 158 1295600 Large Community value : 116733 variable 2810616 116734 2810640 <-- and keep growing with the time All statistics I am showing are collected after all neighbors was connected and all prefixes accepted/announced ! I dont know yet which part of the code cause memory leak when just set large-community, but after half a day compile and change version I can confirm the memory leak exists in all frr versions starting from 8.4.0. In first look it seems no changes in bgp_lcommunity.c and bgp_lcommunity.h. If anyone have some idea or memory what could cause this will greatly appreciate to share (will save me a lot of time to debug and reinvent the hot water :). |
After a lot of lurking and testing found the commit which makes the problem: In my config frr sets large community on the route map associated to default-originate . Maybe that cause memory leak and constantly very fast increasing LC structures , maybe that patch mess and another route maps I dont know yet (not very easy to change config logic in production environment just for testing). Meanwhile maybe someone more familiar (I will have to learn and get know all the code around first) can take a look over that commit and may have idea what is wrong ? UPDATE: ! ! Zebra configuration saved from vty ! 2024/02/04 10:25:24 ! ! loaded from 8.2.2 frr version 9.0.1 frr defaults traditional ! hostname AS1 password tt enable password tt log file /var/log/frr/bgpd.log ! ! ! router bgp 65501 bgp router-id 10.185.1.31 no bgp ebgp-requires-policy no bgp default ipv4-unicast no bgp graceful-restart notification bgp graceful-restart-disable no bgp network import-check neighbor 10.185.1.33 remote-as 65503 neighbor 10.185.1.33 description R1 neighbor 10.185.1.33 solo ! address-family ipv4 unicast network 10.1.2.0/24 network 10.1.3.0/24 network 10.1.4.0/24 neighbor 10.185.1.33 activate neighbor 10.185.1.33 default-originate route-map DEFR neighbor 10.185.1.33 soft-reconfiguration inbound neighbor 10.185.1.33 route-map T4-IN in neighbor 10.185.1.33 route-map T4-OUT out exit-address-family ! address-family ipv6 unicast exit-address-family ! exit ! access-list telnet-filter seq 5 permit 127.0.0.1/32 access-list telnet-filter seq 10 deny any ! bgp large-community-list standard defroute seq 1 permit 65503:300:300 ! ! route-map T4-IN permit 10 set large-community 65503:500:330 additive exit ! route-map T4-OUT permit 5 match large-community defroute set large-community 65501:300:330 exit ! route-map T4-OUT permit 10 set large-community 65501:400:330 exit ! route-map DEFR permit 10 set large-community 65501:300:300 exit ! ! line vty access-class telnet-filter exit ! After established connection with 10.185.1.33: AS1# show memory ... Large Community : 28 40 1136 28 1136 Large Community display string: 5 variable 904 5 904 Large Community value : 28 variable 912 28 912 ... AS1# clear bgp ipv4 10.185.1.33 wait to connection with the peer to be establish AS1# show memory ... Large Community : 35 40 1416 35 1416 Large Community display string: 5 variable 904 5 904 Large Community value : 35 variable 1080 35 1264 ... As you can see structures for LC are increased, and something is not freed. This leak apply to all frr versions after commit 8cb56fb . With snapshot frr-455b860f9db5e5be186a7d1606283b3a02989c64.zip (right before 8cb56 commit) I dont see increasing of the memory because Large Communities. |
More debug: The memory leak comes when use clause "set" in default-originate route map policy and is not present only when setting large-community. Setting community string in the default-originate RM also trigger huge memory leak (community structures start increasing). The worst is that once you put "set" clause in the route-map associated to default-originate the leak is triggered. After that nor delete the route-map itself nor removing route-map from the neighbor section stop increasing the memory in the bgpd process, the only you can do is to restart all frr daemons. Also I believe the things get worse if default route is learned and from other neighbor (even it is filtered on the neighbor input with prefix-list). I experienced and very strange behaviours when changing set clauses in the RM associated with default-originate. All can be simulated with very simple config (such I posted above) in a controlled environment. I manually remove commit 8cb56fb from current stable 9.1 and for more than 3 hours dont observe massive memory increase of the bgpd process. Of course changes on the fly in default-originate RM do not work as expected, but frr is usable and stable (for now). Maybe it is good to revert that commit and update all bin packages from version 8.3.2 up to the current stable. I believe many routers in the internet which uses frr announce default-originate and not few of them uses route-map with "set" clauses. I will keep looking and knowing the code, but it is very complicated and unclear. I dont think there is a simple quick fix, maybe there is a need of total rewrite of the default-originate RM algorithm. Or someone really who knows the work logic in details to take a look. |
I'll test this and let you know. |
Thank you very much Donatas ! To saiarcot896: |
I think maybe have an idea what cause the leak.... new_ret = route_map_apply_ext( peer->default_rmap[afi][safi].map, bgp_dest_get_prefix(dest), pi, &tmp_pi, &new_pref); This call route_map_apply_ext from lib/routemap.c ... /* * set cmds return RMAP_OKAY or * RMAP_ERROR. We do not care if * set succeeded or not. So, ignore * return code. */ (void)(*set->cmd->func_apply)( set->value, prefix, set_object); which call route_set_lcommunity -> lcommunity_dup -> lcommunity_new -> calloc new memory because in subgroup_default_originate we treat attr like blank new structure which need to be first initialized. Seems subgroup_default_originate functions is also called when there are withdraw/announce of the default route learned and from a peer. The check for applying of the default-originate route map ( if (peer->default_rmap[afi][safi].name)... ) is done (only by "if there is a name of the route-map" ?!? not by any real change or RM name change ?) every time in the if scope and not all pointers are freed ( call bgp_attr_flush() ) in these tmp structures, also there is mess how they are used. I dont understand what is the idea behind the "pref" pointer some sort of preference or ? At all It is posible to be done some properly free, but this is ugly and truly I think its not right. The logic is wrong, and maybe some functionality will not work this way. By me and from user's perspective the logic should be this: There are 2 ways to send default route to a peer:
Finally call subgroup_default_originate() which set attrs and need flags (from output route map if there are such), related with the default route which to be announce/withdraw to the peer. Please correct me if I am wrong somewhere in the logic or missing something |
Yes, already integrated Donald's patch in my testing 9.1 but it will not resolve this leak. I know what cause it, and testing since yesterday ugly (most likely and not very proper) but fast fix. The problem is in subgroup_default_originate(). When call route_map_apply_ext() it allocate new memory for data inside the stucture tmp_pi.attr (tmp_pi.attr = tmp_attr), which we lost because never free. Here is my test change in bgp_updgrp_adv.c -> subgroup_default_originate() new_ret = route_map_apply_ext( peer->default_rmap[afi][safi].map, bgp_dest_get_prefix(dest), pi, &tmp_pi, &new_pref); if (new_ret == RMAP_PERMITMATCH) { if (new_pref < pref) { pref = new_pref; bgp_attr_flush(new_attr); new_attr = bgp_attr_intern( tmp_pi.attr); // bgp_attr_flush(&tmp_attr); } subgroup_announce_reset_nhop( (peer_cap_enhe(peer, afi, safi) ? AF_INET6 : AF_INET), new_attr); ret = new_ret; } // else // bgp_attr_flush(&tmp_attr); bgp_attr_flush(&tmp_attr); } So far can confirm few things:
And now the long part :) In the current situation there are maybe and other potential problems: for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { struct attr tmp_attr = attr; tmp_pi.attr = &tmp_attr; So far as my knowledge (very basic) are over the compilers it depending on versions, bugs compile optimization options and e.t.c. on each iteration in the loop you may or may not get new memory for the "tmp_attr" , this memory may be freed on the scope exit or on the function exit (again differ from compiler to compiler). Which means potential stack crash (depending how much stack memory the thread/process can use) Also it is not very clear if the all memory of tmp_attr will be zeroed or will be filled with some random data in it (the compiler may or may not generate assembler code which will copy the all attr in tmp_attr, may copy only populated data in the attr). So I think best will be to get out of the loop scope declaration, zeroing ( memset(&tmp_attr,0,sizeof(struct attr));), assign the pointer tmp_pi.attr = &tmp_attr , and in the loop do memcpy(&tmp_attr, &attr, sizeof(struct attr)); on each iteration. Yes longer for write but much safer for use :). Have to check and "attr" family functions in attr.c , maybe to use them is best approach for work with struct attr. Also maybe is good to have signalling when there are change in the default-originate route map, because we dont have such. |
@ton31337 if (peer->default_rmap[afi][safi].name) { struct bgp_path_info tmp_pi = {0}; tmp_pi.peer = bgp->peer_self; SET_FLAG(bgp->peer_self->rmap_type, PEER_RMAP_TYPE_DEFAULT); /* Iterate over the RIB to see if we can announce * the default route. We announce the default * route only if route-map has a match. */ for (dest = bgp_table_top(bgp->rib[afi][safi_rib]); dest; dest = bgp_route_next(dest)) { if (!bgp_dest_has_bgp_path_info_data(dest)) continue; for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { struct attr tmp_attr = attr; tmp_pi.attr = &tmp_attr; new_ret = route_map_apply_ext( peer->default_rmap[afi][safi].map, bgp_dest_get_prefix(dest), pi, &tmp_pi, &new_pref); if (new_ret == RMAP_PERMITMATCH) { if (new_pref < pref) { pref = new_pref; bgp_attr_flush(new_attr); new_attr = bgp_attr_intern( tmp_pi.attr); bgp_attr_flush(tmp_pi.attr); } subgroup_announce_reset_nhop( (peer_cap_enhe(peer, afi, safi) ? AF_INET6 : AF_INET), new_attr); ret = new_ret; } else bgp_attr_flush(&tmp_attr); } } bgp->peer_self->rmap_type = 0; Change with: /* During my tests and debug, spot a situation where we have .name but * not and .map (almost every time on the start of the process). * Maybe init is in other thread which starts slower. Don know how to * handle such situation, but in all cases it is safer to enter. */ if (peer->default_rmap[afi][safi].name && peer->default_rmap[afi][safi].map) { struct bgp_path_info tmp_pi; struct attr tmp_attr; struct bgp_dest *pm_dest = NULL; struct bgp_path_info *pm_pi = NULL; memset(&tmp_pi, 0, sizeof(struct bgp_path_info)); memset(&tmp_attr, 0, sizeof(struct attr)); tmp_pi.peer = bgp->peer_self; tmp_pi.attr = &tmp_attr; SET_FLAG(bgp->peer_self->rmap_type, PEER_RMAP_TYPE_DEFAULT); /* Let's try to get lowest rule number in the route-map * Put it in IFs statements to protect from illegal reads. */ int lrm_rule = 0; if( peer->default_rmap[afi][safi].map->head ) if( peer->default_rmap[afi][safi].map->head->pref ) lrm_rule = peer->default_rmap[afi][safi].map->head->pref; /* This semi-conditional logic is sh.. - only on prefix list. * Much more logical/flexible/usable from user point * is to handle default route announce in the peer[s] output * route-map, in the default-originate route-map only to * permit set clauses for self generated default route. * But what is that is. Lets iterate over the RIB (can be * millions times) to check if we will have permit match * to announce default route. */ for (dest = bgp_table_top(bgp->rib[afi][safi_rib]); dest; dest = bgp_route_next(dest)) { if (!bgp_dest_has_bgp_path_info_data(dest)) continue; for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) { /* route_map_apply_ext() allocates new memory * in tmp_pi.attr ! Better to write new func * with the same logic (much faster as well) * to do only check, and when we know what to * do then to call route_map_apply_ext() */ new_ret = route_map_apply_ext( peer->default_rmap[afi][safi].map, bgp_dest_get_prefix(dest), pi, &tmp_pi, &new_pref); if (new_ret == RMAP_PERMITMATCH) { /* pref is the rule number in the RM * On which we have match */ if (new_pref < pref) { pref = new_pref; pm_dest = dest; pm_pi = pi; } ret = new_ret; } /* _MUST_ be flush on each itteration ! */ bgp_attr_flush(&tmp_attr); } /* We know which is lowest rule number (!0) in the RM * and we get match on it. No need to spin more here. */ if(ret == RMAP_PERMITMATCH && lrm_rule && lrm_rule == pref) break; } /* Final apply RM, Moving this from the loops above for speed. * pm_pi and pm_dest are pointers, must check them if they * still are here (in can they are released by another thread) */ if(ret == RMAP_PERMITMATCH && pm_pi && pm_dest ) { memcpy(&tmp_attr, &attr, sizeof(struct attr)); new_ret = route_map_apply_ext( peer->default_rmap[afi][safi].map, bgp_dest_get_prefix(pm_dest), pm_pi, &tmp_pi, &new_pref); bgp_attr_flush(new_attr); new_attr = bgp_attr_intern(&tmp_attr); bgp_attr_flush(&tmp_attr); subgroup_announce_reset_nhop( (peer_cap_enhe(peer, afi, safi) ? AF_INET6 : AF_INET), new_attr); } else ret = RMAP_DENYMATCH; bgp->peer_self->rmap_type = 0; This patch also will have significant speed impact in cases where there are no match clause in the default-originate RM. If you have idea how to check what is the condition of the first rule (permit/deny) will be great. So if for example we have in the config. route-map XXX deny 1 exit where obviously is pointless to spin these loops, ( in case of full internet bgp tables can take ~ 2 millions * RM-rules-number iterations). |
bgpd# show version
FRRouting 9.0.1 (gazgw) on Linux(6.1.0-12-amd64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
system top reports:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15022 frr 20 0 6820200 6.4g 5640 S 0.3 20.4 1:31.66 bgpd
1668 frr 20 0 636332 186880 2040 S 0.0 0.6 59:08.28 zebra
bgpd -> show memory
~ 10 minutes later
system top reports:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15022 frr 20 0 11.2g 11.0g 5640 S 0.3 35.4 2:29.07 bgpd
1668 frr 20 0 636332 186880 2040 S 0.0 0.6 59:09.59 zebra
show memory
IPv4 Unicast Summary (VRF default):
BGP router identifier XXX.XXX.XXX.XXX, local AS number XXXXXX vrf-id 0
BGP table version 128817
RIB entries 235456, using 43 MiB of memory
Peers 13, using 265 KiB of memory
Peer groups 4, using 256 bytes of memory
IPv6 Unicast Summary (VRF default):
BGP router identifier XXX.XXX.XXX.XXX, local AS number XXXXXX vrf-id 0
BGP table version 254868
RIB entries 371565, using 68 MiB of memory
Peers 13, using 265 KiB of memory
Peer groups 4, using 256 bytes of memory
The OS is linux debian bookworm (all is upgrade to current stable). FRR is installed from package ( repository: https://deb.frrouting.org/frr)
The bgpd system memory usage keep growing until it not consume all the available memory (more than 25G after a hour or two) and kernel's oom-killer just kill the process. From what I see largest memory usage is from large community structures, in the config bgpd do set and match on every peer and peer-group (in the route maps).
I tried to debug with valgrind but it is hard, because that router is in production environment.
The text was updated successfully, but these errors were encountered: