Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSPF does not fully free up MTYPE memory on shutdown as it should #14855

Closed
donaldsharp opened this issue Nov 21, 2023 · 2 comments · Fixed by #14870
Closed

OSPF does not fully free up MTYPE memory on shutdown as it should #14855

donaldsharp opened this issue Nov 21, 2023 · 2 comments · Fixed by #14870
Assignees
Labels
triage Needs further investigation

Comments

@donaldsharp
Copy link
Member

Running our topotests ospfd is spitting out a bunch of memory being leaked on shutdown:

ospfd: 
  ## showing active allocations in memory group libfrr
ospfd:     Buffer                        :      2 *         24
ospfd:     Hash                          :      2 * (variably sized)
ospfd:     Hash Index                    :      1 * (variably sized)
ospfd:     Interface                     :      6 *        280
ospfd:     Connected                     :      5 *         48
ospfd:     Link List                     :     31 *         40
ospfd:     Link Node                     :     24 *         24
ospfd:     Prefix                        :      5 *         56
ospfd:     Stream                        :      4 * (variably sized)
ospfd:     Route table                   :     12 *         56
ospfd:     Route node                    :      2 *        120
ospfd:     Typed-hash bucket             :      2 * (variably sized)
ospfd:     VRF                           :      1 *        216
ospfd:     Zclient                       :      2 *       3248
ospfd: 
  ## showing active allocations in memory group logging subsystem
ospfd: 
  ## showing active allocations in memory group ospfd
ospfd:     BFD configuration data        :      1 *         76
ospfd:     OSPF if info                  :      6 *         40
ospfd:     OSPF if params                :      7 *        144
ospfd:     OSPF MPLS parameters          :      6 *        248
ospfd:     OSPF Extended parameters      :      6 *        136
ospfd:     OSPF opaque function table    :      6 *        112

As far as I can tell this is a result of the ospf gr changes that were made. This should be cleaned up.

@donaldsharp donaldsharp added the triage Needs further investigation label Nov 21, 2023
@donaldsharp
Copy link
Member Author

As a note this might not be a full example of all the ospf mtype leaks as that this is only one of the tests being run. There are probably some more that should be looked at and handled

@rwestphal rwestphal self-assigned this Nov 22, 2023
@IvayloJ
Copy link

IvayloJ commented Nov 23, 2023

I dont know if there is any relation, but I observed a strange behavior with GR (frr 9.0.1) in bgpd (sorry for off topic, but could be somehow useful). When 'graceful-restart-disable' is set per neighboor, after some time I see in the logs this: "buffer_flush_available: write error on fd XX: Bad file descriptor", which makes me thing something is wrong with the structure (in the memory) which holds data about the connection peer info (where fd is stored), maybe wrong pointer swap for that structure (if the logic on init new peer connection, is forget old make new), or something similar. Not debug enough to tell exactly.

rwestphal added a commit to opensourcerouting/frr that referenced this issue Nov 24, 2023
The ospfd cleanup code is relatively complicated given the need to
appropriately handle the "max-metric router-lsa on-shutdown (5-100)"
command. When that command is configured and an OSPF instance is
unconfigured, the removal of the instance should be deferred to allow
other routers sufficient time to find alternate paths before the
local Router-LSAs are flushed. When ospfd is killed, however, deferred
shutdown shouldn't take place and all instances should be cleared
immediately.

This commit fixes a problem where ospf_deferred_shutdown_finish()
was prematurely exiting the daemon when no instances were left,
inadvertently preventing ospf_terminate() from clearing the ospfd
global variables. Additionally, the commit includes code refactoring
to enhance readability and maintainability.

Fixes FRRouting#14855.

Signed-off-by: Renato Westphal <[email protected]>
rwestphal added a commit to opensourcerouting/frr that referenced this issue Nov 24, 2023
The ospfd cleanup code is relatively complicated given the need to
appropriately handle the "max-metric router-lsa on-shutdown (5-100)"
command. When that command is configured and an OSPF instance is
unconfigured, the removal of the instance should be deferred to allow
other routers sufficient time to find alternate paths before the
local Router-LSAs are flushed. When ospfd is killed, however, deferred
shutdown shouldn't take place and all instances should be cleared
immediately.

This commit fixes a problem where ospf_deferred_shutdown_finish()
was prematurely exiting the daemon when no instances were left,
inadvertently preventing ospf_terminate() from clearing the ospfd
global variables. Additionally, the commit includes code refactoring
to enhance readability and maintainability.

Fixes FRRouting#14855.

Signed-off-by: Renato Westphal <[email protected]>
rwestphal added a commit to opensourcerouting/frr that referenced this issue Dec 1, 2023
The ospfd cleanup code is relatively complicated given the need to
appropriately handle the "max-metric router-lsa on-shutdown (5-100)"
command. When that command is configured and an OSPF instance is
unconfigured, the removal of the instance should be deferred to allow
other routers sufficient time to find alternate paths before the
local Router-LSAs are flushed. When ospfd is killed, however, deferred
shutdown shouldn't take place and all instances should be cleared
immediately.

This commit fixes a problem where ospf_deferred_shutdown_finish()
was prematurely exiting the daemon when no instances were left,
inadvertently preventing ospf_terminate() from clearing the ospfd
global variables. Additionally, the commit includes code refactoring
to enhance readability and maintainability.

Fixes FRRouting#14855.

Signed-off-by: Renato Westphal <[email protected]>
cscarpitta pushed a commit to cscarpitta/frr that referenced this issue Feb 9, 2024
The ospfd cleanup code is relatively complicated given the need to
appropriately handle the "max-metric router-lsa on-shutdown (5-100)"
command. When that command is configured and an OSPF instance is
unconfigured, the removal of the instance should be deferred to allow
other routers sufficient time to find alternate paths before the
local Router-LSAs are flushed. When ospfd is killed, however, deferred
shutdown shouldn't take place and all instances should be cleared
immediately.

This commit fixes a problem where ospf_deferred_shutdown_finish()
was prematurely exiting the daemon when no instances were left,
inadvertently preventing ospf_terminate() from clearing the ospfd
global variables. Additionally, the commit includes code refactoring
to enhance readability and maintainability.

Fixes FRRouting#14855.

Signed-off-by: Renato Westphal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants