-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zebra final shutdown finally #14811
Zebra final shutdown finally #14811
Conversation
ci:rerun |
b0e588d
to
49bbbd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I'd like to look into some of the zebra changes, there's already quite a bit of mpls cleanup (for example) in other places, may make sense just to enhance that.
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
On shutdown go through and ensure that any contexts the dplane provider holds are freed. Signed-off-by: Donald Sharp <[email protected]>
The MTYPE_BGP memory type was being over used as both the handler for the bgp instance itself as well as memory associated with name strings. Let's separate out the two. Signed-off-by: Donald Sharp <[email protected]>
When bgp is shutting down, it calls bgp_fsm_change_status on everything including a self peer, which goes through and cleans the tables of the self peer data structures as if it's a real peer. Add a bit of code to just not do the work at all. This allows unlocks to flow a bit further and for the self peer to be deleted on shutdown. Signed-off-by: Donald Sharp <[email protected]>
a) Rename rib_init to zebra_rib_init() to better follow how things are named b) on shutdown cycle through the rib_dplane_q and free up any contexts sitting in it. Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
49bbbd7
to
71f7ecb
Compare
ci:rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
I guess it depends on whether we want to get these in now, or wait on the other enhancement pr's? |
I'd like to see this merged now. At least to get rid of all the memleaks |
It appears that when this was merged with master it blew up the ASAN reports we had spent so much time getting clean. I am unclear at this time as to why things procedded forward after the master branch began failing what where suppose to be fatal ASAN tests. You can see the merge to master and the leaks in the ASAN1 report at https://ci1.netdef.org/browse/FRR-FRR-7233 |
I had said I would look into this, but I thought this was a different PR. :) I'll let @donaldsharp address this first. Why wasn't this failure flagged on the pre-merge CI test though? |
It appears as if some communications between Martin and myself was not clear and the only test that became fatal was post-merge into master. Though it is still unclear how even this failure went unaddressed for 30 days. Sadly with the leaks caused by this PR in the tree we can not attempt to turn on a pre-merge fatal state as then ALL PR runs would fail. |
This PR was all about fixing leaks, so it's odd/ironic that it caused them. In any case presumably we can get them fixed soon and enable failure in CI going forward. |
My gut feeling from reading through some of the ASAN reports is that this "fix" mearly freed top level structures that had pointers inside them to other data. Most of the ASAN reports are for a massive slew of "indirect" leeks. I tried to create a revert PR from this one, but that fails, probably due to intervening changes. IMHO, we should first get these changes all backed out and get back to the clean ASAN state ASAP, fix the CI to cause pre-PR runs to be fatal for ASAN, then go back after what this purposed to fix. If my "git" foo was only better... |
See individual commits but more examples of memory leaks on shutdown that we need to clean up.