-
-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for stupid systemd-networkd behaviour #2479
Comments
I have run some tests and all of ip addresses, routes and routeing rules being deleted are detected by keepalived. If such an event occurs (and it shouldn't because the addresses, routes and rules are keepalived's and not anyone elses), then keepalived will revert to backup state, and almost certainly then become master again (the exception would be if there is a higher priority VRRP instance that was held back from becoming master due to There really is no excuse for any other process, whether it be systemd-networkd or not, to delete addresses, routes or rules that do not belong to it (in other words it did not create). A while ago we requested, and had allocated, a routeing protocol identifier allocated for keepalived (value 18) and all routes and rules installed by keepalived are specified with that protocol id (see the description of protocol in the ip-rulte(8) man page). Unfortunately there is no equivalent for ip addresses. @zviratko Can you please provide some specific examples of the problems you are experiencing. In other words, provide your keepalived configuration files, along with what actions are happening/commands being executed that cause the problem, what impact it has on keepalived, and ideally the keepalived log entries at the time. |
I see (and understand). Some docs talk about "reinstating" addresses and routes, but I wasn't able to confirm whether it was really implemented that way. Interesting note about nopreempt - I have it set, so that my firewalls don't flip/flop (it should stick to last healthy node). Not sure what the correct setup for that is then? Keepalived for sure either doesn't notice a rule missing or didn't transition to BACKUP due to my misconfiguration. I'm not sure I can provide anything truly reproducible, except trying to delete something by hand (which I'm willing to do one day during maintenance, this is in production). Sometimes when a VM goes up/down and its interface is deleted, or when I do "networkctl reload", or maybe on full moon, systemd-networkd justdecides to delete something, keepalived usually transitioned to BACKUP, this was probably the first time it didn't yet a crucial ip rule was missing. Over time, I added: Now I also added to systemd-networkd config, which should prevent it from deleting routes and rules.
The weird thing is, that sometimes it (networkd) just doesn't do that and everything works. Sometimes it goes crazy. But that's not really an issue for this repository (it is too civilized for this debate). I know the "right" thing to do is to boycott systemd or at least not use the networkd component, but it's going to be hard (and there's nothing to switch to unless I want to run Gentoo with openrc/netrc). configfile below (public IPs redacted)
|
Just a couple of comments on your configuration.
I said in my previous post that it was not possible to set a "protocol" for ip addresses in the way that can be done for routes and rules. I have since discovered that kernel commit 47f0bd503210 added exactly that feature, which first appeared in Linux v5.18 and was first supported by v6.4.0 of the iproute utility. I will add support for this in keepalived. |
Thank you for taking a look
I did both of these in an attempt to make keepalived as "lenient" as possible. I am surprised this is the only obvious extra stuff that's in there :-) Btw with this config, keepalived sometimes just doesn't execute the backup script on failover. Sadly it was not reproducible, and it occured only after a firewall has been running in MASTER state for some time (like a week). I didn't make an issue because I know the right thing to do is to use the FIFO, but in case you see anything in there that might be causing that... but it could be useful to at least log that keepaliveed is trying to execute it (or isn't for some reason) as all I can say is that it never reached the first line in the script.
Cool, but systemd-networkd still requires configuration for that (and there's no filtering for "ignore proto keepalived"). Maybe it would be better to use "proto kernel" by default when running under systemd so it gets ignored? I can't imagine anyone not wanting everything to survive systemd-networkd interference. It took me a good while to realize what is happening... |
I know issued connected to this have been discussed before, but could keepalived maybe better handle systemd-networkd deleting things on reload?
In particular
Unfortunately, systemd-networkd is not only the engine behind most other stuff (netplan, networkmanager), but also the most featureful network manager if one needs stuff like vlan aware bridges, routing rules, special network settings (and it is somewhat declarative in its behaviour which is nice).
It would be better for systemd-networkd to allow fixing this (it already somewhat does for VIPs but rules just disappear on me), but that's not feasible (I would file a bug in their GitHub but Lennart banned me for making a good argument years ago and it would get ignored anyway because they know better).
Feel free to include a derogatory log message aimed at systemd when keepalived fixes stuff in this instance :-)
Thanks.
The text was updated successfully, but these errors were encountered: