Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepalived coredump #2502

Open
swimlessbird opened this issue Nov 8, 2024 · 10 comments
Open

keepalived coredump #2502

swimlessbird opened this issue Nov 8, 2024 · 10 comments

Comments

@swimlessbird
Copy link

swimlessbird commented Nov 8, 2024

Describe the bug
Triggering a network interface down while reloading keepalived causes a coredump.

To Reproduce
Mabay ip link set dev xxx down and reload keepalived.
The interface have unicast_src_ip.

Expected behavior
no coredump

Keepalived version

Keepalived v2.2.4 (08/21,2021)

Copyright(C) 2001-2021 Alexandre Cassen, <[email protected]>

Built with kernel headers for Linux 5.10.0
Running on Linux 5.10.0-136.12.0.86.h1905.eulerosv2r12.x86_64 #1 SMP Wed Jul 3 06:00:00 UTC 2024
Distro: EulerOS 2.0 (SP12x86_64)

configure options: --build=x86_64-openEuler-linux-gnu --host=x86_64-openEuler-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-sha1 --with-init=systemd --enable-snmp --enable-snmp-rfc build_alias=x86_64-openEuler-linux-gnu host_alias=x86_64-openEuler-linux-gnu PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/generic-hardened-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection  LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/generic-hardened-ld

Config options:  LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 INIT=systemd

System options:  VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK

Distro (please complete the following information):

  • Name: openEuler
  • Version: 2203lts
  • Architecture: x86_64

Details of any containerisation or hosted service (e.g. AWS)
no

Configuration file:

global_defs {
    router_id 1
    script_user root
    vrrp_garp_master_refresh 15
}
vrrp_instance LVS_HW {
    state BACKUP
    interface eth0
    virtual_router_id 1
    priority 20
    advert_int 1
    nopreempt
    virtual_ipaddress {
        xxx dev eth0
        xxx  dev eth1
        xxx  dev eth0
    }
    virtual_ipaddress_excluded {
    }
    notify_master "xxx"
    notify_backup "xxx"
    notify_fault "xxx"
    unicast_src_ip 192.167.18.108
    unicast_peer {
        192.167.18.107
    }
}
include vsconf/*.conf

Notify and track scripts

NA

System Log entries

Nov  5 10:15:30 EulerOS ifdown[497431]: You are using 'ifdown' script provided by 'network-scripts', which are now deprecated.
Nov  5 10:15:30 EulerOS ifdown[497432]: 'network-scripts' will be removed from distribution in near future.
Nov  5 10:15:30 EulerOS ifdown[497433]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: Deassigned address 192.167.18.108 from interface eth0
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: Netlink reports eth0 down
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: (LVS_HW) Entering FAULT STATE
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: Deassigned address fe80::f816:3eff:fe80:685c from interface eth0
Nov  5 10:15:30 EulerOS ifup[497499]: You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
Nov  5 10:15:30 EulerOS ifup[497500]: 'network-scripts' will be removed from distribution in near future.
Nov  5 10:15:30 EulerOS ifup[497501]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: Netlink reports eth0 up
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: (LVS_HW) Entering BACKUP STATE
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: LVS_HW: sending gratuitous ARP for 192.167.18.108
Nov  5 10:15:30 EulerOS Keepalived_vrrp[484707]: Sending gratuitous ARP on eth0 for 192.167.18.108
Nov  5 10:15:31 EulerOS Keepalived_vrrp[484707]: Assigned address fe80::f816:3eff:fe80:685c for interface eth0
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW) Receive advertisement timeout
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW) Entering MASTER STATE
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW) setting VIPs.
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW) Sending/queueing gratuitous ARPs on eth0 for 192.167.18.110
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Sending gratuitous ARP on eth0 for 192.167.18.110
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Sending gratuitous ARP on eth0 for 192.167.18.110
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Sending gratuitous ARP on eth0 for 192.167.18.110
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Sending gratuitous ARP on eth0 for 192.167.18.110
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Sending gratuitous ARP on eth0 for 192.167.18.110

Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: Reloading
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Reloading
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Registering Kernel netlink reflector
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Registering Kernel netlink command channel
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Assigned address 192.167.18.110 for interface eth0
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: Assigned address fe80::f816:3eff:fe80:685c for interface eth0
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW) setting VIPs.
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: bind unicast_src 192.167.18.108 failed 99 - Cannot assign requested address
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW): entering FAULT state (src address not configured)
Nov  5 10:15:34 EulerOS Keepalived_vrrp[484707]: (LVS_HW) Entering FAULT STATE
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: Warning: Virtual server FWM 99: protocol specified for fwmark - protocol will be ignored
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: WARNING - fwmark virtual server FWM 99, real server [192.167.18.107]:none:5000 has port specified - clearing
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: Warning: Virtual server FWM 98: protocol specified for fwmark - protocol will be ignored
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: WARNING - fwmark virtual server FWM 98, real server [192.167.18.107]:none:8443 has port specified - clearing
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: Gained quorum 1+0=1 <= 2 for VS FWM 99
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: Gained quorum 1+0=1 <= 2 for VS FWM 98
Nov  5 10:15:34 EulerOS Keepalived_healthcheckers[484862]: reload has finished
Nov  5 10:15:34 EulerOS Keepalived[484706]: Reload complete
Nov  5 10:15:34 EulerOS Keepalived[484706]: Keepalived_vrrp exited due to segmentation fault (SIGSEGV).
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Please report a bug at https://github.com/acassen/keepalived/issues
Nov  5 10:15:34 EulerOS Keepalived[484706]:  and include this log from when keepalived started, a description
Nov  5 10:15:34 EulerOS Keepalived[484706]:  of what happened before the crash, your configuration file and the details below.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Also provide the output of keepalived -v, and whether keepalived is being
Nov  5 10:15:34 EulerOS Keepalived[484706]:  run in a container or VM.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  A failure to provide all this information may mean the crash cannot be investigated.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  If you are able to provide a stack backtrace with gdb that would really help.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Source version 2.2.4
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Built with kernel headers for Linux 5.10.0
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Running on Linux 5.10.0-136.12.0.86.h1905.eulerosv2r12.x86_64 #1 SMP Wed Jul 3 06:00:00 UTC 2024
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Command line: 'keepalived' '-D' '-p' '/var/run/keepalived.pid' '-r' '/var/run/vrrp_lvs.pid' '-c'
Nov  5 10:15:34 EulerOS Keepalived[484706]:                '/var/run/checkers_lvs.pid'
Nov  5 10:15:34 EulerOS Keepalived[484706]:  configure options: --build=x86_64-openEuler-linux-gnu --host=x86_64-openEuler-linux-gnu
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --infodir=/usr/share/info --enable-sha1 --with-init=systemd --enable-snmp
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --enable-snmp-rfc build_alias=x86_64-openEuler-linux-gnu
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     host_alias=x86_64-openEuler-linux-gnu
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -fstack-protector-strong -grecord-gcc-switches
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -specs=/usr/lib/rpm/generic-hardened-cc1 -m64 -mtune=generic
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -fasynchronous-unwind-tables -fstack-clash-protection  LDFLAGS=-Wl,-z,relro
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -Wl,-z,now -specs=/usr/lib/rpm/generic-hardened-ld
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Config options: LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 INIT=systemd
Nov  5 10:15:34 EulerOS Keepalived[484706]:  System options: VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK
Nov  5 10:15:34 EulerOS Keepalived[484706]: pid 484707 exited due to segmentation fault (SIGSEGV).
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Please report a bug at https://github.com/acassen/keepalived/issues
Nov  5 10:15:34 EulerOS Keepalived[484706]:  and include this log from when keepalived started, a description
Nov  5 10:15:34 EulerOS Keepalived[484706]:  of what happened before the crash, your configuration file and the details below.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Also provide the output of keepalived -v, and whether keepalived is being
Nov  5 10:15:34 EulerOS Keepalived[484706]:  run in a container or VM.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  A failure to provide all this information may mean the crash cannot be investigated.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  If you are able to provide a stack backtrace with gdb that would really help.
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Source version 2.2.4
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Built with kernel headers for Linux 5.10.0
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Running on Linux 5.10.0-136.12.0.86.h1905.eulerosv2r12.x86_64 #1 SMP Wed Jul 3 06:00:00 UTC 2024
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Command line: 'keepalived' '-D' '-p' '/var/run/keepalived.pid' '-r' '/var/run/vrrp_lvs.pid' '-c'
Nov  5 10:15:34 EulerOS Keepalived[484706]:                '/var/run/checkers_lvs.pid'
Nov  5 10:15:34 EulerOS Keepalived[484706]:  configure options: --build=x86_64-openEuler-linux-gnu --host=x86_64-openEuler-linux-gnu
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --infodir=/usr/share/info --enable-sha1 --with-init=systemd --enable-snmp
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     --enable-snmp-rfc build_alias=x86_64-openEuler-linux-gnu
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     host_alias=x86_64-openEuler-linux-gnu
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -fstack-protector-strong -grecord-gcc-switches
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -specs=/usr/lib/rpm/generic-hardened-cc1 -m64 -mtune=generic
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -fasynchronous-unwind-tables -fstack-clash-protection  LDFLAGS=-Wl,-z,relro
Nov  5 10:15:34 EulerOS Keepalived[484706]:                     -Wl,-z,now -specs=/usr/lib/rpm/generic-hardened-ld
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Config options: LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 INIT=systemd
Nov  5 10:15:34 EulerOS Keepalived[484706]:  System options: VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE
Nov  5 10:15:34 EulerOS Keepalived[484706]:                  GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK
Nov  5 10:15:34 EulerOS Keepalived[484706]: VRRP child process(484707) died: Respawning
Nov  5 10:15:34 EulerOS Keepalived[484706]:  Please log an issue at https://github.com/acassen/keepalived/issues/
Nov  5 10:15:34 EulerOS Keepalived[484706]:  and include a full copy of your keepalived configuration files, and
Nov  5 10:15:34 EulerOS Keepalived[484706]:  copies of the keepalived system log entries around the time this happened

Did keepalived coredump?

stacktrace:
 (gdb) bt
#0  0x000055da36d6785e in vrrp_send_pkt (vrrp=vrrp@entry=0x55da385a2280, peer=peer@entry=0x55da385a2e20) at vrrp.c:1477
#1  0x000055da36d69090 in vrrp_send_adv (vrrp=vrrp@entry=0x55da385a2280, prio=prio@entry=0 '\000') at vrrp.c:1526
#2  0x000055da36d694e1 in vrrp_state_leave_master (vrrp=vrrp@entry=0x55da385a2280, advF=advF@entry=true) at vrrp.c:1804
#3  0x000055da36d771f9 in down_instance (vrrp=vrrp@entry=0x55da385a2280) at vrrp_track.c:546
#4  0x000055da36d6d59c in open_sockpool_socket (sock=sock@entry=0x55da38593190) at vrrp.c:2507
#5  0x000055da36d71df0 in vrrp_open_sockpool (l=0x55da3859b8a0) at vrrp_scheduler.c:526
#6  vrrp_dispatcher_init (thread=<optimized out>) at vrrp_scheduler.c:563
#7  0x000055da36d9ae5e in thread_call (thread=0x55da3859b7a0) at scheduler.c:1975
#8  process_threads (m=0x55da385939d0) at scheduler.c:2039
#9  0x000055da36d9b875 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:2168
#10 0x000055da36d5f1e7 in start_vrrp_child () at vrrp_daemon.c:1138
#11 0x000055da36d27886 in start_keepalived (thread=<optimized out>) at main.c:572
#12 0x000055da36d9ae5e in thread_call (thread=0x55da38593950) at scheduler.c:1975
#13 process_threads (m=0x55da385932d0) at scheduler.c:2039
#14 0x000055da36d9b875 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:2168
#15 0x000055da36d299f1 in keepalived_main (argc=<optimized out>, argv=0x7ffc7c3db958) at main.c:2733
#16 0x00007f9f89b67390 in ?? () from /usr/lib64/libc.so.6
#17 0x00007f9f89b6743c in __libc_start_main () from /usr/lib64/libc.so.6
#18 0x000055da36d26775 in _start ()


NULL ptr in 
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `keepalived -D -p /var/run/keepalived.pid -r /var/run/vrrp_lvs.pid -c /var/run/c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055da36d6785e in vrrp_send_pkt (vrrp=vrrp@entry=0x55da385a2280, peer=peer@entry=0x55da385a2e20) at vrrp.c:1477
1477            return sendmsg(vrrp->sockets->fd_out, &msg, (peer) ? 0 : MSG_DONTROUTE);
Missing separate debuginfos, use: dnf debuginfo-install file-libs-5.41-2.h2.eulerosv2r12.x86_64 glibc-2.34-105.h38.eulerosv2r12.x86_64 iptables-libs-1.8.7-11.h8.eulerosv2r12.x86_64 libmnl-1.0.5-1.h2.eulerosv2r12.x86_64 libnftnl-1.2.0-2.h1.eulerosv2r12.x86_64 libnl3-3.7.0-1.h7.eulerosv2r12.x86_64 libpcap-1.10.1-3.eulerosv2r12.x86_64 libxcrypt-4.4.26-4.h2.eulerosv2r12.x86_64 lm_sensors-3.6.0-5.h2.eulerosv2r12.x86_64 net-snmp-5.9.1-5.h4.eulerosv2r12.x86_64 net-snmp-libs-5.9.1-5.h4.eulerosv2r12.x86_64 openssl-libs-1.1.1m-15.h41.eulerosv2r12.x86_64 perl-libs-5.34.0-6.h11.eulerosv2r12.x86_64 zlib-1.2.11-22.h7.eulerosv2r12.x86_64
(gdb) list
1472            if (vrrp->family == AF_INET && do_checksum_debug)
1473                    check_tx_checksum(vrrp, peer);
1474    #endif
1475
1476            /* Send the packet */
1477            return sendmsg(vrrp->sockets->fd_out, &msg, (peer) ? 0 : MSG_DONTROUTE);
1478    }
1479
1480    /* Allocate the sending buffer */
1481    static void
(gdb) p vrrp->sockets
$1 = (sock_t *) 0x0

Additional context
NA

@swimlessbird
Copy link
Author

Thank you very much for looking into this.

@pqarmitage
Copy link
Collaborator

pqarmitage commented Nov 9, 2024

@swimlessbird Cancel my earlier request - I didn't see that you have already shown at vrrp->sockets == NULL.

@pqarmitage
Copy link
Collaborator

I have reproduced this problem with keepalived v2.2.4. The problem is that that version did not handle properly the unicast_src_ip being deleted, and didn't put the VRRP instance into fault state when the address is deleted. When keepalived reloads its configuration it opens the outbound socket and attempts to bind it to the unicast_src_ip which of course fails. It then closes the socket and vrrp->sock is not set.

This problem has been resolved by v2.3.2 (the latest version) although I have not tracked down the specific commit which resolved it. keepalived now puts the VRRP instance into fault state if the unicast_src_ip is deleted, and brings the vrrp instance out of fault state when the IP addresses is added back (provided that everything else is OK).

I suggest you upgrade keepalived to v2.3.2 to resolve this issue. You will find many improvements and bug fixes since v2.2.4 was released over 3 years ago.

@swimlessbird
Copy link
Author

I have reproduced this problem with keepalived v2.3.2.
The log :

Nov 18 19:57:10 localhost systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Nov 18 19:57:12 localhost NetworkManager[1166]: <info>  [1731931032.1504] audit: op="connections-load" args="/etc/sysconfig/network-scripts/ifcfg-eth2" pid=2073144 uid=0 result="success"
Nov 18 19:57:12 localhost ifdown[2073155]: You are using 'ifdown' script provided by 'network-scripts', which are now deprecated.
Nov 18 19:57:12 localhost ifdown[2073156]: 'network-scripts' will be removed from distribution in near future.
Nov 18 19:57:12 localhost ifdown[2073157]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Nov 18 19:57:12 localhost NetworkManager[1166]: <info>  [1731931032.2475] device (eth2): state change: activated -> deactivating (reason 'user-requested', sys-iface-state: 'managed')
Nov 18 19:57:12 localhost dbus-daemon[897]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.3' (uid=0 pid=1166 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
Nov 18 19:57:12 localhost NetworkManager[1166]: <info>  [1731931032.2489] audit: op="device-disconnect" interface="eth2" ifindex=4 pid=2073168 uid=0 result="success"
Nov 18 19:57:12 localhost systemd[1]: Starting Network Manager Script Dispatcher Service...
Nov 18 19:57:12 localhost dbus-daemon[897]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 18 19:57:12 localhost systemd[1]: Started Network Manager Script Dispatcher Service.
Nov 18 19:57:12 localhost NetworkManager[1166]: <info>  [1731931032.2661] device (eth2): state change: deactivating -> disconnected (reason 'user-requested', sys-iface-state: 'managed')
Nov 18 19:57:12 localhost kernel: [202614.807486] print_fib6_table_status: 1 callbacks suppressed
Nov 18 19:57:12 localhost NetworkManager[1166]: <info>  [1731931032.2723] policy: set-hostname: current hostname was changed outside NetworkManager: 'localhost.localdomain'
Nov 18 19:57:12 localhost NetworkManager[1166]: <info>  [1731931032.2725] policy: set-hostname: set hostname to 'glen-06-b2-v4wan-168105-cust346.vm9.cable.virginm.net' (from address lookup)
Nov 18 19:57:12 localhost dbus-daemon[897]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.3' (uid=0 pid=1166 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
Nov 18 19:57:12 localhost systemd[1]: Starting Hostname Service...
Nov 18 19:57:12 localhost dbus-daemon[897]: [system] Successfully activated service 'org.freedesktop.hostname1'
Nov 18 19:57:12 localhost systemd[1]: Started Hostname Service.
Nov 18 19:57:12 localhost systemd-hostnamed[2073176]: Hostname set to <localhost.localdomain> (static)
Nov 18 19:57:13 localhost systemctl[2073218]: [systemctl reload keepalived] called by PID 2073136 (bash test.sh)
Nov 18 19:57:13 localhost systemd[1]: Reloading LVS and VRRP High Availability Monitor...
Nov 18 19:57:13 localhost Keepalived[2072747]: Reloading ...
Nov 18 19:57:13 localhost Keepalived[2072747]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 18 19:57:13 localhost Keepalived[2072747]: Configuration file /etc/keepalived/keepalived.conf
Nov 18 19:57:13 localhost systemd[1]: Reloaded LVS and VRRP High Availability Monitor.
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Reloading
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: vrrp_ipsets has been specified but not vrrp_iptables - vrrp_ipsets will be ignored
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: (LVS_HW) setting VIPs.
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: (LVS_HW) Sending/queueing gratuitous ARPs on eth1 for 172.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 172.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: (LVS_HW) Sending/queueing gratuitous ARPs on eth1 for 80.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 80.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 172.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 80.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 172.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 80.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 172.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 80.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 172.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: Sending gratuitous ARP on eth1 for 80.4.67.91
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: bind unicast_src 55.55.55.1 failed 99 - Cannot assign requested address
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: (LVS_HW): entering FAULT state (src address not configured)
Nov 18 19:57:13 localhost Keepalived_vrrp[2072749]: (LVS_HW) Entering FAULT STATE
Nov 18 19:57:13 localhost Keepalived[2072747]: Reload complete
Nov 18 19:57:13 localhost kernel: [202615.686541] keepalived[2072749]: segfault at 2c ip 00005600bb2b441e sp 00007ffe749f5620 error 4 in keepalived[5600bb276000+7f000]
Nov 18 19:57:13 localhost kernel: [202615.686555] Code: 00 31 d2 4c 8d 5c 24 10 eb 16 66 90 4c 89 54 24 10 31 d2 4c 8d 5c 24 10 c7 44 24 18 10 00 00 00 49 8b 81 58 02 00 00 4c 89 de <8b> 78 2c e8 5a 25 fc ff 48 8b 8c 24 58 01 00 00 64 48 2b 0c 25 28
Nov 18 19:57:13 localhost systemd[1]: Started Process Core Dump (PID 2073222/UID 0).
Nov 18 19:57:13 localhost Keepalived[2072747]: pid 2072749 exited due to segmentation fault (SIGSEGV).
Nov 18 19:57:13 localhost Keepalived[2072747]:  Please report a bug at https://github.com/acassen/keepalived/issues
Nov 18 19:57:13 localhost Keepalived[2072747]:  and include this log from when keepalived started, a description
Nov 18 19:57:13 localhost Keepalived[2072747]:  of what happened before the crash, your configuration file and the details below.
Nov 18 19:57:13 localhost Keepalived[2072747]:  Also provide the output of keepalived -v, and whether keepalived is being
Nov 18 19:57:13 localhost Keepalived[2072747]:  run in a container or VM.
Nov 18 19:57:13 localhost Keepalived[2072747]:  A failure to provide all this information may mean the crash cannot be investigated.
Nov 18 19:57:13 localhost Keepalived[2072747]:  If you are able to provide a stack backtrace with gdb that would really help.
Nov 18 19:57:13 localhost Keepalived[2072747]:  Source version 2.3.2
Nov 18 19:57:13 localhost Keepalived[2072747]:  Built with kernel headers for Linux 5.10.0
Nov 18 19:57:13 localhost Keepalived[2072747]:  Running on Linux 5.10.0-182.0.0.95.h2358.eulerosv2r13.x86_64 #1 SMP Fri Nov 15 16:12:28 UTC 2024
Nov 18 19:57:13 localhost Keepalived[2072747]:  Command line: '/usr/sbin/keepalived' '-D'
Nov 18 19:57:13 localhost Keepalived[2072747]:  configure options: --build=x86_64-openEuler-linux-gnu --host=x86_64-openEuler-linux-gnu
Nov 18 19:57:13 localhost Keepalived[2072747]:                     --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
Nov 18 19:57:13 localhost Keepalived[2072747]:                     --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share
Nov 18 19:57:13 localhost Keepalived[2072747]:                     --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec
Nov 18 19:57:13 localhost Keepalived[2072747]:                     --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man
Nov 18 19:57:13 localhost Keepalived[2072747]:                     --infodir=/usr/share/info --enable-sha1 --with-init=systemd --enable-snmp
Nov 18 19:57:13 localhost Keepalived[2072747]:                     --enable-snmp-rfc build_alias=x86_64-openEuler-linux-gnu
Nov 18 19:57:13 localhost Keepalived[2072747]:                     host_alias=x86_64-openEuler-linux-gnu
Nov 18 19:57:13 localhost Keepalived[2072747]:                     PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe
Nov 18 19:57:13 localhost Keepalived[2072747]:                     -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
Nov 18 19:57:13 localhost Keepalived[2072747]:                     -fstack-protector-strong -grecord-gcc-switches
Nov 18 19:57:13 localhost Keepalived[2072747]:                     -specs=/usr/lib/rpm/generic-hardened-cc1 -m64 -mtune=generic
Nov 18 19:57:13 localhost Keepalived[2072747]:                     -fasynchronous-unwind-tables -fstack-clash-protection  LDFLAGS=-Wl,-z,relro
Nov 18 19:57:13 localhost Keepalived[2072747]:                     -Wl,-z,now -specs=/usr/lib/rpm/generic-hardened-ld
Nov 18 19:57:13 localhost Keepalived[2072747]:  Config options: LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT
Nov 18 19:57:13 localhost Keepalived[2072747]:                  SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3
Nov 18 19:57:13 localhost Keepalived[2072747]:                  IPROUTE_ETC_DIR=/etc/iproute2.d INIT=systemd SYSTEMD_NOTIFY
Nov 18 19:57:13 localhost Keepalived[2072747]:  System options: VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES
Nov 18 19:57:13 localhost Keepalived[2072747]:                  RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID
Nov 18 19:57:13 localhost Keepalived[2072747]:                  RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE
Nov 18 19:57:13 localhost Keepalived[2072747]:                  RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE
Nov 18 19:57:13 localhost Keepalived[2072747]:                  FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA
Nov 18 19:57:13 localhost Keepalived[2072747]:                  IPTABLES NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY
Nov 18 19:57:13 localhost Keepalived[2072747]:                  IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE
Nov 18 19:57:13 localhost Keepalived[2072747]:                  VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF
Nov 18 19:57:13 localhost Keepalived[2072747]:                  SO_MARK
Nov 18 19:57:13 localhost Keepalived[2072747]: VRRP child process(2072749) died: Respawning
Nov 18 19:57:13 localhost Keepalived[2072747]:  Please log an issue at https://github.com/acassen/keepalived/issues/
Nov 18 19:57:13 localhost Keepalived[2072747]:  and include a full copy of your keepalived configuration files, and
Nov 18 19:57:13 localhost Keepalived[2072747]:  copies of the keepalived system log entries around the time this happened
Nov 18 19:57:13 localhost Keepalived[2072747]: Restart of VRRP process delayed 0 seconds to limit respawn rate
Nov 18 19:57:13 localhost Keepalived[2072747]: Starting VRRP child process, pid=2073224
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: Registering Kernel netlink reflector
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: Registering Kernel netlink command channel
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: vrrp_ipsets has been specified but not vrrp_iptables - vrrp_ipsets will be ignored
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: Assigned address 172.4.67.91 for interface eth1
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: Registering gratuitous ARP shared channel
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: (LVS_HW) removing VIPs.
Nov 18 19:57:13 localhost Keepalived[2072747]: read eventfd count 1, num_reloading 0
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: bind unicast_src 55.55.55.1 failed 99 - Cannot assign requested address
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: (LVS_HW): entering FAULT state (src address not configured)
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: (LVS_HW) Entering FAULT STATE
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: (LVS_HW) removing VIPs.
Nov 18 19:57:13 localhost Keepalived_vrrp[2073224]: VRRP sockpool: [ifindex(  3), family(IPv4), proto(112), fd(-1,-1) , unicast, address(55.55.55.1)]
Nov 18 19:57:13 localhost NetworkManager[1166]: <info>  [1731931033.2422] policy: set-hostname: current hostname was changed outside NetworkManager: 'localhost'
Nov 18 19:57:13 localhost systemd-coredump[2073223]: Process 2072749 (keepalived) of user 0 dumped core.
Nov 18 19:57:13 localhost systemd[1]: [email protected]: Deactivated successfully.
Nov 18 19:57:15 localhost NetworkManager[1166]: <info>  [1731931035.1572] audit: op="connections-load" args="/etc/sysconfig/network-scripts/ifcfg-eth2" pid=2073237 uid=0 result="success"
Nov 18 19:57:15 localhost ifup[2073248]: You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
Nov 18 19:57:15 localhost ifup[2073249]: 'network-scripts' will be removed from distribution in near future.
Nov 18 19:57:15 localhost ifup[2073250]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.

conf:

[root@localhost ~]# cat /etc/keepalived/keepalived.conf
global_defs {
    router_id 1
    script_user root
    vrrp_garp_master_refresh 15
}
vrrp_instance LVS_HW {
    state BACKUP
    interface eth1
    virtual_router_id 1
    priority 10
    advert_int 1
    nopreempt
    virtual_ipaddress {
        172.4.67.91/24  dev eth1
        80.4.67.91/24  dev eth1
    }
    virtual_ipaddress_excluded {
    }
    unicast_src_ip 55.55.55.1
    unicast_peer {
        55.55.55.2
    }
}

The interface conf

[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPADDR=55.55.55.1
GATEWAY=55.55.55.1
NETMASK=255.255.254.0
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eth2
DEVICE=eth2
ONBOOT=yes
ZONE=trusted

The reproduce script:

#!/bin/bash
ifdown eth2 &
sleep 1
systemctl reload keepalived &
sleep 2
ifup eth2

@swimlessbird
Copy link
Author

@pqarmitage This problem can reproduce this problem with keepalived v2.3.2.
We can add check before use.
e.g.

vrrp_send_pkt(vrrp_t * vrrp, unicast_peer_t *peer)
                check_tx_checksum(vrrp, peer);
 #endif

+       if (!vrrp->sockets)
+               return -1;
+
        /* Send the packet */
        return sendmsg(vrrp->sockets->fd_out, &msg, (peer) ? 0 : MSG_DONTROUTE);
 }

@pqarmitage
Copy link
Collaborator

@swimlessbird I was testing something very similar to this yesterday and was not getting a segfault, although the behaviour I was seeing was not quite as I expected.

I think the check for !vrrp->socketsmay just as well be done at the beginning of vrrp_send_pkt() and avoid the unnecessary work of setting up msg if the packet cannot be sent.

Was the segfault in the return sendmsg(vrrp->sockets->fd_out, &msg, (peer) ? 0 : MSG_DONTROUTE); line, presumably because vrrp->sockets is NULL?

I will have a look at this this evening and I hope to have a patch for tomorrow.

@pqarmitage pqarmitage reopened this Nov 18, 2024
@pqarmitage
Copy link
Collaborator

@swimlessbird Having looked at your keepalived config and the ifcfg-eth2 and reproduce scripts I am a bit confused. The vrrp instance is configured to use eth1, but the 55.55.55.0/23 subnet and the ifdown/up in the script are using eth2.

Is the way you are wanting to work that the VRRP packets are sent over eth2, but the virtual_ipaddresses are configured on eth1? If so, then I think that might be something that we hadn't anticipated in the code.

@swimlessbird
Copy link
Author

@swimlessbird Having looked at your keepalived config and the ifcfg-eth2 and reproduce scripts I am a bit confused. The vrrp instance is configured to use eth1, but the 55.55.55.0/23 subnet and the ifdown/up in the script are using eth2.

Is the way you are wanting to work that the VRRP packets are sent over eth2, but the virtual_ipaddresses are configured on eth1? If so, then I think that might be something that we hadn't anticipated in the code.

Yes, I want to work that the VRRP packets are sent over eth2, but the virtual_ipaddresses are configured on eth1.

@swimlessbird
Copy link
Author

@swimlessbird I was testing something very similar to this yesterday and was not getting a segfault, although the behaviour I was seeing was not quite as I expected.

I think the check for !vrrp->socketsmay just as well be done at the beginning of vrrp_send_pkt() and avoid the unnecessary work of setting up msg if the packet cannot be sent.

Was the segfault in the return sendmsg(vrrp->sockets->fd_out, &msg, (peer) ? 0 : MSG_DONTROUTE); line, presumably because vrrp->sockets is NULL?

I will have a look at this this evening and I hope to have a patch for tomorrow.

Yes,the segfault is because vrrp->sockets is NULL . And the gdb output:

Core was generated by `/usr/sbin/keepalived -D'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005600bb2b441e in vrrp_send_pkt (vrrp=vrrp@entry=0x5600ce80f8f0, peer=peer@entry=0x5600ce810780) at vrrp.c:1519
1519            return sendmsg(vrrp->sockets->fd_out, &msg, (peer) ? 0 : MSG_DONTROUTE);
Missing separate debuginfos, use: dnf debuginfo-install file-libs-5.41-3.h1.eulerosv2r13.x86_64 glibc-2.34-143.h20.eulerosv2r13.x86_64 iptables-libs-1.8.7-14.h3.eulerosv2r13.x86_64 libcap-2.61-6.h2.eulerosv2r13.x86_64 libgcrypt-1.10.2-1.h2.eulerosv2r13.x86_64 libgpg-error-1.46-1.eulerosv2r13.x86_64 libmnl-1.0.5-2.eulerosv2r13.x86_64 libnftnl-1.2.0-6.eulerosv2r13.x86_64 libnl3-3.7.0-2.h2.eulerosv2r13.x86_64 libxcrypt-4.4.26-5.h1.eulerosv2r13.x86_64 lm_sensors-3.6.0-7.h1.eulerosv2r13.x86_64 lz4-1.9.3-3.eulerosv2r13.x86_64 net-snmp-5.9.1-6.h8.eulerosv2r13.x86_64 net-snmp-libs-5.9.1-6.h8.eulerosv2r13.x86_64 openssl-libs-1.1.1wa-2.h19.eulerosv2r13.x86_64 perl-libs-5.34.0-13.h3.eulerosv2r13.x86_64 systemd-libs-249-63.h18.eulerosv2r13.x86_64 xz-libs-5.2.5-3.eulerosv2r13.x86_64 zlib-1.2.11-24.h4.eulerosv2r13.x86_64
(gdb) bt
#0  0x00005600bb2b441e in vrrp_send_pkt (vrrp=vrrp@entry=0x5600ce80f8f0, peer=peer@entry=0x5600ce810780) at vrrp.c:1519
#1  0x00005600bb2b5ee6 in vrrp_send_adv (vrrp=vrrp@entry=0x5600ce80f8f0, prio=prio@entry=0 '\000') at vrrp.c:1568
#2  0x00005600bb2b668a in vrrp_state_leave_master (vrrp=vrrp@entry=0x5600ce80f8f0, advF=advF@entry=true) at vrrp.c:1857
#3  0x00005600bb2c4cf9 in down_instance (vrrp=vrrp@entry=0x5600ce80f8f0) at vrrp_track.c:548
#4  0x00005600bb2ba63c in open_sockpool_socket (sock=sock@entry=0x5600ce800820) at vrrp.c:2610
#5  0x00005600bb2bf56a in vrrp_open_sockpool (l=0x5600ce80b4d0) at vrrp_scheduler.c:562
#6  vrrp_dispatcher_init (thread=<optimized out>) at vrrp_scheduler.c:599
#7  0x00005600bb2ead14 in thread_call (thread=0x5600ce802ad0) at scheduler.c:2078
#8  process_threads (m=0x5600ce802d60) at scheduler.c:2145
#9  0x00005600bb2eb5c1 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:2267
#10 0x00005600bb2aaa95 in start_vrrp_child () at vrrp_daemon.c:1179
#11 0x00005600bb2ead14 in thread_call (thread=0x5600ce7faf20) at scheduler.c:2078
#12 process_threads (m=0x5600ce7fff30) at scheduler.c:2145
#13 0x00005600bb2eb5c1 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:2267
#14 0x00005600bb27ac5c in keepalived_main (argc=<optimized out>, argv=0x7ffe749f6578) at main.c:2869
#15 0x00007f7a223e7090 in ?? () from /usr/lib64/libc.so.6
#16 0x00007f7a223e713c in __libc_start_main () from /usr/lib64/libc.so.6
#17 0x00005600bb277815 in _start ()
(gdb) p vrrp->sockets
$1 = (sock_t *) 0x0
(gdb)

@pqarmitage
Copy link
Collaborator

Yes, I want to work that the VRRP packets are sent over eth2, but the virtual_ipaddresses are configured on eth1.

The configuration you have listed above doesn't do what you want. Since you have specified interface eth1 the VRRP packets are sent on eth1. If you want the VRRP packets to be sent over eth2, you need to specify interface eth2. The VIPs will still be configured on eth1 since you have specified that against each VIP.

The minimal actions necessary to cause this problem are: ip addr del 55.55.55.3/23 dev eth2 and then systemctl reload keepalived. Because you are using ifup/ifdown scripts, when you down the interface the configured addresses are also removed; it is not necessary for the interface to be down for this problem to occur. And it doesn't matter whether you specify eth1 or eth2 as the interface the problem will still occur; in fact you don't even need to specify an interface and keepalived will use the interface that the unicast_src_ip address is configured on.

The basic problem is that we do not track unicast_src_ip addresses being deleted. On the face of it, everything still looks OK after the unicast_src_ip is deleted, since keepalived keeps sending adverts. Unfortunately, since the address has been deleted, it cannot receive any adverts sent to the (deleted) address, and so the vrrp instance(s) will transition to master if they were in backup state, potentially causing multiple masters (split brain situation).

I am going to have to change the code so that deleting the unicast_src_ip address from an interface (actually, deleting the last one configured) puts the VRRP instance into fault state, and when the first such address is added, it ups the instance (and if necessary opens the sockets).

There is also a socket option that can help - IP_FREEBIND. Having set this option, we can then bind to the unicast_src_ip even if it is not currently configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants