Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using IPv6 address, segmentation fault can occur due to incorrectly using IPv4 ping #149

Closed
nuno-silva opened this issue Mar 20, 2020 · 17 comments

Comments

@nuno-silva
Copy link
Contributor

Describe the bug
Spine runs and is able to populate (some?) graphs, but ends up crashing with SIGSEGV after (apparently) failing to resolve an IPv6 address.

To Reproduce
Steps to reproduce the behaviour:

  1. Install cacti-spine 1.2.10
  2. Configure spine
  3. Run spine (as root), using gdb
  4. See SIGSEGV

Expected behaviour
Spine should not segfault. This was running fine using spine 1.2.8.

Server

  • OS: Gentoo Linux
  • Spine 1.2.10

Compiling

  • compiler: gcc version 9.2.0 (Gentoo Hardened 9.2.0-r2 p3)
  • autoconf: autoconf (GNU Autoconf) 2.69
  • glibc: ldd (Gentoo 2.29-r7 p8) 2.29
  • source: release (cacti-spine-1.2.10.tar.gz)

Additional context
gdb log

2020/03/20 23:01:54 - SPINE: Poller[1] PID[15304] WARNING: Error resolving host my-ipv6-host.example.com (Name or service not known)
free(): invalid pointer
--Type <RET> for more, q to quit, c to continue without paging--

Thread 216 "spine" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fb723ca0700 (LWP 17768)]
0x00007fb7250d35e1 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fb7250d35e1 in raise () from /lib64/libc.so.6
#1  0x00007fb7250bc539 in abort () from /lib64/libc.so.6
#2  0x00007fb72511a228 in ?? () from /lib64/libc.so.6
#3  0x00007fb725121f7a in ?? () from /lib64/libc.so.6
#4  0x00007fb725123ae4 in ?? () from /lib64/libc.so.6
#5  0x00007fb72518a008 in freeaddrinfo () from /lib64/libc.so.6
#6  0x000055997380c272 in init_sockaddr (name=0x7fb723c7a410, 
    hostname=0x7fb71803b780 "my-ipv6-host.example.com", port=7) at ping.c:878
#7  0x000055997380aa53 in ping_icmp (host=0x7fb718027eb0, ping=0x7fb718011c70) at ping.c:367
#8  0x0000559973809fde in ping_host (host=0x7fb718027eb0, ping=0x7fb718011c70) at ping.c:69
#9  0x0000559973800e39 in poll_host (host_id=558, host_thread=1, last_host_thread=1, host_data_ids=0, 
    host_time=0x7fb723c9fdd0 "2020-03-20 23:01:54", host_errors=0x7fb723c9fd80, host_time_double=1584745314.5383811)
    at poller.c:571
#10 0x00005599737ff595 in child (arg=0x559973d5a290) at poller.c:77
#11 0x00007fb7253c4427 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fb7251a1b2f in clone () from /lib64/libc.so.6

Also note that the my-ipv6-host.example.com is a dummy host that I have replaced for privacy. However, it was a valid DNS name that resolves to valid AAAA record.

What I see in cacti.log is

2020/03/20 23:25:52 - SPINE: Poller[Main Poller] PID[15682] FATAL: Spine Encountered An Unhandled Exception Signal Number: '6' [0, Success] (Spine thread)
2020/03/20 23:25:52 - SPINE: Poller[Main Poller] PID[15682] WARNING: Error resolving host my-ipv6-host.example.com (Name or service not known)
@TheWitness
Copy link
Member

Does it have a valid v4 address? What ping method are you using? What about the Cacti host, does it have an IPv6 address?

TheWitness added a commit that referenced this issue Mar 21, 2020
Segmentation fault in spine 1.2.10 due to IPv6 address using IPv4 ICMP ping
@TheWitness TheWitness changed the title Segmentation fault in spine 1.2.10 Segmentation fault in spine 1.2.10 due to IPv6 address using IPv4 ICMP ping Mar 21, 2020
@TheWitness
Copy link
Member

Okay, I've pushed an update to the 1.2.x branch. Testing using that branch. Use the following option to make the testing a bit more simple:

./spine -R -V 3 -S -f host_id -l host_id

Replace host_id with the IPv6 host.

@TheWitness
Copy link
Member

Please also answer the other questions. If the host also support IPv4, we could always use it's IPv4 address for ping. Long term, we just need to support ICMP, TCP, and UDP over IPv6, which we do not support today.

@nuno-silva
Copy link
Contributor Author

Thank you for your work.
The cacti host/device in question does not have an IPv4 address. It's a DNS name that resolves to an IPv6 address only (this is intentional; it's a device to monitor IPv6 connectivity). Here's some of the settings for the device:

  • SNMP Version: Not In Used
  • Downed Device Detection: Ping
  • Ping Method: ICMP Ping
  • Ping Timeout Value: 500
  • Ping Retry Count: 1

After opening the issue, I disabled the device as a workaround to stop the SIGSEGV.
I'll test your patch tomorrow once I get a chance and report back.

@nuno-silva
Copy link
Contributor Author

I recompiled spine with the paches from release/1.2.10...0a5cba7 and the SIGSEGV is gone.
However, this created more problems:

  1. the IPv6-only device I mentioned before appears offline in cacti, despite being online. The cacti interface (which I believe uses cmd.php) is able to ping it ("ICMP Ping Success (7.111 ms)"). Here's the spine output:
spine -R -V 3 -S -f 558 -l 558
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 1.2.10 starting
NOTE: Spine will support multithread device polling.
DEBUG: Initial Value of Active Threads is 0
SPINE: Active Threads is 1, Pending is 1
SPINE: Active Threads is 2, Pending is 2
DEBUG: In Poller, About to Start Polling of Device for Device ID 558
Device[0] HT[1] Total Time: 0.0014 Seconds
POLLER: Active Threads is 1, Pending is 1
Device[558] HT[1] NOTE: There are '2' Polling Items for this Device
Device[558] HT[1] Total Time: 0.0027 Seconds
Device[558] HT[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function
DEBUG: The Value of Active Threads is 0 for Device ID 558
POLLER: Active Threads is 0, Pending is 0
SPINE: The Final Value of Threads is 0
Time: 0.3146 s, Threads: 4, Devices: 2
  1. all my devices devices using a DNS name containing both A and AAAA records (IPv4+IPv6) that are using ICMP Ping no longer work. E.g.:
2020/03/22 19:29:59 - SPINE: Poller[Main Poller] PID[26479] Device[google-public-dns-b.google.com] Hostname[google-public-dns-b.google.com] ERROR: HOST EVENT: Device is DOWN Message: PING: Device is IPV6. Please use the SNMP ping options only.

Note: google-public-dns-b.google.com is a real device that I have (contains only a "ping latency" graph) and, obviously, has no SNMP available. You can use it to test this on your end if necessary.
You can probably add one IPV6-only name to your /etc/hosts to test the original problem, which would be equivalent to my my-ipv6-host.example.com device:

my-ipv6-host.example.com  2001:4860:4860::8888  # Google DNS

TL;DR: the ICMP method is now broken whenever a device contains an AAAA record, even if it also contains an A record.

@TheWitness
Copy link
Member

So, if a device has an ipv4, do you want to bias to that address?

@TheWitness
Copy link
Member

When you ping6 to one of this hosts you don't need to specify an outbound interface do you?

TheWitness added a commit that referenced this issue Mar 22, 2020
@TheWitness
Copy link
Member

Answer the two questions, but with this update, you should be all set. There is still no ping6, but if you answer the questions, that'll help. It's not a big deal, just time, and I don't really want to enable ipv6 on my home router (need FW rules first).

@nuno-silva
Copy link
Contributor Author

So, if a device has an ipv4, do you want to bias to that address?

If ICMP is implemented for IPv6, it's better to use IPv6. If it's not implemented/working, at least fallback to IPv4 when the device also has an IPv4 address. This is better than having devices with both IPv4/IPv6 not work at all unless they're using SNMP.

When you ping6 to one of this hosts you don't need to specify an outbound interface do you?

I do have two interfaces on the machine, but one of them is for a local network only. I do not need to specify an outbound interface:

myhost ~ # ping6 -c1 google-public-dns-b.google.com
PING google-public-dns-b.google.com(dns.google (2001:4860:4860::8844)) 56 data bytes
64 bytes from dns.google (2001:4860:4860::8844): icmp_seq=1 ttl=53 time=11.3 ms

--- google-public-dns-b.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 11.340/11.340/11.340/0.000 ms

I'll test it and report back. Thanks.

@TheWitness
Copy link
Member

Cool

@nuno-silva
Copy link
Contributor Author

Ok, so:

  • the dual-stacked (IPv4+IPv6) ICMP-only hosts are working again. That's good.
  • the IPv6-only host does not seg fault, but appears offline. I understand spine has no support yet for ICMPv6 (which is unfortunate, but understandable). However, having IPv6-only devices appearing offline (Device[558] PING Result: PING: Device is IPV6. Please use the SNMP ping options only.) prevents IPv6-enabled scripts or Data Input Methods from running at all -- this was also the case before the SIGSEGV was introduced. Unfortunately, using SNMP ping is not an option for devices which are not managed by me or don't support it.

That being said, I think supporting ICMPv6 is another issue in itself and this issue "as-is" is resolved.
On the other hand, I think the comments I made in both commits related to this issue (5644db7, 4155ecb) are relevant and should be addressed before closing, as they may cause other issues in the future.

Once again, thank you for your time and work.

@TheWitness
Copy link
Member

TheWitness commented Mar 23, 2020

Yea, working on it. Before I fix the ping issue, I want to see how other OSS projects are handling the stack issues. Most seem to be written to do everything with the IPv6 stack and they simply place the IPv4 inside the IPv6 API. So, I'll give that a look, and then I have to enable IPv6 again on my router. Not a big deal, I just have to reset a bunch of NAT rules to protect the household. That way I'll be able to test properly.

@TheWitness
Copy link
Member

Those hosts should be fine, just use SNMP ping instead of ICMP|TCP|UDP for now.

TheWitness added a commit that referenced this issue Mar 23, 2020
This should finish up what is currently in scope.  Long term, instead of returnning the addr type, return a structure of the addr_type and the socket information so that we don't have to call getaddrinfo() a second time.  That structure can be used in the various ping_* functions.
@TheWitness
Copy link
Member

BTW, the ICMPv6 has been logged for some time, it's here: #127 if you feel up for doing a pull request, just let me know you are working on it.

@nuno-silva
Copy link
Contributor Author

Can't use SNMP ping for host which don't have SNMP running :)

One use case is monitoring IPv6-only hosts, for example, ipv6.test-ipv6.nl. In this case, using TCP ping would probably work (since this is a public web server), but in the case of a third party router with no open ports, that's not an option.
I tried TCP ping on my IPv6-only device and it is also failing with PING Result: PING: Device is IPV6. Please use the SNMP ping options only..

As for the pull-request: I don't have the time right now, but should I have some free time on my hands in the future, I will consider working on ICMPv6 and will let you know.

TheWitness added a commit that referenced this issue Mar 23, 2020
@TheWitness
Copy link
Member

@nuno-silva, no problem. I'm stuck at the house right now. Implementing ICMP|TCP|UDP ping are not that complicated, I've just never had a need for them. With that said, next I'll change the function to return the correct socket structure, and then pass that structure to the various pings. Should be done before the end of the week unless something get's in the way. I might not even do that. I'll have to look at it during the week.

@TheWitness
Copy link
Member

Close this when you are satisfied.

nuno-silva added a commit to nuno-silva/spine that referenced this issue Apr 1, 2020
netniV pushed a commit that referenced this issue Apr 1, 2020
@netniV netniV changed the title Segmentation fault in spine 1.2.10 due to IPv6 address using IPv4 ICMP ping When using IPv6 address, segmentation fault can occur due to incorrectly using IPv4 ping Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants