Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False Positive #912

Open
piflav opened this issue Jan 26, 2022 · 10 comments
Open

False Positive #912

piflav opened this issue Jan 26, 2022 · 10 comments
Labels

Comments

@piflav
Copy link

piflav commented Jan 26, 2022

Hi,
since a few months I'm SimpleMonitoring 150+ hosts from a Windows Server.
Very basic just ping every 1 min plus Pushover notifications and HTML status page:

[HostName]
type=ping
host=172.x.y.z
tolerance=5

It works fine but I've realized when an Host is down for long time another one is often reported up and down every 10/15 mins even if (checked pinging directly from command line) no packet was really lost.
It looks like the false positive problem is reported for the Host immediately before in the configuration file of the one really down.
For example:

#Host reported flapping even if UP
[Host-A]
type=ping
host=172.x.y.z
tolerance=5

#Host DOWN since long time
[Host-B]
type=ping
host=172.x.y.z
tolerance=5

If I comment the Host-B configuration the problem disappear.
My Python knowledge is very limited so I didn't go trough the code to find where the problem could be.

Thanks

@jamesoff jamesoff added the bug label Jan 26, 2022
@jamesoff
Copy link
Owner

Interesting; could you let me know what version you're using (and which Python version)?

Is it always the host above the failed one which flaps? Any feel for roughly how long "Host-B" would need to be down for the problem to manifest?

@piflav
Copy link
Author

piflav commented Jan 26, 2022 via email

@jamesoff
Copy link
Owner

Thanks for the info, I'll have a go at reproducing it. Hope the workaround of disabling/removing the long-term down host is ok for you for now.

@piflav
Copy link
Author

piflav commented Feb 3, 2022

Hi,
I've realized today another issue probably related to the same bug.
About 5 of 150+ hosts monitored report ping time 0.000ms which is impossible because hundreds of km away.
As example here the logs related to same location:

2022-02-03 15:01:38+01:00 FR-Saint-Denis-VRRP: ok (0.000s) (Ping time 15.584ms)
2022-02-03 15:01:38+01:00 FR-Saint-Denis-LAN1: ok (0.000s) (Ping time 0.000ms)
2022-02-03 15:01:38+01:00 FR-Saint-Denis-LAN2: ok (0.000s) (Ping time 15.616ms)
2022-02-03 15:01:38+01:00 FR-Saint-Denis-L3: ok (0.000s) (Ping time 15.626ms)
2022-02-03 15:01:38+01:00 FR-Saint-Denis-LB1: ok (0.000s) (Ping time  0.000ms)
2022-02-03 15:01:38+01:00 FR-Saint-Denis-LB2: ok (0.000s) (Ping time 15.621ms)

I suspect it's a bug of ping3 maybe due to the fact I'm asking to ping 150+ host every 1 min and the time between pings is too short.

How do you manage that?

@jamesoff
Copy link
Owner

jamesoff commented Feb 4, 2022

Agreed, that is odd. Not sure what's going on there, but if it's legit I want that network :)

Is it always those hosts?

Could you maybe try changing them to the host monitor? This is the original one for pinging hosts and works by actually running ping rather than being implemented in Python.

@piflav
Copy link
Author

piflav commented Feb 4, 2022

I've replace ping with host and it looks like all works as expected even if a host is down since 1 hour.
To report no details about round trip on HTML page and logs.
I've not set ping_regexp and time_regexp (default automatic)

FYI the output of the ping command on the server is

C:\>ping -n 1 -w 1000 172.20.51.1

Pinging 172.20.51.1 with 32 bytes of data:
Reply from 172.20.51.1: bytes=32 time=26ms TTL=56

Ping statistics for 172.20.51.1:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 26ms, Maximum = 26ms, Average = 26ms

@jamesoff
Copy link
Owner

jamesoff commented Feb 4, 2022

Glad that's fixed the weird behaviour for those hosts. It should include the ping time in the detail field; I'll take a look to see if I can see why it isn't.

@piflav
Copy link
Author

piflav commented Dec 28, 2022

No difference using either ping or host.
The temporary solution is disabling multithreading with -j 1

@jamesoff
Copy link
Owner

Thanks for the update. I'm also seeing this with a couple of my monitors recently (I have some kit unplugged so it's definitely not going to be up, despite what SimpleMonitor is occasionally reporting ;)

Interesting to know disabling multithreading helps, I'll have a look upstream at the library I'm using for it to see if there's any fix.

@jamesoff
Copy link
Owner

That didn't take long to track down; the library has an issue with multithreading: kyan001/ping3#26

I wonder if I can support both (multithreading and correct pings) by keeping all the ping monitors on one thread 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants