False positives in supervision: What if we set it up correctly?

gear 1674891 340 340x250 1
bX6tIM39 400x400
Team Leader

Need some help?

We saw it in our previous articleIn addition, the non-management of false positives can have a significant impact on the operating costs of a computer system.

Cause #1 of false positives a bad configuration that undergoes the usual jolts of an SI.

A few examples :

  • A few packets of a ping that get "lost" and make a piece of equipment go wrong.
  • A slightly too long RTA on an equipment ping due to a network overload, which makes an equipment go wrong.
  • A network share that doesn't respond for a few minutes and makes a service go unrecognized.
  • Disk space filled for a few minutes with a backup before a tape copy. The disk space verification service goes to Alert or Critical.

The parameters to be adjusted to limit false positives

Thresholds and thresholds

First step in limiting false positives : setting the thresholds.

We do not set the threshold of a hard disk in the same way if a disk is 200 GB or 2 TB.

If a device is in DMZ or accessible via VPN, the alert thresholds on the RTA must be increased. 10 ms, the default value in ServiceNav, may not be appropriate.

Complementary controls

Second step in reducing false positives: additional controls.

What could be more frustrating than receiving a notification from a DOWN server, rushing to the server and in the time it takes to connect to the server receive a notification from the UP server? All this because 2 ping packets got "lost" on the network.

The interest of additional checks is therefore to ask ServiceNav to check X times at Y minute intervals whether the Alert/Critical/Unknown state is still current before putting the item (equipment or service) in a non-OK state, and thus to launch the complete processing chain of an alert.

For example:

Here we monitor the RTA of a ping and alert if it exceeds the critical threshold (in red).

As can be seen, the ATR regularly exceeds the threshold for a few moments, but then quickly returns to normal. It is probably necessary to work on the connection between the ServiceNavBox and remote equipment, but it is not necessary to open a ticket every time you pass through Critical.

We have therefore decided to put 3 complementary checks at 1 minute intervals for this equipment. That is to say that to start alerting and displaying on the operating dashboards, it is necessary that the RTA of the equipment be above the threshold for 3 minutes in a row.

Result: 1 single critical passage (the "hole" around 4pm) instead of several dozen.

In conclusion

In ServiceNav, there are therefore simple and effective ways to reduce false positives by adjusting thresholds and implementing additional controls. When you know the cost of processing a false positiveIf you're not sure what to do, you can spend a few minutes adjusting your configuration.

And as a result, does ServiceNav help me identify the items that need to be treated as a priority? Of course it does! And we'll see that in our next article to be published soon on our blog.

This may also be of interest to you


Welcome to ServiceNav!

Need help? More information about our products? Write to us!
You have taken note of our privacy policy.


While the epidemic lasts, ensure the availability and performance of your IT services for teleworking, with ServiceNav!

Following the government's call to mobilize to help businesses overcome the current health and economic context, we help you, free of charge, to ensure the complete monitoring of your teleworking environments: VPN, VDI, Teams, Skype Enterprise, Citrix... Objectives: collection, availability and usage indicators, dashboards to support your communication.
We use cookies to ensure that you have the best possible experience on our site, and if you continue to use this site, we will assume that you are satisfied with it.

Reserve your place

You have taken note of our privacy policy.