It's a Trap!
Actually, no, it's not a trap. ServiceNav implements well the Trap event management in its version 3.19!
This highly demanded feature allows you to collect events generated by your equipment, give a status based on their content and then integrate these results into a weather service, dashboard and report.
How does it work?
A challenge for ServiceNav
Traps management has been under consideration at Coservit for a few months now. We knew that most of the equipment could generate alerts, some of them only work with this type of supervision. After a slight configuration of the ServiceNav Box and the upgrade to version 3.19, the ServiceNav product has the traps processing capabilities.
Before we begin, what is a Trap?
To make it simple the SNMP protocol works in 2 ways:
- Active: it is up to the supervision box to send a "GET" request to collect information. The server will then respond. This is the standard operation of the supervision.
- Passive: here the equipment doesn't wait to be interrogated. As soon as an alert occurs, it sends an SNMP packet (a trap), containing information about it.
A problem with the way it works
1 - In most supervision tools, the implementation of the passive mode is binary:
- No alert: status is OK.
- Arrival of an alert: the status becomes critical.
Regardless of the criticality or content of the event, it causes a critical status.
We need to be more precise in the management of alerts.
2 - Once a status has gone critical, it does not automatically return to OK.
We want to avoid a red checkpoint at all times, as this would be counterproductive.
It is also important that users do not have to perform any manual actions to reset the control status.
A solution for every problem
It is essential that the information collected is relevant and useful, both for technical exploitation and for management.
To avoid having a critical alert at each event, the plugin gives the possibility to filter the content of the trap thanks to customizable patterns.
If a particular string is found in the trap, the status is adjusted accordingly.
When a trap is received, its content is compared to the patterns entered in parameter. Everyone can define what he considers as a critical alert or not.
Filtering can be done on an OID, words or a phrase.
The problem of the relevance of alerts has now been resolved. How to make the checkpoint turn green again once the alert has been processed? Moreover, how do you know when an alert is over?
Some equipment sends traps when there is a problem and then a "reverse" trap indicating that the equipment has returned to its nominal state.
Thanks to the OK pattern, it will be possible to define an OID or a character string present in the trap that will indicate an OK status. Thus, no manual action.
For other equipment that only notifies in case of problems, we have implemented a timeout system.
When a trap is received, the checkpoint goes into Warning or Critical status. If no other trap is received before the timeout defined in the parameter, the control point will automatically return to the OK status.
Trap checkpoints are therefore perfectly autonomous.
It can be considered that an alert should only remain in supervision for a maximum of 30 minutes and that its follow-up will be done through the creation of a ticket.
After this time, the checkpoint will switch to OK to allow for the possible receipt of another alert.
If an email notification is associated with the checkpoint, even if the checkpoint is OK, the user is notified that a particular event has occurred, and can intervene to the best of his availability.
Intelligent management and customizable text output
A trap is composed of several variables (the time of reception, the OID, the text for example).
The text output displayed in the ServiceNav interface can be selected by users.
If the OID of the trap is in variable 2 and the text is in variable 3, the text output can be :
An example to illustrate this
With this configuration:
If the trap received is:
The checkpoint will be:
If a few minutes later, the equipment sends a trap indicating that the server is healthy again :
The control point will be set to OK using the pattern :
As you will have understood, such a feature opens the supervision to new equipment and is part of our desire to constantly improve ServiceNav.
A webinar will be held on December 5 and 11 for a more detailed presentation of this feature. Sign up !