Documentations

Ensure and maintain the availability of its ServiceNav Box fleet

On the page

Do you need help?

Purpose of this documentation

The central platform is the heart of ServiceNav, but the monitoring boxes are the eyes of the system.
The ServiceNav Boxes (SNB or supervision boxes) allow to :

  • Collect monitoring information on the customer's LAN or from an external source.
  • Transmit the collected data to the central platform via the VPN tunnel.
  • Send email alerts (independently of VPN tunnel access).
  • Receive instructions given by the user via the web interface (immediate control, acknowledgement, configuration application).

It is essential to ensure that the supervision boxes are not subject to any unavailability.

This documentation will explain how to avoid SNB unavailability and how to solve certain problems if they occur.

Supervise your ServiceNav Boxes

You won't learn anything if we tell you that the best way to prevent equipment failure is to supervise it?
That's why ServiceNav is the place to be!

When setting up a ServiceNav Box, the first two reflexes should be :

  • Self-supervision of the box via the equipment model ServiceNav Box - self-supervision.
    In terms of resources, it is very important to monitor the use :

    • CPU load: about 1vCPU per 1000 control points (to be adapted according to the control points used). A lack of CPU will cause instability of the box and delays in the execution of the control points.
    • RAM: a lack of RAM can prevent checkpoints from being executed and can lead to service outages nagios or openvpn resulting in a shutdown of the supervision.
    • Disk space: a lack of disk space will cause the file system to be put in read-only mode and the supervision to be unstable or even stopped.
  • Cross supervision via another box with the equipment model ServiceNav Box - Supervision by supervisor.
    The control point Box-Live-Status allows to make sure that the supervised box has sent supervision data since the last X minutes.
    If this checkpoint passes CRITICAL, it means that the supervised SNB is no longer sending data to the central platform and therefore the statuses on the web interface are no longer current. It is imperative to take action to restore communication.

Important to note Supervision and maintenance of the boxes are the responsibility of the customers.

For more details, a webinar entirely dedicated to the boxes and their supervision is available here : How to monitor your ServiceNav Boxes.
The supervision of the boxes is described at the end of the following documentation: Installation of an SNB

Solving problems with a ServiceNav Box

Even though the risks are greatly reduced thanks to the monitoring, it is possible that a ServiceNav Box may be unavailable.
The following section will present some common scenarios and how to solve the problem.

Scenarios

  • Connection to the VPN tunnel impossible, high latency of the box in the VPN tunnel, untimely connection losses.
    -> Follow the solution: Check network access.
  • All control points are in Indeterminate status.
    -> Follow the solution: Check network access.
    -> If the problem is still not solved: follow Restart remoteOperationBox and nagios.
  • The checks performed by a ServiceNav Box have a very old time stamp.
    -> Follow the solution: Restart remoteOperationBox and nagios.
  • Unable to reload the configuration on a Servicenav Box.
    -> Follow the solution: Restart remoteOperationBox and nagios.
  • Acknowledgements are not taken into account.
    -> Follow the solution: Restart remoteOperationBox and nagios.
  • Immediate checks launched from the web interface are not taken into account.
    -> Follow the solution: Restart remoteOperationBox and nagios.

Solutions

Check network access

  1. Check SNB performance (CPU load, RAM, disk space) and add more if necessary.
  2. Check that the box is on time with the order date.
  3. Ensure that no changes/deletions to firewall rules have been made recently.
  4. Check that the box has access to the ServiceNav VPN port on the output to the central platform.
    For the platform https://servicenav.io -> telnet vpn.servicenav.io $(awk -F '[ ]' 'NR==42 {print int($3)}' /etc/openvpn/client.conf)
    For the platform https://azure.servicenav.io -> telnet vpn-azure.servicenav.io $(awk -F '[ ]' 'NR==42 {print int($3)}' /etc/openvpn/client.conf)
    For an OnPremise platform -> telnet
    Functional access :

    If no access, do what is necessary at the firewall level.
  5. Make sure that the LAN IP address of the box is not also assigned to another machine on the same network.

Restart remoteOperationBox and nagios

The process remoteOperationBox  ensures the sending and receiving of messages between the box and the central platform.
If it does not work anymore :

  • The supervision data collected by the box will no longer be sent to the central platform.
  • All actions performed on the web interface towards the box will no longer be transmitted to it.

The process nagios ensures the scheduling of inspection points. He communicates with remoteOperationBox to take into account immediate control executions or acknowledgements made by the web interface.

Perform the following operations:

  • Connect to the ServiceNav Box with an SSH client.
  • Stop the process remoteOperationBox :
    • Execute : service remoteOperationBox stop
    • Check that no more processes are running: ps aux | grep remoteOperationBox
    • If this is the case, manually kill the process instances: kill or kill -9 in case of resistance
  • Stop the process nagios :
    • Execute : nagios stop service
    • Check that no more processes are running: ps aux | grep nagios (the nagios may take a little time, repeat the order several times ps).
    • If there are still processes nagios : kill them manually : kill or kill -9 in case of resistance.
  • At this point, remoteOperationBox and nagios must not be running and no process must be present at the output of the ps.
  • Relaunch the service nagios : nagios start service
  • Relaunch the service remoteOperationBox : service remoteOperationBox start and verify the presence of 6 instances of the service.
  • Check on the web interface that the application is working again.

If the problems persist after restarting both services, please contact ServiceNav support.

Ensuring the recovery of a ServiceNav Box

Three use cases:

  • The ServiceNav Box is completely unusable despite a reboot. Impossible to connect to it via SSH or via a local console.
    -> Follow this documentation : Ensure the PRA of a ServiceNav BoxSee the chapter "Complete replacement of a ServiceNav Box".
  • Migrate a defective, but still accessible, ServiceNav Box to a new one.
    -> Follow this documentation : ServiceNav Box Migration
  • Rollback the ServiceNav Box with a backup.
    -> Follow this documentation : Ensure the PRA of a ServiceNav BoxSee the chapter "Rollback from a ServiceNav Box backup".

You may also be interested in

servicenav hyperV1

Commissioning of a ServiceNav Box

Replace a ServiceNav Box Ubuntu 16.04 with a ServiceNav Box 4.17 Ubuntu 20.04

Setting up a proxy on a ServiceNav Box

en_GB

Welcome to ServiceNav!

Do you need some help? More information about our products? Write to us!
You have taken note of our privacy policy.
We use cookies to ensure the best experience on our site. If you continue to use this site, we will assume that you are satisfied with it.

Reserve your place

You have taken note of our privacy policy.