How-To Set up and Administrate SONARPLEX High Availability
Use Case
Setting up a high availability (HA) monitoring envirnoment between two SONARPLEX devices.
What HA does and what not
High availability is a function that syncronizes two SONARPLEX devices by copying the configuration, all RRD graphs and all log files from a master to a slave SONARPLEX device. In case of a failure (e.g. hardware malfunction, power outage etc.) of the master, the slave will load the synchronized data and takes over the role of the former master.
This does not include an automatic IP-address failover! All IPs remain unchanged, so make sure the slave has access to all of the configured hosts that are being monitored!
If the real master is back online, the slave will recognize it and falls back to the slave state automatically. In this moment, all RRD graphs and log files that were collected in the meantime will be re-synced with the real master again.
Only RRD graphs and logs are getting re-synced back. Configuration changes are not synced back! Any config changes done while the slave acted as master will be lost. Better consider creating a backup of the configuration to restore it on the real master after switching back.
Example Setup
The instructions are based on the example setup as described below.
SONARPLEX HA Master
This is the SONARPLEX HA master device which contains the productive host- and servicechecks.
IP-Address | 172.16.0.100 |
---|---|
azeti Agent Port | 4192 |
azeti Agent Password | SamplePW |
SONARPLEX HA Slave
This is the SONARPLEX HA slave device which is completely empty and contains no productive information.
IP-Address | 172.16.0.200 |
---|
The slave runs just one single HA service check to maintain HA capabilities. This service should run in short intervals (3-5 min.) to reduce the outage times and gaps in logging and graphs.
The general procedure is as follows:
- Configure the SONARPLEX HA master as the productive device. No further configuration is necessary.
- Configure the SONARPLEX HA slave with just the host of the HA master. Add a servicecheck to that host with the template "check_azeti_ha".
- Let the SONARPLEX HA slave run this servicecheck to start syncronisation.
Step-by-step guide
SONARPLEX HA Master
- Open the Administration Web Interface > Configuration > Network > Agent Configuration
- Set Agent Password to "SamplePW" (without quotes)
- Click to save the configuration.
SONARPLEX HA Slave
- Open the Administration Web Interface> Configuration > Setup > Hosts
Create new host with the IP of the SONARPLEX HA master, in this case 172.16.0.100.
- Create new service with the plugin "check_azeti_ha" and add the just created host to the list of hosts to check.
- Change the normal check interval to a value between 3-5 minutes to reduce gaps due to failover transitions.
- Set Password (Optional) to the given SONARPLEX HA master Agent password "SamplePW" (without quotes).
- Click to save the configuration.
Verify the setup
To verify the functionality of the high availability setup on your SONARPLEX devices, follow these steps:
On SONARPLEX HA Slave
- Open User Webinterface > Monitoring > Services
- See the just created HA service check to report "OK" with a note on syncronized files, for example:
Failover behavior: automatic, HA mode 0: Monitor process is up, last complete syncronization: 2014-08-20 08:13:28 (302 files)
On SONARPLEX HA master
- If the syncronization is complete, shut down the SONARPLEX HA master device to test HA functionality.
Final result
Let the SONARPLEX HA slaves service check "check_azeti_ha" run through the following HA modes:
Check result | HA mode |
---|---|
OK | HA mode 0: Monitor process is up |
WARNING | HA mode 1: Machine seems to be down |
1st hard CRITICAL | HA mode 2: Machine seems to be down |
2nd hard CRITICAL | Machine reboots with HA master configuration |
The SONARPLEX HA slave is about to reboot with all configurations, log files and graphs from the last complete syncronization.
How to trigger a failover manually
There are three ways to trigger a manual failover in case of a functionality test or a scheduled maintenance. These all result in the slave SONARPLEX becoming the master SONARPLEX.
Shutdown the SONARPLEX HA master
The first method is to shutdown the SONARPLEX HA master completely. To do this, follow these steps:
- Open the Administration Web Interface> Status > Summary
- Click on "Click here to shut your appliance down"
- Wait for the SONARPLEX HA slave to take over
Stop the SONARPLEX HA master monitoring process
The next way to trigger a manual failover is to stop the monitoring process on the SONARPLEX HA master:
- Open the Administration Web Interface> Status > Monitor
- Click on "Stop the monitor process"
- Wait for the SONARPLEX HA slave to take over
Disconnect from the network (not recommended)
The last way is to just disconnect the SONARPLEX HA master from the network so the SONARPLEX HA slave won't recognize it anymore. This is not recommended as the monitoring procss on the former SONARPLEX HA master is still running and thus, you can end up with inconsistent graphs and logs, once the connection has been re-established.
- Disconnect the ethernet connection from the SONARPLEX HA master
- Wait for the SONARPLEX HA slave to take over
How to switch back after failover
If a failover has taken place and all problems have been solved, you may wish to switch back to the original state of the setup (former slave being slave and former master being master again). This can be done simply by starting both SONARPLEX appliances and having them connected to your network in a way that they can communicate with each other. The former SONARPLEX HA slave will notice the SONARPLEX HA master to be back online and automatically synchronizes the collected graphs and log files. After completion, it will recover its former configuration with the only check being the HA master check.
Only log files and graphs are synchronized back to the SONARPLEX HA master! Changes in configuration done on the SONARPLEX HA slave while running as HA master are not getting transferred back and will be lost! If you made changes to the config, create a backup and save it prior to the switchback.
How to proceed in case of hardware failures/RMA
If one of your HA members suffers a hardware failure and needs to be replaced, contact support@azeti.net for assistance in creating an RMA. The following article describes the procedure after receiving your RMA replacement device.
- Boot the new SONARPLEX device and connect it to the network
- Setup an IP-address which is different from the original SONARPLEX HA master
- Open the Administration Web Interface > System > Update / Patch
- Install all updates and patches to match the version of your running original SONARPLEX HA slave
- Open the Administration Web Interface > System > Backup
- Create a full backup on the original SONARPLEX HA slave (which is the temporary master at this moment) and save it locally
- Open the Administration Web Interface > System > Restore
- Restore this backup on your new SONARPLEX device containing all parts except "System" (this would otherwise overwrite the IP-address)
- Open the Administration Web Interface > Configuration > Network > Ethernet configuration
- Change the IP-address of your new SONARPLEX device to match the one from the old SONARPLEX HA master
- After this change has commited, the SONARPLEX HA slave will switch back to slave mode again
If this takes too long, you also have the opportunity to exchange the roles of both HA devices. This way, the running device will become the full SONARPLEX HA master and the replaced unconfigured device will become the new SONARPLEX HA slave.
- Open the Administration Web Interface > Configuration > Network > Ethernet configuration
- Change the IP-address of your running HA device to the IP of the former SONARPLEX HA master
- Setup the new SONARPLEX device as you did the first time by setting up the SONARPLEX HA slave IP-address
- Configure the new SONARPLEX HA slave as mentioned in the Step-by-step guide
- style