Disaster Recovery
To protect against a disaster that affects the entire Primary Monitoring Centre, it's recommended that an offsite disaster recovery (DR) location be established. The DR site should be able to be brought online simply and quickly and provide all essential monitoring services. There is also a requirement for little or no data loss to occur.
While it is technically possible to setup a fully redundant system with automatic failover to the DR Site, the general consensus is that this is not recommended. The process of failing over to the DR site should involve some manual intervention. Using the guidelines below this process can be made very simple and fast.
There are three main components to consider when planning for your DR location
Data Replication and SQL Services
This section covers how to ensure a copy (exact copy or at least a very recent copy) of the database is available for use at the DR site when needed. There are three options available,
Asynchronous Availability Groups
Requires: SQL Server Enterprise (When used with HA at primary), or Standard (with no HA)
Where Availability Groups are used, an additional node can be created at the DR site to perform data transfer using asynchronous replication. Availability Groups ensure a live copy of the data will be available at the offsite location when required, with little or no data loss.
Along with the actual data being transported to the DR site, the SQL Services must also be started at the DR site. When the DR site is required, the SQL node at the DR site will need to be manually failed over. This will start the SQL services, and bring the database online. The manual failover can be included in the Promote DR Utility (see below).
DR Data Replication
While it is possible to configure the cluster so the SQL node at the DR site will failover automatically, there is additional hardware required (witness) and also the risk that the DR node fails over unexpectedly. For these reasons this solution is not normally implemented.
SQL Log Shipping
If an async Availability Group is not possible, the data can be transferred to the offsite location using SQL log shipping. Log shipping transfers database changes in regular intervals, so there will be some data lost (between one and two intervals worth of data).
Log Shipping is compatible with both SQL Server's Basic Availability Groups and Always On Failover Clustering, as well as non-clustered primary sites.
With Log Shipping the SQL database at the DR site is normally in a state where is can't be accessed and needs to be brought online when failing over to the DR site. This step can be included in the Promote DR Utility (see below).
SQL Database Mirroring
Database mirroring could also be used to create a replicated database at the DR site, with the following restrictions,
- Microsoft has marked Database Mirroring for deprecation, so this should be considered before investing in this solution.
- Also only one duplicate can be made using SQL Mirroring. If Mirroring is already setup to make a backup copy at the primary location, it can't be used to make a duplicate at the DR site.
When configuring SQL mirroring it must be setup in asynchronous mode to work reliably to the remote DR site. With Mirroring, the SQL database at the DR site is normally in a state where is can't be accessed and needs to be brought online when failing over to the DR site. This step can be included in the Promote DR Utility (see below).
Remote Intrastructure
When failover to the DR location is required, a server running the Patriot Data and Task service will be required at the offsite location.
Separate Remote Backup Server
A separate server (Patriot DR Server) must be located at the offsite location with the Patriot Data Service and Patriot Task Service installed.
The Patriot services on the Patriot DR Server would be set to manual start-up mode, and left in a non-running state. When failing over to the backup location, the Patriot services needs to be started (see Promote DR Utility below), along with the backup alarm receivers.
Where Availability Groups are used with asynchronous replication, the Backup Data Service should be configured to connect to the SQL availability group virtual network name (this same name spans across both sites). In all other cases, it should be configured to connect to the SQL Server at the DR site.
Backup Receivers
The redundant alarm receivers located at the offsite location must be configured to communicate with the Patriot DR server. These alarm receivers should be left disabled / offline during normal operation to ensure no signals are inadvertently received through them. Most alarm panels allow for a secondary receiver number however this should never be directed to the DR site because if the primary network or PSTN lines were temporarily down these would want to still be received at the Primary site.
Backup Receiver Tasks
An appropriate receiver task will also need to be preconfigured to run on the backup task service. This can be setup from any workstation and the details will be replicated to the offsite location using whichever means has been setup. Ensure the correct computer name (name of server at offsite location) is entered, and the Backup setting is also checked in each task. Once these backup tasks are in place, it becomes a very simple process of just starting the Patriot services when failover is required.
Backup Tasks
Remote Workstations
An adequate number of workstations will also need to be made available at the DR site. Alternatively, remote workstations could be configured on the DR Servers, so operators can connect remotely to the DR site. All these workstations should be pre-configured to connect to the Patriot Data Service running at the DR site.
Signalling Redirection
Signalling redirection involves redirecting the alarm signals from reporting to the primary location to reporting to the DR location. It is assumed that for each alarm receiver used at the primary location, a duplicate receiver will be located at the DR location.
There are various solutions to signalling redirection, some options include,
IP Reporting with Three Contact Points (Preferred Solution)
If the alarm system is reporting using a modern IP based reporting system that is capable of at least three contact points, two points can be located at the primary monitoring station, and an additional point can be setup at the DR location. This gives dual path reporting to the primary location, and then an alternate third path when failing over to the DR location. The level of automation and failover process does depend on the service provider in question. For instance with Permacon the third connection would need to be manually started by calling the 24 hour support service or manually failing over using a service provider application. Inner Range Multipath on the other hand will begin communicating with the DR site as soon as the Patriot Task Service is activated at the DR site and begins acknowledging heartbeats.
A backup task should be pre-configured to run on the Patriot DR Server for each different IP receiver located at the DR Site, and also for each type of alarm panel reporting directly to Patriot. See Backup Receiver Tasks above.
IP Reporting - Three Contact Points
IP Reporting with only Two Contact Points
Where the IP device only has two points of contact (primary and secondary) you can set the secondary to point to the DR location. As the secondary path is likely to be used in cases where the Primary Monitoring Site has gone offline, Patriot can install a Task Service at the DR location. This will forward the signals onto the Primary location when connected, and fall-back to the DR location when DR fail-over has been initiated. Using this system we can effectively replicate the three contact point setup above.
A backup task should be pre-configured to run on the Patriot DR Server for each different IP receiver located at the DR Site, and also for each type of alarm panel reporting directly to Patriot. See Backup Receiver Tasks above.
IP Reporting - Two Contact Points
IP Reporting with a Single Contact Point
Where the IP device can only report to one contact point there is no great solution, and these devices should generally be avoided.
If the device supports DNS, these DNS records can be updated to redirect signals as required. DNS can be slow to update depending on the settings used and the DNS Provider.
Your network service provider(s) may also have options to redirect IP signalling to the DR site.
IP Panel Reporting Directly to a Patriot Task
The solution used depends on the panels ability to support multiple server addresses. If multiple can be configured, the secondary server point should be configured to connect to the Patriot Task Service run at the DR site, which will allow redirection to the appropriate Data Service.
Remote Call Divert
Where alarm receivers are connected to phone lines, the phone lines can be diverted to the DR location using remote call divert provider (available in some countries) or through your telecommunication provider. The method of call divert should be pre-arranged with your telecommunication provider. The redundant alarm receiver must be configured in exactly the same way (identical handshakes, line card and receiver numbers duplicated) as the receiver at the Primary site.
A backup task should be pre-configured to run on the Patriot DR Server for each redundant receiver located at the DR Site. See Backup Receiver Tasks above.
Promote DR Utility
There are several steps required to bring the DR site online. To make this as easy as possible, these steps can all be scripted, and setup in the Patriot Promote DR Utility located at the DR site. This then makes the process as easy as a single button click. It can be configured to perform the following actions,
- Bring SQL Online. Performs one of these actions following depending on what's required. SQL Cluster manual failover, Log Shipping bring db online, DB Mirroring bring db online.
- Start Patriot Services.
Once done the only other actions required are,
- Switch on any physical or software alarm receivers at the DR site.
- Perform remote call divert on any phone line based alarm receivers.