Displaying present location in the site.

Backup and Restore Method for an HA Cluster across Regions Using AWS AMI: Restore

EXPRESSCLUSTER Official Blog

December 1st, 2023

Machine translation is used partially for this article. See the Japanese version for the original article.

Introduction

For the mirror disk type HA cluster built with EXPRESSCLUSTER X on Amazon Web Services (hereinafter called "AWS"), assuming a situation where business continuity becomes difficult due to a failure of the region you usually use, we tried a method of restoring the backed up cluster to another region. This method can also be used as a means of disaster recovery (DR) in the event of a large-scale disaster that affects the entire region.

This article describes the backup procedure of the overall backup and restore procedures. For instructions of backup, refer to popuphere.

When restoring from AMIs for cross-region disaster recovery, the following recovery patterns are possible:

  • After restoration, change the network and EXPRESSCLUSTER resource settings to recover business operations as the replica of an HA cluster.
  • Restore only some servers as a standalone server instead of an HA cluster.
This time, we will restore the HA cluster with the first pattern that recover business operations as the replica of an HA cluster.

Last updated: January 19th, 2023
  • *Added steps to enable/disable EBS fast snapshot restore before and after EC2 restore.

Contents

1. Things to Consider When Restore

1.1 Concept of Licensing When Backing up and Restoring

We will explain the concept of licensing when backing up and restoring images containing EXPRESSCLUSTER X.

Since EXPRESSCLUSTER X does not allow a single license to exist in multiple locations at the same time by the licensing, restoring (cloning) another virtual machine from a licensed image usually violates the licensing. However, the only followings are allowed as exceptions.

  • Register extra licenses for the number of virtual machines to be replicated at the time of backing up
  • After restoring from the image, quickly replace the licenses for the old environment with the licenses for the new environment
This time, we will replace the licenses after restoring.

1.2 Copying Data of Mirror Disk After Restore

Since the AMIs used in this restore procedure is created from EC2 instances that was stopped at the same time before creating the AMIs, the mirror disk data on both servers is the same (for more detailed procedure, refer to popuphere). When restoring from AMIs created with such a complete quiescent point, there is no need to full copy of the mirror data after the restore.

On the other hand, if there is a difference in the mirror disks between servers, such as a different backup timing, it is necessary to add procedures to full copy the mirror data. But this pattern is not covered in this article.

2. HA Cluster Configuration to Restore

This time, we will use a 2 nodes mirror disk type HA cluster in a cloud environment (AWS) as an example. The HA cluster configuration is basically the same as the backup article except for the following resource setting changes, so refer to the backup article for the detailed configuration. The reason for changing the resource settings is to explicitly change the connection destination in the Oregon region so that clients do not unexpectedly connect to the N. Virginia region when the N. Virginia region recovers from a failure.

- EXPRESSCLUSTER Resource Settings (the differences from the backup article)

  • AWS Virtual IP resource (awsvip)
  • - IP Address: 172.32.0.1
  • AWS DNS resource (awsdns)
  • - Resource Record Set Name: oregon.expresscluster.local.

The AWS environment before restoring is as follows.
Assuming that the N. Virginia region is not available due to failure, the network resources for building an HA cluster in VPC have been created in the Oregon region, and the AMIs copied from the N. Virginia region are stored.

Next, the configuration of the AWS environment and HA cluster after restoring is as follows. The configuration is the 2 nodes Active-Standby cluster configuration that server01 and server02 share data by placing the application data on the mirror disk, which is the same as that in the N. Virginia region which was the backup target. The side for the VIP (virtual IP address) and virtual hostname (Route 53 DNS record), the server accessed by the client, is the active server, and the other side is the standby server.

3. Restore Procedure

3.1 Launching EC2 Instances from AMIs

Create EC2 instance from AMI. If necessary, enable EBS fast snapshot restore before creating the EC2 instance.

3.1.1 Enable EBS Fast Snapshot Restore(Optional)

For EBS volumes that has just been created from EBS snapshots, the storage blocks must be pulled down from Amazon S3 and written to the volume on first access. This preliminary action takes time and the I/O performance degrades when each block is accessed for the first time. This I/O performance degradation is no exception for EBS volumes created when launching EC2 instances from user-created AMIs. If no special measures are taken, this I/O performance degradation may cause service response performance to deteriorate or monitoring errors to occur due to processing delays when operations are resumed after restoration.

This time, by enabling EBS fast snapshot restore for the EBS snapshot backup before restoring, so that I/O performance as specified can be delivered immediately after EC2 startup. Note that it takes 60 minutes per TiB to complete the activation and that there is a limit of 50 snapshots per region that can be activated simultaneously.
If I/O performance is not an issue, this procedure can be omitted.

Select the EBS snapshot that configures the AMI to be restored in the AWS Management Console, and open [Actions] - [Manage fast snapshot restore]. From [Fast snapshot restore settings], check the availability zone where you plan to restore EC2, and select [Enable]. Follow the same procedure to enable EBS fast snapshot restore for all EBS snapshots that configure the AMI to be restored.

3.1.2 Launching EC2 Instance

In the AWS Management Console, select the AMI copied to the Oregon region, then click [Launch Instance from AMI] from the above menu to open the EC2 Instance launch wizard. Configure the IAM role, security groups, etc. as necessary in the wizard. At this time, it is necessary to select a key pair before launching an EC2 instance. In the case of a Linux environment, set the same key pair as the one used in the backup environment. Since the Windows environment uses the same password as the backup environment, setting a key pair is not mandatory.

3.1.3 Disable EBS Fast Snapshot Restore(Optional)

Additional charges will be incurred while EBS fast snapshot restore is enabled.
Therefore, if EBS fast snapshot restore was enabled, disable it.

Select the EBS snapshot for which EBS fast snapshot restore was enabled in "3.1.1 Enable EBS Fast Snapshot Restore(Optional)" in the AWS Management Console, and open [Actions] - [Manage fast snapshot restore]. From [Fast Snapshot Restore Settings], check the availability zones that have been enabled and select [Disable]. Follow the same procedure to disable all EBS fast snapshot restore that were enabled in "3.1.1 Enable EBS Fast Snapshot Restore (optional)".

3.2 Setting up AWS Environment

When you created new EC2 instances from AMIs, the "Source / destination check" of the network interfaces is enabled , so in the AWS Management Console, choose [Actions] - [Networking] - [Change source/destination check], check [Stop] of "Source / destination checking", and click [Save].

Based on the information such as the ENI IDs and private IP addresses of the EC2 instances created in "3.1 Launch EC2 Instances from AMIs", configure the route tables of the VPC to use the AWS Virtual IP resource and the Route 53 to use the AWS DNS resource. For details on how to set up the AWS environment, refer to the followings.

[Reference]
popupDocumentation - Setup Guides
  • Windows > Cloud > Amazon Web Services > EXPRESSCLUSTER X 5.0 for Windows HA Cluster Configuration Guide for Amazon Web Services
  • Linux > Cloud > Amazon Web Services > EXPRESSCLUSTER X 5.0 for Linux HA Cluster Configuration Guide for Amazon Web Services

3.3 Setting up the OS

Configure the settings on the OS of each EC2 instance. First, confirm that the hostnames are the same as the back up environment, and if not so, change it to the same hostname as the backup environment.

Next, confirm that the partitions used by the mirror disk are recognized in the same as the backup environment, and re-configure the AWS CLI on the OS.

3.3.1 OS Settings (Windows)

Confirm that the partitions used by the mirror disk are recognized in the same as in the backup environment.
From [Computer Management] - [Disk Management], check that the drive letters of the disks set on the mirror disk have not changed, and that the file system of the partition is RAW (access control by the mirror disk driver is effective). If the GUID changes due to reasons such as using MBR instead of GPT for the partition format, the access control by the mirror disk driver does not work and each drive is accessible, so reset the drive letter according to the mirror disk resource settings.

Configure the AWS CLI.
If the OS of the AMI you use is Windows Server 2016 or Windows Server 2019, EC2Launch (v1) is installed by default, so you need to reconfigure the route when restoring to another subnet, so run the following command.

> Import-Module c:\ProgramData\Amazon\EC2-Windows\Launch\Module\Ec2Launch.psm1; Add-Routes

If the OS of the AMI you use is Windows Server 2022 or later, EC2Launch v2 is installed by default, so you do not need to run the above command.

Next, run "aws configure" with administrator privileges to change the default region of the AWS CLI to the Oregon region (us-west-2). Settings other than the default region should be set according to your environment.

> aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [us-east-1]: us-west-2
Default output format [text]:

3.3.2 OS Settings (Linux)

This time, since the disks are configured by LVM, use the lvdisplay command etc. to confirm that the partitions are recognized correctly. In the case of a non-LVM environment, check using lsblk or parted command.

# lvdisplay
--- Logical volume ---
LV Path                       /dev/clp_md2/cp
LV Name                    cp
VG Name                    clp_md2
LV UUID                     XXXXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXXXX
LV Write Access         read/write
LV Creation host, time server01, 2022-11-08 06:04:41 +0000
LV Status                   available
(Omitted)

Next, run "aws configure" with root privileges to change the default region of the AWS CLI to the Oregon region (us-west-2). Settings other than the default region should be set according to your environment.

# aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [us-east-1]: us-west-2
Default output format [text]:

3.4 Replacing EXPRESSCLUSTER Licenses

Replace the EXPRESSCLUSTER licenses on the OS of each EC2 instance with the license for the restore environment (HA cluster in Oregon region).
In the following, after deleting the licenses registered on the command execution server in bulk, license files with the extension ".key" under the current directory are registered in bulk. Do the same on each server.

3.4.1 Replacing EXPRESSCLUSTER Licenses (Windows)

Execution example
> clplcnsc -d -a
Are you sure to remove the license? [y/n] y
<Serial No : eval-base-sv1> License delete succeeded.
<Serial No : eval-repl-sv1> License delete succeeded.

Processed license num (success: 2, error: 0).

> clplcnsc -i *.key
<Filename : eval_base_50_sv3.key>
  <Serial No : eval-base-sv3> License registration succeeded.

<Filename : eval_repl_50_sv3.key>
  <Serial No : eval-repl-sv3> License registration succeeded.
Command succeeded.
But the license was not applied to all the servers in the cluster
because there are one or more servers that are not started up.

Processed license num (success: 2, error: 0).

3.4.2 Replacing EXPRESSCLUSTER Licenses (Linux)

Execution example
# clplcnsc -d -a
Are you sure to remove the license? [y/n] y
<Serial No : eval-base-sv1> License delete succeeded.
<Serial No : eval-repl-sv1> License delete succeeded.

Processed license num (success: 2, error: 0).

# clplcnsc -i *.key
<Filename : /home/ec2-user/eval_base_50_sv3.key>
  <Serial No : eval-base-sv3> License registration succeeded.

<Filename : /home/ec2-user/eval_repl_50_sv3.key>
  <Serial No : eval-repl-sv3> License registration succeeded.
Command succeeded.
But the license was not applied to all the servers in the cluster
because there are one or more servers that are not started up.

Processed license num (success: 2, error: 0).

3.5 Setting up EXPRESSCLUSTER

On Cluster WebUI of the master server, change the EXPRESSCLUSTER settings according to the restore environment.

3.5.1 Changing EXPRESSCLUSTER Settings (Windows)

Connect to Cluster WebUI using the IP address of the EC2 instance of the restored master server and select [Config mode]. At this time, the message "Failed to obtain the cloud the environment information on one or more servers." may be displayed, but you can ignore it. From [Cluster Properties] - [Interconnect], change the IP addresses of each server according to the restore environment. Then click [Apply the Configuration File] in the Config mode. At this time, do not change any settings other than the IP addresses at the same time. A message that requires manually OS restart is displayed when the settings are reflected, but ignore it and perform the following steps.

After the IP addresses settings are applied, continue with the following setting changes.
Open the properties of the AWS virtual IP resource, and from [Details] tab, change the [IP address], [VPC ID], and [ENI ID] in common/individual settings according to the restore environment.
Open the properties of the AWS DNS resource, and from [Details] tab, change the [Hosted Zone ID], [Resource Record Set Name], and [IP address] in common/individual settings according to the restore environment.
Open the properties of the mirror disk resources, and from [Tuning] on [Details] tab, uncheck [Execute the initial mirror construction].
Select each server from [Servers that can run the group] on [Details] tab of the mirror disk resource, click [Edit], and then click [Obtain Information] - [Connect] to confirm that the data partition and cluster partition settings are correct. If you reset the drive letters in "3.3 Various settings on the OS", reselect the partitions according to the changed information.

For X 4.1/X 4.2
In addition to the above settings, open [Group properties] of failover and select [Manual Startup] from [Startup Attribute] on [Attribute] tab.

After changing these settings, click [Apply the Configuration File] in the Config mode.

3.5.2 Changing EXPRESSCLUSTER Settings (Linux)

Connect to Cluster WebUI using the IP address of the EC2 instance of the restored master server and select [Config mode]. At this time, the message "Failed to obtain the cloud the environment information on one or more servers." may be displayed, but you can ignore it. From [Cluster Properties] - [Interconnect], change the IP addresses of each server according to the restore environment. Then click [Apply the Configuration File] in the Config mode. At this time, do not change any settings other than the IP addresses at the same time.

After the IP addresses settings are applied, continue with the following setting changes.
Open the properties of the AWS virtual IP resource, and from [Details] tab, change the [IP address], [VPC ID], and [ENI ID] in common/individual settings according to the restore environment.
Open the properties of the AWS DNS resource, and from [Details] tab, change the [Hosted Zone ID], [Resource Record Set Name], and [IP address] in common/individual settings according to the restore environment.
Open the properties of the mirror disk resources, and from [Tuning] on [Details] tab, uncheck [Execute the initial mirror construction].

After changing these settings, click [Apply the Configuration File] in the Config mode.

3.6 Post-processing for Restore of EXPRESSCLUSTER

Perform post-processing after restoring EXPRESSCLUSTER on the OS of each EC2 instance to make it possible to start as an HA cluster. This procedure assumes that a full copy of the mirror data is not required after the restore.
The steps vary depending on versions of EXPRESSCLUSTER X.

3.6.1 Post-processing for Restore of EXPRESSCLUSTER (Windows)

For X 4.3 or later
Run clprestore command with the "--skip-copy" option. The server restarts automatically when the command completes.

> clprestore --post --skip-copy
Command succeeded.
clprestore.bat : Changing the setting of cluster services to Auto Startup.
clprestore.bat : Rebooting...
Reboot server01 : Command succeeded.

For X 4.2
Run clpsvcctrl command to change the predetermined EXPRESSCLUSTER services to autostart. Then restart the server executing clpdown command.

> clpsvcctrl --enable core
> clpdown -r

On the OS of the master server after restarting, perform the forced mirror recovery of mirror disk resources (md1, md2). After that, start the failover group. (This operation can also be performed from Cluster WebUI.)

> clpmdctrl --force server01 md1
Command succeeded.
> clpmdctrl --force server01 md2
Command succeeded.
> clpgrp -s failover
Command succeeded.

For X 4.1
Use the command provided by the OS to change the predetermined EXPRESSCLUSTER services to autostart. Then restart the server.

> sc.exe config clppm start= auto
[SC] ChangeServiceConfig SUCCESS
> shutdown -r -t 0

On the OS of the master server after restarting, perform the forced mirror recovery of mirror disk resources (md1, md2). After that, start the failover group. (This operation can also be performed from Cluster WebUI.)

> clpmdctrl --force server01 md1
Command succeeded.
> clpmdctrl --force server01 md2
Command succeeded.
> clpgrp -s failover
Command succeeded.

3.6.2 Post-processing for Restore of EXPRESSCLUSTER (Linux)

For X 4.3 or later
Run clprestore command with the "--skip-copy" option. The server restarts automatically when the command completes.

# clprestore.sh --post --skip-copy
Mirror info will be set as default.
The main handle on initializing mirror disk <md1> success.
The main handle on initializing mirror disk <md2> success.
Initializing mirror disk complete.
clprestore.sh : Changing the setting of cluster services to Auto Startup.
clprestore.sh : Rebooting...

For X 4.2
Run clpsvcctrl command to change the predetermined EXPRESSCLUSTER services to autostart. Then restart the server executing clpdown command.

# clpsvcctrl.sh --enable core
# clpdown -r

For X 4.1
Use the command provided by the OS to change the predetermined EXPRESSCLUSTER services to autostart. Then restart the server.

# systemctl enable clusterpro
Created symlink /etc/systemd/system/multi-user.target.wants/clusterpro.service → /usr/lib/systemd/system/clusterpro.service.
# systemctl enable clusterpro_md
Created symlink /etc/systemd/system/multi-user.target.wants/clusterpro_md.service → /usr/lib/systemd/system/clusterpro_md.service.
# reboot

3.7 Changing EXPRESSCLUSTER Settings After Restore Completed

Revert the settings changed at the time of restore on the Cluster WebUI of the master server as necessary. This time, the settings will be reverted only EXPRESSCLUSTER X 4.1/4.2 for Windows environment.

Connect to Cluster WebUI using the IP address of the EC2 instance of the restored master server and select [Config mode]. open [Group properties] of failover and select [Automatic Startup] from [Startup Attribute] on [Attribute] tab.

After changing this setting, click [Apply the Configuration File] in the Config mode.

  • *The [Execute the initial mirror construction] setting changed in "3.5 Setting up EXPRESSCLUSTER" does not affect the operation after the restore is completed, so there is no need to revert the setting.

4. Checking the Operation

Check that the client machine in the Oregon region can access the HA cluster using the VIP address (172.32.0.1) and DNS name (oregon.expresscluster.local), and that the restored data can be successfully accessed on each server.

  • 1. Start the failover group on server01.
  • 2. From the client machine, access the VIP (172.32.0.1) and DNS name (oregon.expresscluster.local) and check that server01 is accessible.
  • 3. Check that the data at the time of backup can be accessed successfully on server01.
  • 4. Use Cluster WebUI or clpgrp command to manually move the failover group to server02.
  • 5. From the client machine, access the VIP (172.32.0.1) and DNS name (oregon.expresscluster.local) and check that server02 is accessible.
  • 6. Check that the data at the time of backup can be successfully accessed on server02.
We were able to confirm that business operations were recovered in the HA cluster restored in the Oregon region.

Conclusion

In this article, we explained the procedure for restoring an HA cluster using AMIs. If you copy the created AMIs to another region in advance, when the entire region you usually use does not work, you can continue business by restoring the cluster to another region.
For backup procedures, refer to popuphere.

If you consider introducing the configuration described in this article, you can perform a validation with the popuptrial module of EXPRESSCLUSTER. Please do not hesitate to contact us if you have any questions.