Global Site
Displaying present location in the site.
April 26th, 2024
Machine translation is used partially for this article. See the Japanese version for the original article.
Introduction
We tried building an HA cluster using forced stop resource on multi-region environment of Amazon Web Services (hereinafter called "AWS").
The forced stop resource forcibly stops the active server from the outside using CLI, etc. when a heartbeat timeout is detected between cluster servers. By using it in conjunction with network partition resolution (hereinafter called "NP resolution") resources, it is possible to prevent both-system activation more reliably.
Since EXPRESSCLUSTER X 5.0, forced stop resource can be easily used on AWS environment by simply specifying parameters. However, HA cluster on multi-region environment, a little ingenuity is required when using forced stop resource. This article introduces the procedure for setting up forced stop resource on multi-region environment.
This article is a supplement to the previously published Introducing the Function of EXPRESSCLUSTER: Forced Stop Resource article, so we recommend that you read the previous articles first.
Contents
1. Forced Stop Resource on a Multi-Region Environment
There are several types of forced stop resource available depending on the environment in which we will build an HA cluster.
In an AWS environment, a forced stop resource with a "Type" of "AWS" is usually used. The forced stop resource executes the AWS CLI command to stop the active instance from the standby instance when the standby instance detects a heartbeat timeout with the active instance, and the instance is forcibly stopped.
However, the forced stop resource as of EXPRESSCLUSTER X 5.2 assumes that the instances to be stopped are in the same region. Therefore, it is not possible to stop instances that exist in different regions.
Therefore, use a forced stop resource with "Type" of "Custom" instead. A custom forced stop resource can be executed by registering a script that describes any process. Therefore, by specifying in the script the region in which the instance to be stopped exists, it is possible to stop the target instance.
2. Configuration Procedure for Forced Stop Resource
Introducing the configuration procedure of custom forced stop resource for multi-region environment.
For more information of the forced stop resource, refer to the Reference Guide.
Documentation - Manuals
- EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Reference Guide
- -> 7. Forced stop resource details
- EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Linux > Reference Guide
- -> 7. Forced stop resource details
AWS CLI commands are executed in a custom forced stop resource script. Therefore, advance settings such as installing AWS CLI on each instance that will constitute the HA cluster are required. For more information of the how to use AWS CLI with EXPRESSCLUSTER, refer to the Getting Started Guide.
Documentation - Manuals
- EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Getting Started Guide
- -> 6. Notes and Restrictions
- -> 6.2 Before installing EXPRESSCLUSTER
- -> 6.2.16 Time synchronization in the AWS environment
- -> 6.2.17 IAM settings in the AWS environment
- EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Linux > Getting Started Guide
- -> 6. Notes and Restrictions
- -> 6.3 Before installing EXPRESSCLUSTER
- -> 6.3.19 Time synchronization in the AWS environment
- -> 6.3.20 IAM settings in the AWS environment
2.1 Registering an Account(Windows Only)
This operation is not necessary on Linux, so proceed to the next section, "2.2 Configuring Fencing".
On Windows, you need to execute the forced stop resource script as the Administrator user. Therefore, register the Administrator user information in EXPRESSCLUSTER.
From the Cluster WebUI, open "Cluster Properties" and click the "Add" button on the "Account" tab.
Enter "Administrator" in "User Name" and the Administrator password in "Password".
Press the "OK" button.
2.2 Configuring Fencing
From the Cluster WebUI, open "Cluster Properties" and select the "Fencing" tab.
For NP resolution, select PING or HTTP method and set targets that can execute NP resolution from server1 and server2.
Select "Custom" from the "Type" pull-down for "Forced Stop" and press the "Properties" button.
Select "server1" on the "Server List" tab and press the "Add" button.
Configure server2 using the same procedure.
On the "Forced Stop" tab, calculate and set the maximum waiting time when performing a forced stop as "Forced Stop Timeout" using the following formula. In this article, we will use 180 seconds as an example.
- CHECK_LOOP_MAX: Maximum number of attempts to check for stop
- SLEEP_WAIT: Interval to check for stop (seconds)
The variables are the same as those set in the script in "2.3 Editing the Script".
Take into account the overhead when running the AWS CLI, and have a number of seconds that you can afford.
Also, if you do not want to failover when forced stop fails, check "Disable Group Failover When Execution Fails".
The settings for the "Script'' tab will be continued in the next section.
2.3 Editing the Script
Specify the forced stop script in the script to be executed by the custom forced stop resource.
The script used in the custom forced stop resource must describe the behavior during periodic checks and the behavior when executing forced stop. Use the environment variable [CLP_FORCESTOP_MODE] to realize branching processing for each operation. If the [CLP_FORCESTOP_MODE] value is 0, a periodic check will be performed, and if it is 1, a forced stop will be performed.
This time, we will introduce script writing examples for the Windows and Linux versions.
2.3.1 Windows Version Script
Select "forcestop.bat" in the "Script" tab and press the "Edit" button.
Enter the contents of the script below.
rem * forcestop.bat
rem ***************************************
cd %~dp0
PowerShell ".\forcestop.ps1; exit $lastexitcode"
set ret=%ERRORLEVEL%
echo ret: %ret%
exit /b %ret%
Also, press the "Add" button to add the PowerShell script that will be called from "forcestop.bat".
Enter "forcestop.ps1" in "Script File Name".
It is not necessary to specify the "Browse File Path".
Press the "Save" button.
Select "forcestop.ps1", press the "Edit" button, and enter the contents of the script below. For the variable INSTANCES, specify the host name, instance ID, and region of each instance that constitute the HA cluster. For the variable CHECK_LOOP_MAX and SLEEP_WAIT, set the maximum number of attempts to check for stop and the interval to check for stop, respectively.
#* forcestop.ps1
#***************************************
##########################################
# Configuration
# Instance information definition
# Format:
# "<Hostname>" = "<Instance ID> <Instance region>";
$INSTANCES = @{
"server1" = "i-xxxxxxxxxxxxxxxxx us-east-1";
"server2" = "i-yyyyyyyyyyyyyyyyy us-west-2";
}
# Maximum number of attempts to check for stop
$CHECK_LOOP_MAX = 50
# Interval to check for stop (seconds)
$SLEEP_WAIT = 3
##########################################
# Get the state of the specified instance
function get_node_status($id, $region) {
aws ec2 describe-instances `
--region $region `
--instance-ids $id `
--query 'Reservations[].Instances[].State.Name' `
--output text
}
# Forced stop the specified instance
function stop_node($id, $region, $opt) {
aws ec2 stop-instances `
--region $region `
--instance-ids $id `
$opt
}
if ( $env:CLP_FORCESTOP_MODE -ne "1" ) {
# Operation during periodic check
# Determine instance ID and region from local server name
$target = $env:CLP_SERVER_LOCAL
if ( ! $INSTANCES.ContainsKey($target) ) {
# Instance information definition corresponding to server name cannot be found
exit 1
}
$info = $INSTANCES[$target].split()
$instanceid = $info[0]
$region = $info[1]
# Forced stop test
$result = stop_node $instanceid $region --dry-run 2>&1 | `
Select-String "DryRunOperation"
if ( $null -eq $result ) {
# Forced stop processing test failure
exit 2
}
} else {
# Operation when forced stop is performd
# Determine instance ID and region from down server name
$target = $env:CLP_SERVER_DOWN
if ( ! $INSTANCES.ContainsKey($target) ) {
# Instance information definition corresponding to server name cannot be found
exit 4
}
$info = $INSTANCES[$target].split()
$instanceid = $info[0]
$region = $info[1]
# Forcibly stop the down server
stop_node $instanceid $region --force > $null
if ( $lastexitcode -ne 0 ) {
# Forced stop processing failure
exit 5
}
# Stop check
$loop_count = 0
while ( $loop_count -lt $CHECK_LOOP_MAX ) {
$status = get_node_status $instanceid $region
if ( $lastexitcode -eq 0 -and $status -eq "stopped" ) {
# Confirm stop
break
}
Start-Sleep -Seconds $SLEEP_WAIT
$loop_count++
}
if ( $loop_count -ge $CHECK_LOOP_MAX ) {
# Stop check timeout occured
exit 6
}
}
exit 0
Select "Administrator" for "Exec User" at the bottom of the "Script" tab.
When the settings are complete, press the "OK" button to save. After saving, press the "OK" button to complete the "Cluster Properties" settings.
2.3.2 Linux Version Script
Select "forcestop.sh" in the "Script" tab and press the "Edit" button.
Enter the contents of the script below.
For the variable AWS_CLI, set the path of the installed AWS CLI.
For the variable INSTANCES, specify the host name, instance ID, and region of each instance that constitute the HA cluster.
For the variable CHECK_LOOP_MAX and SLEEP_WAIT, set the maximum number of attempts to check for stop and the interval to check for stop, respectively.
#***************************************
#* forcestop.sh
#***************************************
##########################################
# Configuration
# Absolute path for AWS CLI
AWS_CLI="/bin/aws"
# Instance information definition
# Format:
# "<Hostname>" = "<Instance ID> <Instance region>";
declare -A INSTANCES=(
[server1]="i-xxxxxxxxxxxxxxxxx us-east-1"
[server2]="i-yyyyyyyyyyyyyyyyy us-west-2"
)
# Maximum number of attempts to check for stop
CHECK_LOOP_MAX=50
# Interval to check for stop (seconds)
SLEEP_WAIT=3
##########################################
# Get the state of the specified instance
get_node_status() {
local id=$1
local region=$2
${AWS_CLI} ec2 describe-instances \
--region ${region} \
--instance-ids ${id} \
--query 'Reservations[].Instances[].State.Name' \
--output text
}
# Forced stop the specified instance
stop_node() {
local id=$1
local region=$2
local opt=$3
${AWS_CLI} ec2 stop-instances \
--region ${region} \
--instance-ids ${id} \
${opt}
}
if [ "${CLP_FORCESTOP_MODE}" != "1" ]; then
# Operation during periodic check
# Determine instance ID and region from local server name
info=(${INSTANCES["${CLP_SERVER_LOCAL}"]})
if [ ${#info[@]} -eq 0 ]; then
# Instance information definition corresponding to server name cannot be found
exit 1
fi
instanceid="${info[0]}"
region="${info[1]}"
# Forced stop test
stop_node "${instanceid}" "${region}" --dry-run 2>&1 | \
grep -q "DryRunOperation" >& /dev/null
if [ $? -ne 0 ]; then
# Forced stop processing test failure
exit 2
fi
else
# Operation when forced stop is performd
# Determine instance ID and region from down server name
info=(${INSTANCES["${CLP_SERVER_DOWN}"]})
if [ ${#info[@]} -eq 0 ]; then
# Instance information definition corresponding to server name cannot be found
exit 4
fi
instanceid="${info[0]}"
region="${info[1]}"
# Forcibly stop the down server
stop_node "${instanceid}" "${region}" --force > /dev/null
if [ $? -ne 0 ]; then
# Forced stop processing failure
exit 5
fi
# Stop check
loop_count=0
while [ ${loop_count} -lt ${CHECK_LOOP_MAX} ]
do
status=$(get_node_status "${instanceid}" "${region}")
if [ $? -eq 0 -a "${status}" == "stopped" ]; then
# Confirm stop
break
fi
sleep ${SLEEP_WAIT}
let loop_count++
done
if [ ${loop_count} -ge ${CHECK_LOOP_MAX} ]; then
# Stop check timeout occured
exit 6
fi
fi
exit 0
When the settings are complete, press the "OK" button to save. After saving, press the "OK" button to complete the "Cluster Properties" settings.
3. Checking the Operation
After setting up the forced stop resource, we will check the operation of the HA cluster by causing a network partition condition.
In order to cause a network partition condition we set up Network ACLs to block all communication across the VPCs of the servers for HA cluster. For details on checking the operation, please refer to "4. Checking the Operation at the Time of NP Resolution" in [2022 Edition] How to Prevent Both-System Activation on AWS (Windows/Linux). Please read "communication between Availability Zones" with "communication as between VPCs in different regions" in the above sentence.
Conclusion
This time, we tried building an HA cluster using custom forced stop resource on multi-region environment of AWS. By using a custom forced stop resource, it is possible to forcibly stop the active instance from the standby instance when heartbeat timeout is detected even in multi-region.
If you consider introducing the configuration described in this article, you can perform a validation with the trial module of EXPRESSCLUSTER. Please do not hesitate to contact us if you have any questions.