Displaying present location in the site.

We Tried Forced Stop Resource on AWS Multi-Region Environment(Windows/Linux)

EXPRESSCLUSTER Official Blog

April 26th, 2024

Machine translation is used partially for this article. See the Japanese version for the original article.

Introduction

We tried building an HA cluster using forced stop resource on multi-region environment of Amazon Web Services (hereinafter called "AWS").

The forced stop resource forcibly stops the active server from the outside using CLI, etc. when a heartbeat timeout is detected between cluster servers. By using it in conjunction with network partition resolution (hereinafter called "NP resolution") resources, it is possible to prevent both-system activation more reliably.

Since EXPRESSCLUSTER X 5.0, forced stop resource can be easily used on AWS environment by simply specifying parameters. However, HA cluster on multi-region environment, a little ingenuity is required when using forced stop resource. This article introduces the procedure for setting up forced stop resource on multi-region environment.

This article is a supplement to the previously published popupIntroducing the Function of EXPRESSCLUSTER: Forced Stop Resource article, so we recommend that you read the previous articles first.

Contents

1. Forced Stop Resource on a Multi-Region Environment

There are several types of forced stop resource available depending on the environment in which we will build an HA cluster.
In an AWS environment, a forced stop resource with a "Type" of "AWS" is usually used. The forced stop resource executes the AWS CLI command to stop the active instance from the standby instance when the standby instance detects a heartbeat timeout with the active instance, and the instance is forcibly stopped.

Forced stop in an AWS region

However, the forced stop resource as of EXPRESSCLUSTER X 5.2 assumes that the instances to be stopped are in the same region. Therefore, it is not possible to stop instances that exist in different regions.

Forced stop in different AWS regions

Therefore, use a forced stop resource with "Type" of "Custom" instead. A custom forced stop resource can be executed by registering a script that describes any process. Therefore, by specifying in the script the region in which the instance to be stopped exists, it is possible to stop the target instance.

Custom forced stop in different AWS regions

2. Configuration Procedure for Forced Stop Resource

Introducing the configuration procedure of custom forced stop resource for multi-region environment.
For more information of the forced stop resource, refer to the Reference Guide.

[Reference]
popupDocumentation - Manuals
  • EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Reference Guide
  • -> 7. Forced stop resource details
  • EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Linux > Reference Guide
  • -> 7. Forced stop resource details

AWS CLI commands are executed in a custom forced stop resource script. Therefore, advance settings such as installing AWS CLI on each instance that will constitute the HA cluster are required. For more information of the how to use AWS CLI with EXPRESSCLUSTER, refer to the Getting Started Guide.

[Reference]
popupDocumentation - Manuals
  • EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Getting Started Guide
  • -> 6. Notes and Restrictions
  • -> 6.2 Before installing EXPRESSCLUSTER
  • -> 6.2.16 Time synchronization in the AWS environment
  • -> 6.2.17 IAM settings in the AWS environment
  • EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Linux > Getting Started Guide
  • -> 6. Notes and Restrictions
  • -> 6.3 Before installing EXPRESSCLUSTER
  • -> 6.3.19 Time synchronization in the AWS environment
  • -> 6.3.20 IAM settings in the AWS environment

2.1 Registering an Account(Windows Only)

This operation is not necessary on Linux, so proceed to the next section, "2.2 Configuring Fencing".
On Windows, you need to execute the forced stop resource script as the Administrator user. Therefore, register the Administrator user information in EXPRESSCLUSTER.

From the Cluster WebUI, open "Cluster Properties" and click the "Add" button on the "Account" tab.

Enter "Administrator" in "User Name" and the Administrator password in "Password".
Press the "OK" button.

Add Administrator user

2.2 Configuring Fencing

From the Cluster WebUI, open "Cluster Properties" and select the "Fencing" tab.
For NP resolution, select PING or HTTP method and set targets that can execute NP resolution from server1 and server2.
Select "Custom" from the "Type" pull-down for "Forced Stop" and press the "Properties" button.

Select "server1" on the "Server List" tab and press the "Add" button.

Configure server2 using the same procedure.

On the "Forced Stop" tab, calculate and set the maximum waiting time when performing a forced stop as "Forced Stop Timeout" using the following formula. In this article, we will use 180 seconds as an example.

Forced Stop Timeout Seconds > CHECK_LOOP_MAX × SLEEP_WAIT
  •   CHECK_LOOP_MAX: Maximum number of attempts to check for stop
  •   SLEEP_WAIT: Interval to check for stop (seconds)

The variables are the same as those set in the script in "2.3 Editing the Script".
Take into account the overhead when running the AWS CLI, and have a number of seconds that you can afford.

Also, if you do not want to failover when forced stop fails, check "Disable Group Failover When Execution Fails".

The settings for the "Script'' tab will be continued in the next section.

2.3 Editing the Script

Specify the forced stop script in the script to be executed by the custom forced stop resource.

The script used in the custom forced stop resource must describe the behavior during periodic checks and the behavior when executing forced stop. Use the environment variable [CLP_FORCESTOP_MODE] to realize branching processing for each operation. If the [CLP_FORCESTOP_MODE] value is 0, a periodic check will be performed, and if it is 1, a forced stop will be performed.

This time, we will introduce script writing examples for the Windows and Linux versions.

2.3.1 Windows Version Script

Select "forcestop.bat" in the "Script" tab and press the "Edit" button.

Enter the contents of the script below.

rem ***************************************
rem *                 forcestop.bat
rem ***************************************

cd %~dp0
PowerShell ".\forcestop.ps1; exit $lastexitcode"
set ret=%ERRORLEVEL%
echo ret: %ret%
exit /b %ret%

Also, press the "Add" button to add the PowerShell script that will be called from "forcestop.bat".

Enter "forcestop.ps1" in "Script File Name".
It is not necessary to specify the "Browse File Path".
Press the "Save" button.

Adding forcestop.ps1 script

Select "forcestop.ps1", press the "Edit" button, and enter the contents of the script below. For the variable INSTANCES, specify the host name, instance ID, and region of each instance that constitute the HA cluster. For the variable CHECK_LOOP_MAX and SLEEP_WAIT, set the maximum number of attempts to check for stop and the interval to check for stop, respectively.

#***************************************
#*                 forcestop.ps1
#***************************************

##########################################
# Configuration

# Instance information definition
# Format:
#   "<Hostname>" = "<Instance ID> <Instance region>";
$INSTANCES = @{
    "server1" = "i-xxxxxxxxxxxxxxxxx us-east-1";
    "server2" = "i-yyyyyyyyyyyyyyyyy us-west-2";
}

# Maximum number of attempts to check for stop
$CHECK_LOOP_MAX = 50

# Interval to check for stop (seconds)
$SLEEP_WAIT = 3

##########################################

# Get the state of the specified instance
function get_node_status($id, $region) {
    aws ec2 describe-instances `
        --region $region `
        --instance-ids $id `
        --query 'Reservations[].Instances[].State.Name' `
        --output text
}

# Forced stop the specified instance
function stop_node($id, $region, $opt) {
    aws ec2 stop-instances `
        --region $region `
        --instance-ids $id `
        $opt
}

if ( $env:CLP_FORCESTOP_MODE -ne "1" ) {
    # Operation during periodic check
    # Determine instance ID and region from local server name
    $target = $env:CLP_SERVER_LOCAL
    if ( ! $INSTANCES.ContainsKey($target) ) {
        # Instance information definition corresponding to server name cannot be found
        exit 1
    }

    $info = $INSTANCES[$target].split()
    $instanceid = $info[0]
    $region = $info[1]

    # Forced stop test
    $result = stop_node $instanceid $region --dry-run 2>&1 | `
        Select-String "DryRunOperation"
    if ( $null -eq $result ) {
        # Forced stop processing test failure
        exit 2
    }

} else {
    # Operation when forced stop is performd
    # Determine instance ID and region from down server name
    $target = $env:CLP_SERVER_DOWN
    if ( ! $INSTANCES.ContainsKey($target) ) {
        # Instance information definition corresponding to server name cannot be found
        exit 4
    }

    $info = $INSTANCES[$target].split()
    $instanceid = $info[0]
    $region = $info[1]

    # Forcibly stop the down server
    stop_node $instanceid $region --force > $null
    if ( $lastexitcode -ne 0 ) {
        # Forced stop processing failure
        exit 5
    }

    # Stop check
    $loop_count = 0
    while ( $loop_count -lt $CHECK_LOOP_MAX ) {
        $status = get_node_status $instanceid $region
        if ( $lastexitcode -eq 0 -and $status -eq "stopped" ) {
            # Confirm stop
            break
        }

        Start-Sleep -Seconds $SLEEP_WAIT

        $loop_count++
    }

    if ( $loop_count -ge $CHECK_LOOP_MAX ) {
        # Stop check timeout occured
        exit 6
    }
}

exit 0

Select "Administrator" for "Exec User" at the bottom of the "Script" tab.

When the settings are complete, press the "OK" button to save. After saving, press the "OK" button to complete the "Cluster Properties" settings.

2.3.2 Linux Version Script

Select "forcestop.sh" in the "Script" tab and press the "Edit" button.

Enter the contents of the script below.
For the variable AWS_CLI, set the path of the installed AWS CLI.
For the variable INSTANCES, specify the host name, instance ID, and region of each instance that constitute the HA cluster.
For the variable CHECK_LOOP_MAX and SLEEP_WAIT, set the maximum number of attempts to check for stop and the interval to check for stop, respectively.

#! /bin/bash
#***************************************
#*                  forcestop.sh
#***************************************

##########################################
# Configuration

# Absolute path for AWS CLI
AWS_CLI="/bin/aws"

# Instance information definition
# Format:
#   "<Hostname>" = "<Instance ID> <Instance region>";
declare -A INSTANCES=(
  [server1]="i-xxxxxxxxxxxxxxxxx us-east-1"
  [server2]="i-yyyyyyyyyyyyyyyyy us-west-2"
)

# Maximum number of attempts to check for stop
CHECK_LOOP_MAX=50

# Interval to check for stop (seconds)
SLEEP_WAIT=3

##########################################

# Get the state of the specified instance
get_node_status() {
  local id=$1
  local region=$2
  ${AWS_CLI} ec2 describe-instances \
    --region ${region} \
    --instance-ids ${id} \
    --query 'Reservations[].Instances[].State.Name' \
    --output text
}

# Forced stop the specified instance
stop_node() {
  local id=$1
  local region=$2
  local opt=$3
  ${AWS_CLI} ec2 stop-instances \
    --region ${region} \
    --instance-ids ${id} \
    ${opt}
}

if [ "${CLP_FORCESTOP_MODE}" != "1" ]; then
  # Operation during periodic check
  # Determine instance ID and region from local server name
  info=(${INSTANCES["${CLP_SERVER_LOCAL}"]})
  if [ ${#info[@]} -eq 0 ]; then
    # Instance information definition corresponding to server name cannot be found
    exit 1
  fi
  instanceid="${info[0]}"
  region="${info[1]}"

  # Forced stop test
  stop_node "${instanceid}" "${region}" --dry-run 2>&1 | \
    grep -q "DryRunOperation" >& /dev/null
  if [ $? -ne 0 ]; then
    # Forced stop processing test failure
    exit 2
  fi

else
  # Operation when forced stop is performd
  # Determine instance ID and region from down server name
  info=(${INSTANCES["${CLP_SERVER_DOWN}"]})
  if [ ${#info[@]} -eq 0 ]; then
    # Instance information definition corresponding to server name cannot be found
    exit 4
  fi
  instanceid="${info[0]}"
  region="${info[1]}"

  # Forcibly stop the down server
  stop_node "${instanceid}" "${region}" --force > /dev/null
  if [ $? -ne 0 ]; then
    # Forced stop processing failure
    exit 5
  fi

  # Stop check
  loop_count=0
  while [ ${loop_count} -lt ${CHECK_LOOP_MAX} ]
  do
    status=$(get_node_status "${instanceid}" "${region}")
    if [ $? -eq 0 -a "${status}" == "stopped" ]; then
      # Confirm stop
      break
    fi

    sleep ${SLEEP_WAIT}
    let loop_count++
  done

  if [ ${loop_count} -ge ${CHECK_LOOP_MAX} ]; then
    # Stop check timeout occured
    exit 6
  fi
fi

exit 0

When the settings are complete, press the "OK" button to save. After saving, press the "OK" button to complete the "Cluster Properties" settings.

3. Checking the Operation

After setting up the forced stop resource, we will check the operation of the HA cluster by causing a network partition condition.

In order to cause a network partition condition we set up Network ACLs to block all communication across the VPCs of the servers for HA cluster. For details on checking the operation, please refer to "4. Checking the Operation at the Time of NP Resolution" in popup[2022 Edition] How to Prevent Both-System Activation on AWS (Windows/Linux). Please read "communication between Availability Zones" with "communication as between VPCs in different regions" in the above sentence.

Conclusion

This time, we tried building an HA cluster using custom forced stop resource on multi-region environment of AWS. By using a custom forced stop resource, it is possible to forcibly stop the active instance from the standby instance when heartbeat timeout is detected even in multi-region.

If you consider introducing the configuration described in this article, you can perform a validation with the popuptrial module of EXPRESSCLUSTER. Please do not hesitate to contact us if you have any questions.