Displaying present location in the site.

Introducing the Function of EXPRESSCLUSTER: Forced Stop Resource

EXPRESSCLUSTER Official Blog

March 17th, 2023

Machine translation is used partially for this article. See the Japanese version for the original article.

Introduction

As a new function of EXPRESSCLUSTER X 5.0, forced stop resource for Amazon Web Services (hereinafter called "AWS") and Oracle Cloud Infrastructure (hereinafter called "OCI") have been added.

By setting forced stop resource, you can easily force stop the down server from the outside when the remaining server (normal server) in the cluster recognize that the server is down due to heartbeat timeout.
In EXPRESSCLUSTER X 4.3 and earlier, if you wanted to force stop in an AWS and OCI environment, you had to prepare the script yourself. But with the addition of the forced stop resource, it is now easier to configure via GUI.

This time, we will introduce the forced stop resource and how to set it up.

Contents

1. What is a Forced Stop Resource?

By setting forced stop resource, you can force stop the down server when the remaining server (normal server) in the cluster recognize that the server is down due to heartbeat timeout. The setting of the forced stop resource differs depending on the type of environment in which the cluster is built, such as physical machine, virtual machine, or cloud, so set the forced stop resource corresponding to the environment type.

In EXPRESSCLUSTER X 4.3 and earlier, there was the forced stop function that can be used in physical and virtual environments, and the force-stop script for use in environments that the forced stop function does not support.
In EXPRESSCLUSTER X 5.0, the forced stop function and the force-stop script have been consolidated as the forced stop resource.
Also, in EXPRESSCLUSTER X 5.0, forced stop resource for AWS and OCI environments have been added.

  • * The script description method differs between the forced stop resource of EXPRESSCLUSTER X 5.0 and the force-stop script of EXPRESSCLUSTER X 4.3 and earlier. Refer to the following for an example of force-stop script settings for EXPRESSCLUSTER X 4.3 and earlier.

The forced stop resource has two types of operations: "Performing a forced stop" and "Periodically checking if the target can be forcibly stopped". The contents of each operation and the timing of execution are as follows.

Performing a forced stop:

  • Use the function of a device or infrastructure system that manages server status to force a downed server to stop.
  • The execution trigger for the forced stop resource is when the server down due to heartbeat timeout is detected and the failover group that was running on the down server starts on the other server. If you stop normally the server from Cluster WebUI, etc. or if a failover group is not started on a downed server and failover does not occur, the forced stop resource is not executed.

Periodically checking if the target can be forcibly stopped:

  • Checks whether a forced stop can be performed, by communicating with a device or infrastructure system for forcibly stopping a server. Depending on the result, the forced stop resource shows whether the server can be forcibly stopped: "Normal" (forced stop can be executed) or "Error" (forced stop is not feasible).
  • Done on a regular basis while the cluster service is running.

1.1 Need for Forced Stop Resource

EXPRESSCLUSTER X can set network partition resolution (hereinafter called "NP resolution") as a way to prevent both-system activation. Please refer to the following article for details on NP resolution.

[Reference]
popupIntroduction to HA Clusters -Glossary 3-
-> Network Partitions
-> Network Partitions Resolution

However, even if an NP resolution is set, if the following failure occur, both-system activation may occur if a failover is performed as it is.

  • The cause that the server recognized as down was the stall (hang -up) of the OS.
  • Heartbeat communication between the servers is not possible, but NP resolution is possible.

Even in the event of such a disability, the forced stop resource that stops the server determined to have been down is used as a method that prevent both-system activation.

2. Configuration Procedure for Forced Stop Resource

Introducing the setting procedure of "AWS Forced Stop", "OCI Forced Stop", and "Custom Forced Stop" among the forced stop resource available in EXPRESSCLUSTER X 5.0.
AWS and OCI forced stop resource can be easily set by simply entering the instance information to be stopped.
You can use a script in custom forced stop resource, but there are different description points from the force-stop script in EXPRESSCLUSTER X 4.3 and earlier, so we will introduce examples of descriptions.

For more information of the forced stop resource, refer to the Reference Guide.

[Reference]
popupDocumentation - Manuals
  • EXPRESSCLUSTER X 5.0 > EXPRESSCLUSTER X 5.0 for Windows > Reference Guide
  • -> 7. Forced stop resource details
  • EXPRESSCLUSTER X 5.0 > EXPRESSCLUSTER X 5.0 for Linux > Reference Guide
  • -> 7. Forced stop resource details

2.1 Configuration Procedure for AWS

When using the AWS forced stop resource, set the instance ID of each instance for the HA cluster. In addition, pre-settings such as installing AWS CLI on each instance for the cluster are required. For more information, please refer to the reference guide described in "2. Procedure for Setting Up a Forced Stop Resource".

Start Cluster WebUI and switch to [Config mode]. Configure the forced stop resource settings from the [Cluster Properties].
Select the [Fencing] tab, select "AWS" from the [Type] pull-down under [Forced Stop], and click "Properties".

  • * See also below for an example of the behavior of AWS's NP resolution resource and forced stop resource set in the [Fencing] tab.

fencing1

On the [Server List] tab, select "server1" and click [Add].

server list1

For [Instance ID], specify the instance ID of "server1" (e.g. i-xxxxxxxxxxxxxxxxx).

instance id1

Set the instance ID for "server2" in the same way. After setting, click [OK] to complete the setting.

2.2 Configuration Procedure for OCI

When using the OCI forced stop resource, set the instance ID of each instance for the HA cluster. In addition, pre-settings such as installing OCI CLI on each instance for the cluster are required. For more information, please refer to the reference guide described in "2. Procedure for Setting Up a Forced Stop Resource".

Start Cluster WebUI and switch to [Config mode]. Configure the forced stop resource settings from the [Cluster Properties].
Select the [Fencing] tab, select "OCI" from the [Type] pull-down under [Forced Stop], and click [Properties].

fencing2

On the [Server List] tab, select "server1" and click [Add].

server list2

For [Instance ID], specify the instance ID of "server1" (e.g. ocid1.instance.oc1.us-ashburn-1.xxxx).

instance id2

Set the instance ID for "server2" in the same way. After setting, click [OK] to complete the setting.

2.3 Configuration Procedure for Using a Script

If you want to execute a forced stop in an environment that is not supported by the forced stop resource, configure "Custom Forced Stop".
Custom forced stop resource differs from the conventional force-stop script in operation timing, so basically, force-stop script created in the environment of EXPRESSCLUSTER X 4.3 and earlier cannot be used as it is.

The following describes the setting procedure and setting points for "Custom Forced Stop".

Start Cluster WebUI and switch to [Config mode]. Configure the forced stop resource settings from the [Cluster Properties].
Select the [Fencing] tab, select "Custom" from the [Type] pull-down under [Forced Stop], and click [Properties].

fencing3

On the [Server List] tab, select "server1" and click [Add].

server list3

Add "server2" in the same way.

On the [Forced Stop] tab, set the maximum waiting time for forced stop as [Forced Stop Timeout]. Also, if you do not want a failover to occur if the forced stop fails, check the [Disable Group Failover When Execution Fails] checkbox.

forcestop

The script to be executed by custom forced stop resource can be saved in EXPRESSCLUSTER or placed anywhere on the server.

This time, we will introduce how to save a script in EXPRESSCLUSTER.
To specify a script placed anywhere on the server, select "User Application" and specify the path of the script.

On the [Script] tab, select "forcestop.sh" and click [Edit].

property

In the script used for custom forced stop resource, it is necessary to describe the operation for "Performing a forced stop" and the operation for "Periodically checking if the target can be forcibly stopped. The environment variable [CLP_FORCESTOP_MODE] is used to implement the branch processing of the "Performing a forced stop" and "Periodically checking if the target can be forcibly stopped" operations.
Custom forced stop resource performs a "Performing a forced stop" if the value of [CLP_FORCESTOP_MODE] is 0, and a "Periodically checking if the target can be forcibly stopped" if the value is 1.

This time, we will introduce a script description example for the Linux version.
In the description example, fictitious commands "ex-check-node" and "ex-stop-node" appear, but the following commands are assumed.

  • ex-check-node: Command to check the status of the server to the infrastructure system.
  • ex-stop-node: Command to request the infrastructure system to stop the server.

#! /bin/sh
#***************************************
#             forcestop.sh             
#***************************************

# Maximum number of stop check attempts (seconds)
CHECK_LOOP_MAX=240

if [ "${CLP_FORCESTOP_MODE}" = 0 ]; then
  # Operation during "Performing a forced stop"
  # Confirmation of whether or not forced stop is executed (command execution check)
  ex-check-node
  ret=$?
  if [ ${ret} -ne 0 ]; then
    # Command execution check failure
    exit 1
  fi
else
  # Operation during "Periodically checking if the target can be forcibly stopped"
  # Force down server stop
  ex-stop-node --force --node ${CLP_SERVER_DOWN}
  ret=$?
  if [ ${ret} -ne 0 ]; then
    # Failure of force stop processing
    exit 2
  fi

  # Stop check
  CHECK_LOOP_COUNT=0
  while [ ${CHECK_LOOP_COUNT} -lt ${CHECK_LOOP_MAX} ]
  do
    ex-check-node --node ${CLP_SERVER_DOWN}
    ret=$?
    if [ ${ret} -eq 0 ]; then
      # Confirm stop
      break
    fi

    sleep 1
    let CHECK_LOOP_COUNT=${CHECK_LOOP_COUNT}+1
  done

  if [ ${ret} -ne 0 ]; then
    # Stop check timeout occurs
    exit 3
  fi
fi

exit 0

After writing the script, press the [OK] button to save it. Click [OK] after saving to complete the settings.

Conclusion

This time, we will introduce the forced stop resource and how to set it up.
In addition to consolidating the conventional forced stop function and force-stop script into "Forced Stop Resource", in the AWS and OCI environments, dedicated forced stop resource have been added, setting it easier can now be set.

If you consider introducing the configuration described in this article, you can perform a validation with the popuptrial module of EXPRESSCLUSTER. Please do not hesitate to contact us if you have any questions.