[2024 Edition] Introduction of the Service Startup Delay Time Setting Feature

November 20th, 2024

Machine translation is used partially for this article. See the Japanese version for the original article.

Introduction

This time, we introduce the feature for configuring the Service Startup Delay Time of EXPRESSCLUSTER X, along with how to calculate the appropriate delay time.
This article is a revised edition (2024 edition) of a previous article. The previous article lacked some considerations regarding the Service Startup Delay Time, so a new article has been created.
Therefore, regardless of the version of EXPRESSCLUSTER X, please refer to this new article instead of the previous one.

1. Reasons for the Necessity of a Startup Delay
1.1 Not Recognizing the Restart of the Peer Server's OS
1.2 Both-System Activation Occurs
1.3 Forced Stop Execution During Cluster Startup Process
1.4 Resource Activation Before the Shared Disk is Fully Powered On
2. Calculating and Setting the Service Startup Delay Time
3. Checking the Operation
3.1 Windows
3.2 Linux

1. Reasons for the Necessity of a Startup Delay

If the startup delay time for the services of EXPRESSCLUSTER X is not appropriately set, the following issues may arise:

Failure to recognize the OS reboot of the peer server

Both-system activation occurring

Forced Stop execution during cluster startup processing

Resources becoming active before the shared disk has completed startup

In versions prior to EXPRESSCLUSTER X 4.3, adjusting the time until the services of EXPRESSCLUSTER X start required delaying the OS startup time or, in the case of Windows, using the armdelay command. However, delaying the OS startup time has the drawback of also delaying the startup of services that are not clustered by EXPRESSCLUSTER X.

In versions 5.0 and later of EXPRESSCLUSTER X, a new setting was introduced to delay the time from powering on each server to the startup of EXPRESSCLUSTER X services. This enhancement allows for the delay of only the startup of EXPRESSCLUSTER X services without affecting the OS startup time.

The appropriate Service Startup Delay Time varies depending on the cluster configuration. In Section 1.1 and subsequent sections, we will explain the necessity of this setting and provide the methods for calculating the required delay time.

If you are using a version prior to EXPRESSCLUSTER X 5.0, you will need to adjust the OS startup time.
Please interpret the Service Startup Delay Time settings in the following explanations as adjustments to the OS startup time.

For more details on the Service Startup Delay Time settings and OS startup time adjustments, please refer to the Installation & Configuration Guide.

[Reference]
popup

Documentation - Manuals

EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Installation & Configuration Guide

-> 2. Determining a system configuration

-> 2.6.3 Adjustment of time for EXPRESSCLUSTER services to start up (Required)

EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Linux > Installation & Configuration Guide

-> 2. Determining a system configuration

-> 2.8.5 Adjustment of time for EXPRESSCLUSTER services to start up (Required)

1.1 Not Recognizing the Restart of the Peer Server's OS

If you want to trigger a failover by restarting the server, but the cluster service startup on the restarted server completes within the combined time of the Heartbeat Timeout and Heartbeat Interval, the peer server will interpret the heartbeat as continuous, failing to recognize the restart, and thus failover will not occur.

No failover occurs because the heartbeat timeout is not detected

To avoid this issue, it is necessary to delay the startup time of EXPRESSCLUSTER X services so that it exceeds the combined time of the Heartbeat Timeout and Heartbeat Interval.
Therefore, please add the following time to the Service Startup Delay Time:

Heartbeat Timeout + Heartbeat Interval
* The default value for Heartbeat Timeout (Windows: 30 seconds, Linux: 90 seconds)
* The default value for Heartbeat Interval: 3 seconds

Heartbeat times out and failover occurs at this point

1.2 Both-System Activation Occurs

When the standby node in EXPRESSCLUSTER X detects a heartbeat loss from the active node, it executes Network Partition Resolution (hereinafter called NP resolution). For more information on NP resolution, please refer to the following article.

[Reference]
popup

Introduction to HA Clusters -Glossary 3-

While NP resolution usually completes in a short time, NP resolution using the DISK method may take tens of seconds to complete.

If a failure occurs on the active node of the cluster server and it restarts, and if the standby node detects a heartbeat timeout and executes NP resolution using the DISK method, there is a possibility that the active node will restart before the NP resolution process completes, resulting in both-system activation where the failover group starts on both servers.

To avoid this issue, it is necessary to delay the startup time of EXPRESSCLUSTER X services on the active node until the NP resolution using the DISK method is complete.
Therefore, please add the following time to the Service Startup Delay Time:

DISK NP resolution resource IO Wait Time + 10 seconds
* The default value for DISK NP resolution resource IO wait time: 80 seconds

Please note that the DISK NP resolution resource is a feature available only for the Windows version and does not need to be considered for the Linux version. For more details on the DISK NP resolution resource, please refer to the following Reference Guide.

[Reference]
popup

Documentation - Manuals

EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Installation & Configuration Guide

-> 6. Details on network partition resolution resources

-> 6.2 Understanding network partition resolution by DISK method

1.3 Forced Stop Execution During Cluster Startup Process

In EXPRESSCLUSTER X 5.0 or later, the Forced stop resource to prevent both-system activation is provided by default. For details about the Forced stop resource, please refer to the following article.

[Reference]
popup

Introducing the Function of EXPRESSCLUSTER: Forced Stop Resource

When using the Forced stop resource, if the conditions for a forced stop are met, a failover will occur after the active node is forcibly stopped by the standby node to prevent both-system activation. Since the forced stop process utilizes the functions of the server, virtual platform, or cloud platform to stop, it may take time to stop the active node server depending on the situation.

If a failure occurs on the active node of the cluster server and it restarts, and the forced stop process is executed on the standby node, there is a possibility that another failure may occur by forced stopping the failover group that is in the process of starting on the active node.

Forced Stop occurs during failover group starting

To avoid this issue, it is necessary to delay the startup time of EXPRESSCLUSTER X services on the active node until the Forced stop resource completes the forced stop.
Therefore, please add the following time to the Service Startup Delay Time:

Forced Stop Timeout + Time to Wait for Stop to Be Completed
* When using a custom forced stop resource, calculate the Time to Wait for Stop to Be Completed as 0 seconds.
* The default values for the Forced Stop Timeout and the Time to Wait for Stop to Be Completed vary depending on the type of Forced stop resource.

For more details on the Forced stop resource, please refer to the following Reference Guide.

[Reference]
popup

Documentation - Manuals

EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Windows > Reference Guide

-> 7. Forced stop resource details

EXPRESSCLUSTER X 5.2 > EXPRESSCLUSTER X 5.2 for Linux > Reference Guide

-> 7. Forced stop resource details

1.4 Resource Activation Before the Shared Disk is Fully Powered On

When you power on the shared disk and server to start the cluster system, if the shared disk is not fully powered on in time for the service startup process of EXPRESSCLUSTER, and EXPRESSCLUSTER starts without recognizing the shared disk, the resources that use the shared disk will fail to activate.
Therefore, it is necessary to set the Service Startup Delay Time to be longer than the following time:

The time it takes for the shared disk to become available after power is applied.

If the shared disk is made available before the server starts, or if the configuration does not use a shared disk, then it is not necessary to set the Service Startup Delay Time related to this matter.

2. Calculating and Setting the Service Startup Delay Time

This section explains the final calculation method for determining the "Service Startup Delay Time" for each cluster configuration explained in Section 1.

First, add together the values explained in Sections 1.1 to 1.3 as follows:

Formula 1
Heartbeat timeout + Heartbeat interval

(+ DISK NP resolution resource IO Wait Time + 10 seconds)
(+ Forced Stop Timeout + Time to Wait for Stop to Be Completed)

* If DISK NP resolution resources are not used, the addition on the second line is unnecessary.
* If Forced stop resources are not used, the addition on the third line is unnecessary.

Next, compare Formula 1 with the value explained in Section 1.4 (the time it takes for the shared disk to become available after power is applied), and set the larger value as the Service Startup Delay Time.

If Formula 1 > Value in Section 1.4:

-> Set the time from Formula 1

If Formula 1 < Value in Section 1.4:

-> Set the value explained in Section 1.4

The calculated Service Startup Delay Time can be set using the Cluster WebUI.
Launch the Cluster WebUI, go to "Config Mode," select the cluster properties, then the "Timeout" tab, and enter the desired delay time in seconds for "Service Startup Delay Time" between 0 and 9999 seconds.
The following screen is from the Windows version, but the same settings can be made from the same location in the Linux version.

3. Checking the Operation

After the OS has started, verify that the EXPRESSCLUSTER X service starts after the number of seconds specified in the Service Startup Delay Time has elapsed. In this verification, we are using EXPRESSCLUSTER X 5.0 (internal version: Windows 13.00, Linux 5.0.0-1) and have set the Service Startup Delay Time to "300 seconds" as a test.

After restarting the OS, verify the time when the OS started and the time when the EXPRESSCLUSTER X service started.
Since the verification procedures differ between Windows and Linux, the procedures for each are described separately.

3.1 Windows

To verify the server's startup time in Windows, check the "System" event log.

Next, check the "Application" event log for the message indicating that the cluster service has started, along with the timestamp.

Refer to the following log entry:
"Cluster service has been started properly."

In this verification, it can be confirmed that the cluster service startup completes 5 minutes and 34 seconds (334 seconds) after the OS startup time.
* Note that there may be a slight variation in the timestamps recorded in the logs.

3.2 Linux

To verify the server's startup time in Linux, use the following command:

# last reboot

Next, use the following command to check the timestamp when the EXPRESSCLUSTER X service started:

# cat /var/log/messages | grep "Starting the cluster daemon"

In this verification, it can be confirmed that the EXPRESSCLUSTER X service startup completes more than 5 minutes (300 seconds) after the OS startup time.
* Note that the search string in the command is case-sensitive.
* There may be a slight variation in the timestamps recorded in the logs.

Conclusion

This article introduced the feature for configuring the Service Startup Delay Time of EXPRESSCLUSTER X, along with how to calculate the appropriate delay time.

By utilizing this feature, you can easily configure the service startup delay for EXPRESSCLUSTER X from the Cluster WebUI without delaying the OS startup time. We encourage you to consider using this feature.

If you consider introducing the configurations described in this article, you can perform a validation with the popup trial module of EXPRESSCLUSTER. Please do not hesitate to contact us if you have any questions.

Contact

Free trial download

Displaying present location in the site.

[2024 Edition] Introduction of the Service Startup Delay Time Setting Feature

Introduction

Contents

1. Reasons for the Necessity of a Startup Delay

1.1 Not Recognizing the Restart of the Peer Server's OS

1.2 Both-System Activation Occurs

1.3 Forced Stop Execution During Cluster Startup Process

1.4 Resource Activation Before the Shared Disk is Fully Powered On

2. Calculating and Setting the Service Startup Delay Time

3. Checking the Operation

3.1 Windows

3.2 Linux

Conclusion