Global Site
Displaying present location in the site.
August 27th, 2021
Machine translation is used partially for this article. See the Japanese version for the original article.
Introduction
This time, we wiill learn about the EXPRESSCLUSTER-specific terms. There are more unfamiliar words, but this content will be a key when understanding the mechanism of EXPRESSCLUSTER.
Let's do our best.
Group Resources
Resources represent individual resources contained in one business in an HA cluster. In EXPRESSCLUSTER, group resources control each resource.
The following are some of the resources that can be controlled by group resources:
- Applications, services
- The disk that stores the shared data, such as a shared disk or a mirror disk
- Destination information for an active server, such as floating IP addresses and virtual host names
Failover Group
A failover group is a collection of group resources required to perform a single independent operation in an HA cluster, which is the unit for failover. By bringing together the resources needed to do business, HA clusters can be easily configured and operated. You can now handle start/stop operations, failover, etc. as a series of operations. The HA cluster provides the necessary controls to keep failover groups and group resources running on an active server. This allows clients to access their work without being aware of destinations and data.

Monitor Resources
Monitor resources monitor the health of the specified targets. If an error is detected in a target, perform the necessary recovery action to continue the business (for example, restarting group resources, failover, etc.).
In addition to group resources (by which such as applications and floating IP addresses are controlled), resources required for business (Network, OS, etc.) can be specified as monitoring targets for monitor resources.
Depending on the monitoring resource, the standby server is also monitored to detect and report errors on the standby server. This allows you to monitor whether a business is ready to start or run on a standby server.

Heartbeat
One of the basic functions of an HA cluster is verifying whether the other server is working properly between servers. The mechanism for this function in an HA cluster is called heartbeat. If the heartbeat is cut off, such as when the server is powered down, a recovery action such as failover is performed.
Heartbeat paths can include shared disks and serial ports (COM ports) in addition to networks. By combining network paths and non-network paths to configure multiple heartbeats, you can make a HA cluster less affected by individual resource failures and more resistant to failure.

Network Partitions
A network partition is a state in which all communication channels have problems and the network between servers is partitioned. Network partitions are also called "split-brains".
HA clusters that do not support network partitions cannot distinguish between communication path failures and server failures, and may access the same resource, such as shared disks, from multiple servers and cause data corruption.

Network Partitions Resolution
As a countermeasure for network partitions, it is important to make the network partition less likely to occur by using multiple types of heartbeats described in the "Heartbeat" Section. If a network partition still occurs, perform network partition resolution.
Network partition resolution detects a heartbeat blocker with the other server and determines if the other server is down or the network partition. In doing so, try to access a predetermined target (network device, shared disk, server) from each server and determine if it is a network partition from the results. For example, ping network partition resolution determines if it is a network partition, and each server communicates with the network device on the client's access path.
An communicatable server determines that the server itself is a server that should continue its business, and failover is performed. On the other hand, a not communicatable server shuts down because it determines that it is partitioned. This prevents the corruption of shared data.
Conclusion
This time, we learned EXPRESSCLUSTER-specific terms. Next, we'll talk about tools of EXPRESSCLUSTER.