On the road again

Problem: IO fencing is unable to start, or be configured, due to pre-existing split brain error while using CPS server as co-ordinator point on VCS 6.0.

Error Message

# vxfenconfig -c
Log Buffer: 0xffffffff88d63b40
VXFEN vxfenconfig NOTICE Driver will use customized fencing - mechanism cps
VXFEN vxfenconfig ERROR V-11-2-1043 Detected a preexisting split brain. Unable to join cluster.
 

Cause

Scenario 1:

The VxFEN driver prevents an ejected node from rejoining the cluster after the failure of the private network links and before the private network links are repaired.

For example, suppose the cluster of system 1 and system 2 is functioning normally when the private network links are broken. Also, suppose system 1 is the ejected system. When system 1 restarts before the private network links are restored, its membership configuration does not show system 2. But when it attempts to register with the coordinator disks, it discovers system 2 is registered with them. Given this conflicting information about system 2, system 1 does not join the cluster and returns an error from vxfenconfig that resembles the following:

"VXFEN vxfenconfig ERROR V-11-2-1043 Detected a preexisting split brain. Unable to join cluster."

 

Scenario 2:

In absence of cluster details in the coordination point server, VxFEN fails with a pre-existing split-brain message (2433060)
When you start server-based I/O fencing, the node may not join the cluster and prints error messages in logs similar to the following:

"In the /var/VRTSvcs/log/vxfen/vxfen.log file:
VXFEN vxfenconfig ERROR V-11-2-1043 Detected a preexisting split brain. Unable to join cluster.
operation failed. CPS ERROR V-97-1400-446 Un-authorized user cpsclient@galaxy, domaintype vx; not allowing action"

The VxFEN daemon on the application cluster queries the coordination point server to check if the cluster members, as seen in the GAB membership, are registered with the CP server. If the application cluster fails to contact the CP server due to some reason, then fencing cannot determine the registrations on the CP server and conservatively assumes a pre-existing split-brain.

Solution

Scenario 1:

Clear the registration for System 2, or clear all stale fencing keys from the CPS. You can clear the node registration from the CPS using the command below.

Clear the keys on the CP servers using cpsadm. The cpsadm command clears a registration on a CP server. It can be run from either a VCS node, or CPS server.

# cpsadm -s cp_server -a unreg_node -c cluster_name -n nodeid

  • "cp_server" is the virtual IP address, or virtual hostname, on which the CP server is listening
  • "cluster_name" is the VCS name for the VCS cluster
  • "nodeid" specifies the node ID of the VCS cluster node.

Note: The environment variable of "CPS_USERNAME/CPS_DOMAINTYPE" needs to be set if running from VCS node:

Ensure that fencing is not already running on a node before clearing its registration on the CP server. After removing all stale registrations, the joiner node will be able to join the cluster.

You can check for stale registrations on CPS server using this command:

# /opt/VRTScps/bin/cpsadm -s <cps_server_vip> -a  list_membership  -c <cluster_name>
List of registered nodes: 0 1

 

Scenario 2:

Before you attempt to start VxFEN on the application, ensure that the cluster details such as cluster name, UUID, nodes, and privileges are added to the CP server.

Here are some important files to help you determine the details that were mentioned above,

To find the cluster name from main.cf"
# grep "cluster" /etc/VRTSvcs/conf/config/main.cf

To find the UUID:
# cat /etc/vx/.uuids/clusuuid

To find the nodes:
# cat /etc/llthosts

To verify privileges, you can use cpsadm to check current users added to the cluster.
# /opt/VRTScps/bin/cpsadm -s <cps_server_vip> -a list_users

The CPS configuration file (auto generated):
# cat /etc/vxcps.conf

Use these files to help determine which parameter is preventing the fencing from starting with CPS:

The file /var/VRTSvcs/log/vxfen/vxfend_[ABC].log contains logs files that may be useful for understanding and troubleshooting fencing-related issues on a VCS cluster (application cluster) node.

The file /var/VRTScps/log/cpserver_[ABC].log contains logs files that may be useful for understanding and troubleshooting fencing-related issues on CP server.

If you confirm that all of the above information is correct, then you can start the fencing.
 

EXAMPLE:

export CPS_DOMAINTYPE=vx

export  CPS_USERNAME=cpsclient@lnx-rosmed1

/opt/VRTScps/bin/cpsadm -s 10.28.89.54  -a unreg_node -c teoco -n 5

 

The same for all other nodes.

Than:

/etc/init.d/vxfen restart

Or

Reboot

Add comment