The nodes of the HAE Cluster will monitor each other. They will shut down a cluster node if a cluster node won’t match the probing agent’s requirements.
Setting the AWS stonith-action to poweroff will permanently shut down the defect cluster node. This will expedite a takeover on AWS.
The default setting reboot will make the stonith agent wait until a reboot will have been successfully completed. This will delay the reconfiguration of the SAP HANA database.
Re-integrating a faulty cluster node into the cluster has to be performed manually since it’ll take an investigation why the cluster node didn’t operate as expected. Restarting the second (faulty) cluster node automatically can be configured as well. It bears however the risk that the remaining node gets harmed trough an incorrect acting second (faulty) node.
The reconfiguration of the second (faulty) node happens through the following steps:
- Restart node through the AWS console
- Investigate the node after reboot and fix a potential defect
- Boot SAP HANA manually. Check the instance health. Fix a potential defect. Shut down SAP HANA.
- Configure SAP HANA to be a secondary node to the new master node.
- Start SAP HANA as secondary node.
- Restart the HAE cluster with the command “service pacemaker start” as super user. This process can take several minutes.
- Verify that all cluster services operate correctly A takeover is now completed. The roles of the two cluster nodes have been flipped.
The SAP HANA database is now protected against future failure events.