Setting up HPE Serviceguard

MC/ServiceGuard A.12.40.00 Enterprise Edition tested on CentOS/RHEL7

Network

all nodes incl. quorum

vi /etc/hosts

x.x.x.x      sg1.localdomain sg1
x.x.x.x      sg2.localdomain sg2
x.x.x.x      qs.localdomain qs
x.x.x.253    dns
x.x.x.254    gw

unless those are XEN guests

#ntpdate -u ntp.obspm.fr
ntpdate -u ru.pool.ntp.org
yum install ntp

SSH keys

all nodes but quorum

ssh-keygen -t ed25519
cat ~/.ssh/id_ed25519.pub 
vi ~/.ssh/authorized_keys
ssh sg1
ssh sg2
ping -c1 qs
ping -c1 sg1
ping -c1 sg2

Requirements

rpm -q \
xinetd \
sg3_utils \
net-snmp \
lm_sensors \
tog-pegasus \
libnl \
authd \
perl \
perl-Sys-Syslog \
krb5-libs \
zlib \
lsscsi \
systemd \
| grep ^package
yum install xinetd net-snmp lm_sensors tog-pegasus libnl perl-Sys-Syslog
yum install java

authd

chkconfig --list
systemctl list-unit-files | grep auth
systemctl start auth.socket
systemctl enable auth.socket

MAKE SURE YOU ARE RUNNING THE LATEST UPDATED KERNEL

rpm -qa | grep kernel
uname -r

we will need this to compile deadman module

yum install kernel-headers kernel-devel
yum groupinstall 'Development Tools'

XEN/PV fix

ERROR: Retrieval of node UUID failed. Unable to retrive UUID on node sg1.
Check if "dmidecode" returns valid UUID.

ERROR: Retrieval of node SMBIOS failed. Unable to retrive SMBIOS version on node sg1.
Check if "dmidecode" returns valid SMBIOS version.

cmcheckconf is complaining about dmidecode not returning an UUID nor SMBIOS version. I do XEN/PV so there’s no SMBIOS to talk to, hence dmidecode does not provide any info. Let’s fake it.

mv /usr/sbin/dmidecode /usr/sbin/dmidecode.dist
vi /usr/sbin/dmidecode
#!/bin/ksh
cat <<EOF
SMBIOS 3.0.0 present.

System Information
        UUID: 00000000-0000-0000-0000-000000000002
EOF
chmod +x /usr/sbin/dmidecode

check

/usr/sbin/dmidecode

Serviceguard Installation

on all nodes but quorum

mkdir lala/
mount -o loop A.12.40.00_Enterprise_Edition_for_Red_Hat_Enterprise_Linux_7_6_SLES_15_12_11_Oracle_Linux_7_BB097-11006_EVAL.iso lala/

cd /root/lala/
less Readme_Before_Install.txt

cd /root/lala/RedHat/RedHat7/Serviceguard/x86_64/
rpm -ivh serviceguard-license-A.12.40.00-0.rhel7.x86_64.rpm
rpm -ivh serviceguard-A.12.40.00-0.rhel7.x86_64.rpm

the post install script SHOULD work fine, if you’ve got the corresponding /usr/src/kernels/ against the running kernel. In case you get package postinstall script failures, and fixed your kernel version afterwards, you can still fix it manually - build the deadman module and load it

ls -lF /lib/modules/`uname -r`/source
ls -lF /lib/modules/`uname -r`/build
ls -ldF /usr/src/kernels/`uname -r`/

cd /usr/local/cmcluster/drivers/
#make clean
make -j16 modules
make modules_install
#modprobe -r deadman
depmod -a
modprobe deadman
lsmod | grep dead
cat /proc/misc | grep dead
ls -lF /dev/deadman
mkinitrd --force /boot/initramfs-`uname -r`.img `uname -r`

Node tuning

on all nodes but quorum

#cp -pi /etc/lvm/lvm.conf /etc/lvm/lvm.conf.dist
#vi /etc/lvm/lvm.conf
#{hosttags = 1}

ls -lF /usr/local/cmcluster/bin/
cat /etc/cmcluster.conf
vi ~/.bashrc 

PATH=$PATH:/usr/local/cmcluster/bin
source /etc/cmcluster.conf

source ~/.bashrc
cp -pi /etc/man_db.conf /etc/man_db.conf.dist
vi /etc/man_db.conf

MANDATORY_MANPATH                       /usr/local/cmcluster/doc/man

and check

which cmquerycl
man cmquerycl

handy symlink

cd ~/
ln -s /usr/local/cmcluster/conf

Quorum Installation

on the quorum server only

cd /root/lala/RedHat/RedHat7/QuorumServer/x86_64/
rpm -ivh serviceguard-qs-A.12.40.00-0.rhel7.x86_64.rpm

ls -lF /usr/local/qs/conf/
vi /usr/local/qs/conf/qs_authfile

sg1
sg2

rpm -ql serviceguard-qs-A.12.40.00-0.rhel7.x86_64
ls -lF /var/log/qs/
systemctl list-unit-files | grep qs
systemctl status qs.service
systemctl start qs.service
systemctl enable qs.service
pgrep -a qs
netstat -lntup | grep qs

Cluster setup

on all nodes but quorum

vi /usr/local/cmcluster/conf/cmclnodelist

sg1 root
sg2 root

on one node only

cd /usr/local/cmcluster/conf/
ls -lF cmcluster.conf.default
cmquerycl -h
cmquerycl -q qs -n sg1 -n sg2 -C cluster1.conf.full

we are getting some warnings are we did not define dedicated hearbeat network nor redundant NICs

- 2 or more heartbeat networks OR
- 1 heartbeat network with local switch (HP-UX Only) OR
- 1 heartbeat network using APA with 2 trunk members (HP-UX Only) OR
- 1 heartbeat network using bonding (mode 1 or 4) with 2 slaves (Linux Only)

but it does not prevent us from proceeding

egrep -v '^[[:space:]]*(#|$)' cluster1.conf.full > cluster1.conf
vi cluster1.conf

define a cluster name (cannot start with cluster, don’t ask me why)

CLUSTER_NAME            sgcluster1
HOSTNAME_ADDRESS_FAMILY         IPV4
QS_HOST                 qs
QS_POLLING_INTERVAL     300000000
NODE_NAME               sg1
  NETWORK_INTERFACE     eth0
    HEARTBEAT_IP        x.x.x.x
NODE_NAME               sg2
  NETWORK_INTERFACE     eth0
    HEARTBEAT_IP        x.x.x.x
MEMBER_TIMEOUT          14000000
AUTO_START_TIMEOUT      600000000
NETWORK_POLLING_INTERVAL        2000000
SUBNET 10.1.1.0
  IP_MONITOR OFF
MAX_CONFIGURED_PACKAGES         300
ROOT_DISK_MONITOR               OFF
USER_NAME       ANY_USER
USER_HOST       ANY_SERVICEGUARD_NODE
USER_ROLE       MONITOR

check

cmcheckconf -C cluster1.conf

populate the ASCII conf among the farm e.g. here from sg2 to sg1

scp cluster1.conf sg1:/usr/local/cmcluster/conf/

and apply

cmapplyconf -k -C cluster1.conf

the cluster should now be status down but configured

cmviewcl -v
cmquerycl -v

Ready to go

all nodes but quorum

start and enable the daemon at boot time (idealy in a row…)

systemctl start cmcluster.init.service
systemctl enable cmcluster.init.service

join the cluster at startup

cp -pi /usr/local/cmcluster/conf/cmcluster.rc /usr/local/cmcluster/conf/cmcluster.rc.dist
vi /usr/local/cmcluster/conf/cmcluster.rc

AUTOSTART_CMCLD=1

reboot the cluster nodes

shutdown -r now

Operations

cmviewcl -v
cmquerycl -v

Resources

HPE Serviceguard for Linux Trial Software https://www.hpe.com/emea_europe/en/resources/servers/serviceguard-linux-trial.html

Serviceguard Quorum Server https://h20392.www2.hpe.com/portal/swdepot/displayProductInfo.do?productNumber=B8467BA

HPE Serviceguard for Linux Repositories https://downloads.linux.hpe.com/SDR/project/sglx/

Subscribe your system to the sglx-base repository https://downloads.linux.hpe.com/SDR/project/sglx/sglx-base/

Serviceguard for Linux Continentalcluster Patches / Updates https://downloads.linux.hpe.com/SDR/project/sglx/sglx-cc/

Subscribe your system to the sglx-qs repository https://downloads.linux.hpe.com/SDR/project/sglx/sglx-qs/

HPE Serviceguard for Linux - Product documentation https://h20195.www2.hpe.com/v2/default.aspx?cc=us&lc=en&oid=376220

Use of a Quorum Server as the Cluster Lock https://docstore.mik.ua/manuals/hp-ux/en/B3936-90078/ch01s06.html https://docstore.mik.ua/manuals/hp-ux/en/B3936-90078/ch01s06.html

Serviceguard Quorum Server https://h20392.www2.hpe.com/portal/swdepot/displayProductInfo.do?productNumber=B8467BA

HPE Serviceguard for Linux Trial Software https://www.hpe.com/emea_europe/en/resources/servers/serviceguard-linux-trial.html

Repository Listing https://downloads.linux.hpe.com/SDR/repo/sglx-qs/rhel/7/x86_64/current/

Serviceguard for Linux - The cmviewcl Command Does Not Work, if we Can Not Use the rpm Command! https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02898778

tutorials

HP service-guard installation for linux on a single cluster node https://linuxindiablog.wordpress.com/2014/06/09/hp-service-guard-installation-for-linux-on-a-single-cluster-node/

HP service-guard installation for linux on a single cluster node http://ramblingsofanindiantechie.blogspot.com/2014/06/hp-service-guard-installation-for-linux.html

리눅스 service guard https://m.blog.naver.com/woomun/220266431105

misc troubleshooting

Serviceguard for Linux - The cmviewcl Command Does Not Work, if we Can Not Use the rpm Command! https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02898778

deadman

HP Serviceguard for Linux - cmclconfd: Unable to Open /dev/deadman https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02753088

LINUX - What Is the Purpose of LINUX Serviceguard Deadman Driver? https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c03630952

retpoline

CONFIG_RETPOLINE: Avoid speculative indirect branches in kernel https://cateee.net/lkddb/web-lkddb/RETPOLINE.html

How to check if Linux kernel is “Retpoline” enabled or not? https://unix.stackexchange.com/questions/435778/how-to-check-if-linux-kernel-is-retpoline-enabled-or-not

rhel vs centos

Converting system between RHEL and CentOS https://negativo17.org/converting-system-between-rhel-and-centos/

Converting CentOS 6 to RHEL 6 https://www.endpoint.com/blog/2011/12/22/converting-centos-6-to-rhel-6

RedHat Version file https://www.centos.org/forums/viewtopic.php?t=41699

How to get CentOS 8/7/6, Fedora 29-25, RHEL 8/7/6 release version https://computingforgeeks.com/how-to-get-centos-fedora-rhel-release-version/

Accidentally deleted /etc/redhat-release file https://unix.stackexchange.com/questions/209820/accidentally-deleted-etc-redhat-release-file


Nethence | Pub | Lab | Pbraun | SNE Russia | xhtml