Cluster Monitor
Recently
I have had another instance of a customer with a very reliable HACMP
environment having outages. It goes like this:
The systems run in
a blacked out computer room, no operators messing about, and the
backup node failed for some reason. This in itself is not a problem
but no one noticed, for a month. When the primary system encountered
a hardware fault the resources had no place to fail to. There was a
total system outage. So I put together a little script to check to
see if this is the last node in the cluster and if it is send email
to someone to complain about being lonely. We set this in root's
crontab file to run every couple of hours. So just for fun here is
the script, please change the email address I don't want lots of
clusters complaining they are lonely.
Cheers lee
#!/usr/bin/ksh
HADIR=/usr/es/sbin/cluster/utilities
NODE=`$HADIR/get_local_nodename`
LOG=/tmp/ClusterMon.log
MAILTO="lee@matildasystems.com"
OTHERNODE=`$HADIR/clgetactivenodes -n $NODE | grep -v $NODE`
if [[ -z $OTHERNODE ]]
then
mail -s "Message from ClusterMon" $MAILTO <<data
Help!
I seem to be all alone.
My other node is missing.
Please investigate.
Thanks ClusterMon
data
else
TS=`date`
echo $0" "$TS "I seem to have the company of $OTHERNODE. " >> $LOG
fi