Matilda Systems Corporation | High Availability Resources
  

 
Recent Events




Quick Links

What We Do

High Availability Resources

Where we are located

Kids Zone

Shell Script Snippets

Home >  Resources > HACMP Resources Collection > Shell Script Snippets

This page contains various shell script snippets (segments) which may prove useful within customization scripts (eg. application start and start scripts) for IBM's AIX-based HACMP product. The snippets are intended to be used in Korn Shell (/bin/ksh) scripts although they'll probably work in Bourne Shell (/bin/bsh) scripts unless indicated otherwise.

These snippets are intended to operate on AIX and were developed and tested on AIX 4.3.2 and/or AIX 4.3.3. Some of the examples below may not work on other Unices or older versions of AIX.

If you'd like to help improve this list of shell snippets, click here.

This page is part of the Matilda Team's HACMP Resources Collection. The home page of the collection is located here.

IMPORTANT: read the disclaimer BEFORE you use any information provided in this collection.


Is my shared volume group online?

The following sequence will determine if the sharedvg volume group is currently online (often useful in application start scripts):
if lsvg -o | grep -q -w sharedvg ; then
    echo sharedvg is online
else
    echo sharedvg is offline
fi
Note the use of the -w option on the grep invocation - this ensures that if you have a sharedvg and a sharedvg2 volume group then the grep only finds the sharedvg line (if it exists).

If you need to do something if the volume group is offline and don't need to do anything if it is online then use this:

if lsvg -o | grep -q -w sharedvg ; then
    :	# null commmand if the volume group is online
else
    echo sharedvg is offline
fi
Some people don't like the null command in the above example. They may prefer the following alternative:
lsvg -o | grep -q -w sharedvg
if [ $? -ne 0 ] ; then
    echo sharedvg is offline
fi
Although we're not particularily keen on the null command in the first approach, we really don't like the use of $? in if tests since it is far to easy for the command generating the $? value to become separated from the if test (a classic example of how this happens is if you add an echo command immediately before the if command when you're debugging the script). If we find ourselves needing to test the exit status of a command in an if test then we either use the command itself as the if test (as in the first approach) or we do the following:
lsvg -o | grep -q -w sharedvg
rval=$?
if [ $rval -ne 0 ] ; then
    echo sharedvg is offline
fi
In our opinion (your's may vary), this makes it much more obvious that the exit status of the grep command is important and must be preserved.

Starting a non-root process from within an application start script

A common requirement in an application start script is the need to start a program and/or shell script which is to be run by a non-root userid. This snippet does the trick:
su - dbadmin -c "/usr/local/db/startmeup.sh"
        
This will run the startmeup.sh script in a process owned by the dbadmin user. Note that it is possible to pass parameters to the script/program as well:
su - dbadmin -c "/usr/local/db/startmeup.sh PRODDB"
This runs the startmeup.sh script with a parameter indicating which database is to be started.

A bit of formalism never hurts when it comes time later to do script maintenance. For example, use shell variables to specify the username and the command to be invoked:

DBUSER=dbadmin
DBNAME=PRODDB
STARTCMD="/usr/local/db/startmeup.sh $DBNAME"
su - $DBUSER -c "$STARTCMD"
This makes it easy to change the username, database name or start command (this is particularily important if any of these appear more than once within the application start script).

The double quotes around $STARTCMD in the su command are necessary as the command to be executed must be passed as a single parameter to the su command's -c option.

Killing processes owned by a user

A common requirement in application stop scripts is the need to terminate all processes owned by a particular user. The following snippet terminates all processes owned by the dbadmin user (this could be part of an application stop script that corresponds to the previous snippet that started the DB as dbadmin).
DBUSER=dbadmin
kill ` ps -u $DBUSER -o pid= `
Since a simple kill is rarely enough and a kill -9 is a rather rude way to start a conversation, the following sequence might be useful:
DBUSER=dbadmin
kill ` ps -u $DBUSER -o pid= `
sleep 10
kill -9 ` ps -u $DBUSER -o pid= `
To see how this works, just enter the ps command. It produces output along these lines:
12276
12348
Note that equal sign in the pid= part is important as it eliminates the normal PID title which would appear at the top of the column of output. I.e. without the equal sign, you'd get this:
  PID
12276
12348
Passing PID to the kill command is just a bad idea as writing scripts which normally produce error messages makes it much more difficult to know if things are working correctly.

Terminating processes which are using a filesystem

Yet another common requirement in application stop scripts is the need to terminate all processes using a filesystem to ensure that the unmount of the filesystem works cleanly. The following snippet will terminate (with a SIGKILL signal) all processes using the filesystem in the /dev/sharedlv logical volume:
fuser -k /dev/sharedlv
Reminder: make sure that if you use fuser in this fashion, that you specify the name of the filesystem's logical volume and not the filesystem's mount point. If you specify the mount point then you'll kill only processes which happen to be using the mount point directory itself (i.e. you won't get an error message but you won't get the desired result either).

Note that one should not use fuser to query entities residing on an NFS mounted filesystem as the fuser will hang if the NFS server is unavailable. Of course, if your cluster node is relying on a currently down NFS server then you have far bigger problems than this to worry about.

A more complete example of an application stop script

A common requirement in application stop scripts is the need to terminate all processes owned by a particular user. For example, a script along the following lines could be used to first gently and then forcibly terminate the database processes started in the previous example:
#!/bin/ksh

DBUSER=dbadmin
STOPCMD="/usr/local/db/stopdb.sh"

# ask nicely
su - $DBUSER -c "$STOPCMD"

# wait twenty seconds and then get rude
sleep 20
kill ` ps -u $DBUSER -o pid= `

# wait ten more seconds and then get violent
sleep 10
kill -9 ` ps -u $DBUSER -o pid= `

# terminate any processes using our two shared filesystems
fuser -k /dev/sharedlv1
fuser -k /dev/sharedlv2

# make sure that our exit status is 0
exit 0
        

Determining if a process exists

Sometimes you need to check if a particular process still exists. If you know the process's pid then the following will do the trick (assumes that the shell variable $pid contains the process's pid):
if kill -0 $pid 2> /dev/null ; then
    echo process $pid still exists
fi
This takes advantage of a little used feature of Unix signals. Sending signal 0 to a process never affects the process but the sender of the signal is told if the signal would have been delivered if the signal number had been greater than 0. In the context of the kill command, the command succeeds with an exit status of 0 if a signal could have been delivered and fails with an error message and a non-zero exit status if the signal couldn't have been delivered (the purists will argue that an exit status of 0 from the kill command indicates that the signal was delivered; they'll then point out that the delivery of signal 0 to a process doesn't do anything; Does anybody really care?).

Note that you are never allowed to send signals to processes that you don't own unless you're root so make sure that the above test is performed as root or that you own the process which you're checking for (unless you're not root and are really interested in whether or not you own the process in which case the snippet above does the job quite nicely).

Supporting lots of IP addresses within a resource group

(the need for this hack has been largely eliminated by the IPAT via IP aliasing feature introduced in HACMP 4.5)

Here's a shell script and example control file that allows you to support multiple IP addresses as "resources" within a resource group. Use a symbolic link to give the script two names - mkalias and rmalias. Describe the IP addresses that you need setup in the config file.

The script is somewhat long so I've put it in a separate file here (this version replaced the old version on 2001/09/07). The example control file is

# alias      | HACMP Base  data                        |   futures  
# IP-Address | IP-Label  | IP-Address  | Netmask       | Client | Server 
#-------------------------------------------------------------------------
192.168.100.1  appsvc     192.168.12.1  255.255.255.0      XX 	XX
192.168.100.2  appsvc     192.168.12.1  255.255.255.0      XX 	XX
        
More detailed notes may appear here shortly but the general idea is:
  • define a service IP address in each resource group that needs aliases.

  • create an alias.data file that lists the aliases that you need associated with each service IP address.

  • edit the script to point at the alias.data file.

  • define the rmalias script variant as a pre event for the swap_adapter_complete, acquire_takeover_addr and acquire_service_addr events.

  • define the mkalias script variant as a post event for the swap_adapter_complete, acquire_takeover_addr, acquire_service_addr and the various release_*_addr events.
Test the resulting cluster very carefully.

This is still a "work in progress". For example, it doesn't yet handle a DARE operation. Please send us ideas for improvements and check back in a while for a better version (and hopefully a detailed explanation of how it works).

 

IMPORTANT: If you lack the appropriate skills, experience and/or competency, are unwilling to take responsibility for your actions, or if you don't like these disclaimers then don't use this information.