Highly Available Crontab

Disclaimer: The information here discusses making changes to the HACMP configuration, no warranty, suitability, functionality is guaranteed, proceed at your own risk.

Lee Elston, Matilda Systems


Purpose:

This is in response to a customer request for modification to HACMP to add in crontab control. It seems that there is two separate issues, first the issue of having a cron job launched when the application is present on another node and secondly keeping the crontab entries synchronized.


Thoughts & Ideas:

  1. Cron uses cron.allow and cron.deny files. These are checked before a crontab entry is processed. Altering the allow and deny files provides instant gratification. Adding or changing actual cron files outside of the crontab will not update the cron processing. (renamed files to deny'ed users still run). If changes are made to the files the cron daemon need an incentive to re-read the cron files

  2. In large shops many contractors come and go giving assistance to the customer, although skilled in their specific area there seems to be tendency not to be HA aware. Many times cron entries are added or changed without considering HACMP. The requirement to make cron jobs HA aware is too large of a request as it would involve educating too many vendors, better to make HACMP more aware of its environment. To address the first issue of having cron jobs attempted while the resource group is absent two new event scripts and a control file have been added. The first event script scans the control file for userid's associated with HACMP application servers. It then adds these users to the “/var/adm/cron/cron.deny” file effectively stopping those cron jobs from running. When an application server is started the control file is scanned, if there is a match on the application server starting and the control file the associated users are removed from the cron.deny file allowing their cron processes to run.

  3. Crontab synchronization. HACMP provides a capability of keeping files synchronized in version 5.2, file collections. The entries are added on a file name basis and HACM will keep the files the same. There may be a time lag between the two systems but that is adjustable. The only catch here is cron may not notice that the file has been altered. To compensate for that the script that allows cron jobs for specific users to run will also give the cron daemon a kick in the process to restart it.

  4. A planned feature of HACMP 5.3 is to perform synchronization to on coming nodes if required, this can enhance the HA crontabs by ensuring consistency between the crontabs entries.


The task at hand


The first iteration of HA crontabs we will have to set things up carefully. If this becomes a popular feature I will consider making it more robust and use HACMP type ODM structures, who knows maybe IBM will add this to HACMP.

  1. A pre-event must be added to node_up_local to reset the crontabs. The logic here is if we are running node up then this node was not running and had no resources and no crontab entries related to HACMP application servers should be present.

  2. A post event to the application_start event must be added to allow the crontab files for the specific application that just started. Putting the cron in the post event allows us to check to see if the application start was successful. Note that an unsuccessful application start is application motoring's problem not ours. If application monitoring decides to move the application server we will comply.

  3. A pre-event to application_stop must be added to disable the current HA crontab entries.

  4. The most time consuming part of this installation is to create a file collections group for the crontab entries. Unfortunately at this time we must list each file separately we wish to have synchronized to the other cluster nodes. ** Note: there is an option to have a cron job added to the system to automatically manage keeping the HACMP file collections complete for the files in the crontab directory, as the author I'm not sure if that is a good idea.

  5. Lastly create the control file that links cron.deny user ids to the application server name.

  6. Copy the files to all nodes (or use file collections), synchronize the cluster and test throughly.


FAQ

Q: Does this customization cause support issues with IBM?

A: No, the facility for adding pre and post events to HACMP has existed for many years, few customers actually use this feature. The most important thing to remember is to make sure your event scripts handle necessary error conditions and return a zero exit code back to HACMP. As with any customization document it and provide support with the information up front when reporting problems. Consider additions to clverify to document and verify the customizations. (we can discuss that another day)