Matilda Systems Corporation | High Availability Resources
Mon, December 6, 2004   

 
Recent Events




Quick Links

What We Do

High Availability Resources

Where we are located

Kids Zone

The wrong way to create a mksysb tape

Home >  Resources > HACMP Resources Collection > The wrong way to create a mksysb tape


This page is an informal part of the Matilda Team's HACMP Resources Collection. The home page of the collection is located here.


This is a true story. The name of the author has been withheld to protect the guilty.

Let's begin . . .

I've no idea if there are any gems of wisdom here. I am sure that there are lessons of some sort buried in here somewhere.

Some background is probably in order. My mission was to recover a cluster node with a trashed ODM (and other problems). The sick node was still bootable but had been messed up in ways that I wasn't interested in trying to fix since reloading the OS would be quicker, easier and more likely to end up with something stable.

My plan was to create a mksysb of the surviving node and use it to restore the sick node. I also didn't want to spend a whole lot of time doing this as I had other things that I wanted to do with the cluster once it was back up and running.

I should point out that the cluster in question was a test cluster consisting of a pair of rather old "classic" MCA RS/6000s. This somewhat explains some of the rather informal steps that I took and also explains (in ways that I won't get into here) why the sick node was sick.

Step one

Since part of the plan was to keep things simple, I decided to use an old (non-IBM) SCSI tape drive that I happened to have handy. The first attempt came to a sudden halt when I (re-)discovered that the built-in SCSI on my RS/6000 model 380 is fast-wide single ended. Although I've got a lot of SCSI cables in my basement, I don't have any 68-pin fast wide ones. I decided to put off the project until I could purchase an appropriate SCSI cable.

Step one - phase II

Just before heading out the next day to buy an appropriate cable, I had a closer look at the fast-wide SCSI connector on the model 380. It turns out that it is a wide (i.e. 16-bit) variant of a somewhat strange connector that IBM used back in the Microchannel days (somewhat like a shrunken Centronics printer connector). It's vaguely but not quite like the sort-of Centronics-like SCSI connectors that IBM is using on some of the newer pSeries systems.

I've no idea how much such a cable costs but it wouldn't be cheap. I also wasn't likely to be able to just walk into a store somewhere and buy such a cable "off the shelf".

Sidebar:
Here's a list of the SCSI connector types that are on the external SCSI cables that I own (in approximately oldest to newest order):
  • 50 pins in three staggered rows in a really big D shaped connector (used on Sun 3's about 15 years ago)
  • 50 pin PC-style "Centronics" (looks like a large Centronics printer connector; used on ISA bus SCSI controllers back in the ISA days)
  • 50 pin MCA PC-style Centronics connector (used by IBM on MCA SCSI adapters (which are too small for the conventional PC-style Centronics style))
  • 25 pin square connector (used on Apple Powerbooks)
  • 25 pin D connector (looks exactly like the computer end of a PC parallel port cable; used by Apple)
  • 50 pin mini-DIN (often called "the SCSI-2 connector"; used on a lot of systems)
  • 68 pin mini-DIN (used for fast-wide SCSI in many situations)
This doesn't include the collection of different internal SCSI cables that I've got lying around the house.

Here's a list of the connector types which I know exist but which I don't have:

  • 68 pin PC-style Centronics (what's on my model 380)
  • 68 pin mini-Centronics (used on newer IBM 43Ps and others (for LVM busses?))
  • 50 pin small PCCard connector (used on PCMCIA/PCCard SCSI controllers)
This definitely reminds me of the line:
The great thing about standards is that there are so many to choose from.

Step two

I now had a choice - I could remove the tape drive from its case and turn it into a temporary internal tape drive or I could "borrow" an 8-bit MCA SCSI controller from another old RS/6000. Not liking the idea of possibly damaging the tape drive's case (it's fairly tightly molded plastic), I opted for the second alternative. Sure enough, ten minutes later I had a SCSI controller on the model 380 that I could plug the tape drive into.

I booted up the machine, brought it down to single user mode and wrote the mksysb tape.

Step three

I shut down the model 380 and moved the tape drive to the "sick" RS/6000 model 370. Turning the key to "service" so that it would boot from the tape drive, I powered on the machine.

It came up in full multi-user mode (this was a clue that I would eventually realize was important).

Now, anyone who's done much with the old MCA RS/6000s knows that they have a three position key switch on their front panel. Booting the machine with the key set to "service" causes the machine to boot from the tape or CD-ROM drive if they contain bootable media and causes the machine to boot into a diagnostics mode otherwise. Booting with the key set to "normal" causes the machine to boot into multi-user mode. Setting the key to "secure" causes any attempt to boot the machine to fail (i.e. the machine is "secure"). The other fact to keep in mind is that mksysb tapes generated on MCA-style RS/6000s are bootable.

Well, I misread the situation and concluded that the reason that it had come up in multi-user mode was because the tape drive that I'd chosen was somehow incompatible with the boot firmware in the model 370.

Step four

No problem - I've got an old 8mm tape drive that was originally an internal tape drive in the model 380 that we're talking about (i.e. it doesn't have compatibility issues to worry about). I'd removed it from the model 380 and put it into an external case since an external 8mm tape drive seemed more useful at the time. As I was soon to discover, this was the first time that the 8mm tape drive had been used since being put into the external case . . .

It took a few minutes to remember where I'd put the 8mm drive and get it connected to the model 380 but before long I was ready to proceed.

I booted up the model 380 and quickly discovered that it thought that there were seven 8mm tape drives installed on the machine. Fortunately, I've been playing the SCSI game long enough to know that this meant that the tape drive's SCSI id was the same as the SCSI controller's SCSI id (i.e. they were both set to 7). This was notwithstanding the fact that the SCSI id selector at the back of the case was set to 3.

Step five

I shut down the model 380, powered everything down, opened up the external case that the 8mm tape drive was in and flipped around the jumper cables that connect the tape drive to the SCSI id selector on the back of the external case (I'd put the jumper cables on backwards months earlier but hadn't actually used to drive again until this night).

Powering everything back up and booting AIX, AIX now understood that there was actually only one tape drive installed. I brought the system down to maintenance mode and made sure that all of the file systems that I wanted to backup were still mounted (I got burned by that once and I try to not repeat the same mistake too many times since there are lots of new mistakes that havn't been made yet).

Everything was in order so I started the mksysb command. I soon had a mksysb tape of the model 380 on a data-grade 8mm tape (keeping in mind that "soon" when applied to anything involving an 8mm tape drive is a term which bears little relationship to the conventional English definition of "soon").

Things were starting to look up . . .

Step six

I then shut down the model 380, powered everything down and removed the tape drive.
Sidebar:
With the exception of devices which are explicitly designed to be hot-pluggable, never connect or disconnect anything from a SCSI bus when any device on the bus is powered up.

I've destroyed a Jazz drive by removing a SCSI drive from the Jazz drive's SCSI bus while the Jazz drive was turned on. I've also heard that it is possible to blow the fuses on a SCSI controller by doing this.

Moving over to the model 370, I shut it down and powered it off. I then removed the model 370's external CD-ROM drive and replaced it with the external 8mm tape drive.

Checking that the key is set to "service", I booted the 370. A few minutes later, the 370 is up in multi-user mode (i.e. it didn't boot from the 8mm tape drive). Logging in as "root", I ask AIX where it thinks the key was set to at boot time. AIX reports that the key was set to "normal".

Step seven

Suspecting that maybe the tape drive has the wrong SCSI id, I set the tape drive's SCSI id to 4 since that is what the model 370's external IBM CD-ROM drive had been set to and I knew that I could boot the model 370 from the external CD-ROM drive since I'd done that a number of times. (the fact that the bootlist command on AIX 5L insisted on reporting absolutely nothing when I tried to query the model 370's bootlist didn't improve my mood).

I flipped the key back and forth between "service" and "normal" a few times to convince myself that the key was properly set to "service". I then powered everything back up and booted the machine. A few minutes later, it was back up in multi-user mode.

Step eight

Starting to suspect that there might be something physically wrong with the lock, I open up the case and take a look. Sure enough, it turns out that the three-position electrical sensor on the back of the lock isn't firmly attached to the lock. Taking a very close look, I discover that turning the key doesn't change where the sensor thinks the key is set to. AIX was reporting that the key was set to "normal" since that's where the sensor believed it was set to regardless of where the key was actually set to. This must have been a recent development as I had booted this machine from the CD-ROM only a few weeks earlier.

I re-seated the sensor, put the cover back on the machine, turn everything back on and booted the machine.

A few minutes later, the machine booted into diagnostic mode (i.e. the key sensor was now working properly but the boot firmware didn't believe that the tape drive had bootable media).

Step nine

Starting to suspect that maybe AIX 5L doesn't create bootable mksysb tapes, I cabled the external CD-ROM drive onto the SCSI chain after the tape drive and set the tape drive back to SCSI id 3. The plan was now to reload the OS from the mksysb by first booting via the CD-ROM drive (i.e. mimic how a mksysb restore is done on PCI-based RS/6000s).

I inserted the bootable AIX 5L CD into the CD-ROM drive, made sure that the mksysb tape was properly loaded into the tape drive, powered everything back on and booted the machine.

A few minutes later, it was back up in diagnostic mode.

Step ten

Taking a close look at things, I discovered that the 8mm tape drive wasn't plugged into the model 370 (I had disconnected the tape drive to get it out of the way when I'd opened up the case to check on the lock). Since the CD-ROM drive was connected to the 8mm tape drive, the CD-ROM drive wasn't connected to the system unit either.

I powered down the 370, plugged in the tape drive, removed the bootable AIX 5L CD and the CD-ROM drive from the chain (making sure that the chain was properly terminated), turned everything back on and rebooted.

SUCCESS! The model 370 booted from the tape drive and I was able to restore from the mksysb.

IMPORTANT: there were a number of tasks that needed to be completed once the OS was reloaded and before HACMP could be started on the model 370. Hopefully, these steps will be described elsewhere on this site someday.

Conclusion

Sigh!

IMPORTANT: If you lack the appropriate skills, experience and/or competency, are unwilling to take responsibility for your actions, or if you don't like these disclaimers then don't use this information.