Sometimes events make me feel like there is just a smidge of Nostradamus in me.
Two years ago in the fall, I came home to my dormroom and was greeted by a slew of error messages containing the same content emanating from my Linux box named Araval: { DriveReady SeekComplete Error }. I mostly ignored them, and didn’t understand their extreme importance to me until much too late. For one split second I was greeted by the Windows Recovery Angel and he recommended that I restart my PC to correct the error.
Don’t get me wrong, if Araval was running Windows this advice would be good, solid advice, but this was Linux. Linux rarely—if ever—needs to be rebooted. I power cycled it and poof, it refused to recognize my 4 month-old 80GB IBM Deskstar hard drive. This was the drive that contained my music, movies, and my /home partition. For those of you who are not *NIX folks, the /home partition is roughly the equivalent of a superfolder that holds the My Documents folder and personal settings for all of the users. Good thing my windows partition, /boot partition, and linux system partitions were still operational. I was able to quickly bring my system back up to speed to where I could at least start browsing the internet to understand what had just happened to me.
Those { DriveReady SeekComplete Error } error messages that I was ignoring were actually indicators that my hard drive had been acting a bit suicidal and was now standing on a window ledge on the empire state building. The moment I turned off the computer was the last moment that hard drive would ever spend alive.
I stumbled across a forum where there was a discussion running about the IBM Deskstar line of hard drives. Many of them had nicknamed them DeathStar drives due to their tendency towards pushing up daisies. These hard drives have a fundamental flaw that causes them to completely exhaust their internal sector-redirect tables. When those tables fill up, my favorite error message begins to broadcast, and the drive will only work as long as it is currently powered-on.
I should rewind for a moment and explain this table idea. The hard drive manufacturers know that random things will befall a drive over the course of its life, and that a few dozen sectors on that drive will become faulty over time. This would ordinarily mean that the drive would start attempting to store data in bad locations of the drive and that data would be lost forever. Modern drives can detect when something wonky happens to a sector and it makes an entry in the redirect table pointing the old sector to some newly chosen sector. With the table in place, any access to the bad sector is not permitted since the access is simply performed somewhere else.
I learned a lesson from this ordeal: always have recent backups. I lost a ton of data when that drive died since I had been neglectful of backing up my personal files. It felt like a part of me had died along with Mr. Deskstar. I immediately purchased 2 60GB Maxtor drives and a RAID level 1 card to give me a bit more peace of mind for the future (a RAID-1 array sends data to 2+ drives whenever data is sent to a single logical area in the system, essentially guarding against a hard drive failure with a constant backup).
Over the course of the past three days, I’ve been thinking about building a new computer to effectively upgrade Araval to something more modern. Just before dinner today, I bought a majority of the core components such as the CPU, the motherboard, and the graphics card. At approximately 10pm this evening, my case started emitting a strange clicking sound that sounded strangely like a fast-switching relay. Also, two major file read/write operations were running on the RAID disks when my IM client froze and my kernel gave me a { DriveReady SeekComplete Error } error again, this time followed with a { Unrecoverable } secondary error code and indications that the error was occuring in RAID drive 1.
Naturally, my first gut-reaction (more like the reaction you get when an alarm clock sound is played and you cringe in physical and psychological pain) was to NOT REBOOT THE COMPUTER—at least until the source of the offending sound was located. Unfortunaely, I did not locate it before the system completely froze forcing a shutdown.
After my shock, I started to think about what little I knew of the problem: a) the error was in only one of the RAID drives b) the sound was very mechanical and was sourced in the middle front of the case. I got myself to believe that if the error were in the hard drives, it was most likely a fatal health issue and also it only affected one of the two mirrored RAID drives. It took about 4 hours of screwing, unscrewing, cleaning, and transplanting to determine that I have some slightly-more-than-mild data corruption on my windows drive due to the sudden shutdown and that RAID drive 1 had suffered a mechanical failure.
Since the RAID card’s BIOS provides a feature to rebuild the first disk once I can find a replacement, I can have the array up and running as good as new in about two shakes of a lamb’s tail. Ten minutes spent on Maxtor’s website yielded me a free replacement for the damaged drive that should arrive next week.
Most times when planning for the worst, be it buying insurance policies, buying a fireproof safe, filling your spare tire before vacation, or backing up your data you never actually expect the worst to happen. Ironically in my case, 2 years after investing in a RAID-1 array for “insurance” purposes it has performed beautifully and saved my data, just as I had planned.
On a more humorous note, I think that Araval was jealous that I was building a new computer.