Friday, October 05, 2007

Hard Drive Failures

I know I haven't posted in a little while. My last post was the day before my South Haven show. The following weekend I was sick, and then the weekend after that was my Northville show. I planned to post an update here, but I had a few things to catch up on around the house, so that came first. Then, to completely distract me, my computer suffered a hard drive failure. And not just a single failure, but 2 hard drives at once.

In the many years I've been using computers, I've been lucky enough to never experience a hard drive failure. I've had several drives that worked reliably through as much as 8 years of use, with never a single problem. Yet, I knew my luck had to be running short, and with my photography business starting to take off, I figured a hard drive failure would be a bad thing. Even if I had all my data backed up, the time to get a new drive, reinstall applications, and restore all the data would be a terrible distraction if it happened in the middle of the show season.

Protecting data: Backups and RAID

To protect myself from such a problem, in addition to backing up all my data periodically, I also bought a second drive and utilized the RAID controller on my motherboard.

For those not familiar with the concept of RAID, it basically means having multiple hard drive appear to the computer as a single hard drive. There are several ways to configure a RAID system. Some give you better performance, some give you data redundancy, and some give you both.

I had my computer set up with a RAID-1 configuration, which means that both hard drives were identical copies (known as mirrors) of each other. Every last file is exactly the same at all times. If one of the drives fails, you can remove it from the system and continue to use your computer with no data loss. Then, you get another drive (ASAP!), put it in the computer, and all the data gets copied from the old hard drive to the new one and you are protected once again.

One risk you run is that you could lose both hard drives at once, so you definitely need some sort of backup plan in addition to RAID to handle that. All of my data was burned to DVDs periodically. So I was prepared for the worst, but I certainly wasn't expecting it. I mean, what are the chances of both drives failing at once?

Really....what are the chances?

Well, one day when I powered up my machine, something in the computer was making a strange sound. I thought it was a fan, but determined it was actually a hard drive. So I powered down, made sure the connections and fasteners were all tight, and then turned my computer back on. That hard drive started making a clicking sound and kept restarting the computer before it could even get a chance to start booting Windows. So I took the drive out, and just as I expected, the RAID system worked. My computer booted up and everything was intact.

I started looking into my options for replacing the drive, trying to see if I should buy the same model, a different brand of the same size, or maybe take the opportunity to upgrade to a larger size. The next morning, when I turned on my computer, a few programs complained that their config files were corrupted. Windows screwing up a file isn't exactly unheard of (and maybe it had something to do with the failed drive causing errors) so I ran scandisk to fix the problem and thought little of it. Later that day, I ended up with more problems. I knew that was more than coincidence, so I ran a surface scan on the disk and it found a bad sector and fixed it (marked it as unusable). That was concerning me, but I figured it was fixed so that was that. However, a few hours later it began finding several more errors. I rand scandisk again and it found more bad sectors, and again it fixed them. A few hours later, the same thing happened again. The hard drive was obviously toast.

Why did this happen?

So, how exactly does something like this happen? My best guess is that my RAID controller (which is built into the motherboard) isn't a very good one. I suspect that the second drive to fail had actually failed first, several months before, but that the RAID controller was detecting the problem, "fixing" it by getting the data from the other drive, and then (most important of all) not bothering to tell me that there was any sort of problem. Once the good drive failed, it was no longer able to cover up the problem and things broke down quickly from there.

The end result

Anyway, I've got 2 new hard drives now and I've got most things restored. It looks like I didn't lose any of my photography related stuff. Any of that stuff that did get damaged had already been backed up. I only ended up losing a tiny bit of data from a single non-important file that is updated once a month and hadn't been backed up recently.

Even for that one file....though it isn't important, it would be nice to have, so I have an ace up my sleeve. Though the data was ruined on the drive with the bad sectors, I suspect the copy on the other drive (the one that wont start up) is probably intact. Buy how do I get it if the drive wont start up? Well, there is a well know and often successful trick for getting data off of a drive that fails in such a manner. Put the drive in a ziplock bag, put it in the freezer for 24 hours, and then when it's nice and cold, quickly plug it into the computer, boot it up and copy the data. It doesn't always work, but often does. Its not known for sure why this works, but the common belief is that when the metal gets cold and contracts ever so slightly, it's enough to get something to line up where it wouldn't before. Whatever the reason, if this trick ends up working, you usually have anywhere from 5 to 30 minutes to get the data off the drive before it stops working again. Even then, people have reported success with refreezing it and repeating the process several times, getting a little more data each time.

Anyway, with things starting to get back to normal, I hope to make a few posts soon about how the end of my show season turned out.

No comments: