My desktop computer was pieced together with parts purchased almost four years ago. It has six SATA ports, and I discovered (after the warranty period) that two of those ports (one of the three SATA controllers) was unreliable. I've got four SATA drives in this box, plus the IDE boot drive. Those SATA drives hold my digital photos, so they're frequently filling up and getting replaced or augmented.
Fast forward to last month. I took one of my old USB backup drives out of that role (where it had worked flawlessly) and tried to add it to the desktop box using one of those flaky ports (blame it on my flaky memory). This is a Western Digital Caviar Green 2TB drive. The filesystem on that drive got corrupted a few days later, and I remembered the SATA port issue. I pulled the smallest drive out of the box & moved this new drive (the largest in the box) onto that port. I assumed all would be well. It was not.
The filesystem got corrupted again after a couple days in use. I tried rearranging ports, swapping cables, etc. The simple SMART tests showed everything was fine. Without any indication that this drive was at fault, I didn't think I had any chance at a warranty replacement (the drive was only half way through its three year warranty). Besides, this drive had worked fine until I first plugged it into the flaky SATA port.
Fortunately, I had the good sense to move my internal mail server off this box to another machine before things got really bad.
The Asus M2N-SLI Deluxe motherboard BIOS (like the board itself) was four years old, so I figured I'd upgrade to the most recent BIOS and see if that affected anything. It did... unfortunately.
Most manufacturer BIOS update utilities are meant to run under Windows or from a DOS bootable floppy. To run the update under Linux, you must use the "flashrom" utility. It's available as a package on many distributions. First, save a copy of the current BIOS onto some removable media like a USB flash drive:
$ flashrom -r backup_bios.bin
Then you write the new BIOS onto the device:
$ flashrom -wv new_bios.bin
DON'T INTERRUPT THIS PROCESS, because if it fails, you just bricked your mobo. Mine succeeded, so I rebooted to see it it helped. Upon reboot, I got through the power-on self test (POST), but it machine refused to boot off either the internal drive or a CD.
I assumed that I needed to revert to the old BIOS. Since the machine wouldn't boot, I needed to do this one from removable media. I don't have a floppy drive, so I made my USB thumb drive bootable using the makebootfat utility (again, as an OS package) on my Linux laptop and following the descripton on this web site.
I found out later that this mobo's BIOS features the "EZ Flash 2" option, which allows you to update the BIOS off a removable device directly from the BIOS setup menu. Pretty slick.
Alas, once I got the flash drive ready, the machine refused to even POST. The chassis fans would spin up, but no beep... nothing. That's bad.
I assumed that the BIOS update didn't work as well as I'd hoped. I tried unplugging all the hard drives, removing all the PCI cards except the vid card, and removing all but the minimum amount of RAM. Still no POST. Finally, I unplugged the power supply and removed the CMOS battery for a minute. Upon reinserting it and plugging everything back in, everything booted up just fine again. Looks like the CMOS reset is what it needed. Probably should have done that right after the first BIOS update.
I rebooted and reinitialized the previously-flaky hard drive again. I spent the time to run a 9-hour extended SMART test over night, and unlike the simple SMART test, this one showed three parameters in the "pre-fail" state. You can run that test via he "Disk Utility" gui utility, but I did it through the GSmartControl tool, which is also available as a package on many distributions and allowed me to save the results to a file. Armed with that pre-fail information, I phoned Western Digital, and quickly had a replacement hard drive on its way under warranty. I just had to pay $6 return shipping for the dead unit, which I can live with. WD's support rep was very friendly and easy to work with, and never batted an eye at my failure claim.
That's where I am right now. I'm hoping the new drive arrives before the weekend so that I can spend it getting it inserted into the array and everything extracted off backup. FYI, it takes about nine hours to extract 2TB off a backup drive. I can't wait to be past this ordeal, as everything else has been at a standstill while I couldn't trust my home filesystem. Thank goodness for backups.
While I'm on the subject, many people have their favorite brand of hard drive, and won't go with anything else. I actually prefer to have a variety of different devices, just so I don't get stuck with a bad lot of drives. I currently have drives from WD, Seagate, Samsung, and Hitachi in my desktop box alone -- not including the various laptops and rack-mount servers around the house. I haven't seen an unusual failure rate from any of them until this one failed WD drive. I've still got other WD drives in service that are doing just fine. My oldest drive is my Seagate 120GB IDE boot drive, which has racked up 6.6 years of powered-on time and is still going strong.
Incidentally, "green" drives are great for low-power, low-heat environments like USB drives, but they suck as the primary drive in an active system. They're SOOOO SLOWWW! I wouldn't buy one specifically for that role again, but since I already had it on hand, it wasn't worth buying a new one.
If you've got any questions or comments on the process above, please speak up in the comments below. I'm always happy to help.
No comments:
Post a Comment
Please leave your comment below. Comments are moderated, so don't be alarmed if your note doesn't appear immediately. Also, please don't use my blog to advertise your own web site unless it's related to the discussion at hand.