Tuesday, November 27, 2012

New computer backup hardware

My day job is computer programming and system administration, usually at small to medium sized businesses.  As such, backing up data and planning for disaster recovery is party of my job description.  As you'd expect, that aspect of my job flows over to my home network as well.  I've used the same overly-redundant backup scenario for my computer data for several years now.  I finally outgrew my current hardware recently and had to tweak my setup a little bit to help it scale better at a reasonable cost.  Read on for details.

I outlined my previous setup in an earlier post on the Prairie Rim Images blog.  In short, all of the machines in my house get backed up to my desktop workstation / server.  It's the big dog on the block, with more storage space than all the others put together.  The main drives in that machine get backed up every night to another set of drives in that same computer which are usually off-line during normal operation.  If a primary drive fails, I could simply mount the backup drive & continue working.

To protect against theft or destruction of my entire house, I also had two encrypted USB hard drives that I kept at my office across town.  I'd alternate which one I brought home each night to sync up with my workstation.  Worst case, if my house disappeared in a puff of blue smoke, I'd still have one copy of my data at the office that was no more than a couple days old.

Fast forward to 2012, when my disk usage is growing by about 750GB per year, thanks largely to the RAW+JPEG images from the 18MP camera I use in my photography side job.  The state of the art in single-spindle hard drives was barely keeping pace with my off-site storage needs, and my current 3TB drives were 98% full.  I could have replaced them with 4TB drives, but that would have lasted me just over a year... and what would I do with the 3TB drives?  The near-line backup drives inside my workstation were also full, and the machine had no more empty SATA ports or drive bays to which I could add more drives (like the 3TB ones I had just outgrown in the USB enclosures).  In fact, I was still using a small, slow, 7.5-year-old IDE hard drive as my boot disk because I didn't have any free SATA ports to use for a new one.  It was time for the incremental upgrades to stop.

I toyed with some more expensive solutions that would have provided plenty of convenience with no additional storage purchases for the next 4 or so years.  However, given some other expenses in our lives (like the new house) I couldn't bring myself to shell out that kind of cash just for a little convenience.

The solution upon which I settled is actually a little simpler than my old solution, and only cost me about $400 (down from $1300+ for the more convenient option).

New Vantec NexStar MX enclosure
First, I had to address the off-site backups.  I decided to get a pair of dual-bay USB3.0 / eSATA drive enclosures, namely the Vantec NexStar MX NST-400MX-S3R (review here).  These will reportedly handle a 4TB drive (the largest currently available) in each bay, and can present the two drives as individuals, concatenate them (JBOD), stripe them (RAID 0), or mirror them (RAID 1).  They cost about $80 each, and get stellar reviews.

Into each Vantec, I placed one of the 3TB drives from my old USB enclosures, plus one of the 1.5TB drives that used to be combined to form my in in-workstation backup.  Striping the pair would have given great performance, but would have capped me at 3TB (double the smallest drive), which is no better than I had with the old setup.  I instead concatenated them to create a single 4.5TB filesystem.  When that fills up in a couple years, I'll replace the little 1.5TB drives (which will be getting rather old anyway) with something larger.  Dual 3TB drives would give me 6TB per enclosure, which should last me another four years at my current rate of increase.

[Update: In October 2014, Vantec informed me that this enclosure will handle dual 6TB drives.  They merely update their published specs to match the largest drives available at the time.  I'm now running one 4TB and one 6TB drive, giving me 10TB in a single enclosure.]

Curiously, I have found that I get better throughput from the enclosure if I have the drives configured as JBOD than if they're stripped together.  That sounds like a shortcoming in the enclosure's firmware.  When stripped across two drives, I'm only getting about 80 MB/s, but when a JBOD configuration is reading off just one drive at a time, I'm getting over 100 MB/s.

The only down side of the Vantecs is that they're about an inch thicker than my old, single-bay USB enclosures, so they won't fit inside the protective case in which I transport them from home to work.  The solution to this problem proved more elusive than expected, and will be detailed in another post.  Basically, I added internal padding to a $10 lunchbox.

Removing the old backup drives from my workstation freed up two SATA ports.  Previously, a 2-disk stripe (1.5TB + 2TB disks) housed my /home filesystem (filled mostly with photos).  I augmented that with a third 2TB disc, giving me a 5.5TB /home partition.  Again, that should last me at least three years before I need to add more spindles.  Because I'm short on SATA ports and drive bays, I chose to configure the three drives as a stripe (RAID 0) instead of RAID 5.  I've already got a couple backups of this data, and nothing I do at home is so critical that I can't tolerate a day of down time if I lose a disk & have to rebuild the filesystem.  Besides, a 3-spindle stripe is really bloody fast--on par with a typical SSD drive, but considerably larger and cheaper.

A word about hard drive choice is in order.  I try to keep a variety of drive models & manufacturers in my system to protect me in case a design flaw causes one type of drive to fail catastrophically.  I'm currently running drives from Western Digital, Hitachi, Seagate, and Samsung.  I've used Maxtor and IBM in the past.  As of today, IBM became Hitachi, which is now owned by WD.  Seagate owns Samsung, so there's really only three major manufacturers of spinning media out there today.

During the last year, I've had to replace one WD and one Samsung (post-buyout) drive.  My experience with WD support was wonderful.  Seagate's support was a nightmare.  You can read all about that in an earlier post.  I'm not inclined to give Seagate any of my money any time soon.

In addition, if you look closely at the hard drive specs, Seagate appears to be catering toward the cheap, careless masses, while WD is designing for a slightly more discerning client√®le.  Seagate's warranties are typically only 1 or 3 years, and the drives are priced dirt cheap.  For the new 2TB drive I put into my workstation, I chose a WD Caviar Black drive, which has a 5-year warranty and costs almost twice as much as the cheapest Seagate 2TB drive.  As painful as it was to deal with Seagate customer support, the last thing I want is to make it an annual occurrence.

All that said, one of my USB enclosures now contains a WD and a Hitachi.  The other houses a WD and a Seagate.  The /home stripe in my workstation is comprised of two WD's (different models) and one Seagate.  I think that's sufficiently diverse.

New Kingston SSD and WD Caviar drives
If you're keeping score at home, you know that I still had one free SATA port.  I used that to replace my 7.5-year-old (that's about 180 in drive years) Seagate 120GB IDE (PATA) boot disk with a new Kingston HyperX 3K 120GB SSD drive.  At $80, it cost about 67 cents per GB, compared to 8 cents per GB for the 2TB WD Caviar.  SSD's are still pretty expensive, but the good ones are amazingly fast.  Do pay attention to the performance specs, because not all SSD's are created equal.  A good SSD can push 500 MB/s.  A good spinning disc is down below 150 MB/s.  Cheap versions of either can cut those numbers in half.  After installing the new OS on the SSD drive, my machine now boots in just 14 seconds.

The one place in this setup where I do want near-real-time redundancy is for the boot disk on my workstation.  I do still run a few public services off this workstation, and even if my /home partition pukes, I still want to be able to log in to correct the problem or mount my USB backup drive instead.  By moving my boot disk to a SATA port, I can bring in another newer (but still old) 400GB IDE drive to act as an in-machine warm spare for the boot disk.  It'll get synced up nightly whether I tell it to or not, and I can boot from it simply by unplugging the other boot disk.

"Holy cow," you're saying.  "Why did you spend $400 and go to all that trouble ferrying drives around instead of just putting your backup on the cloud?"

Fair question.  I actually used that system long ago, before I did much photography.  Let's do a little math.  I currently have about 3TB of data that needs to be backed up, and I add about 750GB of new data every year.  On a heavy weekend of photography, I can shoot over 50GB of images.  At my current home network upload speed (upload is usually far slower than download), it would take me 7.2 days to upload 50GB of data, and that's if nobody in my house used our Internet connection for anything else during that time.  Uploading the initial 3TB would take 434 days.  Yes, you can often ship a hard drive to the cloud managers in order to prime the pump, but then you incur the expense of a hard drive.  I'd also have to upgrade to a considerably faster business-class Internet connection to make this setup work.  My annual cost would then be far more than what I've currently invested in USB drives.

Oh, and don't forget that you'll then have to download all 3TB of this data again if you ever need to restore a drive.  That'll take 11 days over a 25 Mbps downlink.  How badly to you need to get access to that data?  Do you think your file ownership, time stamps, and permissions will be retained during that process?  I'm betting not.

I know not everybody has as much critical data as I do.  Unless you're into photography or video editing, most heavy users are just archiving movies or things that they can easily re-rip from the original optical media if they lose a disk.  The vast majority of home computer users probably measure their critical data in GB rather than TB.  For those people, cloud storage works quite well.  However, for those heavy users like me, I hope my setup provides some insight into how to backup your own data.

If you've got any more specific questions, or if you'd like to share your own backup scenario, please speak up in the comments below.

1 comment:

  1. Agree, back when my WD HD died eons ago, it was just in the warranty period, and return (but not recovery) to WD was totally painless. And they promptly shipped out a replacement, without a fuss.


Please leave your comment below. Comments are moderated, so don't be alarmed if your note doesn't appear immediately. Also, please don't use my blog to advertise your own web site unless it's related to the discussion at hand.