Friday, May 30, 2014

Fixing a linux software RAID degraded at boot

I have a lot of storage space on my Linux desktop box at home.  The bulk of it is spread across two mdadm software RAID 5 partitions:  one 3-disk array and one 5-disk array.  Occasionally, I've found that the 5-disk array comes up in degraded mode after the machine boots.  After much frustration, I think I finally found the fix for this.

The problem seems to date back to 2011, at least in Ubuntu.  I'm currently running Ubuntu 13.10 (saucy).  The system will start to boot fine, but will hang during the boot process.  If you've got the right display settings, you'll see the message:

"WARNING: There appears to be one or more degraded RAID devices"

and you'll be given the option to boot anyway in degraded mode or drop into a recovery shell so you can fix the problem.  Since my problem array housed neither my root disk nor my home partition, I booted anyway in degraded mode so that I could fix the problem in a sane user environment.

Running "mdadm --detail /dev/md127" indicated that 3 of the 5 disks in my RAID 5 had been dropped from the array.  Not good.

# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Mon Dec 30 08:52:36 2013
     Raid Level : raid5
     Array Size : 5860022272 (5588.55 GiB 6000.66 GB)
  Used Dev Size : 1465005568 (1397.14 GiB 1500.17 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Sat May 24 10:33:51 2014
          State : clean 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : tatooine:photo  (local to host tatooine)
           UUID : 970006f0:3c77ee31:aa4bb4f9:e883b5d0
         Events : 243

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      removed
       3       8       65        3      removed
       5       8      145        4      removed


A little research indicated that the drives were most likely fine, and that this was due to a timing issue where the drives simply weren't ready yet when the kernel tried to assemble the RAID array.  The fix is to add a couple lines to the initialization script, "/usr/share/initramfs-tools/scripts/mdadm-functions".  In the "degraded_arrays()" function, add the two red lines at the top:

degraded_arrays()
{
    udevadm settle
    sleep 5
    mdadm --misc --scan --detail --test >/dev/null 2>&1
    return $((! $?))
}

This gives the drives in your array time to get their act together before the system tries to add them to the RAID.  On my next reboot, the drives all came back as expected.

No comments:

Post a Comment

Please leave your comment below. Comments are moderated, so don't be alarmed if your note doesn't appear immediately. Also, please don't use my blog to advertise your own web site unless it's related to the discussion at hand.