Failed Software RAID-5

Hello all,
I set up a server running openSuSE 11.1 a few weeks back for my own use at home. It was configured with 4x1TB hard drives and I used software raid to set these four drives into a raid-5 array. This morning I turned on the server to find it had re-booted itself, possibly from a power failure or brownout, and failed to boot back up. The software raid is giving me an error and I have spent all of today searching the internet with a way to get the array back up but come up with nothing. Any help would be appreciated. Here is some info on what is going on:

  • physically the hard drives are fine - I did a full sector-by-sector scan on each one using Western Digitals SMART utilities and they all passed without a flaw.
  • /dev/md0 is a 16gb raid-0 array for swap consisting of /dev/sda2, /dev/sdb2, /dev/sdc2, /dev/sdd2 - this array works fine
  • /dev/md1 is a 3TB raid-5 array for the OS and Data consisting of /dev/sda3, /dev/sdb3, /dev/sdc3, /dev/sdd3 - this is the array giving me issues
  • the /boot partition is /dev/sda1

When I boot I receive this error message and then I am dumped into the SH shell early in the boot process. I have access to some basic command line utils like mdadm but lots of stuff is missing.


md: md1 stopped.
md: bind<sdb3>
md: bind<sdc3>
md: bind<sdd3>
md: bind<sda3>
raid5: device sda3 operational as raid disk 0
raid5: device sdd3 operational as raid disk 3
raid5: device sdc3 operational as raid disk 2
raid5: device sdb3 operational as raid disk 1
raid5: allocated 4288kB for md1
raid5: raid level 5 set md1 active with 4 out of 4 devices, algorithm 0
RAID5 conf printout:
 --- rd:4 wd:4
 disk 0, o:1, dev:sda3
 disk 1, o:1, dev:sdb3
 disk 2, o:1, dev:sdc3
 disk 3, o:1, dev:sdd3
md1: bitmap file is out of date, doing full recovery
md1: bitmap initialisation failed: -5
md1: failed to create bitmap (-5)
mdadm: failed to RUN_ARRAY /dev/md/1: Input/output error.
invalid root filesystem -- exiting to /bin/sh
$

Here is the info for my array and the four disks attached to it:


$mdadm -D /dev/md1
/dev/md1:
        Version : 1.00
  Creation Time : Thu Jun 18 15:15:16 2009
     Raid Level : raid5
  Used Dev Size : 972462464 (927.41 GiB 995.80 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sun Jun 28 10:50:07 2009
          State : active, Not Started
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 128K

           Name : linux:1
           UUID : e9a0da25:0bce6c41:0330678f:44257bba
         Events : 6303

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       4       8       51        3      active sync   /dev/sdd3

$mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : e9a0da25:0bce6c41:0330678f:44257bba
           Name : linux:1
  Creation Time : Thu Jun 18 15:15:16 2009
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1944925016 (927.41 GiB 995.80 GB)
     Array Size : 5834774784 (2782.24 GiB 2987.40 GB)
  Used Dev Size : 1944924928 (927.41 GiB 995.80 GB)
   Super Offset : 1944925272 sectors
          State : clean
    Device UUID : eabd52cf:8b404ce5:57000a4a:74617399

Internal Bitmap : -233 sectors from superblock
    Update Time : Sun Jun 28 10:50:07 2009
       Checksum : 5858713e - correct
         Events : 6303

         Layout : left-asymmetric
     Chunk Size : 128K

    Array Slot : 0 (0, 1, 2, failed, 3)
   Array State : Uuuu 1 failed

$mdadm --examine /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : e9a0da25:0bce6c41:0330678f:44257bba
           Name : linux:1
  Creation Time : Thu Jun 18 15:15:16 2009
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1944925016 (927.41 GiB 995.80 GB)
     Array Size : 5834774784 (2782.24 GiB 2987.40 GB)
  Used Dev Size : 1944924928 (927.41 GiB 995.80 GB)
   Super Offset : 1944925272 sectors
          State : active
    Device UUID : dbe61e90:6a957602:8ad6d54c:b561a4f6

Internal Bitmap : -233 sectors from superblock
    Update Time : Sun Jun 28 10:50:07 2009
       Checksum : 964bc582 - correct
         Events : 6303

         Layout : left-asymmetric
     Chunk Size : 128K

    Array Slot : 1 (0, 1, 2, failed, 3)
   Array State : uUuu 1 failed

$mdadm --examine /dev/sdc3
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : e9a0da25:0bce6c41:0330678f:44257bba
           Name : linux:1
  Creation Time : Thu Jun 18 15:15:16 2009
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1944925016 (927.41 GiB 995.80 GB)
     Array Size : 5834774784 (2782.24 GiB 2987.40 GB)
  Used Dev Size : 1944924928 (927.41 GiB 995.80 GB)
   Super Offset : 1944925272 sectors
          State : active
    Device UUID : cbd92876:238d1eb7:4bc2e26e:ca7d581a

Internal Bitmap : -233 sectors from superblock
    Update Time : Sun Jun 28 10:50:07 2009
       Checksum : 76beb802 - correct
         Events : 6303

         Layout : left-asymmetric
     Chunk Size : 128K

    Array Slot : 2 (0, 1, 2, failed, 3)
   Array State : uuUu 1 failed

$mdadm --examine /dev/sdd3
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : e9a0da25:0bce6c41:0330678f:44257bba
           Name : linux:1
  Creation Time : Thu Jun 18 15:15:16 2009
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1944925016 (927.41 GiB 995.80 GB)
     Array Size : 5834774784 (2782.24 GiB 2987.40 GB)
  Used Dev Size : 1944924928 (927.41 GiB 995.80 GB)
   Super Offset : 1944925272 sectors
          State : active
    Device UUID : b8eaa407:238e9d80:7b8e7eef:a7b56aaa

Internal Bitmap : -233 sectors from superblock
    Update Time : Sun Jun 28 10:50:07 2009
       Checksum : e267cdfe - correct
         Events : 6303

         Layout : left-asymmetric
     Chunk Size : 128K

    Array Slot : 4 (0, 1, 2, failed, 3)
   Array State : uuuU 1 failed

Note: /dev/sdd3 says it is using slot 4 but the valid slots are 0,1,2,3,failed - this looks off to me. It also says the state of the array has 1 failed HDD but I cant find which one it is and the result from mdadm -D /dec/md1 says none are failed?

I have tried forcing the array to mount but the result is the same as when booting:

$mdadm -A -f /dev/md1
md: md1 stopped.
md: bind<sdb3>
md: bind<sdc3>
md: bind<sdd3>
md: bind<sda3>
raid5: device sda3 operational as raid disk 0
raid5: device sdd3 operational as raid disk 3
raid5: device sdc3 operational as raid disk 2
raid5: device sdb3 operational as raid disk 1
raid5: allocated 4288kB for md1
raid5: raid level 5 set md1 active with 4 out of 4 devices, algorithm 0
RAID5 conf printout:
 --- rd:4 wd:4
 disk 0, o:1, dev:sda3
 disk 1, o:1, dev:sdb3
 disk 2, o:1, dev:sdc3
 disk 3, o:1, dev:sdd3
md1: bitmap file is out of date, doing full recovery
md1: bitmap initialisation failed: -5
md1: failed to create bitmap (-5)
mdadm: failed to RUN_ARRAY /dev/md/1: Input/output error.

Does anyone have any suggestions on what else I could try to get this array up so I can get my data off if it? This is my first experience using MD. I have used a lot of hardware raid solutions but never software before. I am beginning to get afraid my data may be gone forever. Any help would be very much appreciated. Even if I cannot recover my data, finding a cause would be very nice so that I have some information on which to decide if I should stick with MD or go to hardware raid.
Thank you,
David

I got exactly the same problem here. The server was working ok, but it just freezed and became unavailable. I needed to manually shut-down (cutting the energy). On reboot md2 (RAID1) became unavailable with the same error as yours.

I have the schema:
/dev/md0 - /dev/sd(abcd)1 - RAID1 (boot)
/dev/md1 - /dev/sd(abcd)2 - RAID0 (swap)
/dev/md2 - /dev/sd(abcd)3 - RAID1 (/) *FAILS ON BOOT
/dev/md3 - /dev/sd(abcd)4 - RAID10 (file server)

When booting with live CD, all MD, except md2, start ok. I could even backup data from md3.

When looking for mdadm --detail /dev/md2 it shows the the spare as “active, not started”. If I ask for details for each sd(abcd)3 some shows “active” and others “clear”.

As if it is RAID1, if I mount sda3 manually it shows all the data. So it is still there in the drives. When trying to start the drive (even using --force) it says either “no superblock” or “mdadm failed to RUN_ARRAY /dev/md2: input/output error”, but it mounts with the status shown above “(active, not started)”.

It seems a common error in Linux (or opensuse), ´cause it´s the second time it happens in less than 3 months. The other time it was a RAID5 and I lost all data. I though that with RAID1 I would be safer, but it is the one that is messed.

I hope anyone can help us.
Thanks,