RAID5 crash, how to recover?

Hello!

I have a huge problem with my file server (OpenSuse 11.3 - 64bit, kernel-2.6.34.7-0.7-default). I’ve just installed an Intel SASUC8I card, connected 3 of the 7 Samsung 2TB drives I have to it and after about one hour, it dropped 2 of the disks.

I’ve managed to trace the problem to the card BIOS, which I’ve replaced with the non-raid edition, so it should now work fine with the kernel raid now.

The problem is that I can’t find a way to “un-fail” these 2 disks. I’m more than positive, that these drives are just fine, only the controller was misbehaving. The dropout also couldn’t have created any data inconsistency either, since the 2 drives dropped out virtually at the same time and there was no writing being done at the time.

I’ve tried add/re-add, I get either mdadm: cannot get array info for /dev/md0 or mdadm: add new device failed for /dev/sdi1 as 7: Invalid argument (depending on the raid being run or being stopped, in either case, mdstat reports it to be inactive)

For a normal or forced assemble, I get mdadm: /dev/md0 assembled from 5 drives and 1 spare - not enough to start the array.

I’ve been googleing like crazy, also trying to get info from mdadm’s help and man, but nothing seems to deal with such a freak accident.

An other interesting thing is, that if I reboot the system, mdstat shows md0 as inactive, but lists all the devices with no flags. It’s only after a run command, that it changes to the 5 remaining devices, all with (S) flags.

Alternatively: does anyone know where device failure info is stored? If I could in some way remove this information from the system (even by reinstalling the OS), I should be able to reassemble the array… Or is it stored in the member drive super-blocks?

About 80% of this array’s data is backed up, so if all else fails, I can restore most of its content, but I’d much prefer to reassemble this one as a whole, since there was absolutely no chance of data corruption.

Any and all help is much appreciated! Tank you!

Got it all back after working with it continuously for like 10-12 hours:)

This was the right command (the disk order is a bit messed up, but it does matter!):

mdadm --create /dev/md0 --assume-clean --level=5 --metadata=1.0 --chunk=1024 --parity=left-asymmetric --raid-devices=7 /dev/sdh1 /dev/sd[cdefi]1 /dev/sdg1

I’ve created the original raid5 with the yast installer, so it was a bit of a pain, to track down all these non-default params…

The biggest help was this page: https://raid.wiki.kernel.org/index.php/RAID_Recovery#Restore_array_by_recreating_.28after_multiple_device_failure.29, maybe I should have started here in the first place :slight_smile:

If someone is interested in the subject, I think I can be of further assistance. I’ve got into this real deep, this is the relevant essence of my research.

You can get into quite a pickle with software raid, but I think it’s still more recoverable than hw raid :slight_smile: