Hello!
I have a huge problem with my file server (OpenSuse 11.3 - 64bit, kernel-2.6.34.7-0.7-default). I’ve just installed an Intel SASUC8I card, connected 3 of the 7 Samsung 2TB drives I have to it and after about one hour, it dropped 2 of the disks.
I’ve managed to trace the problem to the card BIOS, which I’ve replaced with the non-raid edition, so it should now work fine with the kernel raid now.
The problem is that I can’t find a way to “un-fail” these 2 disks. I’m more than positive, that these drives are just fine, only the controller was misbehaving. The dropout also couldn’t have created any data inconsistency either, since the 2 drives dropped out virtually at the same time and there was no writing being done at the time.
I’ve tried add/re-add, I get either mdadm: cannot get array info for /dev/md0 or mdadm: add new device failed for /dev/sdi1 as 7: Invalid argument (depending on the raid being run or being stopped, in either case, mdstat reports it to be inactive)
For a normal or forced assemble, I get mdadm: /dev/md0 assembled from 5 drives and 1 spare - not enough to start the array.
I’ve been googleing like crazy, also trying to get info from mdadm’s help and man, but nothing seems to deal with such a freak accident.
An other interesting thing is, that if I reboot the system, mdstat shows md0 as inactive, but lists all the devices with no flags. It’s only after a run command, that it changes to the 5 remaining devices, all with (S) flags.
Alternatively: does anyone know where device failure info is stored? If I could in some way remove this information from the system (even by reinstalling the OS), I should be able to reassemble the array… Or is it stored in the member drive super-blocks?
About 80% of this array’s data is backed up, so if all else fails, I can restore most of its content, but I’d much prefer to reassemble this one as a whole, since there was absolutely no chance of data corruption.
Any and all help is much appreciated! Tank you!