Lost Raid after power outage

I’m running SUSE 11.1-64 with a 2-disk software Raid1.

A power outage caused all sorts of hard disk problems. fsck fixed all the issues exept for the raid. Now I get the following message during boot:

md: md0: raid array is not clean -- starting background reconstruction
md: raid1 personality registered for level 1
raid1: raid set md0 active with 2 out of 2 mirrors
md0: bitmap file is out of date, doing full recovery
md0: failed to create bitmap (-5)
mdadm: failed to RUN_ARRAY /dev/md/0: Input/output error
                                                         failed
Checking file systems...
fsck 1.41 (01-Sep-2008)
fsck.ext3: Invalid argument while trying to open /dev/md0
/dev/md0:
The superblock could not be read or does not describe a correct ext2 filesystem.  If the devic is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck ith an alternate superblock:
    e2fsck -b 8193 <device>
bootsplash: status on console 0 changed to on
                                                        failed

I tried running e2fsck as suggested, but it gives me the same “suberblock could not be read” error.

This Raid is where I have my important data I don’t want to lose (which is why it’s on a RAID).

Any advice on how to retreive my data would be greatly appreciated.

Update:

After extensive web searching, I found that “releasing” the raid with “mdadm --stop” makes is so that the individual disks can be mounted with an ext3 flag.

This means I can at least read the data on either disk (and make sure it’s all backed up) so now my challenge is to configure these 2 working disks to work as a Raid 1 configuration. (Preferably without losing all my data in the process).

This this possible? I’ve done a bunch of research on the mdadm command, but haven’t found anything that allows me to form a raid out of two existing disks without reformatting one or both of them.

Thanks again.

I would recommend that you find whichever drive has the best copy of the data, and then keep that one for the recovering of data to the next

fail the other drive and remove it from the array
do an
mdadm --zero-superblock on the bad drive.

Then add the drive back to the array.

Thanks mark54g. It’s weird because I never received a message saying one or the other drive was bad, only an error stating that the raid disk (md0) was bad. As far as I know both drives are fine… Is there a way to check which one is affected?

Can you please verify that these are the steps I should take, with raid as md0 and sdd1 as one of the devices:

  1. Use mdadm to fail one of the drives (mdadm /dev/md0 --fail /dev/sdd1)
  2. Use mdadm to remove that device from the array (mdadm /dev/md0 --remove /dev/sdd1)
  3. Zero out the device (mdadm --zero-superblock /dev/sdd1)
  4. Add device back into array (mdadm /dev/md0 --add /dev/sdd1)

Just curious, but what would happen if I choose the wrong drive to do this with? Also, what would happen if I used the mdadm --assemble command to build the array from the existing disks without doing the --zero, etc.?

Sorry for all the questions, and thanks again for your reply.

if you don’t have the superblock data, then it should not assemble without a force and even then, not sure.

I thought you were going to mount the drives individually to inspect them. That would be the best way to figure out which drive to keep

Also, run smartctl -a on the drive to look for errors or failures.

the --zero-superblock does not zero the drive, but rather takes away the raid information on the drive. It basically strips it of its RAID identity.

If both drives are good, then you are simply doing a time costly resync

When I mount the drives individually they show no errors. I’ll run smartctl -a and see what I get.

Thanks again!

O.K., here’s where I am:

smartctl says “No Errors Logged” for both drives.
fsck and e2fsck for both disks returns “clean”
fdisk has no errors

Rebuilt the array per your suggestion and let it resync. Now:

“cat /proc/mdstat” says the raid is active
“mdadm -D /dev/md0” returns “Superblock is Persistent”
“mdadm --examine /dev/sdc1” says all is good for both drives.
fsck says O.K. for md0
fdisk says O.K. for md0

md0 is mountable and readable without errors.

I thought I was good to go, but when rebooted it still gave me the superblock error and again booted into maintenance mode. md0 was not built. I repeated the above steps, but upon reboot the md0 device was not recognized even though it was mountable and readable after the assembly prior to reboot.

At this point I dissolved the raid. I’ve mounted one of the two disks, and will use the other for periodic backups. (I know I probably should have had more patience, but I’ve been unable to boot into X for 2 days and have things I need to do…)

I’m frustrated because the main reason I decided to do a raid was for security, but it seems to me that the raid is what caused all the trouble I’ve been through for the past few days. (Evidenced by the fact that all the disks included in the raid are error free with all data intact, the other non-raid disks that were affected by the power outage were running again after a simple repair, and several other people in my google searches that have very similar issues with similar setups that they were not able to resolve.)

Thank you again for all your help mark54g. Perhaps I’ll get a hardware RAID controller and try again sometime, but for now this software raid seems to have a lot of issues.