Two 1000GB HDD in RAID-1 causes boot failure

Please excuse a long first post to this forum. Better to include too
much system information than too little.

Summary

I’ve had a frustrating week trying to replace two 500GB drives in a
RAID-1 array with new 1000GB drives. All the things I’ve tried from
numerous Google searches have failed to find a solution.:’(

The system will not boot when I have the two 1000GB drives in place,
failing because of an alleged bad superblock on /dev/md0, despite
/dev/md0 being used in the booting process.


fsck.ext3: Invalid argument while trying to open /dev/md0
/dev/md0:
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
   e2fsck -b 8193 <device>

fsck.ext3: /dev/md0 failed (status 0x8). Run manually!

Yet I can boot successfully when I have installed either 1000GB drive
with either 500GB drive. :open_mouth: Files should be the same across all four
HDD because of the RAID synchronisation.

Context

I’m running openSuSE 10.3 (planning to upgrade after the HDD
replacements) 2.6.22.19-0.2-default kernel.

The RAID arrays are:


DEVICE    content         fs          Component partitions
/dev/md0  /root           ext3        /dev/sda1 /dev/sdb1
/dev/md1  /               ext3        /dev/sda2 /dev/sdb2
/dev/md2  swap            linux-swap  /dev/sda3 /dev/sdb3
/dev md3  LVM user space  reiserfs    /dev/sda5 /dev/sdb5

as created by the hardware supplier, except my reformatting of the
user area.

Here’s my /etc/fstab.


/dev/md1             /                    ext3       acl,user_xattr        1 1
/dev/md0             /boot                ext3       acl,user_xattr        1 2
/dev/md2             swap                 swap       defaults              0 0
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/fd0             /media/floppy        auto       noauto,user,sync      0 0
/dev/phaethon/home   /home                reiserfs   acl,user_xattr        2 1
/dev/phaethon/soft   /soft                reiserfs   acl,user_xattr        2 2
/dev/phaethon/data   /data                reiserfs   acl,user_xattr        2 3
/dev/phaethon/drdata /drdata              reiserfs   acl,user_xattr        2 4

Now I’ve propagated RAID-1 arrays by failing and removing each of the
partitions on one 500GB HDD with mdadm, physically replacing that HDD
with a 1000GB replacement, add and resync each partition with mdadm.
This seemed to work fine. Here’s my /etc/mdadm.conf.


DEVICE partitions
ARRAY /dev/md0 level=raid1 UUID=f9be8b7b:f75c07ab:929d8e0b:6cb74544
ARRAY /dev/md1 level=raid1 UUID=b064c616:e00cac44:c9361c99:cb68b594
ARRAY /dev/md2 level=raid1 UUID=f034e886:7f5fc3e9:0a675f97:3b727903
ARRAY /dev/md3 level=raid1 UUID=03b7460a:2e03b65c:27620c58:f58f9a0d

I also ran **grub **with the normal


root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)

as given in various HOWTOs.

After the failure in the repair-filesystem mode /proc/mdstat excludes
/dev/md0, and only / is mounted. If I reactivate /dev/md0 with say

mdadm -Ac partitions /dev/md0 -m dev

/dev/md0 reappears in /proc/mdstat and I can mount it.

Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] 
md3 : active raid1 sdb5[3]
      465202100 blocks super 1.0 [2/1] [U_]
      bitmap: 154/444 pages [616KB], 512KB chunk

md0 : active raid1 sdb1[0] sda1[1]
      136448 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[3]
      20972784 blocks super 1.0 [2/1] [U_]
      bitmap: 85/161 pages [340KB], 64KB chunk

md2 : active(auto-read-only) raid1 sdb3[3] sda3[2]
      2104500 blocks super 1.0 [2/2] [UU]
      bitmap: 0/9 pages [0KB], 128KB chunk

unused devices: <none>

I notice that the bitmap and superblock from the original /dev/md0 on
the 500GB drives is gone. That might be me using

mdadm --zero-superblock /dev/sda1
mdadm --zero-superblock /dev/sdb1

as suggested in a few places to cure this RAID boot problem.

As far as I can tell there is nothing wrong with the /boot partitions on
either 1000GB drive.


Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000ed1bb

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          17      136521   fd  Linux raid autodetect
/dev/sda2              18        2628    20972857+  fd  Linux raid autodetect
/dev/sda3            2629        2890     2104515   fd  Linux raid autodetect
/dev/sda4            2891      121601   953546107+   5  Extended
/dev/sda5            2891       60806   465210238+  fd  Linux raid autodetect
/dev/sda6           60807      121601   488335806   8e  Linux LVM

The partitions are identical for /dev/sdb1.

I can mount /boot manually from either /dev/sda1 or /dev/sdb1,
and **fsck **checks on those individual partitions in /dev/md0 show no
problems. Indeed I can boot successfully from those partitions on
either 1000GB drive, provided the other HDD in the RAID-1 is either
of the 500GB drives. (Note this was only possible after I had applied
a patch to /lib/mkinitrd/scripts/setup-md.sh as shown below.)


#!/bin/bash
#
#%stage: softraid
#
mdblockdev=

for bd in $blockdev ; do
    # get information about the current blockdev
    update_blockdev $bd
    mdconf=$(mdadm -Db $bd 2> /dev/null | sed -n "s@/dev/md[0-9]*@/dev/md$blockminor@p")
    if  -n "$mdconf" ] ; then
	md_tmpblockdev=$(mdadm -Dbv $bd 2> /dev/null | sed -n "1D;s/,/ /g;s/^ *devices=\(.*\)/\1/p")
	md_dev=${bd##/dev/}
	mdblockdev="$mdblockdev $md_tmpblockdev"
	eval md_conf_${md_dev}=\"$mdconf\"
	md_devs="$md_devs $md_dev"
	root_md=1
    else
	mdblockdev="$mdblockdev $bd"
    fi
done

blockdev="$mdblockdev"

if  -n "$root_md" ] ; then
    need_mdadm=1
    echo "DEVICE partitions" > $tmp_mnt/etc/mdadm.conf
    ( -s /etc/mdadm.conf ] && egrep -v '^DEVICE ' /etc/mdadm.conf
    for md in $md_devs; do
        eval echo \$md_conf_$md 
    done
    ) | sort -u >> $tmp_mnt/etc/mdadm.conf
fi

save_var need_mdadm
save_var root_md

What’s baffling me is that is that booting fails only if I have both
1000GB drives installed.

I’ve tried the steps described in Lost Raid after power outage - openSUSE Forums
to no avail.

Solutions?

I’m wondering if there is some other bug or patch that’s preventing
the two 1000GB drives working together during the boot process. Or
there is some mkinitrd step I’ve missed; many of the tutorials use
commands not available on my system. I suppose it’s possible to disable
the failing fsck, but that’s not ideal and new ground for me. I could
try to recreate /dev/md0 with the RAID superblock and bitmap in place.

Any ideas what’s awry? I’d welcome suggestions to pinpoint the
problem, if not solutions.

If you’ve reached this far, thank you for reading.

Just to double-check the steps:
(1) Booted with both 500GB disks
(2) You failed /dev/sdb1. Removed it from md0. Powered down.
(3) Physically replaced 2nd 500GB with 1000GB.
(4) Rebooted. Added /dev/sdb1 to md0. Waited till sync complete.
(5) You failed /dev/sda1. Removed it from md0. Powered down.
(6) Physically replaced 1st 500GB with 1000GB.
(7) Rebooted. Added /dev/sda1 to md0.

Yes that’s the synopsis. There was lots of waiting, typically a
couple of hours to resync, for each HDD switch.