Raid1 problem

I recently moved a 320GB raid1 array from one machine to a new machine. The old machine was running Mandriva 2007.0, the new one is running openSUSE 11 64bit (much better OS BTW, not that you didnt already know that :wink: ).

I can use:

mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1

and the raid1 assembles correctly and is ready to be mounted with:

mount /dev/md0 /data

At that point everything works as expected. The problem I have is I cannot figure out how to get the darn thing to mount at boot.

/etc/sysconfig/mdadm has the following in it:

BOOT_MD_USE_MDADM_CONFIG=yes

I have edited /etc/mdadm.conf to have:

DEVICE /dev/sdb1 /dev/sdc1
ARRAY /dev/md0 UUID=big_long_string_found_old_host auto=yes

(and yes, the big_long_string matches what is found in the output of mdadm --examine /dev/sdb1 and /dev/sdc1).

Then I can assemble it from root as:

mdadm -As /dev/md0 

and it assembles correctly and mounts correctly.

With that in mdadm.conf, at boot I see the following in dmesg:

EXT3-fs: unable to read superblock
md: md0 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: raid1 personality registered for level 1
raid1: raid set md0 active with 2 out of 2 mirrors

That said, when I do the assemble and mount manually I see this in dmesg:

EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.

There is no entry in mtab or fstab for it at this point in time because I cant seem to get it at least assemble at boot.

I think the mtab entry will look like this once I insert it (as it was on old system):

/dev/md0 /data ext3 rw 0 0

And Iā€™m not sure what the fstab should be.

How do I get this to auto assemble and mount at boot time?

This is not elegant, but it works . . .

Since the array is RAID 1, you can instruct grub to boot the kernel from one of the individual drives rather than the array itself, i.e., using the conventional grub syntax and device.map alignment. I have /boot on its own partition, in a RAID 1 array; sda1 + sda2 = md0. In device.map (hd0) is aligned to sda1. And then of course in the boot stanza on the kernel line the ā€œroot=ā€ clause points to the root array, i.e., ā€œroot=/dev/md1ā€. FWIW.

I neglected to mention the raid array does not contain the boot or OS partitions. It is booted from an seperate drive.

So I only need to get the raid array running as previously mentioned without regard to booting.

First of all /etc/fstab will contain a normal mount line except that the device will be a md device. The RAID assembly is transparent to fstab, in other words.

Hereā€™s what a machine I maintain that has two RAID arrays contains in /etc/mdadm.conf:

DEVICE partitions
ARRAY /dev/md0 level=raid1 UUID=97a86a39:7f5ef385:863f7dc3:dc245982
ARRAY /dev/md1 level=raid1 UUID=5ef1f4fd:6a3974ed:f9c3dad2:bd169594

mdadm will read /proc/partitions to find the partitions with the correct UUIDs to assemble as directed.

And you need these settings in /etc/sysconfig/mdadm


MDADM_SCAN=yes
MDADM_CONFIG="/etc/mdadm.conf"
BOOT_MD_USE_MDADM_CONFIG=yes

Have you double-checked at boot that the kernel is loading the md driver you need? Or whether it is choking at that point?

By the way, my set up is exactly as @ken_yap describes.

Changing /etc/mdadm.conf from

DEVICES /dev/sdb1 /dev/sdc1

to

DEVICES partitions

did not solve the problem. The other paramters you mentioned /etc/sysconfig/mdadm are all the same as mine.

After boot, with any of the above, I cannot directly mount the raid array with:

mount /dev/md0 /data

If I try I get:

mount: you must specify the filesystem type

but even by specifing auto with -t, it still says the same thing basically. If I specify ext3, I this:

mount: wrong fs type, bad option, bad superblock on /dev/md0, 
missing codepage or helper program, or other error (could 
this be the IDE device....)

and at that point I see this in dmseg output:

EXT3-fs: unable to read superblock

But if I issue:

mdadm -As /dev/md0

then the mount command as above it will mount and is readable.

I guess ultimately I could stick the commands in boot.local:

mdadm -As /dev/md0
mount /dev/md0 /data

but I donā€™t think thatā€™s an ideal solution.

In reference to the module loading at boot. How would I check?

Thx for the help thus farā€¦

Do you have the raid modules in your initrd as mingus725 pointed out? For example, in my case the machine in question has in /etc/sysconfig/kernel, in variable INITRD_MODULES, raid1 as one of the modules. If you do not, you would have to fix, then rebuild the initrd with mkinitrd.

The arrays are mounted by lines in /etc/fstab which look like this:

/dev/md0        /       reiserfs        acl,user_xattr 1 1

So the type and options are specified there.

It looks like your arrays have not been assembled by the time the mount is attempted, which is why you needed -As. Perhaps the -As triggered the loading of the missing module. You have to get the scan and assembly to work earlier.

PS: Strictly speaking as you are booting off another disk, you donā€™t need to have the raid module(s) in the initrd, they can also be in MODULES_LOADED_AT_BOOT in /etc/sysconfig/kernel. And you can check if you have raid modules loaded before the mount with the lsmod command.

Just a bit of follow-up on @ken_yapā€™s post . . .

My fstab entries are:

/dev/md1	/	ext3	acl,user_xattr	1	1
/dev/md0	/boot	ext3	acl,user_xattr	1	2
/dev/md2	/home	ext3	acl,user_xattr	1	2

I do not have the raid1 module built into the initrd, only raid0. I donā€™t remember how long ago, but I noticed that I did not need raid1 in the initrd. It appears the kernel is now loading the raid1 and raid456 modules automatically; perhaps this is being triggered by the kernel seeing the root partition being raid (and interestingly, I donā€™t use 4 or 5 or 6 but that module is loaded anyway). If you watch the kernel messages (hit Escape to drop the green screen after the boot starts) you canā€™t miss it. Or after booting at the terminal command line just do:

dmesg | grep raid

or to paginate the entire log


dmesg | more

I looked in etc/sysconfig/kernel and raid1 was not present. I added it to the INITRD_MODULES line just before ext3. Then ran mkinitrd (no params) and it spit out info but no errors. Then rebooted.

After boot, I see in lsmod output:

raid1      43136 0

All day today (even before initrd change) I no longer see the md0 attempts in dmesg output. Odd, as I was seeing it yesterday and donā€™t remember changing anything outside of what Iā€™ve been documenting.

I still cant mount it without assembling it.

Here is whats in my /etc/sysconfig/mdadm:

MDADM_DELAY=60
MDADM_MAIL="root@localhost"
MDADM_PROGRAM=""
MDADM_RAIDDEVICES=""
MDADM_SCAN=yes
MDADM_CONFIG="/etc/mdadm.conf"
MDADM_SEND_MAIL_ON_START=no
MDADM_DEVICE_TIMEOUT="60"
BOOT_MD_USE_MDADM_CONFIG=yes

Here is whats in my /etc/sysconfig/kernel now:

INITRD_MODULES="processor thermal stat_nv fan jbd raid1 ext3 edd"

Here is the current /etc/mdadm.conf:

DEVICE /dev/sdb1 /dev/sdc1
ARRAY /dev/md0 UUID=5ae26599:48ff33b8:bf469295:afb91df9 auto=yes

Now I can run the assemble and mount commands via CLI, but the assemble still doesnt appear to be happening at boot. What gives? :bangshead:

I saved the output of lsmod just after reboot, then saved it just after issuing the assemble command with mdadm. The differences are here, but I donā€™t know what they mean:
Before:


sd_mod   47280 8
raid1    43136 0
sata_nv  46860 7

After:


sd_mod   47280 12
raid1    43136 1
sata_nv  46860 9

Currently no entry in mtab or fstab.

Further, its only AFTER issuing the assemble command that I see the dmesg output:


md: md0 stopped
md: bind<sdb1>
md: bind<sdc1>
raid1: raid set md0 active with 2 out of 2 mirrors

I moved raid1 from INITRD_MODULES to MODULES_LOADED_ON_BOOT (raid1 is the only one). Reran mkinitrd (no errors), and rebooted.

I see the following in dmesg output:

md: raid1 personality registered for level 1

Still cant mount right after boot. The mdadm assemble command must be executed first then a mount command will work.

I gotta be missing something obvious :\ ā€¦

The third column lists how many instances of the module are currently loaded. A value of zero represents an unloaded module. So the first list shows the raid1 module loaded at that time but not being used, while the second list shows it is being used by one array.

Iā€™m wondering if there is another module(s) that must be loaded in the initrd. Are you quite sure that is ā€œstat_nvā€ and not ā€œsata_nvā€? And you may need sd-mod to be in that initrd, too.

After rebuilding the initrd, and following the boot, do the ā€œdmesg | moreā€ and look for the module being invoked or anything related.

Btw, I donā€™t think the array gets assembled each time. IIRC, once thatā€™s done itā€™s done unless it needs to be changed.

The ā€œraid . . . personalityā€ is what you should see in the kernel messages. Switching the module to ā€œon bootā€ means that a modprobe is done later in the boot sequence, while in the initrd it loads with the kernel. Everything youā€™re seeing suggests to me that there is still a module not loading at boot.

Your right, thats a typo. It is sata_nv, not stat_nv.

Wouldnt any additional modules that it needed to load have shown up in the lsmod output after I ran the assemble?

After boot, there is a device /dev/md0. Iā€™m wondering if that device should be removed (aka, its bad)?

Isnā€™t sd-mod for SD cards and MMC cards? This machine doesnt have either.

The kernel may be loading the module later.

sd_mod is a scsi disk driver. SATA is categorically scsi. Try adding it to your initrd; it canā€™t hurt.

Device nodes are created at boot by the kernel.

How about posting back your fstab?

/etc/fstab:


/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part5 swap                 swap       defaults              0 0
/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part1 /                    ext3       acl,user_xattr        1 1
/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part10 /data2               ext3       acl,user_xattr        1 2
/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part7 /home                ext3       acl,user_xattr        1 2
/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part9 /tmp                 ext3       acl,user_xattr        1 2
/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part6 /usr                 ext3       acl,user_xattr        1 2
/dev/disk/by-id/scsi-SATA_ST3250620NS_9QE7FHEW-part8 /var                 ext3       acl,user_xattr        1 2
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0

If the device node is created at boot, it has a strange date:

brw-r----- 1 root disk 9, 0 Jun  6 18:26 /dev/md0

Where is the mounting of the array in fstab? Everything I see is an individual partition mounted by Device-ID. As posted earlier, the arrays need to mounted by Device-Name (also can be mounted by Volume-label or UUID). Try changing fstab, per examples above.

Re the md0 block device: Right, I forgot certain device types are retained. Should not be a problem unless permissions were changed.

I said there was no entry in fstab or mtab. I added this to fstab:


/dev/md0        /data       ext3        acl,user_xattr 1 2

It resulted in a hung boot (drop to maintenance mode). There was a message on screen about not being able to read the superblock of md0 with a few more lines of what might be wrong. Couldnt capture it for pasting.

Right. Thatā€™s an oops; I thought that with @ken_yapā€™s post you added it in. It wonā€™t mount at boot without being in fstab. So that at least addresses the ā€œnot at bootā€ question.

So it appears the remaining issue is md not being able to read the superblock. It may be that there is something about the array specific to the machine you removed it from that renders it invalid on the new machine. Take a look at the mdadm man page re the HOMEHOST setting, IIRC that gets written into the superblock; perhaps there is more. There are mdadm commands which rebuild the superblock, I would try that next. If all that fails, since this is a RAID 1 array (i.e., the partitions are identical), I would break the array altogether and recreate it (after having a verified backup). YaST can create arrays, but I donā€™t know if it can do so from unformatted partitions (synchronization); that may have to be done from the command line.

Agreed. Thing that confuses me is that it assembles and mounts manually with no errors about superblock.

Considering this from mdadm man page:


When using Auto-Assemble, only arrays tagged for the given homehost will be assembled.

I will look at mdadm HOMEHOST as mentioned, but I think Iā€™m just going to back the data up to the /data1 partition on the boot drive (theres plenty of room), then drop and recreate. I will want to back it up again anyway before messing with the superblock. Getting tired of messing with it.

Thx for all the help so far. Iā€™ll advise any outcome with mdadm.