DMRAID device numbers are different in grub than in suse11.1

Strange problem with suse 11.1. DMRAID devices are numbered differently in grub than they are once suse has booted.

I have the following drive layout:
dmraid (raid1):
128MB ext3 /boot
rest of drive as LVM

dmraid (raid1)
whole drive as LVM

I previously had OpenSUSE 11.0 on this machine with the same configuration and it worked fine. Today I did a clean install of 11.1, ran zypper up, and then rebooted. With the net-install CD still in the drive, it defaults to booting from the hard disk, and the system comes up just fine. When I take the CD out, grub complains with “hd(0,0)/messages file not found”, then appears the grub menu minus the normal green background. When I select the kernel to boot, it fails with error 17 and says there is not a valid ext3 partition on hd(0,0) 0x8e

I was able to find a workaround for this, but it is rather strange so I wanted to post so that both others can benefit from my morning spent on this, and perhaps a bug can be identified.

Here we can see that menu.lst is defaulting to the first entry(0) which lives on hd(0,0).
c07b00:/boot/grub # cat /boot/grub/menu.lst

Modified by YaST2. Last modification on Mon Mar 23 14:27:48 EDT 2009

default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/message
##YaST - activate

###Don’t change this comment - YaST2 identifier: Original name: xen###
title Xen – openSUSE 11.1 - 2.6.27.19-3.2
root (hd0,0)
kernel /xen.gz
module /vmlinuz-2.6.27.19-3.2-xen root=/dev/vg00/dom0_root resume=/dev/vg00/dom0_swap splash=silent showopts vga=0x314
module /initrd-2.6.27.19-3.2-xen

###Don’t change this comment - YaST2 identifier: Original name: linux###
title openSUSE 11.1 - 2.6.27.19-3.2
root (hd0,0)
kernel /vmlinuz-2.6.27.19-3.2-default root=/dev/vg00/dom0_root resume=/dev/vg00/dom0_swap splash=silent showopts vga=0x314
initrd /initrd-2.6.27.19-3.2-default

###Don’t change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe – openSUSE 11.1 - 2.6.27.19-3.2
root (hd0,0)
kernel /vmlinuz-2.6.27.19-3.2-default root=/dev/vg00/dom0_root showopts ide=nodma apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 x11failsafe vga=0x314
initrd /initrd-2.6.27.19-3.2-default

This shows that hd(0,#) is defined as /dev/disk/by-id/raid-ddf1_0
c07b00:/boot/grub # cat device.map
(hd0) /dev/disk/by-id/raid-ddf1_0
(hd1) /dev/disk/by-id/raid-ddf1_1

Here I list all of my /dev/disk/by-id/raid* devices and you can see that there are two partitions on ddf1_0 and one partition on ddf1_1. This looks good per my disk layout that I mentioned at the top of this post:
c07b00:/boot/grub # ls -l /dev/disk/by-id/raid*
lrwxrwxrwx 1 root root 10 2009-03-23 12:00 /dev/disk/by-id/raid-ddf1_0 → …/…/dm-0
lrwxrwxrwx 1 root root 10 2009-03-23 12:00 /dev/disk/by-id/raid-ddf1_0-part1 → …/…/dm-2
lrwxrwxrwx 1 root root 10 2009-03-23 12:00 /dev/disk/by-id/raid-ddf1_0-part2 → …/…/dm-3
lrwxrwxrwx 1 root root 10 2009-03-23 12:00 /dev/disk/by-id/raid-ddf1_1 → …/…/dm-1
lrwxrwxrwx 1 root root 10 2009-03-23 12:00 /dev/disk/by-id/raid-ddf1_1-part1 → …/…/dm-4

Just to be sure of what is on the partitions, here is file -s of each dm-# device:
c07b00:/boot/grub # file -s /dev/dm-0
/dev/dm-0: x86 boot sector; partition 1: ID=0x83, active, starthead 1, startsector 63, 273042 sectors; partition 2: ID=0x8e, starthead 0, startsector 273105, 976221855 sectors

c07b00:/boot/grub # file -s /dev/dm-1
/dev/dm-1: x86 boot sector; partition 1: ID=0x8e, active, starthead 1, startsector 63, 976494897 sectors

c07b00:/boot/grub # file -s /dev/dm-2
/dev/dm-2: Linux rev 1.0 ext3 filesystem data (needs journal recovery)

c07b00:/boot/grub # file -s /dev/dm-3
/dev/dm-3: LVM2 (Linux Logical Volume Manager) , UUID: QvbdqflmuBfuKChq4KRQ3FFmzJtd5tg

c07b00:/boot/grub # file -s /dev/dm-4
/dev/dm-4: LVM2 (Linux Logical Volume Manager) , UUID: nmEYFuE4J3PepRHev5MtiRB7mBgM2qf

c07b00:/boot/grub # grep boot /etc/fstab
/dev/mapper/ddf1_0_part1 /boot ext3 acl,user_xattr 1 2

The above demonstrates that /dev/disk/by-id/raid-ddf1_0-part1 is in fact where my only ext3 partition is, which is /boot and is what grub SHOULD be and IS using.

So if it is all configured correctly, what is the problem? After the failed boot, while sitting at the grub menu, I hit ‘e’ and take a look at what is available for partitions. What was really weird is that hd0 has one partition and hd1 has two. Look at what is above, and you will see that this is backwards. At this point in the boot, the dmraid devices come up in a different order and hd(0,0) is now pointing to an LVM partition instead of my ext3 partition.

To solve(well… more of an ugly workaround than a solution really) this problem, I tell it to boot from hd(1,0) which gets me back into suse, and then change my menu.lst to reference hd(1,0). Now it reboots with no problem. However if you were to look at my configuration files, they do look broken because once suse comes up, the order has been switched back to hd0 being where the ext3 partition is.

The same machine in the same configuration did not have this problem with 10.3 or 11.0.

Hopefully this helps someone.

cat OriginalPost | sed s/“hd(”/"(hd"/g
Forum does not allow edits… :frowning: