How to make both RAID1 disks bootable using GRUB2

HI,
I have a very basic setup of 2 disk sda and sdb RAID1 partitioned as mdo /, md1 /home and md2 swap.
System is a Legacy BIOS.

I want to make sdb bootable so that in the event of sda failing the system can simply boot off sdb without need for faffing about booting from a rescue disk

So I read in many posts that all that was needed was to copy the first 512 sectors of sda to sdb using dd if of sectors=512 count=1 - I seem to recall using that method successfully myself some years ago.

Having done that I pulled the power out of sda to simulate a failure expecting that on restart it would boot from sdb. Wrong - all I get is the word GRUB endlessly repeating and filling the entire screen.
I’m suspicious that those instructions are intended for GRUB only, and possibly more is now required to make it work on the new and more complex GRUB2

This is where it gets weird… I shutdown and re-connect sda and power up expecting it to now boot from sda as normal however once again all I get is the word GRUB endlessly filling the screen… - huh???
Seemingly it is still trying to boot from the non-bootable sdb

The only way I can get it to boot at this point is to now disconnect sdb and it will then happily boot from sda again. If I now shutdown and add sdb again everything is back to normal and it boots fine again… weird!

I have since had a look with Yast bootloader and I see there is an option to ‘enable RAID redundancy’ for GRUB which I hopefully selected, but this has made no difference at all. The system will still not boot from sdb after an sda failure, and once again I have to go through the whole performance of disconnecting sdb and booting from sda and then shutting down and re-adding sdb.

In a real time situation of sda failing, not being able to boot from sdb rather reduces the usefulness of RAID so I would rather like to make this work if at all possible.
Any help or ideas to fix this, or explanations for the weird behaviour I’m seeing would be greatly appreciated.

Thanks in Advance
Chris

Should add also that during OS install I selected to install GRUB in mbr - a first attempt using the offered selection of installing on /dev/md0 left me with a non booting system - why even offer this as a selection if it isn’t going to work?

To begin with, I am not an expert at all. But my idea would be that one should inform the BIOS of the fact that it should use sdb instead of sda (well, the BIOS has a different notation of course). The copying of the MBR/partition table (which btw. implies that both parition tables were exact the same, else you introduce a problem, general question: wouldn’t a copy of 446 bytes only be better?) is imho to assure that GRUB works, but the BIOS comes first.

O at least the BIOS must have the sequence fisrt disk, second disk, … configured.
Just an idea.

Hi thanks for your thoughts -

Both partition tables should be exactly the same - the system comprises of 2 x mirrored RAID1 250Gb discs.
I don’t believe the BIOS is the issue here, with sda (sata0) removed the BIOS automatically selects the next available hard disc (sata1) as the drive to attempt to boot from. Were that not the case I would not be seeing the word GRUB endlessly printing on my screen.

Chris

Let us try to check the GRUB configurtion to see if it point to one of thh disks instead to the mirror devices,

Again, I am no GRUB2 expert, but the (generated, do not edit there!) configuration seems to be in /boot/grub2/grub.cgf. You can only read hat as root.

When I do

grep '/dev/disk/' /boot/grub2/grub.cfg

I get several pointers to e.g.

/dev/disk/by-id/ata-Hitachi_HDT725032VLA380_VFJ201R23XUEXW-part1

Which is clealy pointing to my disk. How is that in your case?

Ahh OK so the first thing I now realise is that it would seem I have Grub and not Grub2!! I could have sworn SuSe 11.4 was Grub2…

so what I actually have is /etc/grub.conf

setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hdo,0)
           setup --stage2=/boot/grub/stage2 --force-lba (hd1) (hd1,0)

This looks OK to me as it seems to suggest that hd0 (sda) is not present then it will move onto hd1 (sdb) and should then boot from that…

Chris

Hm, you did not tell at all what version of openSUSE you use. Let alone that you use a rather old (and unsupported) version. >:)

I doubt that anything inside /etc/grub.conf is active at boot time. The only things "known"at boot time are in /boot/grub. And if I remember correctly from GRUB times long ago there is there menu.lst and device.map as crucial configuration.

Yes using 11.4 as this PC will be running a license server and tbh 11.4 was one of the most stable Suse versions I ever ran…in fact I was running 11.4 up until a few months ago on my personal machine which I upgraded to 13.1 and have regretted it ever since…

However I found this in another thread:-

1) Find the the stage 1 file:

grub> find /boot/grub/stage1
 (hd0,0)
 (hd1,0)
grub>

The output could be different, depending on the partition where /boot is located.

2) Asumming your disks are /dev/sda (hd0) and /dev/sdb (hd1) and you  have grub installed in the MBR of /dev/sda, do the following to install  grub into /dev/sdb MBR:

> device (hd0) /dev/sdb
> root (hd0,0)
> setup (hd0)

That is telling grub to assume the drive is hd0 (the first disk in the system).
Thus, if the first fails, the second will play the role of the first one, and so the MBR will be correct.

This has worked and the system now boots from sdb with sda disconnected. Of course mdadm is reporting that it is running degraded with sda as the sole drive so it really has fooled the system into thinking that sdb is sda!

Cheers
Chris

It would probably be better in a RAID to use generic boot code in MBR put grub in the boot partition and and flag it. Also you need to use a disk ID that is not manufacture specific. Maybe use labels so that grub is not pointing to the wrong drive.

My >:) above was not so much about tha fact that you may have reasons to run 11.4, it was about the fact that you did not inform us, your potential helpers, about the fact that you run 11.4. Thus more or less making fools of us.

Please, next time you ask something do forget to add the most basic pieces of information. We are all doing this in our spare time and like to spend that spare time efficient.