grub version disk numbering

I’m having trouble getting a system to boot. Specifically, I think the MBR is trashed and I don’t know how to repair it. I’m scared to experiment :frowning:

The system has a PATA DVD drive, a SATA drive via a motherboard connector and two 3ware RAID controllers. I can boot systems from the DVD drive but when I try to boot from the hard drive grub says Error 2 (which is a stage 1.5 error, I think).

The m/b SATA drive has both a 10.3 and an 11.1 system installed, on partions 2 and 3 respectively. Partition 1 is a swap partition. The RAID arrays are whole-disk LVM systems containing data. I want to run the 10.3 system. The BIOS is set to boot from the DVD then the m/b SATA, and NOT to boot from the RAIDs.

Two issues worry me: grub’s disk numbering and grub version dependency. If I boot into the rescue system on the 10.3 install DVD it shows the partitioned disk as /dev/sda. If I boot into the rescue system on the 11.1 install DVD it shows the partitioned disk as /dev/sdc. I suppose that means something in the kernel changed?

If I run grub in the 11.1 rescue system, I can say “find /boot/grub/menu.lst” and it reports two files on (hd2,1) and (hd2,2). If I run grub in the 10.3 rescue system it says error 15 file not found.

I’m scared to use 11.1 grub to setup hd2 because when I’ve tried previously I’ve managed to overwrite the LVM metadata on one of the RAID arrays.

Is there some way I might get the find command in the 10.3 grub to work? Or is there some way to test what the 11.1 grub will do that is non-destructive.

I’ve read several how-tos, forum messages, grub manual etc but am still confused!

Thanks in advance, Dave

hd2,1) and (hd2,2) are the partions 2 and 3 you refer to

if you feel unsure examine the menu.lst files in /boot/grub/ on the respective partitions and see how the installed OS’s 10.3 and 11.1 have entries in the menu for booting the system.
See if the boot entries hd reverences match your findings with the respective DVD’s

Also, why not get Parted Magic and see how that reads it
Downloads

Using Parted Magic an Introduction - openSUSE Forums

Re-Install Grub Quickly with Parted Magic - openSUSE Forums

Anyway the code for grub on the MBR is only concerned with the location of the menu, not it’s content.
And my guess is, your current grub is/was installed by 11.1 and has an entry to boot 10.3 - But your 10.3 menu probably doesn’t have an entry for 11.1
So it’s the 11.1 menu you should consider when doing the code for installing grub to the MBR
Which if I read correctly is: (hd2,2)

But with Parted Magic you can mount those partitions and read the files just to make sure.

Yes, that I do understand.

if you feel unsure examine the menu.lst files in /boot/grub/ on the respective partitions and see how the installed OS’s 10.3 and 11.1 have entries in the menu for booting the system.
See if the boot entries hd reverences match your findings with the respective DVD’s

I’m not sure I understand you. My problem is with loading grub itself, not with the menu.lst. So I’m not sure what looking in menu.lst will tell me? The hd numbers in the menu.lst may be wrong and are in any case affected by the device.map files, I think? I want to go back to first principles and install grub without depending on linux, then when I have a working grub I can tweak the contents of menu.lst and/or device.map to boot the systems I want with the appropriate linux device names.

Also, why not get Parted Magic and see how that reads it
Downloads

Using Parted Magic an Introduction - openSUSE Forums

Re-Install Grub Quickly with Parted Magic - openSUSE Forums

I was just about to try SuperGrubDisk, which I think may be what Parted Magic uses? I’m not familiar with PartedMagic; how does it run grub - does the grub use BIOS calls directly or are they emulated through a linux kernel or somesuch? It’s exactly that step I don’t trust.

Anyway the code for grub on the MBR is only concerned with the location of the menu, not it’s content.
And my guess is, your current grub is/was installed by 11.1 and has an entry to boot 10.3 - But your 10.3 menu probably doesn’t have an entry for 11.1
So it’s the 11.1 menu you should consider when doing the code for installing grub to the MBR
Which if I read correctly is: (hd2,2)

But with Parted Magic you can mount those partitions and read the files just to make sure.

I don’t think the MBR is in a partition? And as I say, I don’t trust the hd numbers returned by linux kernels since they seem to differ between kernel versions. I need to be sure the hard disk number is that seen directly by the grub I’m going to use, which is most likely the 11.1 one as you say.

Thanks, Dave

Parted Magic is a live Linux CD, I think the menu also has a supergrubdisk option.
You call grub from a RoxTerm on the running live session.

I don’t think the MBR is in a partition?
I didn’t say that did I?!
:)I know what the MBR is.

I want to go back to first principles and install grub without depending on linux
You will have to explain this to my limited understanding.
===============================

BTW
I’m not sure I would spend too much time on 10.3 as it’s support has ended

I do not think you have a grub stage 1 (MBR) problem. So reinstalling grub stage 1 likely will not fix problem but might give an error to help pin point the real problem. Stage 1 has done its job and has passed control of booting to stage 1.5. (My opinion due to error #2)

You could have a corrupted stage1.5 or stage 2 file,(?) or menu.lst has an error.

I can understand your concern with 10.3 and 11.1 not seeing the hdd’s the same and 10.3 not finding menu.lst and have no help.

Note for below: I have never used raid so advice may be bad.
Can you disconnect the raid easily, if so might do so. ( I would not boot past menu.lst to be on the safe side )

Grub doesn’t rely on any Linux kernel. It is not Linux specific. You don’t need Linux to use Grub. But you need a partition formated in whatever filesystem Grub is able to read to put Grub stuff (everything under /boot/grub in Linux ). I’ve been using Grub on FreeBSD and NetBSD systems without any Linux installed.

And as I say, I don’t trust the hd numbers returned by linux kernels since they seem to differ between kernel versions.

This is wise. I don’t either. The order in which different I/O controllers are searched is compilation-dependend, and so are the hd numbers.

That’s what I was afraid of - so it’s using the linux kernel to access the disks.

I didn’t say that did I?!
:)I know what the MBR is.

Well you said it was (hd2,2)

I’m not sure I would spend too much time on 10.3 as it’s support has ended

Indeed but 11.1 was a dog. I’ll try again with 11.2 real-soon-now but my priority at the moment is to get this server online again.

I agree. I hoped that reinstalling grub would fix the problem wherever it was.

You could have a corrupted stage1.5 or stage 2 file,(?) or menu.lst has an error.

Since 1.5 is giving the device does not exist error number I guess that’s before stage 2 and definitely before menu.lst. A bad menu.lst normally will produce at least a grub command line. I’ve also read things that talk about possible version incompatibilities between the MBR (and possible embedded stage 1.5) and what’s on disk in the filesystem. I don’t know whether that’s possible but it’s another reason to want to do a complete grub install.

I can understand your concern with 10.3 and 11.1 not seeing the hdd’s the same and 10.3 not finding menu.lst and have no help.

Thanks for the sympathy anyway!

Note for below: I have never used raid so advice may be bad. Can you disconnect the raid easily, if so might do so. ( I would not boot past menu.lst to be on the safe side )

I haven’t found an easy way. I can remove all the drives, which will guarantee I don’t corrupt them again, I suppose.

Cheers, Dave

This is sort of true but not the whole truth, AFAIK. When you invoke grub from a linux command-line, which is what happens in the rescue system and appears to be the case with Parted Magic, I believe grub does not access the disks via the BIOS but instead goes via the kernel, which exposes you to possible device numbering inconsistencies. That’s why I want to find a way to run a freestanding grub.

Cheers, Dave

No.
However a Grub entry like the following one is possible :


root (hd0,5)
kernel /boot/vmlinuz-2.6.31.8-0.1-desktop root=/dev/sdb6

The first line is needed to define the partition to boot. It uses Bios numbering. The second line is needed to boot the Linux kernel. It uses Linux device names. Linux and Bios numbering differ in this example. That’s not Grub fault. BTW I’m not sure that Grub has been originally developped on Linux. I guess it was a Hurd project, at a time were Linux was still using LiLO.

Grub is freestanding. The fact that you install it from Linux doesn’t mean that you need Linux to run it. What you need is the partition where you install it, not the Linux kernel and of course the Grub part in the MBR. Try to remove the Linux kernel and you will see that Grub will continue to boot or chainload anything else, provided you tell it the right HD/partition number of the OS you want to boot.

I’m not sure what you meant by ‘No’ in your previous message but I suspect you meant that I was wrong to say ‘I believe grub does not access the disks via the BIOS but instead goes via the kernel’. If so, please read section 15.1 of the GNU GRUB Manual 0.97 - where it says:
*
‘You can use the command grub for installing GRUB under your operating systems and for a testbed when you add a new feature into GRUB or when fixing a bug. grub is almost the same as the Stage 2, and, in fact, it shares the source code with the Stage 2 and you can use the same commands (see Commands) in grub. It is emulated by replacing BIOS calls with UNIX system calls and libc functions.’*

What you say is true about the version that is installed in the MBR and used to boot the system. It is not true about the version of grub that is run when you invoke it from the linux command-line. And so using the version run within linux to install the version in the MBR runs the risk of the drive numbering mismatch that can sometimes occur.

If you do an strace of a grub process run from the linux command-line, you can see it making lots of _llseek and read calls to the system when you ask it to find a file.

Cheers, Dave

That’s what I did in the end. I pulled out all the drives, reinstalled grub from the 11.1 rescue system, edited the menu.lst to include four entries - two for the 10.3 system and two for the 11.1 system - one entry from hd0 and one from hd2. Hopefully that will protect me against any other weirdness later. And now I’ve managed to boot the 10.3 system! Hurrah!

Thanks everybody,
Dave

Removing the kernel would show you that Grub doesn’t use it to access the disks. (of course you won’t be able to boot that kernel anymore). I doesn’t mean that Grub doesn’t replace BIOS calls with its own (UNIX) system calls, as any operating system do. In fact, the Grub shell is a kind of mini operating system. My point is that once Grub is installed, it doesn’t use Linux to access the partition you choose to boot. If it doesn’t find a kernel, it will give you an error, but at this point it would have had already accessed the disk.

What you say is true about the version that is installed in the MBR and used to boot the system. It is not true about the version of grub that is run when you invoke it from the linux command-line.

This is correct. But grub, update-grub, grub-install or whatever you invoke from the Linux command-line are just Linux utilities needed to install Grub under Linux. You have to install Grub somehow.

And so using the version run within linux to install the version in the MBR runs the risk of the drive numbering mismatch that can sometimes occur.

If you use Linux device names instead of Grub syntax maybe. You can use both.