My computer had a single IDE drive. I recently added a SATA drive. For a while everything was working as it should, and I was able to access the partitions on the SATA drive.
Then I did some kind of update – not sure what – and everything collapsed. I can get Grub to start and I can even boot into Windows, but when I try to boot into Linux the system can’t find the basic modules needed to do anything (the ones that are called at the very beginning of the boot process).
Part of the problem seems to be that after I added the SATA drive, the IDE drive became /dev/sdb and the new SATA drive became /dev/sda. The /etc/fstab file became totally confused, with the IDE partitions listed under /dev/disk/by-id for the SATA drive. And it appears that the grub.lst file was also affected.
The difficult thing about straightening this out is that I don’t have a working system, so the normal tools are inaccessible. I tried booting from the distribution DVD and attempting to repair the system by modifying /etc/fstab, but the entries seemed to be missing. I tried using the rescue system and editing /etc/fstab, but that didn’t help either.
So what tools should I be using to untangle this mess?
Why did you not use the Repair System on the DVD, which tries to figure out how to straighten out your GRUB and fstab rather than giving you a shell to edit them yourself.
BTW, at some point in the repair it may say that the repair failed. This appears to be a Repair System bug, just ignore it and continue, it really is repaired.
By using the Rescue System I determined that all the partitions are intact, so I certainly don’t want to reinstall and lose everything.
I tried to use the Repair System (from a 11.0 DVD rather than 11.1, but I assume that for this problem it doesn’t matter.) The dialog for fstab changes showed an incorrect mount point (twice), but when I clicked on CHANGE, the repair program just went on to the next screen without giving me any way of modifying it. And after finding grub, the root device, and the boot device, the Rescue System simply hung in a wait state. So the Repair System may not be usable for this problem.
A particular annoyance was that in order to operate on fstab or the boot loader, the Repair System insisted on checking all the filesystems, which took a very long time. I wish there was a way to tell it not to do that, particularly since it happens every time I try a different variant on the repair.
I’m beginning to conclude that the only way to fix this is to work through the Rescue System, mount the necessary partitions there, and edit the critical files /etc/fstab and /boot/grub/menu.lst.
But my attempts with that approach have so far not succeeded. I edited menu.lst to refer only to hd1 and /dev/sdb, but when I called grub-install, I got this (transcribed manually):
grub> setup--stage2=/boot/grub/stage2 --force-lba (hd0,2) (hd0,2)
Error 21: Selected disk does not exist
Recall that sda (=hd0) is now the SATA disk and sdb (=hd1) is now the IDE disk – which was sda until I installed the SATA disk.
When I boot up, the system never even gets to the point where it creates the /dev files – but it does seem to be booting from the right partition. Without the /dev files, I don’t think any mounting is possible.
I’m wondering if somewhere there’s an assumption built in that the boot device is /dev/sda rather than /dev/sdb. The fact that fstab shows the disk ID for the SATA disk as being ATA and the disk ID for the IDE disk as being SATA does seem to indicate a bug somewhere.
Perhaps you should fix it by changing the device names in /boot/grub/menu.lst and /etc/fstab to the normal pathnames i.e. /dev/sdb1, etc. and worry about getting the right by-id path after you have booted up.
The other thing is that your initrd may not contain the driver for the PATA interface, which is another reason to use the Repair System since the initrd is one of the things it fixes.
I wouldn’t assume that the 11.0 Repair System is valid for a 11.1 install.
I tried using the 11.1 Rescue System, which unlike 11.0 has a partitioning tool among the expert options. I can look at the partitions and edit their properties, supposedly. For each partition there are mounting options, and the mounting options include a choice to mount the partition and specify a mount point, which is just what I need.
But the tool won’t let me type anything into the mount point option. Whether I go to the appropriate text box with the keyboard or the mouse, there’s no typing cursor and no way I can see to enter any information. If I could get past that, the Rescue System might rescue me.
One clue to what is going on is that if I attempt a normal boot, I get a series of fatal errors for modules not found:
pata_via
sata_via
sata_inic162x
ata_generic
via_82cxxx
ide_pci
Without those modules, I assume, it’s not possible to access my discs. So there must be something in boot/grub/menu.lst that causes the bootup process to be looking for them in the wrong places. What might that be?
Kernel modules are loaded from two places, from the initrd, for modules that have to be loaded before the filesystems are mounted, obviously disk controller drivers that are needed come into this category, and then after the root filesystem has been mounted, more modules can be loaded from /lib/modules/<kernel version>.
If you have a new interface, then the module for the driver has to be in the initrd. This is one thing the Repair System fixes, when disks get moved to different controllers, e.g. new motherboard.
If those modules cannot be found after the kernel and initrd have loaded, then that’s probably because /lib/modules is not accessible.
When I tried to set up the boot record using grub, I had some problems in specifying initrd. I got an error 16 (inconsistent file structure) even though initrd does exist in /boot. So it’s possible that initrd never did its work. But I still don’t know why I got the error 16 from grub, or how to determine if initrd ever did anything on bootup.
I mentioned earlier that if I call grub explicitly and give it the commands appearing in menu.lst, I get an Error 16 from the initrd command. What’s particularly odd is that if instead of typing initrd /boot/initrd I type initrd /boot/message.old, which makes no sense, I don’t get the Error 16! And I did check that /boot/initrd is symlinked to /boot/initrd-2.6.27.19-3.2-default (which is about 6MB long).
isn’t going to do anything useful. GRUB will just note that it’s not a kosher initrd, spit it out, and boot the kernel without one. And all the modules that were supposed to be loaded from the initrd will be absent. So don’t go there, it does nothing useful.
Now info grub, under troubleshooting, has this to say about Error 16:
16 : Inconsistent filesystem structure
This error is returned by the filesystem code to denote an internal
error caused by the sanity checks of the filesystem structure on
disk not matching what it expects. This is usually caused by a
corrupt filesystem or bugs in the code handling it in GRUB.
Sounds like your filesystem is messed up or the partition is not the one you think it is.
I knew it didn’t do anything useful. I tried it just as an experiment because I wanted to determine if the problem was related to the filesystem (as the explanation of Error 16 indicates) or to the specific initrd file. I figured that if the problem was with the filesystem it would show up for any file in that filesystem, sensible or not. So I concluded that the problem had to do with the specific initrd file, not the filesystem.
Sounds like your filesystem is messed up or the partition is not the one you think it is.
If I interpret the results of my experiment with message.old correctly, then the filesystem is not messed up. And if the partition wasn’t the correct one, the file would not be found at all. (I did double-check the root specification, though.)
If initrd isn’t getting executed on bootup because it’s missing or defective, that would account for the missing critical files that prevent the devices from even being found.
I now think the problem has to do with the inputs to mkinitrd, since the initrd file was just recreated by the Repair System, presumably by mkinitrd. Alas, I don’t see how to run mkinitrd from the Rescue System shell. If I mount the actual root partition (to mm, say), and do it from a chroot jail on mm, the /dev files aren’t accessible. And if I don’t do it from a chroot jail, the paths of the utilities called by mkinitrd (such as perl) are all wrong and it won’t run.
I tried that, with an interesting result: the modules ata_generic, etc., were missing and could not be found. Just like when I was doing the actual bootup. I don’t know if they’re really missing or just in the wrong place. In any event, how can I make them available as they should be?
On another front: after my last try of the Repair System, accepting all the suggested defaults for the boot loader, I got into a state where a reboot merely gives me a grub prompt! I tried providing the same commands that were supposed to be in menu.lst and got a “file not found” for /boot/vmlinuz (even though it really is on the root partition).
So now I have two problems: getting Linux booted at all, and once it’s booted, enabling it to find ata_generic, etc.
After endless fiddling, and concluding that I had some nasty kernel configuration problems probably brought on by the System Repair actions, I used the 11.1 distribution DVD to do an update. The system seems to be running now, although I still have some work to do in restoring the previous configuration.