boot failure: mkinitrd does not understand that /dev/sda has become /dev/sde

On installation of OpenSuse 12.2 RC 2, the root filesystem was on /dev/sda1, which changed to /dev/sde1 after connecting all my 6 harddisks.
(I had to do that, because the installer hangs if I have all harddisks connected, but disconnecting all other harddisks than the one I will install to, allowed me to install it.) So far, so good.

However, attemting to install the fglrx blob resulted in boot failure: I noticed that that installation failed because mkinitrd failed. On next boot, grub failed to read the initrd file - as I found out, it does not exist.

I am able to chroot into my root filesystem and run mkinitrd, but this fails:

# rootdev=/dev/sde1 mkinitrd -d /dev/sde1

Kernel image:   /boot/vmlinuz-3.4.2-1-default'
Initrd image:   /boot/initrd-3.4.2-1-default
Root device:    /dev/sde1 (mounted on / as ext4)
Kernel Modules: thermal_sys thermal processor fan pata_sil1680 scsi_dh scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw usb-common usbcore pcmcia_core pcmcia mmc_core ssb ohci-hcd uhci-hcd ehci-hcd usbhid hid-logitech-dj
Features:       acpi block usb resume.userspace resume.kernel
Perl-Bootloader: 2012-07-17 02:37:21 ERROR: Command '/usr/sbin/grub2-install --force --skip-fs-probe /dev/sda1 >/var/log/YaST2/y2log_bootloader 2>&1' failed with code 256 and output: /usr/sbin/grub2-setup: warn: Attempt to install GRUB to a partitionless disk or to a partition. This is a BAD idea..

Since /dev/sda1 is on the wrong harddisk, I assume that that’s the problem.

  1. How to persuade mkinitrd to install grub2 on /dev/sde1 instead, so I can boot my computer? (it didn’t honor my environment override, nor the -d option)
  2. How to permanently correct that setting, so mkinitrd will work automatically? ("/dev/sda" is not mentioned in any file under /etc, and man 8 mkinitrd does not mention config files)

PS: I am thankful that grub2 failed to install to the wrong harddisk.

mkinitrd doesn’t install grub2. Simply never use device names anywhere, neither in Grub menu nor in /etc/fstab and check your device.map!
If the device names have changed, that’s because you have more than one SATA controller and the drivers were loaded in a different order. Thus udev gave different device names to your hard disks.

Btw this thread should be moved to the Pre-Release/Beta subforum.

Pleass all wait until this thread is moved.

Moved to Pre-Release/Beta and open for discussion again.

From my error message, it looks like mkinitrd invoked /usr/sbin/grub2-install, and that the wrong harddisk was passed as argument to it.

I later found out that the -B option to mkinitrd skips that invocation and succeeds. Whatever I did, I now have an initrd (I also messed with yast2 bootloader, which complained that it couldn’t install the bootloader due to unsupported partitioning). Now, grub fails for a different reason: it shows something like “welcome to GRUB” without any menu entry. (I don’t know whether this is grub 1 or 2, because I desparately tried to install both with yast, so I don’t know which sticked.)

  1. I have now kexeced from Knoppix.

Having kexeced, I tried to install grub2 once again using yast2 bootloader. No error this time, but it hangs at the “Install boot loader” step with the progress bar at 100%. No error is written to stderr, and I find nothing relevant in dmesg. (Yast did wake all my harddisks, and dmesg is littered with IO-errors from some failed harddisks that I am rescuing data from. That’s not your business, yast.)

I’m not sure that is true.

I have grub2 installed in “/boot” (not in the MBR).

The last time that I ran “mkinitrd”, the boot sector content changed (where boot sector is first sector in the partition assigned to “/boot”). By contrast, running “grub2-mkconfig” does not change the boot sector.

Hi anordal,

in general it would be a good idea to use partitioner of YaST
to change the Fstab options to mount by ‘device ID’,
whenever possible.

Then the entries in /etc/fstab would look like


/dev/disk/by-id/ata-Hitachi_HDS723020BLA642_MN3220F33TJJDE-part7 /                    ext3       acl,user_xattr        1 1

and the like, avoiding the references to /dev/sda(X) …
which may change whenever drives are plugged / unplugged.

I’m not that experienced to say in advance whether you should install openSUSE again afterwards,
in order to get it running.
Perhaps the boot loader settings in YaST could help as well ?

You can (and should) as well chose this mounting by ‘device ID’ during installation of openSUSE,
if you choose at least ‘edit partition setup’ (or a similar choice - I didn’t run installation in english language),
when, during installation, it comes to confirming/changing the partition setup.

But this at least would avoid the problem with the changing numbers in /dev/sda1, /dev/sda5 etc…

Good luck
Mike

P.S.: Plugging and unplugging external hard disks other than USB disks
under Linux in general seems to be difficult - which is different from e.g. older MacOS systems.
Especially if there is a Linux file system on the external disks !

If the disks are all SATA then you can change the order with the ports ie the SATA plugs on the mother board. Most are numbered so use a lower numbered SATA connectors the lower the number the first in order. If mixed IDE SATA then it gets complicated and non predictable

I am actually using /dev/disk/by-id names exclusively in fstab. Sorry for not telling you sooner. The installer lets you specify the fstab naming of every filesystem if you choose manual partitioning. I was conscious not to mention anything like /dev/sda. Indeed, cd /etc && ack /dev/sda shows nothing.

That was a good tip. Unfortunately, I have mixed IDE SATA, 3 of each.
In my case, SATA precedes IDE, maybe because my IDE controller is on a PCI card.

Offtopic (SATA is freakin’ unreliable):
All SATA harddisks I’ve ever had (the 3 currently in my computer) are dying (including 2 1-year old WDC Green 2TB). Files do not have consistently the same checksum, and I see IO-errors in dmesg. It didn’t help to change SATA cables and motherboard.
In comparison, I’ve never seen an IDE harddisk die. My backup harddisk from the previous millennium squeals like a wounded pig when it spins, but does it fail? No.
So I bought that IDE controller and blew the dust off my faithful old IDE disks. Going all-IDE now for the foreseeable future (I just have to rescue my SATA-data).

Urgh !

All 3 SATA disks, all 3 ?
Modern high capacity disks aren’t as reliable anymore as older disks,
and the HD manufacurers reduced guaranteed lifetime while increasing expected error rate
for modern high capacity HDs.
However, that all your 3 SATA disks should fail,
without exception, that doesn’t sound very likely.

But wait, hasn’t there been a once early version of an intel chipset
in the last years, the shipment of which has been delayed then,
because of a bug in the SATA controler ?

Didn’t that bug cause a premature death of the SATA controler ??

I once read that in a computer journal, and I remember it because I then already
thought of putting together a computer myself from selected parts.

I think this is between more than 1 and perhaps 3 years ago
(I think it were the early days/months of the intel Core-i processors).

So perhaps it’s not your SATA disks that are failing, but the SATA controler
of the intel chipset of the motherboard.

Are you sure you didn’t get a motherboard with such a chipset ?

You could perhaps check that with another computer that has a different motherboard,
or even if you purchase an external SATA disk casing with an USB port (about 25 Dollars or less),
so your SATA disks to be tested then no longer would be connected via the SATA controler of your
motherboard.

Mike

Yes, I remember. I doubt I am affected: my harddisks started failing with my old mobo (asus a7v600 bought in 2004). For what I know, they might just have had a high bitrot rate from the beginning.

Offtopic: I had >100 days of uptime without noticing anything, until I ran fsck. I learnt that fsck is the worst thing you can ever do to a bitrotten filesystem, SATA lacks error correction (SATA just counts errors), and RAID1 didn’t help me (I don’t understand either, how disk redundancy is supposed to help when disks pretend everything is fine). Luckily, of the files that were not truncated by fsck, some file formats contain checksums — reading a file repeatedly (while thrashing the page cache) until it comes out with the right checksum works way better than RAID1, I tell you.

Ontopic:
The original problem “mkinitrd does not understand that /dev/sda has become /dev/sde” can be solved using yast2 bootloader by reinstalling GRUB2, choosing “Boot from Root partition”.
After solving the original problem, mkinitrd did no longer fail, but it didn’t succeed either — it just hung forever. So did also zypper and yast when trying to install kernels or bootloaders. Suspiciously, these procedures spin up all my harddisks, only to litter dmesg with io-errors from my dead harddisks. Indeed, mkinitrd succeeds if I disconnect my dead harddisks. Not strange that the installer also hung.
After disconnecting my dead harddisks, GRUB2 successfully installs. Now, I am back where I started, where grub won’t load initrd:

Loading initial ramdisk ...
error: couldn't read file.

GRUB2 then presumably tries to start Linux without initrd, which evidently results in kernel panic. However, the initrd file used by grub (seen by pressing e) exists.

The version of GRUB2 installed with OpenSuse 12.2 RC 2 was 1.99. In an update, I received version 2.00, which gives a more verbose error message:

Loading initial ramdisk ...
error: attempt to read or write outside of disk `hd0'.

My installation is on one partition, so this is nonsense to me.
Googling the error gave 1 result: Gentoo Forums :: View topic - grub2 error: attempt to read or write outside of disk ‘hd0’

When installing GRUB2 from Yast, I see this in dmesg:

blockdev: sending ioctl 125d to a partition!
blockdev: sending ioctl 125d to a partition!
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
blockdev: sending ioctl 125d to a partition!
blockdev: sending ioctl 125d to a partition!

If it matters, sda is the wrong harddisk.

Legacy grub revealed the real problem:

Error 18: Selected cylinder exceeeds maximum supported by BIOS

My root partition is only 16GiB, so this is pathetic. My motherboard is asus f1 a75-v pro. Bios info seen by dmidecode:

Vendor: American Megatrends Inc.
Version: 2201
Release Date: 01/16/2012

For the record, all my attempts to boot it in EFI mode failed.

What I meant to say about EFI:
I really tried to boot this mobo in EFI mode, but it showed no interest in my EFI system partition.

The cure for grub error 18 was to create a /boot partition. I resized partition 1 (using gparted on knoppix) to make space for a partition 2 in front of it (reverse partition table order).
Reinstalling legacy grub with Yast and trying to boot gave grub error 15, but reinstalling grub2 with Yast booted!

I have one wish for OpenSuse 12.2 final (apart from fixing all the bugs):
Don’t scan partition tables during bootloader installation! (Why not look at /dev/sd* instead?)
Trying to read certain blocks from a failed harddisk might take forever, as it does for me, which is a likely reason the installer hung with all harddisks attached in the first place, so I had to juggle my harddisks around, which confused mkinitrd and lead to all this bootloader nightmare.
Even if all harddisks are ok, I appreciate if the OS can leave them alone and not spinning them up all the time.

Hi arnordal,

good to hear that your system is up again.

It was a bit hard to follow you, because of the many hard disks,
and because in your posts you wrote of two quite different motherboards,
ASUS A7V600, and ASUS F1A75-V PRO,
that would be in use.

However, both are non-intel and for AMD CPUs,
so the chipset bug in the intel chipset concerning the SATA controler
anyway isn’t relevant to you.

UEFI boot is different from older approaches,
and it requires an additional EFI boot partition, see through e.g. thread
http://forums.opensuse.org/english/get-technical-help-here/install-boot-login/475347-error-occurred-while-installing-grub-during-os-11-4-installation-error-25-asus-efi-bios.html
and the links therein,
and this may explain

Coming back to your SATA hard disks:

How many other tools than fsck are there on a Linux system,
to check the integrity of a filesystem ?

fsck may not be perfect, but the outcome will as well depend on the options you give,
which you didn’t report here.

Usually the capability of error correction doesn’t depend on the type of bus (IDE, SATA, SCSI, USB, …) used.

One criterion rather is, whether the HD drive is a SMART (Self Monitoring And Repair Technology) drive, or not.

To find this out, run smartctl as root from the command line.

And because you seem to be quite aware of what you do:
Did you know ‘badblocks’ ?
As I used that under an older version of openSUSE
I had to get su even before being able to read the man pages.
This has a reason:
Be careful, it can overwrite everything on a hard disk (including partition tables),
without any specific warning.

I once experimented with badblocks using a 2nd hand IBM SCSI hard disk (with SMART)
which continuously produced new bad blocks/bad sectors.

Running badblocks on this failing disk several times, the reported number of bad blocks
gradually decreased from run to run, because these were mapped out by the SMART disk itself.

Feeding fsck with the numbers of these blocks (which in principle would be possible)
would in this situation be harmfull,
because the ext file system would than map out intact spare blocks, by which the
bad blocks already were replaced by the SMART disk itself !

Anyway, using smartctl and badblocks you could seriously check the state of your
SATA hard disks.

Take care
Mike

Be realistic! How/why would mkinitrd modify your boot sector?

Then you should be able to reproduce it, right ? Write your boot sector to a file with dd:

# dd if=/dev/sdXn of=somefile bs=512 count=1

run mkinitrd and compare the content of the boot sector with the content of the file… and tell us what has changed!

How? By reinstalling grub2.

Why? That, I do not know. I just use the software. I didn’t design it.

See comments #32 and #33 in the bugzilla report for Bug 765198 which confirm that grub2 is being reinstalled.

I have been doing that. However, I compared by checking md5sums of the two, which doesn’t tell me what changed.

Everything in “/boot/grub2/i386-pc” has also been updated to the time that “mkinitrd” was last run.