openSUSE 11.4: Grub Error 16 after every kernel update

Hi everyone,

I’m running openSUSE 11.4 on several machines and whenever there is a kernel update, there is a slight chance that grub refuses to boot the kernel after the update is done. It happened again today.
Here’s what I do/observe:

  • “zypper update”, which (besides other stuff) listed kernel-default. I confirm.
  • After this is done, I do “shutdown -r now” to reboot.
  • Grub shows its menu (text mode only, I don’t have the graphical bootsplash installed). After 3 sec or so timeout, it tries to boot as always, but immediately says “Error 16: Inconsistent filesystem structure”.

Unfortunatelly, this problem occurs only randomly. I’m running 2 64bit machines and 4 32bit machines. Up to now it happened only on 32bit machines and at every kernel-update, it occurs on 1 or 2 of my machines.
To fix this, I have to boot from external media and recreate the broken boot-partition (i.e. copy files to some temporary place and do a “mkfs.ext2 /dev/sda1”, then copy back on the new filesystem). Then I mount everything, chroot in my to-be-fixed system and run “grub-install”. After that I can boot from harddisk again.
Some more information:

  • The filesystem which is inconsistent according to grub is ok according to fsck.ext2. I checked this before rewriting the filesystem with mkfs.ext2
  • The system contains only one harddisk with 2 partitions: sda1 is /boot, ext2, 80MiB; sda2 is a luks-encrypted lvm container. partition table is gpt, grub is installed in the mbr (I think).
  • The way I installed these systems I a bit unorthodox: I basically set up the partitioning scheme by hand and mounted everything, then did a “zypper -R /mnt/root install …”. While this might contribute to the triggering of this behavior, it still believe there is an underlying bug that is causing this mess. I have no other explanation for this to happen only randomly, but not always.

Does anyone have an idea where I could look for the reason of the strange behavior? I searched this forum and google, and many people are seeing this error message from grub, but most threads do (a) only discuss how to get the system back to boot normal or (b) have this problem while running the grub shell from within the working system. I couldn’t find a discussion thread that aimed at my problem (or I wasn’t able to recognize such a thread).
In the last months I always just “fixed” this as fast as possible to move on. Today I made a copy of /dev/sda1 (“cat /dev/sda1 > file”) right after booting my rescue system. If anyone has time and interest in fixing the underlying bug, I’m happy to share this file, logfiles and other data, I just don’t know where to look for hints.

Yarny

On 2011-11-08 22:36, Yarny wrote:

> After 3 sec or so timeout, it tries to boot as
> always, but immediately says “Error 16: Inconsistent filesystem
> structure”.

16 : Inconsistent filesystem structure
This error is returned by the filesystem code to denote an internal
error caused by the sanity checks of the filesystem structure on
disk not matching what it expects. This is usually caused by a
corrupt filesystem or bugs in the code handling it in GRUB.

Well, as you did an fsck and it is not corrupt, then it is a bug in grub -
so I read the manual. You could try reporting in bugzilla.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Yarny,I have found that when you have a problem such as you report on more than one computer it is due to some action you have done over and over. We are all creatures of habit, likely to configure several PC’s in a mostly identical way. Since I don’t see such an issue as you have being complained about, then its got to be some detail of your setup not generally used here by others. One such item that I see is the use of ext2 partitions. Is there some reason to stick with this older partition type? Did you know that EXT4 partitions are more reliable than using EXT2 on most modern systems in use today? You got to ask yourself if this might be the common thread with all of the problems that you report. Think about it.

Thank You,

On 2011-11-09 02:56, jdmcdaniel3 wrote:

> One such item that I see is the use of
> ext2 partitions. Is there some reason to stick with this older
> partition type?

Yes, there is. I also use ext2 for all my /boot partitions and I recommend
its use - without problems. Why? Because it is a partition of 100-200 MB,
very small. A journal there is a waste. Plus, if there is a journal it has
to be applied in memory by grub when booting for recovery after hibernation.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Hi everyone,

Re ext2, I don’t know anymore which arguments drove me to this filesystem for /boot, but the arguments from robin_listas were probably amongs them. I especially remember I wanted to make /boot as small as possible.

> You could try reporting in bugzilla.
So I did: https://bugzilla.novell.com/show_bug.cgi?id=729667

Thanks for all your thoughts so far.
Yarny