Isn’t the journal supposed to ensure that, remember how ppl hated & dreaded fsck(8)? Accessing the filesystem is not only the structure on disk, but journal entries and “replaying”.
The (subtle) problem was that GRUB 16bit code had too little memory to reliably handle such, ReiserFS & XFS were especially affected.
Anyway the reason for past recommendations for a boot partition to be ext2, on part of disk with reasonably low LBA’s was because of past “boot weirdnesses”, so I shouldn’t waste much more time worrying about this one. ext4 now has an option “no journal” which can be used on /boot, as part of install process; and I noticed hibernate now “freezing” processes I/O in order to try and unmount clean disk state to avoid such troubles. There’s also the installation warning about possible BIOS limititations affecting LBA block numbers.
The trouble with this, there just won’t be a definite answer, we can only speculate as to possibilities based on the information now available.
So IMO :
There are subtle bugs, but they’re uneconomical to fix.
Your configuration is not robust, so if such rare happenings bother you, do a bullet proof conservative set up, like an experienced server sysadmin probably would.
Probably kernel & boot loader improvements will fix these bugs, but introduce new subtle ones in their place over next few years
I have had unclean shutdowns because video driver problems have caused freezes. Most of these have gone away with the 3.0.0 kernel, but I have had them in the past.
When “fsck” is run in that case, it does report “recovering from journal” before it reports “clean”. So the situations are distinguishable.
Yes, that’s likely true. And I probably bumped into one of them.
My main 11.4 setup is reasonably robust. The test partition setup for M3 isn’t. That’s why I experimented with copying the desktop kernel to the “/boot” from the 11.4 install, and booting it directly with the 11.4 grub. It still didn’t load that way, though, so it is pretty clear that the problem with the 12.1 M3 desktop kernel is not a subtle bug in how I am booting. It is probably a subtle bug in the kernel itself.
Apologies then, the drift of the previous discussion seems to have obscured that, I have tried to explain how moving a file like menu.lst by copying it can affect boot success.
If you suspect the kernel, firstly until 3.0rc6 was a crasher. Most blank screens have been caused by KMS, so using nomodeset resolves them by causing a “legacy” driver to be used in modern driver’s place; been particularly true in past of intel with some older things not booting with newer kernels.
Have you got a reproducible problem start up now? If so then finding out where it goes wrong would be the first step.
Installing a Kernel:HEAD one from the repo, would allow seeing if it’s been found & fixed UPSTREAM already (or by the SuSE kernel team GKH et al).
When doing a new install, grub is setup to default to booting the newly installed linux (to complete the installation). Typically this setting would be cleared on the first boot. However, booting via the “configfile” directive with a different instance of grub won’t clear the setting, since the wrong instance of grub is being used.
My current solution is to invoke the secondary grub with “other” (i.e. with chainloader). That way the newly installed grub is actually invoked and the default flag is cleared.
Mostly likely, removing the file “default” from “/boot” (or was that “/boot/grub”) would also work.