GRUB Error 17 on single boot system OpenSuSE 11.4

I have a single boot system that was working great until this morning when it locked up. I was working as I always do when the system stopped responding completely.
CTRL+F1 did not work to allow me into the back so I could log in and see what was happening.

CTRL+SHIFT+SysRq did not recover the system or interrupt what ever was causing the lock up.

SSH from another system failed as the client did not respond

So I had to do a hard boot and when I did I got this:

GRUB Loading stage1.5.

GRUB loading, please wait…
Error 17

Again this is a single boot system no Windows other than in VMWare inside the install. It is a multi drive with RAID set up workstation that I have at work. The short of it I am sole IT support as our companies ITS will not support Linux. In 5 years of operations with Linux at work this is my first issue, and I can’t figure it out. I already tried the PartedMagic solution (ref = Re-Install Grub Quickly with Parted Magic) and that didn’t work.

I re-installed the system completely even going to the extent to change my partition scheme just to make sure that wasn’t the problem (which doesn’t make sense as the system had been running for 46 days non-stop without a problem). Upon reboot after performing a complete security update I got the Error 17 message again.

Below is the output of fdisk -l

Welcome - Parted Magic (Linux 2.6.30.6-pmagic)
Most of the filesystem tools and partition programs featured by Parted Magic
include man pages.  To read a manual page,  simply type man and
the name of the tool. (Examples: 'man ntfsprogs' or 'man fdisk')

root@PartedMagic:~# fdisk -l

Disk /dev/sda: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000aeba5

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          85      673792   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              85       13139   104856576   83  Linux
/dev/sda3           13139       14181     8377344   82  Linux swap
/dev/sda4           14181       91202   618664960   83  Linux

Disk /dev/sdb: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000080

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       15013   120591891   fd  Linux raid autodetect
/dev/sdb2           15014       30394   123547882+  fd  Linux raid autodetect

Disk /dev/sdc: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000081

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       15013   120591891   fd  Linux raid autodetect
/dev/sdc2           15014       30394   123547882+  fd  Linux raid autodetect

Disk /dev/sdd: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000082

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       30394   244139773+  fd  Linux raid autodetect

Disk /dev/sde: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000083

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1       30394   244139773+  fd  Linux raid autodetect

Disk /dev/sdf: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x2b49c53d

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1               1         592     4746240   82  Linux swap
Partition 1 does not end on cylinder boundary.
/dev/sdf2             592        9730    73403392   83  Linux
root@PartedMagic:~# 

One thing I noted was that the boot flag is set on /dev/sdb1 which it shouldn’t be. I don’t know how to take and disable the flag.

/dev/sda1   *           1          85      673792   83  Linux

The above is /boot which is the change I made to my system and it’s also set with a boot flag, which is correct.

So… what is going on? How do I fix it? And how did this happen?

PS: No, I have not made any system changes or modifications, especially to the boot files or the kernel. This is a production machine and I am in the middle of a billing cycle with deadlines so I won’t even do security updates during this time. As I said I was doing my job when this all started.

Thank you

On 2011-06-29 23:06, MkIII Supra wrote:
> GRUB loading, please wait…
> Error 17

17 : Cannot mount selected partition
This error is returned if the partition requested exists, but the
filesystem type cannot be recognized by GRUB.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

First off, the bootflag only matters if you have a generic boot code in MBR - which is NOT the case here. Having the bootflag on the /boot partition certainly doesn’t hurt but is irrelevant to Grub in MBR. I would say that you should check your first HD for bad sectors (stage 1.5 is written on the first track) or do a fresh install on another HD (as it seems that you have the system on the first HD, outside of your RAID).

Okay… so how do I fix that? Re-installing didn’t work. I even changed the file system on the /boot from extfs4 to extfs3.

Also just so I could get my job done I removed the newly installed drive and put the older drive back in, and the same error is occurring on the old drive as on the new. Is it the extfs4 file system? Is that the issue?

Which partition is giving me the problem? How do I find that out so I can correct it?

The drive that has /boot and / on it is brand new (less than a two old) and I just tried a fresh install for the 3rd time and still getting the Error 17 message. I also ran the disk utilities provided by the manufacture and there are no errors found with the disk. So on a whim I installed the old drive that only has / on it and same error is coming up. The old drive was formatted with extfs4 as well.

My question is, could that be the issue, extfs4? I have always used the ReiserFS as it has been stable and solid for me for years, but with the stoppage of production on ReiserFS and the advent of extfs4 (which is from what I understand supposed to be better) I have started to move over from ReiserFS on all by my RAID drives.

I am getting more confused by the cause… the old drive did have bad sectors and would occasionally start “chattering” which is why it was replaced. It booted fine and for the most part ran okay… but now IT’s also giving the Error 17.

Not on modern Linuxes.

findgrub mind provide some info - by reading the location and type of the partition where stage2 is located (but Grub itself has to do that too).
You’ll have to get the script an run it somehow.

Pls.help: How do I install and run “findgrub”? - Page 2

Here’s what it sees on my fileserver (LVM + RAID1) :

Find Grub Version 3.0.1 - Written for openSUSE Forums


 - reading MBR on disk /dev/sda                    ... --> Grub  found in sda MBR     => sda1   0xfd (openSUSE)
 - searching partition /dev/sda1   (LINUX RAID AUTO) ...
 - skipping partition  /dev/sda2   (swap)         
 - searching partition /dev/sda3   (LINUX RAID AUTO) ...
 - reading MBR on disk /dev/sdb                    ...
 - searching partition /dev/sdb1   (LINUX RAID AUTO) ...
 - skipping partition  /dev/sdb2   (swap)         
 - searching partition /dev/sdb3   (LINUX RAID AUTO) ...

If it cannot find the partition (in red), it might indicate that the partition table or stage 1.5 is messed up (due to logical or physical errors).

I read that openSUSE’s Legacy Grub has been patched (long ago) to allow extfs4 booting. However - as I’m a pretty conservative guy - I’m still using ext3fs for /boot an/or /. You certainly have nothing to lose by choosing ext3fs for /boot.

Same here.

What about wrong BIOS settings for this drive? (Legacy IDE/AHCI/SATA or whatever). That could be the problem.

please_try_again… I am re-installing with all ext4FS replaced with either extfs3 or ReiserFS. If that doesn’t work then I will try the findgrub solution and see where that leads me. Either way I will post as soon as something different happens.Looks like I will need to keep a spare system for times like this…

Re-install failed. So next is the findgrub solution, see if that resolves the issue.

Uh, I just went to read the solution and from what I see I have to have a running version of OpenSuSE to get it to work… that’s not possible as I am unable to get past the Error 17 issue.

You really don’t need to replace all ext4fs with ext3fs. Just /boot (the partition where Grub is looking for stage2) would be enough.
But think about BIOS settings too. If the Grub loader cannot find the device, it might be because it cannot read its geometry and looks at the wrong place (the address found in stage 1.5 doesn’t match the real start position of the partition). So if the drive is OK (which we don’t know) it looks to me like a HD geometry or partition alignment issue.

  • or it looks on the wrong drive - which is even more stupid (but possible). :frowning:

Odd, so for shiggles I decided to get into the BIOS and see what was what… can someone please explain how a BIOS can be modified while the system is running? The boot devices were all jangled up and /dev/sda was NOT in the boot options. A quick adjustment and it APPEARS that I am now back in the game, the system did in fact boot without an error… but I am skeptical at this time.

Would it be possible for you to physically disconnect all other HDs before installing? I don’t like this solution but that’s what I would try next. You may also check if the install media you’re using is OK (actually tha’ts what I would check first). Then look at BIOS settings, wipe out the partition table before installing (zero fill) and if nothing helps, try another HD.

Don’t see where the file system type has anything to do with it since it was working then quit. So there are 3 possibilities.

  1. hardware problem
  2. Did you install anything new or have an update before the problem happened. Maybe an update?
  3. corrupted file system. Note sectors may checkout fine at low level but the file system may still be corrupt. But this hangs on how grub was installed.

I assume that your RAID is software based???

It got changed as you replaced the drive. Didn’t you mention that you replaced the drive at some point? It looks like you solved the problem now. :wink:

Yes, but when I changed the drive I also checked the BIOS to make sure it was pointing to the new drive. That’s what has me puzzled. This system doesn’t get shut down or rebooted once it’s been installed and set up. I don’t add software or change configurations once it’s set up. It’s a work station that has to be operational at all times. Today has cost me close to $300,000.00 in revenue… and to make it up I get to work all night.

With that kind of money why not a mirrored system.

It is truly odd to have scrambled BIOS settings. It may indicate some failing hardware. I don’t believe I have ever seen that before unless the battery is failing.

That’s what I was about to say. Replace the battery!

On 2011-06-30 00:06, MkIII Supra wrote:
>
> Okay… so how do I fix that? Re-installing didn’t work. I even changed
> the file system on the /boot from extfs4 to extfs3.

My guess is that it is reading a different partition, or different disk,
than the one you think it should read.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2011-06-30 00:06, please try again wrote:

> I read that openSUSE’s Legacy Grub has been patched (long ago) to allow
> extfs4 booting. However - as I’m a pretty conservative guy - I’m still
> using ext3fs for /boot an/or /. You certainly have nothing to lose by
> choosing ext3fs for /boot.

Actually, you should choose ext2 for /boot. That is, assuming it is a
different partition.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2011-06-30 00:06, MkIII Supra wrote:

> I have always used the
> ReiserFS as it has been stable and solid for me for years, but with the
> stoppage of production on ReiserFS and the advent of extfs4 (which is
> from what I understand supposed to be better)

I very much doubt that. It can be better on some respects, but not others
that are distinctive of reiserfs, like not using a whole cluster for a 1
byte file.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)