Can't find /dev/disk/by-id/ Suse 11.1 boot

About 70% of Suse 11.1 boots fail “Can’t find /dev/disk/by-id/…”

The drives are fairly recent 40GB Seagate ATA. CPU is Intel Core Duo 1.6GHz.

We have three identical machines and they all exhibit this problem under Suse 11.1.

I’ve tried broken_modules=ata_piix and insmod=ide_generic kernel parameters to no effect.

We’ve also tried an 80-way cable instead of 40-way but this doesn’t help either.

Can anyone help please?

Stab in the dark: have you tried kernel option ‘rootdelay=x’ where x is a number of seconds. Start high (say, 10), if it works, lower it till you get an acceptable % of failed boots.

are these ‘older’ machines with new drives installed…or were they
born with those Seagates and they worked okay initially…and, then
after some months you started seeing this problem?

these three identical machines…are they daily office use, home use
or 24/7, or what? do they get about the same use or does one get more
than the others? if that is the case did the most used one start
missing boots first?

and, can you give the Seagate model number please…

are the drives S.M.A.R.T. <http://en.wikipedia.org/wiki/S.M.A.R.T.>
and if so, have you received any warnings or errors…(are you running
smartd?)

and, if can you confirm that all three fstab are about the same, then
do this at a command line for any of them:

cat /etc/fstab

and, copy/paste the output into a reply

OH, are you using RAID…sometimes that gets a little outta whack and
comes up slower than it should and just reports the drive is not
there, when in fact is there but showing up late for work…

if you have a failed boot, then does the next attempt (seconds later)
almost always boot ok?

OH, and just to confirm: you say you are using Suse 11.1, is that
openSUSE 11.1 or SUSE Linux Enterprise 11 update 1 (or whatever they
call interim releases)

were these machines born with 11.1?


platinum

The machines originally ran Suse 10.0 with an older slower processor. We had no problems until we upgraded the processor and installed 11.1 The install couldn’t find the hard drives, we used ‘broken_modules=ata_piix’ to fix this. We have had problems booting Suse 11.1 since the install, sometimes it works, more often it doesn’t.

They are being developed for a process control application.

Seagate model ST340014A

Extract from boot.msg:

<6>ata1.00: ATA-6: ST340014A, 3.06, max UDMA/100
<6>ata1.00: 78165360 sectors, multi 0: LBA48
<6>ata1.01: ATA-6: ST340014A, 3.06, max UDMA/100
<6>ata1.01: 78165360 sectors, multi 0: LBA48
<4>ata1.00: limited to UDMA/33 due to 40-wire cable
<4>ata1.01: limited to UDMA/33 due to 40-wire cable
<6>ata1: clearing spurious IRQ
<6>ata1: clearing spurious IRQ
<6>ata1.00: configured for UDMA/33
<6>ata1: clearing spurious IRQ
<6>ata1: clearing spurious IRQ
<6>ata1.01: configured for UDMA/33
<5>scsi 0:0:0:0: Direct-Access ATA ST340014A 3.06 PQ: 0 ANSI: 5
<5>scsi 0:0:1:0: Direct-Access ATA ST340014A 3.06 PQ: 0 ANSI: 5

smartd is running, no errors.

/etc/fstab:

/dev/disk/by-id/ata-ST340014A_5JX3TSE9-part2 swap swap defaults 0 0
/dev/disk/by-id/ata-ST340014A_5JX3TSE9-part3 / reiserfs acl,user_xattr 1 1
/dev/disk/by-id/ata-ST340014A_5JX3TSE9-part1 /boot reiserfs acl,user_xattr 1 2
/dev/disk/by-id/ata-ST340014A_5JX3LWKG-part1 /sdb1 ext3 defaults 1 2
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
usbfs /proc/bus/usb usbfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0

Not using RAID.

If one attempt fails, then several successive ones fail.

We’re using openSUSE 11.1

First thing to check when an error message say “Can’t find …” isto check if it exists.

So what about

ls -l /dev/disk/by-id/

So we all can see what is there and what not.
Also an

ls -l /dev/sd*

might be interesting.

angelambayley wrote:
> The install couldn’t find the hard drives, we used ‘broken_modules=ata_piix’
> to fix this.

i do not know why the drive was not seen (do you?), nor where you got
that solution/prescription, or what that does…so, you may want to
rethink the work around and try to discover WHY the installer
couldn’t see the drive in the first place…

as far as i know that is not a usual situation…

i mean, you put in a new cpu, and a new hard drive and the first thing
that happens is the installer can’t see the hard drive…hmmm, where
did you pick up that work around?

> We have had problems booting Suse 11.1 since the install,
> sometimes it works, more often it doesn’t.

this is very strange and is what leads me to suspect a hardware problem…

> They are being developed for a process control application.
>
> Seagate model ST340014A

ok…are the three machines with or without XWindows…you running
them in runlevel 3?

the install media, did you check the md5sum of the iso (if you
downloaded it)?

and, did you check the install media after burning and before starting
install?

(see, i’m GUESSING you used the same install media on three identical
machines and got the same BAD result, which might mean you have an
identical faulty software set cause by faulty media…

but while it is close to bed time here i think you ought to try the
other posters idea…it could be that the drive just isn’t showing up
as quickly as it should…

i think those drives have a five year warranty…which is probably
about over…

hmmmm…just noticed you are using reiser…you know that has fallen
out of favor and well, do you have good reason to use it rather than
EXT3, or maybe even ext4…

you might consider doing a clean install with known 100% good
install media…OH, i see you upgraded from 10.0 to 11.1, right? has
to be because otherwise you would have ext3 and your home directory
would be on a different partition…

do you know that is upgrading is officially NOT a supported way to get
from 10.0 to 11.1?? that could be your problem, see:
http://en.opensuse.org/Upgrade scroll down and you will see “This
method is unsupported.” which was at the TOP of the page the last time
i looked at it, last week…and following the UNSUPPORTED caveat is a
way to get from 10.3 to 11.1, but no 10.0 to 11.1

it could be that you have a faulty install just from that!

> Extract from boot.msg:
>
> <6>ata1.00: ATA-6: ST340014A, 3.06, max UDMA/100
> <6>ata1.00: 78165360 sectors, multi 0: LBA48
> <6>ata1.01: ATA-6: ST340014A, 3.06, max UDMA/100
> <6>ata1.01: 78165360 sectors, multi 0: LBA48
> <4>ata1.00: limited to UDMA/33 due to 40-wire cable
> <4>ata1.01: limited to UDMA/33 due to 40-wire cable
> <6>ata1: clearing spurious IRQ
> <6>ata1: clearing spurious IRQ
> <6>ata1.00: configured for UDMA/33
> <6>ata1: clearing spurious IRQ
> <6>ata1: clearing spurious IRQ
> <6>ata1.01: configured for UDMA/33
> <5>scsi 0:0:0:0: Direct-Access ATA ST340014A 3.06 PQ: 0
> ANSI: 5
> <5>scsi 0:0:1:0: Direct-Access ATA ST340014A 3.06 PQ: 0
> ANSI: 5

i’m not smart enough to see anything in the above…
maybe someone else will…

>
> smartd is running, no errors.

do this in a terminal…

sudo smartctl -a /dev/sda

give roots pass when challenged, and see if you still see no
errors…as i scratch my head i’m leaning toward a hardware problem…

>
> /etc/fstab:
>
> /dev/disk/by-id/ata-ST340014A_5JX3TSE9-part2 swap swap defaults 0 0
> /dev/disk/by-id/ata-ST340014A_5JX3TSE9-part3 / reiserfs acl,user_xattr 1 1
> /dev/disk/by-id/ata-ST340014A_5JX3TSE9-part1 /boot reiserfs acl,user_xattr 1 2
> /dev/disk/by-id/ata-ST340014A_5JX3LWKG-part1 /sdb1 ext3 defaults 1 2
> proc /proc proc defaults 0 0
> sysfs /sys sysfs noauto 0 0
> debugfs /sys/kernel/debug debugfs noauto 0 0
> usbfs /proc/bus/usb usbfs noauto 0 0
> devpts /dev/pts devpts mode=0620,gid=5 0 0

wait!

i see two hard drives, the one numbered 5JX3TSE9 is sda and has three
partitions…the system, a swap and the boot partition…

and, another one numbered 5JX3LWKG with one partition is sdb, is that
a data drive?

WHICH drive is the 40GB seagate?

AND, from your first post you say: “Can’t find /dev/disk/by-id/…”
so, now we need to know WHICH drive is not showing up for work, sda or
sdb…wait, i can guess it is sdb because the boot is progressing on
sda and it tells us that sdb can’t be found…

ok, also do this to check the smart of sdb

sudo smartctl -a /dev/sdb

> Not using RAID.
>
> If one attempt fails, then several successive ones fail.
>
> We’re using openSUSE 11.1

i’m finished for the day…and headed for bed, maybe someone who knows
better will show up…

i think while i sleep i’ll leave you with this to ponder as a
potential way forward:

  1. check the smart readout from the terminal input above for both drives…

  2. seriously consider taking one of those machines and do a full,
    fresh, format and install of 11.1

but before you do read the good info at http://tinyurl.com/6jwtg9 on
where to get the install image, and how to check it is good…


platinum
Note: Accuracy, completeness, legality, or usefulness of this posting
may be illusive. and, i’m not a real guru, see caveat:
http://tinyurl.com/6aagco
*

Have you set your jumpers on your hard driver correctly (I.E. Master, Slave, Cable Select). Try and put all drives on Cable Select. Other option is a incorrect setting in the BIOS. Check the BIOS and make sure it has the correct setting. Default should be Auto or something and that should do most drives.

If both drives are on one cable and both are claiming master or something strange things can happen. Try and have a look. Hoop it helps :slight_smile:

Aha! Now here’s a strange thing…

I was using kernel 2.6.27.29-0.1-pae. Using this kernel I got the frequent “can’t find /dev/disk/by-id…” hang on boot.

I have now installed 2.6.27.29-0.1-default and this boots up correctly every time.

If anyone is unfortunate enough to suffer the same intermittent boot problem this may be the answer.

Thanks for all the help
Angela Bayley