Crazy start up boot errors on Aspire TimelineU with opensuse 12.2

I’ve just had to delete all partitions and re-install from scratch… again!

I don’t know why…

Every couple of boots I’ve been getting errors like this:

..[69.703839] ata.00:exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[69.740538] ata1:00 irq-stat 0x40000008
[69.777082] ata1.00:failed command:READ FPDMA QUEUED
[69.813672] ata1.00:cmd 60/08:00:80:sf:38/00:00:3a:00:00/40 tag 0 ncq 4096in 
[69.813674] res41/40:00:80:5f:38/00:00:3a:00:00/40 Emask 0x409 (media error) <LF>
.[69.887979] ata1:00:status:{UNC}
[69.890399] Buffer I/O error on device sda3, logical block 116327408

udevd (112): worker [117] terminated by signal 9 (killed)
could not find /dev/root

Want me to all back to /dev/disk/by-id/ata-WDC-WDCS000LPPVT-22G33TO-WD-WX81C32W1161-Part 2? (Y/n)

And… yes (y) or no (n), didn’t do much because it couldn’t find Part 2 of whatever the hell this is.

Booting in safe mode didn’t solve the problem, and running the recovery option on the Suse DVD didn’t work.

I don’t know what this above error is. What do you when something like this happens?
I thought maybe it could be a physical hard disk problem, but my computer is working fine now that I have re-installed everything.
I was thinking it may be a virus, a bug that has corrupted the file system, or some dodgy hard disk sectors, or a dodgy BIOS (although I don’t know much about that)/

Something which may also help you diagnose this or may be a separate issue is that I ran a firmware test from the openSuse DVD and this produced a lot of fails:

F     No SMBIOS or DMI entry point found
[FAIL] 05/2 memory hole test
F     The memory map has a memory hole between 15mb and 16mb
[FAIL] MTRR validation
F     memory range 0xafa00000 to 0xfeafffff (PCI Bus 0000:00) has incorrect attribute write-back
[FAIL] HPET configuration test
F     Failed to locate HPET base
[FAIL] CPU frequency saling tests (1-2 mins)
12 CPU frequency test supported.

What’s this all about? Do I need to update the BIOS or something, and could this be affecting the previous errors?

Like I say my laptop is working fine now, but I want to prevent any future similar problems from re-occurring.

Thank you for your help,

Your hard drive is failing.

Your hard drive is failing.

looks like it…

you could try to read smart data, which is the self diagnosis of HDDs, with the live dvd by running in a terminal:

smartctl -i

and if it’s active:

smartctl -H

however that’s not 100% reliable.
then you could try badblocks, which searches for bad blocks on the harddrive and mark them as unusable. however that could wipe all your data, so either have a backup available or copy what you can via a live cd to another hdd.

see:
https://wiki.archlinux.org/index.php/Badblocks

Hi thank you. It passed the smartctl tests

I am trying to run:

badblocks -nsv /dev/sda2

for instance and I get the following message: no such file or directory while trying to determine device size. Any ideas?

I will probably look into ordering another hard drive, but I thought I would try the badblock program first to see if that solves the problem by identifying and excluding the bad blocks from the file system by then doing

fsck -vcck /dev/<device>

did you run it on a mounted partition? try a live cd with the partitions unmouted.

however you should really use e2fsck, since after running badblocks you must tell the filesystem about the found badblocks…

i’m running one that has had badblocks errors 2 years ago. still runs fine. this is because it’s not uncommon but rather every HDD has bad blocks (there are even extra replacement blocks build in) and the firmware usually traces them. however some slip through.

Thank you Brian.

I booted up my G-parted CD and entered the command prompt and I was able to run a non-destructive read/write test

badblocks -nsv /dev/sda3

I did this with all my block partitions, sda1-3. sda3 produced towards the end 200 bad blocks. It seems odd to have such an exact even number of bad blocks.

I noticed that e2fsck is basically the same as fsck, except it is for the linux file system. So I ran

e2fsck -vcck /dev/sda3

, and it started the whole process of looking for badblocks again, so I didn’t need to run the first one. When I got in from work I noticed that it had made some modifications to the file system and something to do with 43 bad blocks, but I wasn’t able to scroll up to see what was the verbose was.

So hopefully this fixed the problem and I can run this procedure once a month to monitor it and see if if the number of bad blocks changes.

My final question, what about the firmware test? How does affect that, or does that indicate another problem?