Is my disk dying? - exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 / irq_stat 0x40000001

Hello

my system is generating a lot of output possibly indicating disk errors but I am unable to find out what may be wrong.
I see a lot of this pattern using dmesg:


[396044.734788] ata7.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[396044.734798] ata7.00: irq_stat 0x40000001
[396046.782431] ata7.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0
[396046.782442] ata7.00: irq_stat 0x40000001

The system is a server running 247365 (and I want it to stay like that :slight_smile: )
So I would like to know what may cause the problem(s) and how to correct

here’s more specific info:

  • O/S version is openSUSE 13.2 (Harlequin) (x86_64)

  • the output is from dmesg but are also written to /var/log/messages

  • file system is ext4

  • I have 6 disks - 5 in RAID5 (SW RAID)

  • All disks are SMART enabled

  • After running smartctl -t long on all disks there are no indications of any errors ( in particular, Offline_Uncorrectable is 0 - suggesting that no unrecoverable bad blocks are found)

  • the system is about 5 years altogether and all 6 disks report Power_On_Hours close to 40000 accordingly

In case it matters all disks are WDC ( 1 750 GB for system and backup and 5 x 640 GB for RAID - actually, it’s used for /var because I run a number of virtual guests )
disk:
/dev/sda WDC WD7500AADS-0
/dev/sdb WDC WD6401AALS-0
/dev/sdc WDC WD6401AALS-0
/dev/sdd WDC WD6401AALS-0
/dev/sde WDC WD6401AALS-0
/dev/sdf WDC WD6401AALS-0

Basically I am stuck because I don’t know how to trace the log output to one of my disks - my next step is probably to replace disks one at a time in the RAID hoping to eliminate the error , but for obvious reasons I would like to make a good guess as to which disk is causing the logging

kind regards,
Jens

Best I can come up with is this thread

https://bbs.archlinux.org/viewtopic.php?id=197205

Check the last post.

Also a note mirroring drives are not enough if you are shooting for 5 9s uptime you need mirrored systems

Thanks for the pointer - I guess I solved the problem :slight_smile:

For all I can see, smartctl output confirms that all disks are OK.

So I took a look at lsscsi and got

lsscsi

[0:0:0:0] disk ATA WDC WD7500AADS-0 0A01 /dev/sda
[1:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdb
[2:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdc
[3:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdd
[4:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sde
[5:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdf
[6:0:0:0] cd/dvd ATAPI iHAS124 Y BL0W /dev/sr0

Clearly, I need a deeper understanding of the output generated in dmesg - specifically, is “ata7.00 …” in the log actually referring to the 7th SATA device? if so, I might be chasing a problem with my DVD looking at my hard disks.

As I have been unable to find out about the log message I simply tried to connect my disks differently - to be precise, I noticed that my DVD drive was connected to a SATA 6Gb/s port, and when I connected the DVD drive to a SATA 3Gb/s the log message disappeared

So as usual - the solution entirely different to my expectations … :slight_smile:

On 2015-06-28 18:56, jens middelfart wrote:
>
> Thanks for the pointer - I guess I solved the problem :slight_smile:
>
> For all I can see, smartctl output confirms that all disks are OK.
>
> So I took a look at lsscsi and got
>
> # lsscsi
> [0:0:0:0] disk ATA WDC WD7500AADS-0 0A01 /dev/sda
> [1:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdb
> [2:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdc
> [3:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdd
> [4:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sde
> [5:0:0:0] disk ATA WDC WD6401AALS-0 3B01 /dev/sdf
> [6:0:0:0] cd/dvd ATAPI iHAS124 Y BL0W /dev/sr0
>
>
> Clearly, I need a deeper understanding of the output generated in dmesg
> - specifically, is “ata7.00 …” in the log actually referring to the
> 7th SATA device?

Probably to 6:0:0:0, if they start at zero.

> As I have been unable to find out about the log message I simply tried
> to connect my disks differently - to be precise, I noticed that my DVD
> drive was connected to a SATA 6Gb/s port, and when I connected the DVD
> drive to a SATA 3Gb/s the log message disappeared
>
> So as usual - the solution entirely different to my expectations …
> :slight_smile:

Interesting!


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))