duplicate IRQs?

I am running openSUSE 11.4 (fully patched) and I am experiencing intermittent freeze-ups of two of my disks (ata1 and ata2, they are part of a software RAID5 managed by mdadm). Going over my dmesg output, it would appear that these two disks have the same IRQ (no. 21) as one of my ethernet controllers (eth2, which is not connected). These are the relevant parts of the dmesg output:

% dmesg | egrep -i 'ata|irq|forcedeth'
    0.000000] Command line: root=/dev/disk/by-id/ata-OCZ_CORE_SSD_MK02084906FA90021-part1 resume=/dev/disk/by-id/ata-ST3400620NS_9QH09P3H-part2 splash=verbose quiet nomodeset vga=0x346
    0.000000]  BIOS-e820: 00000000bfff0000 - 00000000bfffe000 (ACPI data)
    0.000000]   NODE_DATA [000000013ffec000 - 000000013fffffff]
    0.000000]   NODE_DATA [000000023ffec000 - 000000023fffffff]
    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
    0.000000] ACPI: IRQ0 used by override.
    0.000000] ACPI: IRQ2 used by override.
    0.000000] ACPI: IRQ9 used by override.
    0.000000] ACPI: IRQ14 used by override.
    0.000000] ACPI: IRQ15 used by override.
    0.000000] nr_irqs_gsi: 40
    0.000000] Kernel command line: root=/dev/disk/by-id/ata-OCZ_CORE_SSD_MK02084906FA90021-part1 resume=/dev/disk/by-id/ata-ST3400620NS_9QH09P3H-part2 splash=verbose quiet nomodeset vga=0x346
    0.000000] Memory: 8182804k/9437184k available (5307k kernel code, 1049092k absent, 205288k reserved, 6119k data, 936k init)
    0.000000] NR_IRQS:33024 nr_irqs:776 16
    0.000000] spurious 8259A interrupt: IRQ7.
    0.617016] ACPI: PCI Interrupt Link [LNKA] (IRQs 16 17 18 19) *11
    0.617189] ACPI: PCI Interrupt Link [LNKB] (IRQs 16 17 18 19) *0, disabled.
    0.617326] ACPI: PCI Interrupt Link [LNKC] (IRQs 16 17 18 19) *0, disabled.
    0.617460] ACPI: PCI Interrupt Link [LNKD] (IRQs 16 17 18 19) *5
    0.617603] ACPI: PCI Interrupt Link [LNEA] (IRQs 16 17 18 19) *0, disabled.
    0.617737] ACPI: PCI Interrupt Link [LNEB] (IRQs 16 17 18 19) *10
    0.617871] ACPI: PCI Interrupt Link [LNEC] (IRQs 16 17 18 19) *0, disabled.
    0.618010] ACPI: PCI Interrupt Link [LNED] (IRQs 16 17 18 19) *0, disabled.
    0.618177] ACPI: PCI Interrupt Link [LUB0] (IRQs 20 21 22 23) *11
    0.618312] ACPI: PCI Interrupt Link [LMAD] (IRQs 20 21 22 23) *10
    0.618447] ACPI: PCI Interrupt Link [LUB2] (IRQs 20 21 22 23) *5
    0.618590] ACPI: PCI Interrupt Link [LMAC] (IRQs 20 21 22 23) *5
    0.618730] ACPI: PCI Interrupt Link [LAZA] (IRQs 20 21 22 23) *0, disabled.
    0.618864] ACPI: PCI Interrupt Link [LSMB] (IRQs 20 21 22 23) *7
    0.619007] ACPI: PCI Interrupt Link [LPMU] (IRQs 20 21 22 23) *10
    0.619178] ACPI: PCI Interrupt Link [LSA0] (IRQs 20 21 22 23) *10
    0.619313] ACPI: PCI Interrupt Link [LSA1] (IRQs 20 21 22 23) *10
    0.619452] ACPI: PCI Interrupt Link [LATA] (IRQs 20 21 22 23) *0, disabled.
    0.619595] ACPI: PCI Interrupt Link [LSA2] (IRQs 20 21 22 23) *11
    0.619886] libata version 3.00 loaded.
    0.620070] PCI: Using ACPI for IRQ routing
    0.620219] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
    0.631849] pnp 00:02: [irq 8]
    0.631924] pnp 00:04: [irq 13]
    0.632351] pnp 00:05: [irq 4]
    0.632853] pnp 00:06: [irq 3]
    0.633406] pnp 00:07: [irq 6]
    0.634596] pnp 00:0b: [irq 1]
    0.634660] pnp 00:0c: [irq 12]
    0.841569] pcieport 0000:00:0a.0: irq 40 for MSI/MSI-X
    0.842119] pcieport 0000:00:0d.0: irq 41 for MSI/MSI-X
    0.842622] pcieport 0000:00:0f.0: irq 42 for MSI/MSI-X
    0.930039] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
    0.950547] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    0.977271] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
    1.032372] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    1.065553] 00:06: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
    1.081604] ACPI: PCI Interrupt Link [LUB2] enabled at IRQ 23
    1.081615] ehci_hcd 0000:00:02.1: PCI INT B -> Link[LUB2] -> GSI 23 (level, low) -> IRQ 23
    1.089075] ehci_hcd 0000:00:02.1: irq 23, io mem 0xfcefac00
    1.095542] ACPI: PCI Interrupt Link [LUB0] enabled at IRQ 22
    1.095549] ohci_hcd 0000:00:02.0: PCI INT A -> Link[LUB0] -> GSI 22 (level, low) -> IRQ 22
    1.103048] ohci_hcd 0000:00:02.0: irq 22, io mem 0xfcefb000
    1.156366] usbcore: registered new interface driver ums-datafab
    1.164064] PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
    1.166511] serio: i8042 KBD port at 0x60,0x64 irq 1
    1.166524] serio: i8042 AUX port at 0x60,0x64 irq 12
    1.172112] rtc0: alarms up to one year, y3k, 114 bytes nvram, hpet irqs
    1.173552] PM: Checking hibernation image partition /dev/disk/by-id/ata-ST3400620NS_9QH09P3H-part2
    1.221573] Write protecting the kernel read-only data: 10240k
    1.251509] sata_nv 0000:00:05.0: version 3.5
    1.251702] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 21
    1.251711] sata_nv 0000:00:05.0: PCI INT A -> Link[LSA0] -> GSI 21 (level, low) -> IRQ 21
    1.251714] sata_nv 0000:00:05.0: Using SWNCQ mode
    1.252103] sata_nv 0000:00:05.0: setting latency timer to 64
    1.252360] scsi0 : sata_nv
    1.252467] scsi1 : sata_nv
    1.252578] ata1: SATA max UDMA/133 cmd 0xc480 ctl 0xc400 bmdma 0xbc00 irq 21
    1.252581] ata2: SATA max UDMA/133 cmd 0xc080 ctl 0xc000 bmdma 0xbc08 irq 21
    1.252818] ACPI: PCI Interrupt Link [LSA1] enabled at IRQ 20
    1.252824] sata_nv 0000:00:05.1: PCI INT B -> Link[LSA1] -> GSI 20 (level, low) -> IRQ 20
    1.252826] sata_nv 0000:00:05.1: Using SWNCQ mode
    1.253177] sata_nv 0000:00:05.1: setting latency timer to 64
    1.253431] scsi2 : sata_nv
    1.253489] scsi3 : sata_nv
    1.253597] ata3: SATA max UDMA/133 cmd 0xb880 ctl 0xb800 bmdma 0xb080 irq 20
    1.253599] ata4: SATA max UDMA/133 cmd 0xb480 ctl 0xb400 bmdma 0xb088 irq 20
    1.253834] ACPI: PCI Interrupt Link [LSA2] enabled at IRQ 23
    1.253837] sata_nv 0000:00:05.2: PCI INT C -> Link[LSA2] -> GSI 23 (level, low) -> IRQ 23
    1.253839] sata_nv 0000:00:05.2: Using SWNCQ mode
    1.254190] sata_nv 0000:00:05.2: setting latency timer to 64
    1.254422] scsi4 : sata_nv
    1.254478] scsi5 : sata_nv
    1.254582] ata5: SATA max UDMA/133 cmd 0xb000 ctl 0xac00 bmdma 0xa480 irq 23
    1.254584] ata6: SATA max UDMA/133 cmd 0xa880 ctl 0xa800 bmdma 0xa488 irq 23
    1.557031] ata5: SATA link down (SStatus 0 SControl 300)
    1.706057] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    1.707095] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    1.710155] ata3.00: ATA-8: OCZ CORE_SSD, 02.10104, max UDMA/100
    1.710158] ata3.00: 62586880 sectors, multi 0: LBA 
    1.713154] ata3.00: configured for UDMA/100
    1.725233] ata1.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
    1.725236] ata1.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32)
    1.744229] ata1.00: configured for UDMA/133
    1.744351] scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD204UI  1AQ1 PQ: 0 ANSI: 5
    2.198049] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    2.217265] ata2.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
    2.217268] ata2.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32)
    2.236222] ata2.00: configured for UDMA/133
    2.236314] scsi 1:0:0:0: Direct-Access     ATA      SAMSUNG HD204UI  1AQ1 PQ: 0 ANSI: 5
    2.236636] scsi 2:0:0:0: Direct-Access     ATA      OCZ CORE_SSD     02.1 PQ: 0 ANSI: 5
    2.690049] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    2.709262] ata4.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
    2.709266] ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32)
    2.728223] ata4.00: configured for UDMA/133
    2.728326] scsi 3:0:0:0: Direct-Access     ATA      SAMSUNG HD204UI  1AQ1 PQ: 0 ANSI: 5
    3.182049] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    3.231753] ata6.00: ATA-7: ST3400620NS, 3.AEG, max UDMA/133
    3.231755] ata6.00: 781422768 sectors, multi 16: LBA48 NCQ (depth 31/32)
    3.306705] ata6.00: configured for UDMA/133
    3.306797] scsi 5:0:0:0: Direct-Access     ATA      ST3400620NS      3.AE PQ: 0 ANSI: 5
    3.368176] pata_amd 0000:00:04.0: version 0.4.1
    3.368204] pata_amd 0000:00:04.0: setting latency timer to 64
    3.368477] scsi6 : pata_amd
    3.368536] scsi7 : pata_amd
    3.369501] ata7: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
    3.369503] ata8: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
    3.534291] ata7.01: ATAPI: PIONEER DVD-RW  DVR-107D, 1.10, max UDMA/33
    3.534299] ata7: nv_mode_filter: 0x739f&0x739f->0x739f, BIOS=0x7000 (0xc00000) ACPI=0x701f (900:60:0x14)
    3.537226] ata7.01: configured for UDMA/33
    3.544255] ata8: port disabled. ignoring.
    4.261979] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: acl,user_xattr
    4.558867] preloadtrace: systemtap: 1.4/0.149, base: ffffffffa00e9000, memory: 43data/41text/106ctx/13net/495alloc kb, probes: 44
    4.727146] forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
    4.727385] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22
    4.727390] forcedeth 0000:00:08.0: PCI INT A -> Link[LMAC] -> GSI 22 (level, low) -> IRQ 22
    4.727395] forcedeth 0000:00:08.0: setting latency timer to 64
    4.732691] forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 2, addr 60:50:40:30:20:10
    4.732694] forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt gbit lnktim msi desc-v3
    4.732982] ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 21
    4.732986] forcedeth 0000:00:09.0: PCI INT A -> Link[LMAD] -> GSI 21 (level, low) -> IRQ 21
    4.732989] forcedeth 0000:00:09.0: setting latency timer to 64
    4.795685] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 19
    4.795696] e100 0000:01:07.0: PCI INT A -> Link[LNKA] -> GSI 19 (level, low) -> IRQ 19
    4.823498] e100 0000:01:07.0: eth0: addr 0xfcfff000, irq 19, MAC addr 00:02:b3:30:eb:09
    4.976155] ACPI: PCI Interrupt Link [LNEB] enabled at IRQ 18
    4.976168] HDA Intel 0000:04:00.1: PCI INT A -> Link[LNEB] -> GSI 18 (level, low) -> IRQ 18
    4.981946] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 17
    4.981953] ENS1371 0000:01:08.0: PCI INT A -> Link[LNKD] -> GSI 17 (level, low) -> IRQ 17
    5.250766] forcedeth 0000:00:09.0: ifname eth2, PHY OUI 0x5043 @ 3, addr 88:aa:99:bb:dd:ee
    5.250771] forcedeth 0000:00:09.0: highdma csum vlan pwrctl mgmt gbit lnktim msi desc-v3
    5.461127] nvidia 0000:04:00.0: PCI INT A -> Link[LNEB] -> GSI 18 (level, low) -> IRQ 18
    7.544810] EXT4-fs (sde3): mounted filesystem with ordered data mode. Opts: (null)
    7.750792] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
   12.649667] forcedeth 0000:00:08.0: irq 43 for MSI/MSI-X

Unfortunately, the forum will not let me post the full dmesg output…

Could this be the cause of the freeze-ups I am seeing, and if so, what would be the appropriate way to fix this?

PCI IRQs can be shared. That’s not to say it cannot cause problems but it’s a normal state of affairs.

pvh513 wrote:
> I am running openSUSE 11.4 (fully patched) and I am experiencing
> intermittent freeze-ups of two of my disks (ata1 and ata2, they are part
> of a software RAID5 managed by mdadm). Going over my dmesg output, it
> would appear that these two disks have the same IRQ (no. 21) as one of
> my ethernet controllers (eth2, which is not connected). These are the
> relevant parts of the dmesg output:

> Unfortunately, the forum will not let me post the full dmesg output…

http://susepaste.org/

> Could this be the cause of the freeze-ups I am seeing, and if so, what
> would be the appropriate way to fix this?

What do you mean by ‘freeze’? What symptoms do you see? What is in
/var/log/messages for example.

pvh513, I did not see what kind of computer this might be but some BIOS’ have a setting for PNP (Plug N’ Play) Operating Systems, which can be Yes (almost any recent OS would qualify as PNP) and let the OS manage IRQ’s or No, where IRQ’s are managed by the BIOS, not the OS. Changing this settings sometimes can change the IRQ assignments. Also, disabling an unused network port in the BIOS might be helpful. Consider there can be other causes for freeze ups. For instance, I saw issues last year with openSUSE 11.3 32 bit, using KDE desktop and then loading the nVIDIA proprietary video driver. I am sure other hardware combinations might exist that can cause a problem.

And, as a relative new user to the openSUSE forums, may I be the first to say welcome to you!

Thank You,

Thanks all for the help! My BIOS supports PnP, but switching that on didn’t seem to make any difference to the IRQ assignments. I also disabled the second ethernet port and that removed the duplicate entry for IRQ 21 (obviously). Now we will have to wait and see if the disks are stable. In the past it could take a few hours before the error would occur, but also up to two months… So it will take a long time before I will be confident that the problem is solved.

This is the /var/log/messages output from the last time I had a problem

Sep  5 06:37:01 dogbert kernel: [54535.827506] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
Sep  5 06:37:01 dogbert kernel: [54535.827510] ata1: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0
Sep  5 06:37:01 dogbert kernel: [54535.827511]   dhfis 0x1 dmafis 0x0 sdbfis 0x0
Sep  5 06:37:01 dogbert kernel: [54535.827514] ata1: ATA_REG 0x40 ERR_REG 0x0
Sep  5 06:37:01 dogbert kernel: [54535.827515] ata1: tag : dhfis dmafis sdbfis sacitve
Sep  5 06:37:01 dogbert kernel: [54535.827517] ata1: tag 0x0: 1 0 0 1  
Sep  5 06:37:01 dogbert kernel: [54535.827524] ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x1910000 action 0xe frozen
Sep  5 06:37:01 dogbert kernel: [54535.827526] ata1.00: hot unplug
Sep  5 06:37:01 dogbert kernel: [54535.827529] ata1: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
Sep  5 06:37:01 dogbert kernel: [54535.827532] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 06:37:01 dogbert kernel: [54535.827536] ata1.00: cmd 60/08:00:3f:01:c0/00:00:e6:00:00/40 tag 0 ncq 4096 in
Sep  5 06:37:01 dogbert kernel: [54535.827537]          res 40/00:04:3f:01:c0/00:04:3f:01:c0/40 Emask 0x10 (ATA bus error)
Sep  5 06:37:01 dogbert kernel: [54535.827540] ata1.00: status: { DRDY }
Sep  5 06:37:01 dogbert kernel: [54535.827544] ata1: hard resetting link
Sep  5 06:37:01 dogbert kernel: [54535.827545] ata1: nv: skipping hardreset on occupied port
Sep  5 06:37:01 dogbert kernel: [54535.827553] ata2: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
Sep  5 06:37:01 dogbert kernel: [54535.827555] ata2: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0
Sep  5 06:37:01 dogbert kernel: [54535.827556]   dhfis 0x1 dmafis 0x0 sdbfis 0x0
Sep  5 06:37:01 dogbert kernel: [54535.827558] ata2: ATA_REG 0x40 ERR_REG 0x0
Sep  5 06:37:01 dogbert kernel: [54535.827559] ata2: tag : dhfis dmafis sdbfis sacitve
Sep  5 06:37:01 dogbert kernel: [54535.827561] ata2: tag 0x0: 1 0 0 1  
Sep  5 06:37:01 dogbert kernel: [54535.827565] ata2.00: exception Emask 0x10 SAct 0x1 SErr 0x1810000 action 0xe frozen
Sep  5 06:37:01 dogbert kernel: [54535.827567] ata2.00: hot unplug
Sep  5 06:37:01 dogbert kernel: [54535.827568] ata2: SError: { PHYRdyChg LinkSeq TrStaTrns }
Sep  5 06:37:01 dogbert kernel: [54535.827570] ata2.00: failed command: READ FPDMA QUEUED
Sep  5 06:37:01 dogbert kernel: [54535.827574] ata2.00: cmd 60/00:00:3f:00:c0/01:00:e6:00:00/40 tag 0 ncq 131072 in
Sep  5 06:37:01 dogbert kernel: [54535.827575]          res 40/00:04:3f:00:c0/00:04:3f:00:c0/40 Emask 0x10 (ATA bus error)
Sep  5 06:37:01 dogbert kernel: [54535.827577] ata2.00: status: { DRDY }
Sep  5 06:37:01 dogbert kernel: [54535.827580] ata2: hard resetting link
Sep  5 06:37:01 dogbert kernel: [54535.827581] ata2: nv: skipping hardreset on occupied port
Sep  5 06:37:04 dogbert kernel: [54538.134052] ata1: failed to resume link (errno=-32)
Sep  5 06:37:04 dogbert kernel: [54538.134060] ata1: SATA link down (SStatus 0 SControl 300)
Sep  5 06:37:04 dogbert kernel: [54538.368860] ata1: hard resetting link
Sep  5 06:37:04 dogbert kernel: [54538.368863] ata1: nv: skipping hardreset on occupied port
Sep  5 06:37:09 dogbert kernel: [54543.872013] ata1: link is slow to respond, please be patient (ready=0)
Sep  5 06:37:11 dogbert kernel: [54545.863017] ata2: SRST failed (errno=-16)
Sep  5 06:37:11 dogbert kernel: [54545.863025] ata2: hard resetting link
Sep  5 06:37:11 dogbert kernel: [54545.863028] ata2: nv: skipping hardreset on occupied port
Sep  5 06:37:14 dogbert kernel: [54548.411037] ata1: SRST failed (errno=-16)
Sep  5 06:37:14 dogbert kernel: [54548.411044] ata1: hard resetting link
Sep  5 06:37:14 dogbert kernel: [54548.411048] ata1: nv: skipping hardreset on occupied port
Sep  5 06:37:19 dogbert kernel: [54553.915014] ata1: link is slow to respond, please be patient (ready=0)
Sep  5 06:37:21 dogbert kernel: [54555.897042] ata2: SRST failed (errno=-16)
Sep  5 06:37:21 dogbert kernel: [54555.897048] ata2: hard resetting link
Sep  5 06:37:21 dogbert kernel: [54555.897051] ata2: nv: skipping hardreset on occupied port
Sep  5 06:37:24 dogbert kernel: [54558.454051] ata1: SRST failed (errno=-16)
Sep  5 06:37:24 dogbert kernel: [54558.454057] ata1: hard resetting link
Sep  5 06:37:24 dogbert kernel: [54558.454060] ata1: nv: skipping hardreset on occupied port
Sep  5 06:37:28 dogbert kernel: [54562.078037] ata2: link is slow to respond, please be patient (ready=0)
Sep  5 06:37:29 dogbert kernel: [54563.957038] ata1: link is slow to respond, please be patient (ready=0)
Sep  5 06:37:56 dogbert kernel: [54590.945014] ata2: SRST failed (errno=-16)
Sep  5 06:37:56 dogbert kernel: [54590.945019] ata2: limiting SATA link speed to 1.5 Gbps
Sep  5 06:37:56 dogbert kernel: [54590.945022] ata2: hard resetting link
Sep  5 06:37:56 dogbert kernel: [54590.945025] ata2: nv: skipping hardreset on occupied port
Sep  5 06:37:59 dogbert kernel: [54593.489043] ata1: SRST failed (errno=-16)
Sep  5 06:37:59 dogbert kernel: [54593.489051] ata1: limiting SATA link speed to 1.5 Gbps
Sep  5 06:37:59 dogbert kernel: [54593.489056] ata1: hard resetting link
Sep  5 06:37:59 dogbert kernel: [54593.489060] ata1: nv: skipping hardreset on occupied port
Sep  5 06:37:59 dogbert kernel: [54593.590048] ata2: failed to resume link (errno=-32)
Sep  5 06:37:59 dogbert kernel: [54593.590058] ata2: SATA link down (SStatus 0 SControl 300)
Sep  5 06:38:00 dogbert kernel: [54594.027604] ata2: hard resetting link
Sep  5 06:38:00 dogbert kernel: [54594.027609] ata2: nv: skipping hardreset on occupied port
Sep  5 06:38:04 dogbert kernel: [54598.519038] ata1: SRST failed (errno=-16)
Sep  5 06:38:04 dogbert kernel: [54598.519042] ata1: reset failed, giving up
Sep  5 06:38:04 dogbert kernel: [54598.519045] ata1.00: disabled
Sep  5 06:38:04 dogbert kernel: [54598.519060] ata1: exception Emask 0x10 SAct 0x0 SErr 0x1950000 action 0xe frozen t4
Sep  5 06:38:04 dogbert kernel: [54598.519062] ata1: hot plug
Sep  5 06:38:04 dogbert kernel: [54598.519064] ata1: SError: { PHYRdyChg CommWake Dispar LinkSeq TrStaTrns }
Sep  5 06:38:04 dogbert kernel: [54598.519071] ata1: hard resetting link
Sep  5 06:38:05 dogbert kernel: [54599.530033] ata2: link is slow to respond, please be patient (ready=0)
Sep  5 06:38:06 dogbert kernel: [54600.750042] ata1: COMRESET failed (errno=-32)
Sep  5 06:38:06 dogbert kernel: [54600.750050] ata1: SATA link down (SStatus 0 SControl 300)
Sep  5 06:38:06 dogbert kernel: [54600.750070] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep  5 06:38:06 dogbert kernel: [54600.750074] sd 0:0:0:0: [sda]  Sense Key : Aborted Command [current] [descriptor]
Sep  5 06:38:06 dogbert kernel: [54600.750078] Descriptor sense data with sense descriptors (in hex):
Sep  5 06:38:06 dogbert kernel: [54600.750079]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 c0 01 
Sep  5 06:38:06 dogbert kernel: [54600.750085]         3f c0 01 3f 
Sep  5 06:38:06 dogbert kernel: [54600.750087] sd 0:0:0:0: [sda]  Add. Sense: No additional sense information
Sep  5 06:38:06 dogbert kernel: [54600.750090] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 e6 c0 01 3f 00 00 08 00
Sep  5 06:38:06 dogbert kernel: [54600.750096] end_request: I/O error, dev sda, sector 3871342911
Sep  5 06:38:06 dogbert kernel: [54600.750117] sd 0:0:0:0: rejecting I/O to offline device
Sep  5 06:38:06 dogbert kernel: [54600.750124] ata1: EH complete
Sep  5 06:38:06 dogbert kernel: [54600.750133] ata1.00: detaching (SCSI 0:0:0:0)
Sep  5 06:38:06 dogbert kernel: [54600.750223] end_request: I/O error, dev sda, sector 3907024047
Sep  5 06:38:06 dogbert kernel: [54600.750225] md: super_written gets error=-5, uptodate=0
Sep  5 06:38:06 dogbert kernel: [54600.750229] md/raid:md0: Disk failure on sda1, disabling device.
Sep  5 06:38:06 dogbert kernel: [54600.750229] <1>md/raid:md0: Operation continuing on 2 devices.
Sep  5 06:38:06 dogbert kernel: [54600.760226] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Sep  5 06:38:06 dogbert kernel: [54600.760421] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep  5 06:38:06 dogbert kernel: [54600.760425] sd 0:0:0:0: [sda] Stopping disk
Sep  5 06:38:06 dogbert kernel: [54600.760431] sd 0:0:0:0: [sda] START_STOP FAILED
Sep  5 06:38:06 dogbert kernel: [54600.760432] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep  5 06:38:07 dogbert kernel: [54600.995349] ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
Sep  5 06:38:07 dogbert kernel: [54600.995355] ata1: SError: { PHYRdyChg CommWake }
Sep  5 06:38:07 dogbert kernel: [54600.995363] ata1: hard resetting link
Sep  5 06:38:09 dogbert kernel: [54603.382039] ata1: COMRESET failed (errno=-32)
Sep  5 06:38:10 dogbert kernel: [54604.069038] ata2: SRST failed (errno=-16)
Sep  5 06:38:10 dogbert kernel: [54604.069043] ata2: hard resetting link
Sep  5 06:38:10 dogbert kernel: [54604.069046] ata2: nv: skipping hardreset on occupied port
Sep  5 06:38:15 dogbert kernel: [54609.572040] ata2: link is slow to respond, please be patient (ready=0)
Sep  5 06:38:17 dogbert kernel: [54611.030040] ata1: SRST failed (errno=-16)
Sep  5 06:38:17 dogbert kernel: [54611.030046] ata1: hard resetting link
Sep  5 06:38:19 dogbert kernel: [54613.077041] ata1: SRST failed (errno=-19)
Sep  5 06:38:19 dogbert kernel: [54613.077045] ata1: reset failed (errno=-19), retrying in 8 secs
Sep  5 06:38:20 dogbert kernel: [54614.111044] ata2: SRST failed (errno=-16)
Sep  5 06:38:20 dogbert kernel: [54614.111049] ata2: hard resetting link
Sep  5 06:38:20 dogbert kernel: [54614.111052] ata2: nv: skipping hardreset on occupied port
Sep  5 06:38:25 dogbert kernel: [54619.614044] ata2: link is slow to respond, please be patient (ready=0)
Sep  5 06:38:27 dogbert kernel: [54621.030041] ata1: hard resetting link
Sep  5 06:38:32 dogbert kernel: [54626.953041] ata1: link is slow to respond, please be patient (ready=0)
Sep  5 06:38:40 dogbert kernel: [54634.455023] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  5 06:38:40 dogbert kernel: [54634.493270] ata2.00: configured for UDMA/133
Sep  5 06:38:40 dogbert kernel: [54634.493292] ata2: EH complete
Sep  5 06:38:40 dogbert kernel: [54634.552067] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  5 06:38:40 dogbert kernel: [54634.571265] ata1.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
Sep  5 06:38:40 dogbert kernel: [54634.571269] ata1.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Sep  5 06:38:40 dogbert kernel: [54634.590279] ata1.00: configured for UDMA/133
Sep  5 06:38:40 dogbert kernel: [54634.590287] ata1: EH complete
Sep  5 06:38:40 dogbert kernel: [54634.590392] scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD204UI  1AQ1 PQ: 0 ANSI: 5
Sep  5 06:38:40 dogbert kernel: [54634.590719] sd 0:0:0:0: Attached scsi generic sg0 type 0
Sep  5 06:38:40 dogbert kernel: [54634.590761] sd 0:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Sep  5 06:38:40 dogbert kernel: [54634.590804] sd 0:0:0:0: [sdf] Write Protect is off
Sep  5 06:38:40 dogbert kernel: [54634.590808] sd 0:0:0:0: [sdf] Mode Sense: 00 3a 00 00
Sep  5 06:38:40 dogbert kernel: [54634.590824] sd 0:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep  5 06:38:40 dogbert kernel: [54634.594434]  sdf: sdf1
Sep  5 06:38:40 dogbert kernel: [54634.594673] sd 0:0:0:0: [sdf] Attached SCSI disk
Sep  5 06:38:40 dogbert kernel: [54634.672521] RAID conf printout:
Sep  5 06:38:40 dogbert kernel: [54634.672526]  --- level:5 rd:3 wd:2
Sep  5 06:38:40 dogbert kernel: [54634.672530]  disk 0, o:0, dev:sda1
Sep  5 06:38:40 dogbert kernel: [54634.672533]  disk 1, o:1, dev:sdd1
Sep  5 06:38:40 dogbert kernel: [54634.672534]  disk 2, o:1, dev:sdb1
Sep  5 06:38:40 dogbert kernel: [54634.676034] RAID conf printout:
Sep  5 06:38:40 dogbert kernel: [54634.676036]  --- level:5 rd:3 wd:2
Sep  5 06:38:40 dogbert kernel: [54634.676038]  disk 1, o:1, dev:sdd1
Sep  5 06:38:40 dogbert kernel: [54634.676039]  disk 2, o:1, dev:sdb1

Well let us hope you had isolated the problem as a duplicate IRQ then and I wish you good luck with this issue.

Thank You,

Last night I had another failure of my RAID5 array. So I am pretty much convinced now that the duplicate assignment of the IRQs was not the cause of the problems. I the meantime I have seen freezes on all four harddrives in the system (but not the SSD, could be low number statistics though). So I am now convinced it is also not a problem with the drives (including firmware, there are two different makes of drives affected) or the cabling. So I am wondering now if this could be a driver problem. The controller in my system is an nVidia Pro 3600, so I am using the sata_nv driver. The freezes occur very erratically. They can come 2 days in a row, or they can stay away for weeks or even 1-2 months…

I am completely out of ideas on how to proceed. Does anybody have any suggestions? The RAID5 is my only backup system, so I do want to get this stable…

Hi
What about environment, drives cool (hddtemp or smartctl), chipset
cool (sensors), dust bunnies, fans all working etc.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 11.4 (x86_64) Kernel 2.6.37.6-0.7-desktop
up 1 day 3:34, 4 users, load average: 0.03, 0.04, 0.05
GPU GeForce 8600 GTS Silent - Driver Version: 285.05.09

The case is well ventilated and the fans are all working. The HD temps are now between 41 and 44 C. The highest temp ever recorded was 51 C. There is a fan blowing over the disks. I clean the case something like once or twice a year. There is no inordinate amount of dust in there. I do not think this is an overheating issue. I do not have a temp for the chipset, but I do not believe that it is overheating. I would think that I would see more diverse crashes if the chipset was overheating…