Is this really a sign of a drive failure?

I have 4 disks in a MD raid 5, and since I’ve installed 11.1, 2 of them have had errors similar to the following:

Dec 23 22:16:42 loki kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Dec 23 22:16:42 loki kernel: ata4.00: irq_stat 0x00400040, connection status changed
Dec 23 22:16:42 loki kernel: ata4: SError: { PHYRdyChg DevExch }
Dec 23 22:16:42 loki kernel: ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Dec 23 22:16:42 loki kernel:          res 40/00:0c:cd:9c:cb/00:00:0b:00:00/40 Emask 0x10 (ATA bus error)
Dec 23 22:16:42 loki kernel: ata4.00: status: { DRDY }
Dec 23 22:16:42 loki kernel: ata4: hard resetting link
Dec 23 22:16:46 loki kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 22:16:51 loki kernel: ata4.00: qc timeout (cmd 0xec)
Dec 23 22:16:51 loki kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x5)
Dec 23 22:16:51 loki kernel: ata4.00: revalidation failed (errno=-5)
Dec 23 22:16:51 loki kernel: ata4: hard resetting link
Dec 23 22:16:52 loki kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 22:16:52 loki kernel: ata4.00: configured for UDMA/133
Dec 23 22:16:52 loki kernel: ata4: EH complete
Dec 23 22:16:52 loki kernel: sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors: (160GB/149GiB)
Dec 23 22:16:52 loki kernel: sd 3:0:0:0: [sdd] Write Protect is off
Dec 23 22:16:52 loki kernel: sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Dec 23 22:16:52 loki kernel: sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 23 22:16:58 loki kernel: end_request: I/O error, dev sdd, sector 312576666
Dec 23 22:16:58 loki kernel: md: super_written gets error=-5, uptodate=0
Dec 23 22:16:58 loki kernel: raid5: Disk failure on sdd2, disabling device.

It’s definitely a problem, because MD marks the drive as failed and continues in degraded mode.

I have about 50 GB of non-essential data that I’m trying to restore, and the restore process is what’s triggering these errors. If I remove and re-add the partition to the MD array, then it will work fine, as long as I don’t retry the restore, or otherwise write a bunch of new data to the array.

I replaced the first disk, and I’m waiting for the second replacement to arrive. The timing is just funny, because no sooner than the new disk finished its sync, and I began restoring data did the second disk (connected to a different sata port) start showing the same errors.

Argh, just happened again with /dev/sdb.

It’s very hard to believe that 3 out of 4 identical drives would go bad all at the same time.

They re-sync fine.

Is md raid more sensitive in the new kernel?