Hi,
I’m running a straightforward configuration with three PVs in a single volume group and one RAID5 logical volume on top of it.
The following error occurred:
[16538.008567] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[16538.008576] ata3.00: failed command: FLUSH CACHE EXT
[16538.008587] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[16538.008587] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[16538.008592] ata3.00: status: { DRDY }
[16538.008600] ata3: hard resetting link
[16543.366832] ata3: link is slow to respond, please be patient (ready=0)
[16548.062645] ata3: COMRESET failed (errno=-16)
[16548.062655] ata3: hard resetting link
[16553.420907] ata3: link is slow to respond, please be patient (ready=0)
[16558.116615] ata3: COMRESET failed (errno=-16)
[16558.116618] ata3: hard resetting link
[16563.475977] ata3: link is slow to respond, please be patient (ready=0)
[16593.182707] ata3: COMRESET failed (errno=-16)
[16593.182717] ata3: limiting SATA link speed to 3.0 Gbps
[16593.182721] ata3: hard resetting link
[16598.235748] ata3: COMRESET failed (errno=-16)
[16598.235757] ata3: reset failed, giving up
[16598.235762] ata3.00: disabled
[16598.235767] ata3.00: device reported invalid CHS sector 0
[16598.235795] ata3: EH complete
[16598.235847] sd 2:0:0:0: [sdc] Unhandled error code
[16598.235852] sd 2:0:0:0: [sdc]
[16598.235855] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[16598.235859] sd 2:0:0:0: [sdc] CDB:
[16598.235861] Read(10): 28 00 2e 97 e5 00 00 00 08 00
[16598.235878] end_request: I/O error, dev sdc, sector 781706496
[16598.235948] sd 2:0:0:0: [sdc] Unhandled error code
[16598.235951] sd 2:0:0:0: [sdc]
[16598.235954] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[16598.235957] sd 2:0:0:0: [sdc] CDB:
[16598.235959] Write(10): 2a 00 00 00 21 48 00 00 08 00
[16598.235972] end_request: I/O error, dev sdc, sector 8520
[16598.235976] end_request: I/O error, dev sdc, sector 8520
[16598.235982] md: super_written gets error=-5, uptodate=0
[16598.235987] md/raid:mdX: Disk failure on dm-12, disabling device.
[16598.235987] md/raid:mdX: Operation continuing on 2 devices.
[16598.355916] md: mdX: resync done.
[16598.400873] md: checkpointing resync of mdX.
I’m yet to find the cause of the initial error, but there appears to be a lot of ideas on the internet, ranging from faulty cables to firmware bugs. The disk itself looks OK, so for now I’m mostly interested in getting the array back up.
After this md hung up, among other processes accessing the disk at the time. A reboot lead to the following:
60.983184] device-mapper: raid: Loading target version 1.5.2
61.626560] md/raid:mdX: not clean -- starting background reconstruction
61.626577] md/raid:mdX: device dm-10 operational as raid disk 1
61.626578] md/raid:mdX: device dm-8 operational as raid disk 0
61.626756] md/raid:mdX: allocated 3282kB
61.626816] md/raid:mdX: cannot start dirty degraded array.
61.626858] RAID conf printout:
61.626858] --- level:5 rd:3 wd:2
61.626859] disk 0, o:1, dev:dm-8
61.626860] disk 1, o:1, dev:dm-10
61.626860] disk 2, o:1, dev:dm-12
61.626966] md/raid:mdX: failed to run raid set.
61.626987] md: pers->run() failed ...
61.627016] device-mapper: table: 253:13: raid: Fail to run raid array
61.627031] device-mapper: ioctl: error adding target to table
Trying to resync the LV manually via lvchange or bringing it up in partial mode yields the same error. LVM-tools don’t seem to offer much other options which leaves me at a loss. Any ideas?
Thanks!