LVM RAID5 broken after sata-link error

Hi,

I’m running a straightforward configuration with three PVs in a single volume group and one RAID5 logical volume on top of it.

The following error occurred:

[16538.008567] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[16538.008576] ata3.00: failed command: FLUSH CACHE EXT
[16538.008587] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[16538.008587]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[16538.008592] ata3.00: status: { DRDY }
[16538.008600] ata3: hard resetting link
[16543.366832] ata3: link is slow to respond, please be patient (ready=0)
[16548.062645] ata3: COMRESET failed (errno=-16)
[16548.062655] ata3: hard resetting link
[16553.420907] ata3: link is slow to respond, please be patient (ready=0)
[16558.116615] ata3: COMRESET failed (errno=-16)
[16558.116618] ata3: hard resetting link
[16563.475977] ata3: link is slow to respond, please be patient (ready=0)
[16593.182707] ata3: COMRESET failed (errno=-16)
[16593.182717] ata3: limiting SATA link speed to 3.0 Gbps
[16593.182721] ata3: hard resetting link
[16598.235748] ata3: COMRESET failed (errno=-16)
[16598.235757] ata3: reset failed, giving up
[16598.235762] ata3.00: disabled
[16598.235767] ata3.00: device reported invalid CHS sector 0
[16598.235795] ata3: EH complete
[16598.235847] sd 2:0:0:0: [sdc] Unhandled error code
[16598.235852] sd 2:0:0:0: [sdc]  
[16598.235855] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[16598.235859] sd 2:0:0:0: [sdc] CDB: 
[16598.235861] Read(10): 28 00 2e 97 e5 00 00 00 08 00
[16598.235878] end_request: I/O error, dev sdc, sector 781706496
[16598.235948] sd 2:0:0:0: [sdc] Unhandled error code
[16598.235951] sd 2:0:0:0: [sdc]  
[16598.235954] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[16598.235957] sd 2:0:0:0: [sdc] CDB: 
[16598.235959] Write(10): 2a 00 00 00 21 48 00 00 08 00
[16598.235972] end_request: I/O error, dev sdc, sector 8520
[16598.235976] end_request: I/O error, dev sdc, sector 8520
[16598.235982] md: super_written gets error=-5, uptodate=0
[16598.235987] md/raid:mdX: Disk failure on dm-12, disabling device.
[16598.235987] md/raid:mdX: Operation continuing on 2 devices.
[16598.355916] md: mdX: resync done.
[16598.400873] md: checkpointing resync of mdX.

I’m yet to find the cause of the initial error, but there appears to be a lot of ideas on the internet, ranging from faulty cables to firmware bugs. The disk itself looks OK, so for now I’m mostly interested in getting the array back up.

After this md hung up, among other processes accessing the disk at the time. A reboot lead to the following:

   60.983184] device-mapper: raid: Loading target version 1.5.2
   61.626560] md/raid:mdX: not clean -- starting background reconstruction
   61.626577] md/raid:mdX: device dm-10 operational as raid disk 1
   61.626578] md/raid:mdX: device dm-8 operational as raid disk 0
   61.626756] md/raid:mdX: allocated 3282kB
   61.626816] md/raid:mdX: cannot start dirty degraded array.
   61.626858] RAID conf printout:
   61.626858]  --- level:5 rd:3 wd:2
   61.626859]  disk 0, o:1, dev:dm-8
   61.626860]  disk 1, o:1, dev:dm-10
   61.626860]  disk 2, o:1, dev:dm-12
   61.626966] md/raid:mdX: failed to run raid set.
   61.626987] md: pers->run() failed ...
   61.627016] device-mapper: table: 253:13: raid: Fail to run raid array
   61.627031] device-mapper: ioctl: error adding target to table

Trying to resync the LV manually via lvchange or bringing it up in partial mode yields the same error. LVM-tools don’t seem to offer much other options which leaves me at a loss. Any ideas?

Thanks!

I think I have the same problem. I have a LVM with three disks, and one of my LV is a RAID5 setup within LVM (and not the apparatly much more common LVM on top of a mdadm RAID). A disk started to give read errors, so I copied it to a new one with dd. Eveything was working very well after that.

Thne I thought I’d wipe the failling drive and run badblocks on it to see if I can use it for /tmp or something. Well, it crashed the kernel after about 15 hours of running badblocks. And now, after disconnecting the drive and rebooting, I have the same messages that you have.

Whenever I run lvchange -a y myvg/myraid5lv:

  321.576063] device-mapper: raid: Loading target version 1.5.0
  321.624854] md/raid:mdX: device dm-15 operational as raid disk 1
  321.625473] md/raid:mdX: allocated 3282kB
  321.625497] md/raid:mdX: not enough operational devices (2/3 failed)
  321.625503] RAID conf printout:
  321.625505]  --- level:5 rd:3 wd:1
  321.625507]  disk 0, o:1, dev:dm-13
  321.625509]  disk 1, o:1, dev:dm-15
  321.625510]  disk 2, o:1, dev:dm-17
  321.625723] md/raid:mdX: failed to run raid set.
  321.625724] md: pers->run() failed ...
  321.625750] device-mapper: table: 253:18: raid: Fail to run raid array
  321.625753] device-mapper: ioctl: error adding target to table

Except dm-13 and dm-17 are perfectly alright. Other non-raid LVs sharing dm-13 and dm-17 PVs are working. I hexdumped dm-13 and dm-17 first few blocks and the data is still here.

mdadm doesn’t recogize my RAID LV, but I don’t know if it’s supposed to (I didn’t use mdadm to set it up, but googling these error messages gives lots of results of people using mdadm to create several RAID arrays and then joining these in LVM). All the LVM tools are telling me everything is good (lvdisplay for example tells me the LV is available).

I’m not helping you, but for once the guy having the same problem than me (i.e. you!) posted just a few days ago and not 5 years ago. I’m hoping someone will help us :slight_smile:

You came here just right, I’ve got my raid working again since some hours ago.

First off, I know exactly what you’re saying. I too found mostly mdadm-related stuff. LVM and mdadm both use md, the kernel raid driver, but operate on different metadata. So mdadm doesn’t seem to be of any help in our case.

LVM actually uses dm-raid (not to be confused with dmraid). dm-raid is the device-mapper raid target which exposes an interface to md and can be configured through dmsetup. dmraid appears to be a tool to activate preexisting software-raids originating from different platforms (Intel raid for example). It probably does so by using dm-raid, but I don’t know. The thing to note here is that the naming scheme can be confusing.

“dmsetup table” and “dmsetup ls --tree” are two useful commands to examine your current device-mappings. Consult the man page of dmsetup for details on the table layout. This document https://www.kernel.org/doc/Documentation/device-mapper/dm-raid.txt is most useful, it describes the specifics for the dm-raid target and commands you can send to md through “dmsetup message”. “dmsetup status” gives information on the health of your array (if it’s running at least). All this this is exactly what LVM does for you at a somewhat higher-level interface and you might not need this to get your array going again, I just mention it because I ended up setting my mappings by hand.

If you activate your raid volume, LVM should set up all tables except the one for the raid volume itself <vgname>-<lvname>, which I’d expect to be empty.

And now for the relevant part: There’s three mappings to metadata, named <vgname>-<lvname>rmeta<n> or so. These should be available in /dev/mapper/ if your LV is activated. Make a backup of those with dd, then have a look at them in a hex editor. The first 512 bytes should be the dm-raid superblock as described here: http://lxr.free-electrons.com/source/drivers/md/dm-raid.c#L759

It starts with 44 6D 52 64 (DmRd), and lists the failed devices in your array at offset 0x18 as a bitfield.


#|   magic     |  features   | num_devices |  array_pos  |         events          |      failed_devices     |  disk_recovery_offset   |   array_resync_offset   |
1| 44 6D 52 64 | 00 00 00 00 | 03 00 00 00 | 00 00 00 00 | 42 4E 00 00 00 00 00 00 | 04 00 00 00 00 00 00 00 | FF FF FF FF FF FF FF FF | 08 A5 97 2E 00 00 00 00 | ...
2| 44 6D 52 64 | 00 00 00 00 | 03 00 00 00 | 01 00 00 00 | 42 4E 00 00 00 00 00 00 | 04 00 00 00 00 00 00 00 | FF FF FF FF FF FF FF FF | 08 A5 97 2E 00 00 00 00 | ...
3| 44 6D 52 64 | 00 00 00 00 | 03 00 00 00 | 02 00 00 00 | 3F 4E 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | FF FF FF FF FF FF FF FF | 08 84 94 2E 00 00 00 00 | ...

Here, bit number three is set on my two good disks to indicate a failure at array_pos 2. So the n’th bit set indicates a failure of disk n-1.

What I did was reset the fail bit, hoping this would allow me to bring the array up in degraded mode to rescue some data. Things went a bit different than intended though, I accidentially loaded a table that included the third disk, i.e. started the whole array (and lvchange -ay would do just that). This is what happened:


[57164.702478] md/raid:mdX: not clean -- starting background reconstruction
[57164.702491] md/raid:mdX: device dm-7 operational as raid disk 2
[57164.702493] md/raid:mdX: device dm-9 operational as raid disk 1
[57164.702494] md/raid:mdX: device dm-8 operational as raid disk 0
[57164.702772] md/raid:mdX: allocated 3282kB
[57164.702898] md/raid:mdX: raid level 5 active with 3 out of 3 devices, algorithm 2
[57164.702900] RAID conf printout:
[57164.702900]  --- level:5 rd:3 wd:3
[57164.702901]  disk 0, o:1, dev:dm-8
[57164.702902]  disk 1, o:1, dev:dm-9
[57164.702903]  disk 2, o:1, dev:dm-7
[57164.703116] mdX: bitmap file is out of date (20031 < 20034) -- forcing full recovery
[57164.703155] created bitmap (932 pages) for device mdX
[57164.714924] mdX: bitmap file is out of date, doing full recovery
[57166.827017] mdX: bitmap initialized from disk: read 59 pages, set 1907720 of 1907720 bits
[57166.877157] md: resync of RAID array mdX
[57166.877165] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[57166.877166] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[57166.877169] md: using 128k window, over a total of 1953505280k.
[57227.982563] EXT4-fs (dm-13): INFO: recovery required on readonly filesystem
[57227.982566] EXT4-fs (dm-13): write access unavailable, cannot proceed
[57854.670994] EXT4-fs (dm-13): INFO: recovery required on readonly filesystem
[57854.670998] EXT4-fs (dm-13): write access unavailable, cannot proceed
[84245.369084] e1000e: eno1 NIC Link is Down
[84620.398071] ata4.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
[84620.398082] ata4.00: failed command: READ FPDMA QUEUED
[84620.398093] ata4.00: cmd 60/00:00:08:ef:8c/04:00:4e:00:00/40 tag 0 ncq 524288 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[84620.398098] ata4.00: status: { DRDY }
[84620.398103] ata4.00: failed command: READ FPDMA QUEUED
[84620.398111] ata4.00: cmd 60/b0:08:08:f3:8c/03:00:4e:00:00/40 tag 1 ncq 483328 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[84620.398116] ata4.00: status: { DRDY }
[84620.398123] ata4: hard resetting link
[84620.703312] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[84620.718320] ata4.00: configured for UDMA/133
[84620.718333] ata4.00: device reported invalid CHS sector 0
[84620.718340] ata4.00: device reported invalid CHS sector 0
[84620.718369] ata4: EH complete
[102238.456704] md: mdX: resync done.
[102238.710452] RAID conf printout:
[102238.710460]  --- level:5 rd:3 wd:3
[102238.710465]  disk 0, o:1, dev:dm-8
[102238.710468]  disk 1, o:1, dev:dm-9
[102238.710471]  disk 2, o:1, dev:dm-7

md immediately started a full recovery. I specifically set my dm-mappings as read-only and you can see EXT4-fs complaining. Not sure why md was able to resync anyway, I’m pretty certain the individual disks dm-7,8,9 were read-only too, not only the topmost raid-device dm-13. Despite another ata-error the resync completed successfully overnight and all my data appears to be there. Lucky me. I don’t know where to go from here (don’t have much faith in the third disk which is a Seagate by the way), but maybe this gives some pointers to what you could do. Try this at your own risk however and be wary, I sure as hell didn’t know what I was doing at times.

Yes! It works! You have my eternal gratitude!

I had found about dmsetup and tables. dmsetup tables is empty for the RAID LV when it’s down, but I found you can view its RAID table by running lvchange -a y -vvvv myvg/myraid5lv. In the middle of the output, there is something like


Adding target to (253:18): 0 10485760 raid raid5_ls 3 128 region_size 1024 3 253:12 253:13 253:14 253:15 253:16 253:17
dm table   (253:18) OF   [16384] (*1)
dm reload   (253:18) NF   [16384] (*1)
device-mapper: reload ioctl on  failed: Input/output error

(253:18 is dm-18, my RAID LV; 13, 15, 17 are the r_images and 12, 14, 16 the r_meta LVs)

The failed bits on the r_metas was set on two of them, like you said. r_meta_1 (the only one which dm thought was working) had 5, r_meta_2 had 1, r_meta_0 had 0. I reset them all to 0, and the LV immediately came up without any resync or anything. Of course, none of the drives actually really failed. I guess the SATA link itself failed during my badblocks run and dm recorded that. What’s unbelievable is that there doesn’t seem to be any tools to recover from that. We shouldn’t have to hexedit some internal data on the disk to tell dm-raid that really the drive is okay.