RAID 5 fail

jehojakim · October 30, 2018, 4:02pm

Hi,

I’m not sure this is the right place to ask, because my problem has AFAIKS mothing to do with the OS (an up-to-date version of TW). But I’d like to give it a try.
I have 4 disks in a raid 5 system, created by mdadm, mounted om /home.
I noticed one of the disks was removed, something that happened one or two times before (in this array, but not with the same disk). In those cases I entered a “mdadm /dev/md0 --add /dev/sdX”, (X being replaced by the drive letter) and mdadm started rebuilding the disk, and added it to the array.

This time, a recovering operation started and ended without errors of warnings, but the disk was not added as a working part to the array, but as a spare disk. So I decided to reboot, but mdadm did not add the disk to the array.
Then I entered mdadm /dev/md0 --re-add /dev/sdd, and mdadm started recovering again. But after a while the process stopped, and now a second disk of the four was removed, and this time mdadm declared that disk to be ‘fault’.

Of course in this state the array is not usable. And while I have reasonably recent back-ups, it would be very nice if I could bring the array to life again.

Any ideas here?

Thanks,

gogalthorp · October 30, 2018, 9:24pm

Did you use smartctrl to check the condition of the drives???

jehojakim · October 30, 2018, 10:19pm

Yes, I did. smartctl -a: Two of the 4 disks show “read failure”, that I have not seen before.

Funny thing is that now (after a reboot) mdadm --detail give 4 of 4 disks working, (so, no disk “removed”, and no disk “fault”; which was what mdadm reported yesterday) but the whole array has state “inactive”.
More funny is the raid level is reported to be raid0; it has to be raid5.

I have an offline test on one of the faulty disks running now. Should end by tomorrow (Oct 31, about 12:15 PM WET).

jehojakim · November 1, 2018, 11:29am

Indeed both disks have read errors. But would it be possible to activate the array, despite those disks not being in good health?
Issuing “mdadm /dev/md0 --assemble” (without explicit mentioning the components; they are already listed by “mdadm --detail /dev/md0”) does’nt seem to do anything.

I am thinking of this option, shown in the man pages:


--assume-clean
              Tell mdadm that the array pre-existed and is known to be clean.  It can be useful when trying to recover from a major failure as you can be sure that no data will be affected unless you actually write to the array.  It can also be used when  creating
              a RAID1 or RAID10 if you want to avoid the initial resync, however this practice — while normally safe — is not recommended.  Use this only if you really know what you are doing.

Another thing is that, as I mentioned in my previous post, mdadm gives raid0 as the raid level for this array. That’s not correct, it should be raid5. How can I tell mdadm that this is a raid5 array? AFAICS it’s not in /etc/mdadm.config.

I’d be grateful for your advises!