Drive read errors and boot failure after RAID rebuild

Hello,

My RAID6 server had a drive failure recently. I replaced the failed hard drive and rebuilt it using the LSI hardware raid interface. First boot went ok it seemed but after an hour or so I noticed slow downs and the /home share was missing. I rebooted, the raid still showed as optimal in the raid interface. However on booting to opensuse 12.2 (I know its old) I noticed the following errors:

ata4:00 status: { DRDY ERR}
ata4:00 status: { UNC }

see screenshot. & screenshot

I have searched for info on these errors which suggest hardware issues. I would like some guidance on how to check the drives. I have not tried to repair with fsck and i’m not sure if this needs to be done from a live cd or not? I have also tried a second new harddrive, rebuilt the raid and get exactly the same results as above. This leads me to believe it is not due to faulty harddrives. The fact that the Lsi RAID interface shows no errors, and RAID ‘optimal’ , and rebuilt the drive(s) without any error highlighted make me think the raid card is ok. I have not tried switching the raid leads yet.

How should I proceed to resolve this? what advice can you provide.

Cheers

Nigel

So…today I’ve tried a few things> swapped the raid controller leads around. this made no difference. Booted a live gparted cd and ran testdisk. testdisk did not find (identify) the missing partitions. a lot of info about 'large sparse suberblocks ’ but no apparent recovery offered. parted magic could not find the missing partitions either or any significant unused space. It would seem the partitions have just vanished!

It seems that after the rebuild the system booted with all the partitions intact however some data must have been corrupt and caused damage to the file system. I am now considering an clean install which I am not looking forward to. :’(

so much for raid redundancy. >:( Luckily I have all my data backed up to cloud. :wink:

Cheers

Nigel