mdadm crash... HELP...

I have an mdadm raid 5, with 5 drives running for more that 1 year with no issues
Something really really bad happend and it’s saying I just loose 3 drives
and don’t let me add the other 3 saying “mdadm: /dev/sdc does not appear to be an md device”

here is how the mdadm looks now
mdadm: metadata format 00.90 unknown, ignored.
/dev/md0:
Version : 00.90
Creation Time : Sun Mar 15 03:49:03 2009
Raid Level : raid5
Used Dev Size : 976762368 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sat Nov 20 16:23:39 2010
      State : active, degraded, Not Started

Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

     Layout : left-symmetric
 Chunk Size : 512K

       UUID : a8218870:64885bf0:e9110788:09b4438a
     Events : 0.1715296

Number   Major   Minor   RaidDevice State
   0       8        0        0      active sync   /dev/sda
   1       8       16        1      active sync   /dev/sdb
   2       0        0        2      removed
   3       0        0        3      removed
   4       0        0        4      removed

I’m really really desperate…
Thanks!

Maybe /dev/sdc failed? Does dmesg show that it was detected?

I cannot add any of the other drives…

Never mind about the other RAID arrays. I’m talking about physical drives here, not RAID arrays which are synthesised from multiple partitions. The physical drive sdc, is it present and working or not? If it has failed, then proceed with replacement of the drive and rebuilding of the arrays.

all the drives are present and no issues. What I’m saying the other drives were part of the same raid and now 3 of them are in “removed” state and mdadm keep saying “does not appear to be an md device”

Can you see the partition types on sdc and are they RAID?

If they are, it maybe just a glitch that caused the partitions to be kicked out of the arrays. (I had a RAID1 array desync on me once and it seems it was a SATA controller glitch.) Try to reassemble the arrays then. You’ll need to consult a mdadm manual for this. Maybe there’s a way to do it in YaST from a rescue disk, but I never looked into that.

the issue is not only with sdc. I have 3 drive in the same state
I have try to reassemble and keep say that the other 3 drives are not md devices.
Why I’m seeing “removed” in the 3 devices?
Why I can’t put those back in the raid 5?
Any idea?

Well check sdd and sde also to see if they are detected, working and have valid RAID partitions then. Maybe you had a controller glitch or failure.

I’m seeing all the drives with no issues
Any idea why only 2 of 5 drives being recognized by mdadm?
Can I force this to be recognized?
Thanks a lot
root@nfs:~# mdadm -A --uuid=a8218870:64885bf0:e9110788:09b4438a /dev/md0
mdadm: metadata format 00.90 unknown, ignored.
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

My understanding is that once an array element is kicked out of the array, you have to intervene and repair the array, it doesn’t happen automatically.

Software RAID comes with more responsibilities for the maintainer. If a server is to be overseen by non-techie people and has to stay up, I normally recommend hardware RAID.

I have troubleshoot mdadm in the past, (assemble, grow, monitor, replace drives, and other general troubleshoot), actually I migrate from ubuntu to opensuse with no issue.
I consider my self a techie guy, This has been running with no issue for the last 2 years and half
What happen in this one is very strange. I have 3 drive removed drives with no apparent hardware failure, any advice?
Any general guidelines for deep troubleshoot or recovery?

You can’t generalise from one incident. Maybe you were unlucky today. It may never happen again. Just repair it and move on. On the other hand if it happens regularly…

How can I repair it? Any advice?
If I run mdadm -E /dev/sdX (X is any of the drive that are removed from the raid 5) I get this

mdadm -E /dev/sdc
mdadm: metadata format 00.90 unknown, ignored.
/dev/sdc:
Magic : a92b4efc
Version : 00.90.00
UUID : a8218870:64885bf0:e9110788:09b4438a
Creation Time : Sun Mar 15 03:49:03 2009
Raid Level : raid5
Used Dev Size : 976762368 (931.51 GiB 1000.20 GB)
Array Size : 3907049472 (3726.05 GiB 4000.82 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Sat Nov 20 12:22:15 2010
      State : clean

Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 79b3e1a6 - correct
Events : 1715280

     Layout : left-symmetric
 Chunk Size : 512K

  Number   Major   Minor   RaidDevice State

this 2 8 32 2 active sync /dev/sdc

0 0 8 0 0 active sync /dev/sda
1 1 8 16 1 active sync /dev/sdb
2 2 8 32 2 active sync /dev/sdc
3 3 8 48 3 active sync /dev/sdd
4 4 0 0 4 faulty removed

How can I repair it? or put this drives back in the raid?
Thanks

Well you must have done something to put sdc and sdd back in the array so do the same thing for sde.

sdc and sdd are not part of the array anymore, the last output was from the mdadm --examine.
I’m looking for someone that know mdadm… any help?

So this output is not the current situation?

0 0 8 0 0 active sync /dev/sda
1 1 8 16 1 active sync /dev/sdb
2 2 8 32 2 active sync /dev/sdc
3 3 8 48 3 active sync /dev/sdd
4 4 0 0 4 faulty removed

Did you try a re-add of sde from mdadm?

no, that is not the current situation.
Every time I want to add any of the 3 drives I get “does not appear to be an md device”

This is my current situation
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
2 0 0 2 removed
3 0 0 3 removed
4 0 0 4 removed

For more information, please read the complete thread
Thanks

Sorry, that’s too weird and something I haven’t encountered before. Try a web search on that message. Personally I still think you have some sort of filesystem damage on sdc which makes it impossible to proceed further.

ok… Suppouse we take sdc out of the ecuation. I can take it out of the server. ( I have tried anyway)
What do you mean with proceed further? What is the next step for recovery?
I can start a degradated array with 4 disk, the issue here is I have only 2 and 3 “removed”. How can put a “removed” disk back?
I have tried to mdadm --manage --add and didn’t work?
Thanks

Perhaps you need to mark it failed first with mdadm before you can add it back? Just guessing.