fail to mount raid5 xfs partition

I have a raid 5 partition /dev/md0 that is no longer mounted after reboots.

The file system created on that mount point is xfs.

I have done the following to debug this but do not understand the root cause. Can anyone help me understand what is the issue and how to fix it (I am not an expert in this stuff).

This has happened “out of the blue” after a couple of years of trouble free running.

fsck /dev/md0

fsck 1.40.2 (12-Jul-2007)
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_check(8) and xfs_repair(8).

xfs_check /dev/md0

xfs_check: /dev/md0 is invalid (cannot read first 512 bytes)

mdadm --detail /dev/md0

mdadm: md device /dev/md0 does not appear to be active.

cat /proc/mdstat

Personalities :
md0 : inactive sda50 sdd53 sdc52 sdb51
977522688 blocks

unused devices: <none>

Tentative conclusions

So it appears that the raid5 is “inactive”.
How can I find out why?
How can I activate it?

System Info

uname -a:
Linux nas 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC i686 athlon i386 GNU/Linux

openSUSE 10.3

Any help/suggestions much appreciated!

A bit more info…

/var/log/messages

There is nothing reported in here that indicates the problem (to me).

disk failure

I pretty sure thee is not an out and out disk failure because
sda1 has /boot
sda2 has /
sdb2 has /var
sdc2 has /tmp
sdd2 has swap
I can see /tmp, /var and (obviously) /.
Though no swap is used, “free” indicates it correctly so I assume it is ok.

try also looking for some hints through dmesg

i.e,

> dmesg | more

or echo it to a file that you can browse through, or grep through, etc…

> dmesg > dmesg.txt

Also, boot-time messages are logged at /var/log/boot.msg (and previous boot at /var/log/boot.omsg)

good luck!

Thanks - more info:

dmesg | more

(relevant excerpt, that don’t actually tell me much more than it is probably still a raid problem)…

md: bind<sdb5>
md: bind<sdc5>
md: bind<sdd5>
md: bind<sda5>
ieee1394: Host added: ID:BUS[0-00:1023] GUID[0011d80000916254]
loop: module loaded
SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled
SGI XFS Quota Management subsystem
XFS: SB read failed

md: could not bd_claim sda5.
md: could not bd_claim sdb5.
md: could not bd_claim sdc5.
md: could not bd_claim sdd5.
md: autorun …
md: … autorun DONE.

Any ideas?

Thanks

more info that may be relevant…

mount /mnt/share

mount: /dev/md0: can’t read superblock

Solved:

Run ‘mdadm -E /dev/X’ for each raided partition.

This showed one partition marked as failure.

I think (since other paritions on this disk are ok) there was a data corruption which was causing the raid to fail in general.

Then did
mdadm -A /dev/md0 -f -U summaries /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5
which seemed to get it going again, apparently ok.

After backing up the raid munt, I then rebuilt it by simply adding in the dodgy partition again…

mdadm /dev/md0 --add /dev/sdb5

Doing ‘cat /proc/mdstat’ showed progress and ETA nicely.