Rebuild raid

I have a Suse 12.1 with RAID 5 main disk.

The array state according to mdadm the sate is: clean, resyncing (PENDING)

On booting the disk is mounted read only.

When trying to rebuild with mdadm --run /dev/md126 result I get is:

mdadm: failed to run array /dev/md126: Read-only file system

Searching the internet I can find nothing that tells me how to sort this.

I have installed another hard disk with another copy of Suse so can read the dive and have backed it up so as a last resort I can delete the volume and reinstall but that would defeat the point of having raid 5.

Any ideas on how I can get this PENDING resync to actually happen ?

Ian

Hello igodman,

We like to see the complete and unabridged computer text that goes with problem descriptions.Thus we do not like stories like

The array state according to mdadm the sate is: clean, resyncing (PENDING)

but we like to see the statement you gave and the output that resulted copied/pasted from your terminal session directly into a post here. And please put those bewteen CODE tags as explained here: http://forums.opensuse.org/english/information-new-users/advanced-how-faq-read-only/451526-posting-code-tags-guide.html

To give you an example:
I do not say: “I have an executable called mmscheck in the bin of my home directory”
but I post:

henk@boven:~> ls -l bin
totaal 24
-rwxr--r-- 1 henk wij   434 23 jul 18:43 fototime
-rwxr-xr-x 1 henk wij 19126 23 jan  2011 mmcheck
henk@boven:~>

On 2012-07-28 11:36, hcvv wrote:
>
> Hello igodman,
>
> We like to see the complete and unabridged computer text that goes with
> problem descriptions.Thus we do not like stories like
>> The array state according to mdadm the sate is: clean, resyncing
>> (PENDING)
> but we like to see the statement you gave and the output that resulted
> copied/pasted from your terminal session directly into a post here. And
> please put those bewteen CODE tags as explained here:
> http://tinyurl.com/2wwx7l9

I have a raid array 5 for testing. Let’s see what happens.


Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sat Jul 28 13:26:27 2012
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

Name : Telcontar:0  (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 42

Number   Major   Minor   RaidDevice State
0       8       11        0      active sync   /dev/sda11
1       8       26        1      active sync   /dev/sdb10
3       8       45        2      active sync   /dev/sdc13
Telcontar:~ #

Now I remove a disk:


Telcontar:~ # mdadm --manage --set-faulty /dev/md0 /dev/sda11
mdadm: set /dev/sda11 faulty in /dev/md0
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sat Jul 28 13:41:25 2012
State : active, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

Name : Telcontar:0  (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 43

Number   Major   Minor   RaidDevice State
0       0        0        0      removed
1       8       26        1      active sync   /dev/sdb10
3       8       45        2      active sync   /dev/sdc13

0       8       11        -      faulty spare   /dev/sda11
Telcontar:~ #

And I try to add it again:


Telcontar:~ # mdadm /dev/md0 -a /dev/sda11
mdadm: Cannot open /dev/sda11: Device or resource busy


Busy? Doing what?

That’s the procedure I have in my notes, it worked once. Maybe it can not be re-added because
it is listed as faulty spare :-?

So there must be something else I have to do, but I do not know what. Lets have a look at the
manual:


-a, --add
hot-add  listed  devices.   If a device appears to have recently
been part of the array (possibly it failed or was  removed)  the
device is re-added as describe in the next point.  If that fails
or the device was never part of the array, the device  is  added
as  a  hot-spare.  If the array is degraded, it will immediately
start to rebuild data onto that spare.

Note that this and the following options are only meaningful  on
array with redundancy.  They don't apply to RAID0 or Linear.

--re-add
re-add a device that was previous removed from an array.  If the
metadata on the device reports that it is a member of the array,
and  the slot that it used is still vacant, then the device will
be added back to the array in the same position.  This will nor-
mally  cause  the data for that device to be recovered.  However
based on the event count on the device, the  recovery  may  only
require  sections  that  are flagged a write-intent bitmap to be
recovered or may not require any recovery at all.

When used on an array that has no metadata (i.e.  it  was  built
with  --build)  it will be assumed that bitmap-based recovery is
enough to make the device fully consistent with the array.

If the device name given is missing then mdadm will try to  find
any  device  that  looks like it should be part of the array but
isn't and will try to re-add all such devices.

-r, --remove
remove listed devices.  They must  not  be  active.   i.e.  they
should  be  failed  or  spare devices.  As well as the name of a
device file (e.g.  /dev/sda1) the words failed and detached  can
be  given to --remove.  The first causes all failed device to be
removed.  The second causes any device which is no  longer  con-
nected  to  the  system  (i.e  an  'open'  returns  ENXIO) to be
removed.  This will only succeed for devices that are spares  or
have already been marked as failed.

re-add does not work, same problem. But if I remove and re-add, it works.


Telcontar:~ # mdadm /dev/md0 --remove /dev/sda11
mdadm: hot removed /dev/sda11 from /dev/md0
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sat Jul 28 13:58:59 2012
State : active, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

Name : Telcontar:0  (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 52

Number   Major   Minor   RaidDevice State
0       0        0        0      removed
1       8       26        1      active sync   /dev/sdb10
3       8       45        2      active sync   /dev/sdc13
Telcontar:~ # mdadm /dev/md0 --re-add /dev/sda11
mdadm: re-added /dev/sda11
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sat Jul 28 13:59:12 2012
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

Name : Telcontar:0  (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 58

Number   Major   Minor   RaidDevice State
0       8       11        0      active sync   /dev/sda11
1       8       26        1      active sync   /dev/sdb10
3       8       45        2      active sync   /dev/sdc13


I think there is a better way to readd a faulty spare, but I don’t know it.

So, igodman, the first thing you have to do is to show the status just as I did above. Not
descriptions, proofs :slight_smile:


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

complete output:


mdadm --detail /dev/md126
/dev/md126:
      Container : /dev/md127, member 0
     Raid Level : raid5
     Array Size : 976768000 (931.52 GiB 1000.21 GB)
  Used Dev Size : 488384128 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3

          State : clean, resyncing (PENDING) 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 64K

           UUID : 95c5ab6e:f552981b:604f027e:440a6614
    Number   Major   Minor   RaidDevice State
       2       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       0       8       32        2      active sync   /dev/sdc

As you can see no errors reported just resyncing (PENDING) hence the questions how to change from resyncing (PENDING) to actually doing a resync.

Further examing a drive gives:


mdadm --examine /dev/sda
/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.02
    Orig Family : 1a34ee2f
         Family : 1a34ee2f
     Generation : 000a7b39
     Attributes : All supported
           UUID : 89fb7f48:ccfe7f7d:76d5b428:bb7d5fc3
       Checksum : 25fbeff9 correct
    MPB Sectors : 2
          Disks : 3
   RAID Devices : 1

  Disk00 Serial : 490831IS651689
          State : active
             Id : 00000000
    Usable Size : 976768264 (465.76 GiB 500.11 GB)

[main]:
           UUID : 95c5ab6e:f552981b:604f027e:440a6614
     RAID Level : 5 <-- 5
        Members : 3 <-- 3
          Slots : [UUU] <-- [UUU]
    Failed disk : none
      This Slot : 0
     Array Size : 1953536000 (931.52 GiB 1000.21 GB)
   Per Dev Size : 976768264 (465.76 GiB 500.11 GB)
  Sector Offset : 0
    Num Stripes : 7631000
     Chunk Size : 64 KiB <-- 64 KiB
       Reserved : 0
  Migrate State : repair
      Map State : normal <-- normal
     Checkpoint : 0 (384)
    Dirty State : dirty

  Disk01 Serial : S20BJDWS945091
          State : active
             Id : 00010000
    Usable Size : 976768264 (465.76 GiB 500.11 GB)

  Disk02 Serial : S1VZJ90S722325
          State : active
             Id : 00020000
    Usable Size : 976768264 (465.76 GiB 500.11 GB)

Get the same result on all 3 drives.

I cant see any problem other than that resyncing (PENDING) and has been for 3 days!

Having read the post hcvv showing removing a disk and replacing it and then assembling the raid I tried this myself:


mdadm --manage --set-faulty /dev/md126 /dev/sda1
mdadm: set device faulty failed for /dev/sda1:  Read-only file system

So far I have not been able to do anything with this raid. I have dhad to install a new copy of the OS on a separate drive as the original Suse linux install gives lots of error because the root file system is read only!

The new copy of the OS also mounts the raid array filesystem read only, fstab entry is:


/dev/disk/by-id/md-uuid-95c5ab6e:f552981b:604f027e:440a6614-part1 /raid_disk           ext4       defaults              1 2

On 2012-07-28 17:16, igodman wrote:
> State : clean, resyncing (PENDING)

Have a look here. Apparently it
starts working when you try writing to it (fsck).


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2012-07-28 17:46, igodman wrote:
>
> Having read the post hcvv showing removing a disk and replacing it and
> then assembling the raid I tried this myself:

That was me, not hcw :slight_smile:

> Code:
> --------------------
>
> mdadm --manage --set-faulty /dev/md126 /dev/sda1
> mdadm: set device faulty failed for /dev/sda1: Read-only file system
>
> --------------------

The man page says:


-o, --readonly
mark array as readonly.


-w, --readwrite
mark array as readwrite.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Sincere apologies for the incorrect attribution :shame:

However your answer below fixed the problem.

Having run the command


mdamd -w /dev/md126

The state is now: clean, reshaping :slight_smile:

On 2012-07-29 10:26, igodman wrote:
>
> Sincere apologies for the incorrect attribution :shame:

No problem.

>
> However your answer below fixed the problem.
>
> Having run the command
>
> Code:
> --------------------
>
> mdamd -w /dev/md126
>
> --------------------
>
>
> The state is now: clean, reshaping :slight_smile:

Good! :slight_smile:


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)