On 2012-07-28 11:36, hcvv wrote:
>
> Hello igodman,
>
> We like to see the complete and unabridged computer text that goes with
> problem descriptions.Thus we do not like stories like
>> The array state according to mdadm the sate is: clean, resyncing
>> (PENDING)
> but we like to see the statement you gave and the output that resulted
> copied/pasted from your terminal session directly into a post here. And
> please put those bewteen CODE tags as explained here:
> http://tinyurl.com/2wwx7l9
I have a raid array 5 for testing. Let’s see what happens.
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jul 28 13:26:27 2012
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : Telcontar:0 (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 42
Number Major Minor RaidDevice State
0 8 11 0 active sync /dev/sda11
1 8 26 1 active sync /dev/sdb10
3 8 45 2 active sync /dev/sdc13
Telcontar:~ #
Now I remove a disk:
Telcontar:~ # mdadm --manage --set-faulty /dev/md0 /dev/sda11
mdadm: set /dev/sda11 faulty in /dev/md0
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jul 28 13:41:25 2012
State : active, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : Telcontar:0 (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 43
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 26 1 active sync /dev/sdb10
3 8 45 2 active sync /dev/sdc13
0 8 11 - faulty spare /dev/sda11
Telcontar:~ #
And I try to add it again:
Telcontar:~ # mdadm /dev/md0 -a /dev/sda11
mdadm: Cannot open /dev/sda11: Device or resource busy
Busy? Doing what?
That’s the procedure I have in my notes, it worked once. Maybe it can not be re-added because
it is listed as faulty spare :-?
So there must be something else I have to do, but I do not know what. Lets have a look at the
manual:
-a, --add
hot-add listed devices. If a device appears to have recently
been part of the array (possibly it failed or was removed) the
device is re-added as describe in the next point. If that fails
or the device was never part of the array, the device is added
as a hot-spare. If the array is degraded, it will immediately
start to rebuild data onto that spare.
Note that this and the following options are only meaningful on
array with redundancy. They don't apply to RAID0 or Linear.
--re-add
re-add a device that was previous removed from an array. If the
metadata on the device reports that it is a member of the array,
and the slot that it used is still vacant, then the device will
be added back to the array in the same position. This will nor-
mally cause the data for that device to be recovered. However
based on the event count on the device, the recovery may only
require sections that are flagged a write-intent bitmap to be
recovered or may not require any recovery at all.
When used on an array that has no metadata (i.e. it was built
with --build) it will be assumed that bitmap-based recovery is
enough to make the device fully consistent with the array.
If the device name given is missing then mdadm will try to find
any device that looks like it should be part of the array but
isn't and will try to re-add all such devices.
-r, --remove
remove listed devices. They must not be active. i.e. they
should be failed or spare devices. As well as the name of a
device file (e.g. /dev/sda1) the words failed and detached can
be given to --remove. The first causes all failed device to be
removed. The second causes any device which is no longer con-
nected to the system (i.e an 'open' returns ENXIO) to be
removed. This will only succeed for devices that are spares or
have already been marked as failed.
re-add does not work, same problem. But if I remove and re-add, it works.
Telcontar:~ # mdadm /dev/md0 --remove /dev/sda11
mdadm: hot removed /dev/sda11 from /dev/md0
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jul 28 13:58:59 2012
State : active, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : Telcontar:0 (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 52
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 26 1 active sync /dev/sdb10
3 8 45 2 active sync /dev/sdc13
Telcontar:~ # mdadm /dev/md0 --re-add /dev/sda11
mdadm: re-added /dev/sda11
Telcontar:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Sun Apr 29 14:08:34 2012
Raid Level : raid5
Array Size : 25173504 (24.01 GiB 25.78 GB)
Used Dev Size : 12586752 (12.00 GiB 12.89 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Jul 28 13:59:12 2012
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : Telcontar:0 (local to host Telcontar)
UUID : 825b22e8:af550e83:93727666:fb8987fd
Events : 58
Number Major Minor RaidDevice State
0 8 11 0 active sync /dev/sda11
1 8 26 1 active sync /dev/sdb10
3 8 45 2 active sync /dev/sdc13
I think there is a better way to readd a faulty spare, but I don’t know it.
So, igodman, the first thing you have to do is to show the status just as I did above. Not
descriptions, proofs 
–
Cheers / Saludos,
Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)