Hi,
I have just spent hours on Google trying to find a solution to this problem, but nothing helped.
I am using OpenSuSE 10.2. I set the system up with two disks in a software raid, with three partitions each: one swap (2GB), / (20GB) and user data (the rest).
/root>cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [linear]
md2 : active raid1 sda3[0] sdb3[1]
367631360 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
20972736 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
2104384 blocks [2/2] [UU]
Later, I decided for performance reasons to remove the md0 swap partition and convert the two partitions to regular swap space. However, I can not delete the partition, I always get a "device or resource busy" error.
What I did first was to fail and remove the sdb1 partition from the md0 device. This worked fine:
/root>mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
(... other success message).
Then I turned off the swap on /dev/md0 using "swapoff /dev/md0".
This is where the trouble started. Removing the second device from the RAID did not work:
/root>mdadm /dev/md0 -f /dev/sda1 -r /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md0
mdadm: hot remove failed for /dev/sda1: Device or resource busy
Neither did stopping the RAID:
/root>mdadm -S /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
Rebooting did not change anything. Even after a reboot, the device is still busy.
After trying everything I could think of to get rid of the RAID, I tried it by force and used fdisk to delete the /dev/sda1 partition from the sda disk. This is where it got really weird: after the reboot, the /dev/sdb1 partition (the one I had previously failed and removed from the RAID) was back in the RAID:
md0 : active(auto-read-only) raid1 sdb1[1]
2104384 blocks [2/1] [_U]
Of course, the device was still busy and could not be removed.
I just deleted /dev/sdb1 using fdisk. Guess what: now the system does not boot anymore:
md: md0 stopped.
mdadm: no devices found for /dev/md0
Invoking userspace resume from /dev/md0
Invoking in-kernel resume from /dev/md0
Attempting manual resume
I/O error reading swsusp image.
This is insane. Is there any way to delete a RAID short of reinstalling the whole system from scratch? Who could help with this problem? Is there a mailing list for mdadm or something like that?
BTW what the heck is swsusp? Does this mean that a reboot in OpenSuSE 10.2 is no longer a reboot, but a suspend to disk? If it is, then it is no wonder that rebooting does not fix any problems. How do I get around this?
Regards,
Arno



Bookmarks