So, yesterday I went to lunch and when I got back and tried to unlock my PC, the X server welcomed me with black screen and “x” cursor. I was forced to CTRL+ALT+(BACKSPACE*2) to kill it and try to re-login (thankfully everything was saved before I left). I was surprised to see the same result… I have decided to reboot the system which ended up in rescue console, due to corrupted /home (/dev/sda7) partition. It is an up to date leap installation with vanilla settings, btrfs for mission critical parts of the system and XFS for data partitions like /home. I was a bit puzzled on what was going on (and I still am). I tried to run xfs_repair, as kindly suggested by the rescue console, but it didn’t seem to work quite well. xfs_repair just hangs, no output at all, no progress, no nothing… and I couldn’t even CTRL+C out of it. I had to force reboot the machine every time I tried to run xfs_repair. For some reason it just hangs there… Thankfully I was able to mount another XFS data partition, that lives on a separate 2TB drive… then I **dd **the **** out of /dev/sda7 to a file safely located on the other drive.
dd if=/dev/sda7 of=sda7.img bs=512k
I was surprised to find that xfs_repair happily repairs the 140GB image (with -L because of dirty logs) without any glitch or hesitation in total of 10 seconds. I had few back thoughts, but at the end I shoved back the image to the drive (with dd of course). Reboot. Yey. Everything worked as if it never failed. Nothing seems to be missing, everything is fine and dandy…
The reason I am writing this post here today is, first of all to let others know a safe way to work around this problem, and secondly, because it happened again this morning, but to my home laptop (also up to date leap installation). Unfortunately xfs_repair behaves the same way, and even worse I have no spare space to dump the partition to a file and fix it. Sooo yeah, now I really need to find out why xfs_repair hangs and how to make it work on the physical drive directly. I tried to lie to it and give it the -f option to assume the drive is a file which at least returned an output (error) stating that “Device or resource is busy”. Hm… if this isn’t handled well without the -f option, this could somehow explain the hanging… However, I have no idea what could be keeping it “busy” in such a way that blocks xfs_repair from doing it’s trick, but doesn’t concern dd at all.
More info:
My office machine is pretty much a brand new monster and the problematic partition is on a 250gb samsung 850 evo ssd drive. As a contrast, the poor laptop at home is 8 years old dell studio 1535 with a WD 320gb mechanical drive. Same OS. Updated on daily basis.
Any ideas?