The great Tumbleweed system rescue attempt
Introduction
Loaded my old DVD install disc (“Tumbleweed snapshot 20170414”) into my removable optical drive and re-booted
Interrupted the boot with F12 to access the drive selection screen. Selected my optical drive and hit “Enter” to boot the rescue disc.
Arrived at the install disc grub-screen (whatever). 4 options: Install, Upgrade, (Something else), and “More”. Using the arrow-keys and “Enter”, chose “More”, then, from the new screen, “System Rescue” - the top option.
The system rescue boot started. Chose “English UK” from the language list offered. Landed in a console (no graphical user-interface in the Rescue environment). Logged-in as “root” at the rescue-login prompt - no password required, just hit “Enter”. The system put up a message: “Have a lot of fun…” and the prompt changed to a root prompt “#”
Setting-up the chroot
Got a list of available partitions:
# lsblk
nmve0n1 931.5G
nmve0n1p1 542M
nmve0n1p2 204.8G
nmve0n1p3 12.3G
nmve0n1p4 122.9G
nmve0n1p5 204.8G
nmve0n1p6 200.0G
nmve0n1p7 186.3G
The TW system partition is nmve0n1p6
Mounted it:
# mount /dev/nmve0n1p6 /mnt
Checked that I had mounted the right partition (ahem, just to make sure)
#ls /mnt
“root”, “boot”, and “home” were all listed (among many other folders)
Bind-mounted various necessary folders:
# for i in proc sys dev run; do mount --rbind /$i /mnt/$i ; done
I also mounted my EFI partition on /mnt/boot/efi.
# mount /dev/nmve0n1p1 /mnt/boot/efi
- probably not necessary, but it’s what I do when restoring grub and it couldn’t do any harm (er, could it?)
I then attempted to chroot into the TW system partition
# chroot /mnt
And that’s when everything fell apart. The command returned “Segmentation fault (core dump)”
From Wikipedia:
In computing, a segmentation fault (often shortened to segfault) or access violation is a failure condition raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory (a memory access violation). On standard x86 computers, this is a form of general protection fault.
Wikipedia adds:
The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own,[1] but otherwise the OS default signal handler is used, generally causing abnormal termination of the process (a program crash), and sometimes a core dump.
In a nutshell, the system returns a segfault if a process tries to access memory without the proper authorization. Thus, not only could I not login to my system (even as root), I couldn’t chroot into it either.
Came out of the chroot
# exit
Dismantling the chroot
Tried dismantling what I’d built, but even that was difficult
# umount /mnt/proc
returned:
Umount: /mnt/proc is busy
(In some cases useful information about the processes using the device is found by lsof (88) or fuser (1)
Ditto, sys, dev, and run
[Query: does this behaviour indicate that the chroot had some effect?]
The rescue system didn’t have access to lsof, but “fuser -vm /mnt/proc” returned things like:
USER PID ACCESS COMMAND
root kernel mount /proc
root 1 f.... systemd
root 1792 f.... systemd-journal
And so on
Tried killing these processes:
# kill -9 1792
but other processes took their place
In the end, I resorted to:
# umount -f /mnt/proc
which didn’t work, and then
# umount -l /mnt/proc
which did
Concluded with:
umount /mnt/boot/efi
and:
umount /mnt
I then repeated the chroot-process, missing out the “mount /dev/nvme0n1p1 /mnt/boot/efi” step, but, of course I got the same result
Final Thoughts
Disappointed that I didn’t get beyond the chroot. I was planning to revert the changes to the system brought about by YAST using snapper starting with:
snapper list -t pre-post
To get a list of (pre YAST / Zypper use) YaST and Zypper snapshots. See this previously-posted link (section 3.2.1 “Undoing YaST and Zypper changes”)
https://www.suse.com/support/kb/doc/?id=000018770
But, as knurpht foretold (above), it seems that my system is too damaged to permit this.
Not as bad as a drive-failure. I haven’t lost any work or data.
Still grieves me though (sniff) [Cue sound of “The Last Post” playing in the distance] I installed OS Tumbleweed on my custom Quiet PC desktop back in the Spring of 2017 (sniff), and it’s given excellent service for almost all of the last 8 years (sniff). Only last May, I transferred the system to a bigger drive using btrfs-replace (what a hairy experience that was!)
Can I rebuild? Of course I can.
Full disclosure: The episode that precipitated the demise of my system began with me poking about in the authentication tab of YAST/Security and Users/User and Group management. As soon as I opened the tab, the system started putting up warnings about “problems”, but I didn’t heed those warnings; I closed them out (this was very foolish of me). I then started messing with the authorisation settings for samba (I was trying, here, to deal - off the cuff - with a problem that had nothing to do with my initial reason for opening YAST). I can’t remember what I did now, but I hit “OK” (whatever) to implement my changes (this was even more foolish of me). Almost immediately, - Elmore Leonard says you should never write this - “all Hell broke lose”. A flurry of increasingly desperate sounding error-messages appeared on my screen, concluding with one that read: “Abandoning all hope…” The rest you know.
Yes, I was stupid, but I still think this system-behaviour needs looking into: it shouldn’t be this easy to totally bork your system.
Thank you for indulging me by reading all this