Request help trouble-shooting boot fail on mature system

Last thing I did before boot fail was Ctrl-Alt-Del out of boot process because I missed my first chance to re-direct the motherboard to boot from USB. So I did that, then next time I tried to boot the main SSD I ended with “You are in emergency mode…” Fortunately, the USB stick in question is a backup system, so I’m not in too bad a situation, plus I have a separate home partition that’ll preserve everything important if I have to re-install. However, the btrfs root partition doesn’t seem corrupted, as I can explore it when I boot from USB. What I don’t know is what files to check for corruption.

I’ve used linux for years but never had occasion to check logs; “journalctl -xb” from the emergency mode prompt yields a nearly 3k lines that mostly look fine, ending with “The unit plymouth-start.service has successfully enetered the dead state.”

About 30 lines prior, it says “Failed to get new runlevel, utmp update skipped.”

If anyone has experience fixing a problem of this nature, is it easier to just reinstall? And if not, where do I find the file that journalctl reads, and what should I ^F search for to explain where the hiccup is?

I normally tackle issues like this by searching for answers already given rather than asking for new ones, but I do that with an error message of some kind, and in this case I don’t have any unique terms to search for, so thanks for any pointers you can offer.

GEF

That should not cause any problems.

So I did that, then next time I tried to boot the main SSD I ended with “You are in emergency mode…”

The most common cause of this problem, is that a file system failed to mount.

I suppose a disk failure (SSD failure) is a possibility. I suggest you check the output of “df” to see what has been successfully mounted. Or maybe the output of “mount” which will tell you whether they are mounted read-only or read-write.

However, the btrfs root partition doesn’t seem corrupted, as I can explore it when I boot from USB. What I don’t know is what files to check for corruption.

Are you able to mount the various subvolumes when booting from USB?

I’ve used linux for years but never had occasion to check logs; “journalctl -xb” from the emergency mode prompt yields a nearly 3k lines that mostly look fine, ending with “The unit plymouth-start.service has successfully enetered the dead state.”

That plymouth line is unlikely to be a problem.

About 30 lines prior, it says “Failed to get new runlevel, utmp update skipped.”

That might be significant. Maybe check a few lines before that, if you can.

Thanks for the reply, nrickert. Sorry I haven’t had the kind of day that allowed much time for troubleshooting, but I can say per mount command that the only thing read-only is tmpfs on sys/fs/cgroup. However, when I escape out of the Tumbleweed bootsplash, I see the usual output full of green OKs going by too fast to read otherwise, until it gets to dev/mapper/xxx. For xxx, substitute text changing too fast for me to see, but this is not scrolling, just one line that keeps changing right before I hit emergency mode. Looking through journalctl -xb some more, I see “Dependency failed for Local File Systems” and “Failed to start File System Check on /dev/system/home” and “systemd-fsck0dev-system-home.service: Failed with result exit-code.” If I do have a system directory that’s not mounting, does that mean it’s time for a new drive? -GEF

OK, back in business, looks like there was corruption on my home partition (ext4), not the root partition (btrfs). I ran fsck on /home and agreed to all its suggestions, so now I can boot. It looked like the bits were plasma config files like kwinrc, so I may have lost some desktop settings. Is an occasional glitch like this nothing to worry about, or a sign that the cheapest SSD I could find was a little too cheap?

In order to decide this one, I recommend having smartd run in the background (maybe you have it installed and active already) or do a »smartctl -a /dev/yourSSDdevice« as root.

I found it a good idea to redirect each smartctl output into a separate file (I do it monthly with my backups) and keep those logs around. It’s informative to compare changes over time:

~/work/smartctl # **grep -i *writ* sda_log/* sda20***
sda_log/smart_a0:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       39186680
sda_log/smart_a1:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       200921496
sda_log/smart_a2:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       312409656
sda_log/smart_a3:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       499824296
. . .
sda_log/smart_aC:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       4534603626
sda_log/smart_aD:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       4662358298
sda_log/smart_aE:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       4765289426
sda_log/smart_aF:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       5363684346
sda20190804.text:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       5601105098
sda20191205.text:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       6152969226
sda20200101.text:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       6270854474
sda20200207.text:241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       6421453330

~/work/smartctl # **grep -i *temp* sda_log/* sda20***
sda_log/smart_a0:190 Airflow_Temperature_Cel 0x0032   071   066   000    Old_age   Always       -       29
sda_log/smart_a1:190 Airflow_Temperature_Cel 0x0032   065   055   000    Old_age   Always       -       35
sda_log/smart_a2:190 Airflow_Temperature_Cel 0x0032   066   053   000    Old_age   Always       -       34
. . .
sda_log/smart_aF:190 Airflow_Temperature_Cel 0x0032   066   047   000    Old_age   Always       -       34
sda20190804.text:190 Airflow_Temperature_Cel 0x0032   065   047   000    Old_age   Always       -       35
sda20191205.text:190 Airflow_Temperature_Cel 0x0032   077   047   000    Old_age   Always       -       23
sda20200101.text:190 Airflow_Temperature_Cel 0x0032   078   047   000    Old_age   Always       -       22
sda20200207.text:190 Airflow_Temperature_Cel 0x0032   064   047   000    Old_age   Always       -       36

~/work/smartctl # **grep -i *error.count* sda_log/* sda20***
sda_log/smart_a0:199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
. . .
sda_log/smart_aF:199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
sda20190804.text:199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
sda20191205.text:199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
sda20200101.text:199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
sda20200207.text:199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0

:~/work/smartctl # _

No errors and just mild temparatures in four years and counting. Pheww… :slight_smile:

Hi
What mount options are used, no tweaks or just the defaults?

Thanks for the tip, Eleventy-One.

Hi Malcom, yes just defaults per guided setup during installation.