Tumbleweed update - stalled boot

idee · October 19, 2020, 6:40pm

Performed a zypper dup Tumbleweed update.
My root partition needs more space.
I rebooted into another partition with another installation LXQT and used the partition tool in YaST to expand the root.
I then used blkid to update and verify the fstab in the primary partition.
Rebooting to that install, it stalls with the Cylon red eye “working” status looking for the root partition.

OK ] Started Show Plymouth Boot Screen.
OK ] Started Forward Password Requests to Plymouth Directory Watch.
OK ] Reached target Paths.
OK ] Found device SAMSUNG_HD161GJ 2.
OK ] Reached target Initrd Root Device.
OK ] Finished dracut initqueue hook.
OK ] Reached target Remote File Systems (Pre).
OK ] Reached target Remote File Systems.
*** ] A start job is running for /dev/disk/by-uuid /9555609 … (10h 42min 52s / no limit)

the *** is the red Cylon searching eye and it never stops, stalls there.

Looks like it is not finding the target.
I assume there might be something in Grub or Boot that needs to be updated.

Any help is appreciated.
Thanks

idee · October 20, 2020, 12:41am

Any thoughts for help? Tks

tsu2 · October 20, 2020, 5:22am

Looks to me that the filesystem is damaged and might be running a fsck.
What is the format? BTRFS?

You may need to just wait until the fsck completes, and presumably should repair any problems it encounters.
Yes… If your partition/filesystem is very big and your drive is slow, it can take a very, very long time.

Cross your fingers, or do anything else if you’re superstitious… and do something else while you’re waiting.
And hope that you won’t still have problems when the task completes.

TSU

idee · October 20, 2020, 6:09am

Thanks,
I set it up to run overnight. Hopefully it will clean itself out soon.
The partition is ext4
Kernel is 5.8.14.1-default

Is there anything I can do to clean off space on the root drive?

Below are the two fstabs on my system.

My primary, that was updated and has the problem:

/etc/fstab: static file system information.#

Use ‘blkid’ to print the universally unique identifier for a device; this may

be used with UUID= as a more robust way to name devices that works even if

disks are added and removed. See fstab(5).

<file system> <mount point> <type> <options> <dump> <pass>

UUID=6dae7c25-d985-44e6-b386-23c7d3a7af64 / ext4 noatime,discard 0 1
UUID=7b77b22b-a19f-4f0e-ab29-ae35449f5f39 swap swap defaults 0 0
UUID=74d45143-adf6-41b3-be64-dbf00eec672e swap swap defaults 0 0
UUID=9e4b2cbe-4227-49f6-bc01-c133785b592d /home ext4 noatime,discard 0 2
UUID=48ABE0A128C0D54D /backups ntfs-3g uid=dad,gid=users,umask=0022 0 2
UUID=420EFA6B0EFA56FF /windows-D ntfs-3g uid=dad,gid=users,umask=0022 0 2
UUID=F84ECD3B4ECCF402 /windows-C ntfs noatime 0 2
UUID=c89bdf78-0c7b-4cb2-94af-7505756efe72 /linux-back9-LXQT ext4 noatime 0 2
tmpfs /tmp tmpfs noatime,mode=1777 0 0

My backup that does boot and has not been updated recently:

/etc/fstab: static file system information.#

Use ‘blkid’ to print the universally unique identifier for a device; this may

be used with UUID= as a more robust way to name devices that works even if

disks are added and removed. See fstab(5).

<file system> <mount point> <type> <options> <dump> <pass>

UUID=F84ECD3B4ECCF402 /windows-C ntfs noatime 0 2
UUID=6dae7c25-d985-44e6-b386-23c7d3a7af64 /linux-front9-Cinnamon ext4 noatime 0 2
UUID=7b77b22b-a19f-4f0e-ab29-ae35449f5f39 swap swap defaults 0 0
UUID=74d45143-adf6-41b3-be64-dbf00eec672e swap swap defaults 0 0
UUID=9e4b2cbe-4227-49f6-bc01-c133785b592d /home ext4 noatime,discard 0 2
UUID=48ABE0A128C0D54D /backups ntfs-3g uid=dad,gid=users,umask=0022 0 2
UUID=420EFA6B0EFA56FF /windows-D ntfs-3g uid=dad,gid=users,umask=0022 0 2
UUID=c89bdf78-0c7b-4cb2-94af-7505756efe72 / ext4 noatime,discard 0 1
tmpfs /tmp tmpfs noatime,mode=1777 0 0

karlmistelberger · October 20, 2020, 6:51am

Boot into the other partition, mount the broken system and show used disk space:

3400G:~ # mount /dev/sdb2 /mnt/
3400G:~ # df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2        30G  5.8G   23G  21% /mnt
3400G:~ # du -hd1 -t1m /mnt/
45M     /mnt/boot
288M    /mnt/var
20M     /mnt/etc
709M    /mnt/lib
9.8M    /mnt/sbin
11M     /mnt/lib64
11M     /mnt/root
4.7G    /mnt/usr
5.8G    /mnt/
3400G:~ #

You may access the journal by running

journalctl --directory /mnt/var/log/journal/..../ -b

Go to the end of the journal and report.

mrmazda · October 20, 2020, 6:58am

If your journal is persistent, it could be gobbling a huge amount of space. If you are able to boot it, then use journalctl to manage it. If you have to boot your other installation, but it’s mountable, then you can do it manually by deleting old files from the subdirectory located in /var/log/journal/.
You can buy a little space until next dup by removing /var/log/updateTestcase*.
Check to see if purge-kernels service has been running. Two installed kernels for most people is plenty.
Inspect /tmp/ for old files, and delete if any are present.
*]Most people need only the latest version of libLLVM. If you have more than one installed, consider uninstalling the older.

karlmistelberger · October 20, 2020, 7:59am

Manual deletion is tedious. Use

journalctl --directory /mnt/var... --disk-usage

to display size and

journalctl --directory /mnt/var... --vacuum-size ...

for freeing disk space.

Inspect /tmp/ for old files, and delete if any are present.

Remove /tmp form /etc/fstab. This moves /tmp to RAM. You will no longer need to worry about its growth.

susejunky · October 20, 2020, 9:29am

When a partition gets resized the file system on that partition needs to be resized as well and that might take time.

I guess YaST will run resize2fs for you but probably in the background. So chances are that the resizing was not finished yet when you left your LXQT installation and you ended up with a (very) unclean file system and now an fsck is running.

Regards

susejunky

mrmazda · October 20, 2020, 10:28am

I can have it done faster with MC than it takes to find out how using the journalctl man page or a web search for the missing examples typical of man pages, or to learn vacuum means recover wasted space. A dozen or fewer keystrokes to get there and do it is far less tedium than man page or web searching.

karlmistelberger · October 20, 2020, 11:08am

I read in the man page once about --vacuum in 2014 and remember since then. When unsure I hit tab to list the options (1 keystroke):

i3-4130:~ # journalctl --
--after-cursor    --disk-usage      --force           --list-boots      --no-tail         --rotate          --update-catalog  --verify-key
--all             --dmesg           --full            --list-catalog    --output          --setup-keys      --user            --version
--boot            --dump-catalog    --grep            --local           --output-fields   --show-cursor     --user-unit       
--case-sensitive  --field           --header          --machine         --pager-end       --since           --utc             
--catalog         --fields          --help            --merge           --priority        --sync            --vacuum-files    
--cursor          --file            --identifier      --no-full         --quiet           --system          --vacuum-size     
--cursor-file     --flush           --interval        --no-hostname     --reverse         --unit            --vacuum-time     
--directory       --follow          --lines           --no-pager        --root            --until           --verify          
i3-4130:~ # journalctl --

mrmazda · October 20, 2020, 1:03pm

As a general rule, no examples, no comprende. So, while the options listing is nice to know, by itself, a list of options generally falls short enough to be a why bother.

idee · October 20, 2020, 9:09pm

Quick update. It has now been running for 15+ hours.
I will start back into it after work and pull some of the other checks requested then.

idee · October 21, 2020, 7:15am

25 hours plus and it is still running the loop, but not booting.
I have rebooted into my backup distro and will start working on it tomorrow.

karlmistelberger · October 21, 2020, 7:32am

susejunky · October 21, 2020, 10:06am

On the rare occasions when i had to resize a partition with an ext4-filesystem on it i always did it manually (i.e. using gdisk and resize2fs). I do not know how YaST handles this.

Two problem areas come to my mind:

man resize2fs
says:“When recreating the partition, make sure you create it with the same starting disk cylinder as before! Otherwise, the resize operation will certainly not work, and you may lose your entire filesystem.”
An interrupted filesystem resize operation may result in an unusable (un-repairable?) filesystem.

Startup your LXQT-installation and try to run e2fsck. I don’t know if it is possible to restart resize2fs.

And keep in mind that depending on the size of the partition and the speed of the disk such operations can be quite lengthy.

Regards

susejunky

idee · October 22, 2020, 2:04pm

karlmistelberger:

Boot into the other partition, mount the broken system and show used disk space:

3400G:~ # mount /dev/sdb2 /mnt/
3400G:~ # df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2        30G  5.8G   23G  21% /mnt
3400G:~ # du -hd1 -t1m /mnt/
45M     /mnt/boot
288M    /mnt/var
20M     /mnt/etc
709M    /mnt/lib
9.8M    /mnt/sbin
11M     /mnt/lib64
11M     /mnt/root
4.7G    /mnt/usr
5.8G    /mnt/
3400G:~ #

You may access the journal by running

journalctl --directory /mnt/var/log/journal/..../ -b

Go to the end of the journal and report.

Finally got time to dig in.

dad-pc:/home/dad # df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        40G   29G  9.3G  76% /mnt
 
dad-pc:/home/dad # du -hd1 -t1m /mnt/
2.9G     /mnt/var
11M      /mnt/lib64
2.0G     /mnt/lib
12G      /mnt/usr
313M    /mnt/tmp
1.3G     /mnt/opt
1.9M     /mnt/lost+found
406M    /mnt/boot
23M      /mnt/etc
8.1M     /mnt/sbin
11G      /mnt/root
29G      /mnt/

It didn’t find the journal

dad-pc:/home/dad # journalctl --directory /mnt/var/log/journal/..../ -b
Failed to open /mnt/var/log/journal/..../: No such file or directory
 
dad-pc:/home/dad # dir /mnt/var/log
alternatives.log    lightdm                                              wpa_supplicant.log
apache2                           ntp                                                   wpslog
audit                    nvidia-installer.log                 wtmp
boot.log               nvidia-uninstall.log                wtmp-20200311.xz
btmp                    pbl.log                                              Xorg.0.log
chrony                  private                                              Xorg.0.log.old
cuda-installer.log  samba                                               Xorg.1.log
cups                     sendfax.log                                       Xorg.1.log.old
fax                        snapper.log                                       YaST2
firewalld               snapper.log-20201017.xz                   zypp
hp                        tallylog                                             zypper.log
krb5                     updateTestcase-2020-10-17-21-22-31 zypper.log-20201018.xz
lastlog                  updateTestcase-2020-10-18-02-55-34
 
dad-pc:/home/dad # journalctl --directory /mnt/var --disk-usage
No journal files were found.
Archived and active journals take up 0B in the file system.

Looking in grub.cfg I saw the following which didn’t look right? Why is it looking at two different partitions?

                    echo     'Loading Linux 5.6.14-1-default ...'
                    linux     /boot/vmlinuz-5.6.14-1-default root=UUID=6dae7c25-d985-44e6-b386-23c7d3a7af64  resume=/dev/disk/by-uuid/95560983-e4b1-42a7-87b3-b2e20c8feb77 quiet

What is the resume= line?
It is not in my fstab

# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=6dae7c25-d985-44e6-b386-23c7d3a7af64  /                  ext4     noatime,discard               0  1
UUID=7b77b22b-a19f-4f0e-ab29-ae35449f5f39  swap               swap     defaults                      0  0
UUID=74d45143-adf6-41b3-be64-dbf00eec672e  swap               swap     defaults                      0  0
UUID=9e4b2cbe-4227-49f6-bc01-c133785b592d  /home              ext4     noatime,discard               0  2
UUID=48ABE0A128C0D54D                      /backups           ntfs-3g  uid=dad,gid=users,umask=0022  0  2
UUID=420EFA6B0EFA56FF                      /windows-D         ntfs-3g  uid=dad,gid=users,umask=0022  0  2
UUID=F84ECD3B4ECCF402                      /windows-C         ntfs     noatime                       0  2
UUID=c89bdf78-0c7b-4cb2-94af-7505756efe72  /linux-back9-LXQT  ext4     noatime                       0  2
tmpfs                                      /tmp               tmpfs    noatime,mode=1777             0  0

mrmazda · October 22, 2020, 5:58pm

A persistent journal is optional. If not found, it isn’t configured. To configure it, use /etc/systemd/journald.conf, or simply create a /var/log/journal directory.

What is the resume= line? It is not in my fstab
It’s an optional (unnecessary) override to the resume line built into each initrd, for use by suspenders.

idee · October 22, 2020, 6:35pm

I have just downloaded the tumbleweed repair disc if that tool might help with anything.

karlmistelberger · October 23, 2020, 9:04am

idee:

Finally got time to dig in.

dad-pc:/home/dad # df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        40G   29G  9.3G  76% /mnt
 
dad-pc:/home/dad # du -hd1 -t1m /mnt/
2.9G     /mnt/var
11M      /mnt/lib64
2.0G     /mnt/lib
12G      /mnt/usr
313M    /mnt/tmp
1.3G     /mnt/opt
1.9M     /mnt/lost+found
406M    /mnt/boot
23M      /mnt/etc
8.1M     /mnt/sbin
11G      /mnt/root
29G      /mnt/

It didn’t find the journal

dad-pc:/home/dad # journalctl --directory /mnt/var/log/journal/..../ -b
Failed to open /mnt/var/log/journal/..../: No such file or directory
 
dad-pc:/home/dad # dir /mnt/var/log
alternatives.log    lightdm                                              wpa_supplicant.log
apache2                           ntp                                                   wpslog
audit                    nvidia-installer.log                 wtmp
boot.log               nvidia-uninstall.log                wtmp-20200311.xz
btmp                    pbl.log                                              Xorg.0.log
chrony                  private                                              Xorg.0.log.old
cuda-installer.log  samba                                               Xorg.1.log
cups                     sendfax.log                                       Xorg.1.log.old
fax                        snapper.log                                       YaST2
firewalld               snapper.log-20201017.xz                   zypp
hp                        tallylog                                             zypper.log
krb5                     updateTestcase-2020-10-17-21-22-31 zypper.log-20201018.xz
lastlog                  updateTestcase-2020-10-18-02-55-34
 
dad-pc:/home/dad # journalctl --directory /mnt/var --disk-usage
No journal files were found.
Archived and active journals take up 0B in the file system.

Looking in grub.cfg I saw the following which didn’t look right? Why is it looking at two different partitions?

                    echo     'Loading Linux 5.6.14-1-default ...'
                    linux     /boot/vmlinuz-5.6.14-1-default root=UUID=6dae7c25-d985-44e6-b386-23c7d3a7af64  resume=/dev/disk/by-uuid/95560983-e4b1-42a7-87b3-b2e20c8feb77 quiet

What is the resume= line?
It is not in my fstab

# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=6dae7c25-d985-44e6-b386-23c7d3a7af64  /                  ext4     noatime,discard               0  1
UUID=7b77b22b-a19f-4f0e-ab29-ae35449f5f39  swap               swap     defaults                      0  0
UUID=74d45143-adf6-41b3-be64-dbf00eec672e  swap               swap     defaults                      0  0
UUID=9e4b2cbe-4227-49f6-bc01-c133785b592d  /home              ext4     noatime,discard               0  2
UUID=48ABE0A128C0D54D                      /backups           ntfs-3g  uid=dad,gid=users,umask=0022  0  2
UUID=420EFA6B0EFA56FF                      /windows-D         ntfs-3g  uid=dad,gid=users,umask=0022  0  2
UUID=F84ECD3B4ECCF402                      /windows-C         ntfs     noatime                       0  2
UUID=c89bdf78-0c7b-4cb2-94af-7505756efe72  /linux-back9-LXQT  ext4     noatime                       0  2
tmpfs                                      /tmp               tmpfs    noatime,mode=1777             0  0

Looks like you system has no problem with disk space. When dealing with problems journal is a great place to start with. So you may consider enabling journal storage on disk on all your systems. Default as indicated in /etc/systemd/journald.conf is #Storage=auto. Thus creating directory /var/log/journal will move journal from memory to hard disk.

With little being known about your system giving advice to recover from the mishap is challenging.

idee · October 26, 2020, 3:53pm

I tried using the rescue disk, but the YaST bootloader tool errored saying it could not find the root partition.

Solution: I reinstalled from a current Tumbleweed iso.
Note the space used on that drive is now about 4 gig, vs the previous drive filled at 36 gig.
A clean install is the best answer. Running great.

Any idea why and what filled the drive?

And thank you to all for your help