After upgrading to 15.3, random lockups (usually at boot) usually requiring forced shutdown

A few months ago, I learned that 15.2 was no longer supported. I did an online upgrade to 15.3 - had some sort of “hiccup” where the install asked me the same question twice in a row, but it finished.

Immediately I had major issues with locking up, but because of the output during boot, I was able to find and fix that. (Updating a second time was part of the fix.)

However, since then I’ve had at least one lockup a day, sometimes several. Most happen when I first fire up in the morning, but some happened at random times (even late in the evening). I’ve noticed that on bootup the hard drive seems to thrash a lot longer before it settles down - sometimes a few minutes after the desktop appears. If I try to get on the internet for my daily news and to start my workday as son as the desktop appears, that almost guarantees a lockup. If I wait until the hard drive stops, then I can access the internet with only occasional lockups (usually due to problems with a webpage) - and if I wait long enough, sometimes it clears itself. I usually play a quick card game while waiting after booting, and other “internal only” programs don’t seem to have an issue.

The startup lockups always require a hard power-down to ‘reset’. The others - sometimes if I wait 30 minutes (or more), it will finally clear up itself and I can continue using it. I used to think it was a purely Firefox problem, but it’s happened when Firefox was NOT started.

(Please note - I don’t have a lot of time to spend reading to fix this… my part-time job as a researcher and trying to finish my dissertation takes up most of my time.)

**Question: **Is it possible to use the online install to re-install Leap 15.3 over itself without having to start from scratch? I have numerous programs I use on a frequent basis (sometimes daily) and it would take me a couple of days to reinstall everything. I don’t have that time to spare.

**Question 2: **Where would I find the logs - to see if there is an error generated before lockup? I used to remember that, but with the pressure I’m under, I cannot remember.

Question 3: Sometimes the lockup feels like a timer has been set into an endless loop. Where might I find such timers (like for a web page to finish loading)? That was a big part of the initial headaches I ran into after upgrading.

BTW: I initially suspected heat or power problems, but that doesn’t seem to be the case. It seems connected to software, and it also seems to be in some way connected with ethernet/network.

I apologize for asking for ‘quick easy’ help with minimal reading, but I’m taking precious time out to send this request for help. I am not a newbie to Linux (over a year with 15.2, and then over a decade with Ubuntu), but am under real time constraints and pressure here, and the pressure is making it hard to remember stuff (like where the log files are).

Thanks!
Bob

I finished mine some time ago: Der Triplettzustand im Pyren-Einkristall: optische Untersuchungen an der ... - Karl Mistelberger - Google Books I spend a lot of time helping users. Your machine has no hairs: No-hair theorem - Wikipedia

You may spend some seconds on pasting this command into a console window and its output into the editor form:

**erlangen:~ #** inxi -zFm  
**System:**
  **Kernel:** 5.18.1-1-default **arch:** x86_64 **bits:** 64 **Console:** pty pts/1 
    **Distro:** openSUSE Tumbleweed 20220606 
**Machine:**
  **Type:** Desktop **Mobo:** ASUSTeK **model:** PRIME B450-PLUS **v:** Rev X.0x 
    **serial:** <filter> **UEFI:** American Megatrends **v:** 2409 **date:** 12/02/2020 
**Memory:**
  **RAM:****total:** 29.27 GiB **used:** 5.18 GiB (17.7%) 
  **Array-1:****capacity:** 128 GiB **slots:** 4 **EC:** None 
  **Device-1:** DIMM_A1 **type:** no module installed 
  **Device-2:** DIMM_A2 **type:** DDR4 **size:** 16 GiB **speed:** 2133 MT/s 
  **Device-3:** DIMM_B1 **type:** no module installed 
  **Device-4:** DIMM_B2 **type:** DDR4 **size:** 16 GiB **speed:** 2133 MT/s 
**CPU:**
  **Info:** quad core **model:** AMD Ryzen 5 3400G with Radeon Vega Graphics **bits:** 64 
    **type:** MT MCP **cache:****L2:** 2 MiB 
  **Speed (MHz):****avg:** 1400 **min/max:** 1400/3700 **cores:****1:** 1400 **2:** 1400 **3:** 1400 
    **4:** 1400 **5:** 1400 **6:** 1400 **7:** 1400 **8:** 1400 
**Graphics:**
  **Device-1:** AMD Picasso/Raven 2 [Radeon Vega Series / Radeon Mobile Series] 
    **driver:** amdgpu **v:** kernel 
  **Display:** x11 **server:** X.Org **v:** 21.1.3 **with:** Xwayland **v:** 22.1.2 **driver:****X:**
    **loaded:** amdgpu **unloaded:** fbdev,modesetting,vesa **gpu:** amdgpu 
    **resolution:** 1920x1080~60Hz 
  **OpenGL:****renderer:** AMD Radeon Vega 11 Graphics (raven LLVM 14.0.4 DRM 3.46 
  5.18.1-1-default) 
    **v:** 4.6 Mesa 22.1.1 
**Audio:**
  **Device-1:** AMD Raven/Raven2/Fenghuang HDMI/DP Audio **driver:** snd_hda_intel 
  **Device-2:** AMD Family 17h/19h HD Audio **driver:** snd_hda_intel 
  **Device-3:** Tenx USB AUDIO **type:** USB 
    **driver:** hid-generic,snd-usb-audio,usbhid 
  **Sound Server-1:** ALSA **v:** k5.18.1-1-default **running:** yes 
  **Sound Server-2:** PipeWire **v:** 0.3.51 **running:** yes 
**Network:**
  **Device-1:** Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
    **driver:** r8169 
  **IF:** eth0 **state:** up **speed:** 100 Mbps **duplex:** full **mac:** <filter> 
**Drives:**
  **Local Storage:****total:** 1.15 TiB **used:** 423.96 GiB (36.1%) 
  **ID-1:** /dev/nvme0n1 **vendor:** Samsung **model:** SSD 950 PRO 512GB 
    **size:** 476.94 GiB 
  **ID-2:** /dev/sda **vendor:** Samsung **model:** SSD 850 EVO 250GB **size:** 232.89 GiB 
  **ID-3:** /dev/sdb **vendor:** Samsung **model:** SSD 850 EVO 500GB **size:** 465.76 GiB 
**Partition:**
  **ID-1:** / **size:** 476.84 GiB **used:** 150.84 GiB (31.6%) **fs:** btrfs 
    **dev:** /dev/nvme0n1p2 
  **ID-2:** /boot/efi **size:** 99.8 MiB **used:** 2.1 MiB (2.1%) **fs:** vfat 
    **dev:** /dev/nvme0n1p1 
  **ID-3:** /home **size:** 476.84 GiB **used:** 150.84 GiB (31.6%) **fs:** btrfs 
    **dev:** /dev/nvme0n1p2 
  **ID-4:** /opt **size:** 476.84 GiB **used:** 150.84 GiB (31.6%) **fs:** btrfs 
    **dev:** /dev/nvme0n1p2 
  **ID-5:** /var **size:** 476.84 GiB **used:** 150.84 GiB (31.6%) **fs:** btrfs 
    **dev:** /dev/nvme0n1p2 
**Swap:**
  **Alert:** No swap data was found. 
**Sensors:**
  **System Temperatures:****cpu:** N/A **mobo:** N/A **gpu:** amdgpu **temp:** 41.0 C 
  **Fan Speeds (RPM):** N/A 
**Info:**
  **Processes:** 430 **Uptime:** 12h 2m **Shell:** Bash **inxi:** 3.3.16 
**erlangen:~ #**



Here is the output (THANKS!!! for the help).

System:
Kernel: 5.3.18-150300.59.68-default x86_64 bits: 64 Desktop: Gnome 3.34.5
Distro: openSUSE Leap 15.3
Machine:
Type: Desktop System: Dell product: OptiPlex 7010 v: 01 serial: <filter>
Mobo: Dell model: 0GY6Y8 v: A02 serial: <filter> BIOS: Dell v: A16
date: 09/09/2013
Memory:
RAM: total: 15.55 GiB used: 2.93 GiB (18.8%)
RAM Report:
permissions: Unable to run dmidecode. Root privileges required.
CPU:
Topology: Quad Core model: Intel Core i7-3770 bits: 64 type: MT MCP
L2 cache: 8192 KiB
Speed: 2833 MHz min/max: 1600/3900 MHz Core speeds (MHz): 1: 3480 2: 3623
3: 3701 4: 3762 5: 3643 6: 3297 7: 3746 8: 3613
Graphics:
Device-1: Intel IvyBridge GT2 [HD Graphics 4000] driver: i915 v: kernel
Display: x11 server: X.Org 1.20.3 driver: modesetting unloaded: fbdev,vesa
resolution: 1: 1280x1024~60Hz 2: 1920x1080~60Hz 3: 1600x900~60Hz
OpenGL: renderer: Mesa DRI Intel HD Graphics 4000 (IVB GT2)
v: 4.2 Mesa 20.2.4
Audio:
Device-1: Intel 7 Series/C216 Family High Definition Audio
driver: snd_hda_intel
Device-2: Creative Labs Sound Core3D [Sound Blaster Recon3D / Z-Series]
driver: snd_hda_intel
Sound Server: ALSA v: k5.3.18-150300.59.68-default
Network:
Device-1: Intel 82579LM Gigabit Network driver: e1000e
IF: em1 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:
Local Storage: total: 1.82 TiB used: 571.33 GiB (30.7%)
ID-1: /dev/sda vendor: Western Digital model: WD2003FZEX-00SRLA0
size: 1.82 TiB
Partition:
ID-1: / size: 1024.00 GiB used: 571.33 GiB (55.8%) fs: btrfs
dev: /dev/dm-2
ID-2: /home size: 1024.00 GiB used: 571.33 GiB (55.8%) fs: btrfs
dev: /dev/dm-2
ID-3: /opt size: 1024.00 GiB used: 571.33 GiB (55.8%) fs: btrfs
dev: /dev/dm-2
ID-4: /tmp size: 1024.00 GiB used: 571.33 GiB (55.8%) fs: btrfs
dev: /dev/dm-2
ID-5: /var size: 1024.00 GiB used: 571.33 GiB (55.8%) fs: btrfs
dev: /dev/dm-2
Swap:
ID-1: swap-1 type: partition size: 2.00 GiB used: 0 KiB (0.0%)
dev: /dev/dm-3
Sensors:
System Temperatures: cpu: 40.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info:
Processes: 288 Uptime: 2h 46m Shell: bash inxi: 3.1.00

Check your drive: https://forums.opensuse.org/showthread.php/570572-Hard-drive-error Run all commands. Use “btrfs check”. Root permissions required.

Use code tags: https://forums.opensuse.org/showthread.php/536143-Using-Code-Tags-Around-Your-Paste

It does seem so to me. SFF Dells run pretty hot. If too much dust has accumulated it can produce instability as well as too much heat. One thing that’s easy to try is a new HD control cable. These can go bad for various reasons, though its not so common as it used to be when a certain shade of red die was used that caused conductor corrosion.

The randomness makes me suspect it’s a power supply instability issue. It could be caused by a failed fan or accumulated dust, but in power supplies, failed electrolytic capacitors are a too common problem. Here it looks like SFF 7010 power supplies are not among the best.

If you have a mind to run inxi again for any reason, run it first this way to upgrade to a much improved version: sudo inxi -U. Also install dmidecode, as you can see your output is incomplete because it’s missing from your installation.

It is possible to borrow a PS from some other PC, without installing it, by simply placing it close enough for the required cables to reach. Yours is rated at 275W, and is operationally standard.

It may be possible that caps are going bad, but I’ve been around PCs since they first came out and know quite a bit about thermal problems. This computer gets cleaned out thoroughly on a regular basis - compressed air and a long-bristle soft brush (and the P/S gets cleaned out as well). As soon as I started developing problems - first thing I did was clean. I’ve helped many people who develop problems - by cleaning as much as an inch of dust off of their motherboards. (Talk about raising a cloud of dust!)

I’ve got several control cables out in my shop… if that’s it, this would be a new issue for me (and I appreciate the suggestion/knowledge). I’ve encountered bad USB cables several times (have a couple that haven’t been trashed yet), but not problems with drive control cables (since the mid 90s).

I’ve had more lockups on boot today… but I think I may have a good clue.

This morning I booted in the repair/recovery mode and it locked up when the network manager ran for the second time (second showing on the screen). It was counting up the seconds and then froze at 32 seconds. The report indicated that it was allowed unlimited wait time on finishing, and I will check and see if I can get Network Manager set so that the loop has a limit (like a minute). It shouldn’t take that long!

I also re-installed Network Manager and most of the supporting files, as well as Firefox. (That seems to help when it becomes a headache.)

Next - I’ll check the hard drive more thoroughly.

I greatly appreciate all the help, and I apologize about not putting the code in a window.

Bob

Given this is a desktop PC, it shouldn’t need any network “management”. To rule out NetworkMangler as a possible issue, you can switch to Wicked or systemd-networkd. All my TW and recent Leap desktop installations are running on systemd-networkd. YaST will not make a switch to systemd-networkd. You need systemctl to disable network-manager and enable systemd-networkd, and you’ll need to configure /etc/systemd/network/NICname.network similar to the following template:

[Match]
Name=eth0

[Network]
Address=192.168.###.xxy/24
DNS=192.168.###.xxx 1.1.1.1 1.0.0.1
Gateway=192.168.###.xxx
IPv6AcceptRA=no
LinkLocalAddressing=no

Thus you may find it simpler to change to Wicked using YaST2. Making this change might result in /etc/resolv.conf becoming an empty file. You may need or want to disable systemd-resolved.service and populate that file.

I’ve noticed some big changes after forcing a re-update to network manager (and Firefox) - the processor seems to be running a lot cooler (over 40c to 28c), maybe the mb as well - now I get a temp reading for it. Yast and other internet-related programs also seem to be communicating better and faster. So far, no lockups. I’ve gone through and checked the hard drives, and saw nothing indicating a problem.

We may have it fixed. I think I’ll wait before posting anything more - and will change this to solved if the problem is really gone. BTW, thanks for the tip about systemd-networkd. I do have systemctl so I can make those changes if necessary. (I’ve already fallen behind in my work trying to fix this!)

Thanks, everyone - I’ll update this thread, either to ‘solved’ or with further information if it locks up again.

Bob

Problem vanished for a couple of days, then returned. It’s somehow connected to the network, as forced reinstalling of Network Manager and Firefox fixed the problem, except for just a little bit ago (less than an hour), when I had to force the computer to shut down (power button held in) and rebooted in recovery mode, then the full re-install of the two programs fixed it.

It seems to be connected to when a page I went to had a glitch - on pages like weather.gov. I should add that sometimes I get a message in Yast (doing the forced re-install) that PackageKit is running, and should Yast ask it to stop. I’d just rebooted the computer so it had to have been started at boot.

(I checked the hard drive, and that seems to be fine, but I can post the results of the tests in case I missed something. The drive is less than a year old, BTW.)

I also looked at the boot log but don’t see anything that stands out as a possible problem.

It appears to be a software-based problem. Having the computer cool down after reinstalling Network Manager and Firefox IMO is revealing - that it’s somehow connected to them.

NOTE: I’m going to post a separate thread about blocking hackers, which I’m beginning to suspect (it’s happened at least four times before in the last few years).

I appreciate the help!
Bob

Much worse - computer locked up in the middle of an update - while downloading the necessary files. I’ve had to reboot several times, and finally reached the point where I can get on here after I started with a read-only file. I got the white screen of death four times.

I don’t know what is going on - no error codes, nothing like that. Nothing indicating a hardware issue either.

Random lockups suggest hardware issues more often than software. What if you try to use live media of some other distro, TW, 15.4, Fedora, Kubuntu, etc.?

Normally I’d agree, but it all started when I upgraded from 15.2 to 15.3. It wouldn’t be the first time I’ve seen an install or upgrade go goofy like this (sometimes a glitch in a single module can create all sorts of weird problems).

I’m back up to limping along. In the meantime, do you know if there is a way to force a re-install (online) without erasing everything (a reinstall instead of upgrade from 15.2)?

That will be getting to the drastic measures point - after a few weeks I will be in a position to spend more time with this system (in repairing/reloading it). Right now I’ve got deadlines to meet.

Thanks!

Troubleshooting random lockups can’t be done efficiently by random hypothesizing. Setup a pristine install from iso first and see whether lockups occur again.

A pesky case: https://forums.opensuse.org/showthread.php/561241-Segfault-Trouble-Shooting In the end it was solved by replacing 2x4GB RAM by 2x8GB. This caused a massive printed circuit damaged due to corrosion to fail terminally. It had caused problems for six years. I detected the damage when removing the mainboard and inspecting its backside.

There is NO way I’m going to do a reinstall anytime soon (unless forced). I CAN’T - I don’t have the time to do it and then fix and reinstall all of the software I have that I use regularly. I just cannot shut down this computer to do that sort of thing… just not possible. I’m very fortunate in that I’ve been able to get it working again - and of course I back up my work (but loss of the system for more than a day would be disastrous).

Maybe in a month or more, I’d have the luxury of free time and won’t be under such pressure - if I’m VERY lucky.

The system seems to be running differently (a bit better) after using the read only boot to fix it, but I’d want to see it stable for at least a week, based on what has happened so far.

You may consider what I do. I keep two partitions for root the current and previous versions. I have a separate home and several other partitions. I do a fresh install into the oldest root and do not mount my home initially. Install the programs I use. I then test the new version until I’m happy with it. Then I mount home partition there and use the new version. I still have the old version if I really need it using the same home. You never lose the computer you just log into the working version. Yes it takes a bit of setup and thought to do but you are never down while you have working hardware.

That is an EXCELLENT idea! One thing I learned from my research is the importance of redundancy… I was hacked a couple of years ago and the hacker managed to wreck part of my research data (the one file I didn’t have multiple backups of outside of my computer).

That’s something I can do easily, as I have plenty of room on my HD to put a full second partition!

I was reluctant to respond to post #15. You got it anyway. I go with a single btrfs partition nvme0n1p2 on host erlangen. When upgrading its hardware I squeezed nvme0n1p2 and did a pristine install of Tumbleweed on nvme0n1p3 for reference. This took a few minutes altogether. When benchmarking btrfs/ext4 io-speed I again squeezed nvme0n1p2 and added ext4 on nvme0n1p4 in a few seconds:

**erlangen:~ #** lsblk -f /dev/nvme0n1 
NAME        FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS 
nvme0n1                                                                             
├─nvme0n1p1 vfat   FAT32       19CF-0B54                             510.4M     0% /boot/efi 
├─nvme0n1p2 btrfs              0e58bbe5-eff7-4884-bb5d-a0aac3d8a344    1.3T    25% /var 
│                                                                                  /usr/local 
│                                                                                  /srv 
│                                                                                  /root 
│                                                                                  /opt 
│                                                                                  /home 
│                                                                                  /boot/grub2/x86_64-efi 
│                                                                                  /boot/grub2/i386-pc 
│                                                                                  /.snapshots 
│                                                                                  / 
├─nvme0n1p3 btrfs              ce197f2b-0c86-4b76-8688-5e8d597fc5bc   36.6G    25% /TW-test 
└─nvme0n1p4 ext4   1.0         5f0d61a4-9fef-4366-8be8-a08231661e85                 
**erlangen:~ #**

Removing nvme0n1p3 and nvme0n1p4 and expanding nvme0n1p2 to its original size can be done in a few seconds too. Stay flexible!

System is now fixed. I got a break from the stress and some free time, so I changed from NetworkManager to Wicked as you suggested, and the problem went away (over a week now without problems). Because of the symptoms, I got the feeling it had to do with ethernet. Anyway, thanks for the suggestion and help (everyone)!

Bob