Log in and suspend slowness issues in the system

mrmazda · October 2, 2023, 6:10pm

Karl’s post 130 showed clearly that device containing 0fbe24e7 at end of its UUID is somehow causing your problem. What filesystem is that? Is it in your fstab? Does some Plasma file in ~/.config/ tree or ~/.local/ tree contain it? If yes, try removing it.

non_space · October 2, 2023, 6:30pm

?? Malcolm’s post #129??? did show something with that UUID . . . which is the partition that contains the TW /home directory . . . .

I’ll check your suggestions out, today’s time to mess with it is now expunged, have to check it later.

No “games” have been played by me in messing with the system, the /home directory . . . other than whatever zypper decided to do. The problem may indeed now reside there . . . the question is why??

One potential “answer” to the “why” of that is, as posted several times recently, TW was the grub/os-prober controller that operated the 6 other systems, several times the recent running of #update-bootloader in TW resulted in a disorganized grub menu . . . the other systems were not located in their proper “sdx” locations . . . so possibly somewhere in there, TW messed with the sdb10 partition where /home is located, along with moving the other players around to other places . . . ???

The problem with that theory is that after running #update-bootloader several times the other systems came back online. Leap 15.6 is now the grub handler; we’ll see if that adjustment maintains grub listing in proper order.

Meanwhile, back on the ranch . . . .

karlmistelberger · October 3, 2023, 6:44am

Stuff happens. When in trouble I always detach external disks. As the front panel of host erlangen has no USB-C header I plugged the drive into the header of the rear panel and forgot about it.
The scsi driver readily works during boot with the same drive attached to a USB-A header.
The scsi driver readily works after boot when a using USB-C header.

erlangen:~ # journalctl -b  _KERNEL_SUBSYSTEM=scsi -g 12:
Oct 03 08:20:10 erlangen kernel: scsi host12: uas
Oct 03 08:20:10 erlangen kernel: scsi 12:0:0:0: Direct-Access     CT4000P3 SSD8             3108 PQ: 0 ANSI: 6
Oct 03 08:20:10 erlangen kernel: sd 12:0:0:0: Attached scsi generic sg2 type 0
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] 4096-byte physical blocks
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] Write Protect is off
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] Mode Sense: 5f 00 00 08
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] Preferred minimum I/O size 4096 bytes
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] Optimal transfer size 33553920 bytes not a multiple of preferred minimum block size (4096 bytes)
Oct 03 08:20:12 erlangen kernel: sd 12:0:0:0: [sdb] Attached SCSI disk
erlangen:~ #

The scsi driver fails during boot when using a USB-C header.

erlangen:~ # journalctl -b  _KERNEL_SUBSYSTEM=scsi -g 12:
Oct 03 08:18:43 erlangen kernel: scsi host12: uas
Oct 03 08:19:04 erlangen kernel: scsi 12:0:0:0: tag#12 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
Oct 03 08:19:04 erlangen kernel: scsi 12:0:0:0: tag#12 CDB: Inquiry 12 00 00 00 24 00
Oct 03 08:19:04 erlangen kernel: scsi host12: uas_eh_device_reset_handler start
Oct 03 08:19:04 erlangen kernel: scsi host12: uas_eh_device_reset_handler success
Oct 03 08:19:04 erlangen kernel: scsi 12:0:0:0: tag#12 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
Oct 03 08:19:04 erlangen kernel: scsi 12:0:0:0: tag#12 CDB: Test Unit Ready 00 00 00 00 00 00
Oct 03 08:19:04 erlangen kernel: scsi host12: uas_eh_device_reset_handler start
Oct 03 08:19:04 erlangen kernel: scsi host12: uas_eh_device_reset_handler success
Oct 03 08:19:04 erlangen kernel: scsi 12:0:0:0: Device offlined - not ready after error recovery
erlangen:~ #

susejunky · October 3, 2023, 1:31pm

Does that mean that you have a directory e.g. /home/userA which will be shared between a userA from Tumbleweed and a userA from Leap (or Arch, or Ubuntu, …)?

non_space · October 3, 2023, 2:24pm

No. The partition is likely containing different /home directories, but the /home/user is not shared across other operating systems.

For the most part historically the / and /home system directories are on the same drive; but a few years back now I added the SSD and something in TW blew up, so I had to run a fresh install . . . I put the / in the new drive and left the TW /home extant in the other internal drive in the machine. And all has been “well” more or less, particularly in regards to fast boot and log in, until this recent thread started up.

non_space · October 3, 2023, 3:21pm

/dev/sdb10: UUID=“64c5dba2-bacd-4459-9904-6bda0fbe24e7” BLOCK_SIZE=“4096” TYPE=“ext4” PARTUUID=“16858b01-4a16-4123-936d-73b985d1cb09”

non_space · October 3, 2023, 3:24pm

From in Sid OS I ran:

$ sudo fsck -f /dev/sdb10
fsck from util-linux 2.39.2
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Inode 2101065 extent tree (at level 1) could be shorter.  Optimize<y>? yes
Inode 2232428 extent tree (at level 1) could be shorter.  Optimize<y>? yes
Inode 3280986 extent tree (at level 1) could be shorter.  Optimize<y>? yes
Inode 3670547 extent tree (at level 1) could be shorter.  Optimize<y>? yes
Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdb10: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdb10: 180488/5906432 files (3.7% non-contiguous), 6203711/23603200 blocks

malcolmlewis · October 3, 2023, 4:59pm

@non_space did that make a difference to the critical-chain output? What do the smartctl stats look like for sdb?

non_space · October 3, 2023, 5:08pm

Possibly. I ran the command from root in the TW system, no time to be ssh-ing in from another machine . . . . Seems like the UUID data is not there, however the log in slowness issue and slow to boot console app all continue as before. : - 0

# systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @7.457s
└─multi-user.target @7.457s
  └─cron.service @7.456s
    └─postfix.service @6.647s +781ms
      └─time-sync.target @6.610s
        └─chronyd.service @6.384s +208ms
          └─network.target @6.381s
            └─NetworkManager.service @4.169s +2.210s
              └─network-pre.target @4.152s
                └─wpa_supplicant.service @6.380s +80ms
                  └─dbus.service @3.367s +66ms
                    └─basic.target @3.328s
                      └─sockets.target @3.328s
                        └─pcscd.socket @3.326s
                          └─sysinit.target @3.294s
                            └─systemd-update-utmp.service @3.255s +37ms
                              └─auditd.service @3.203s +48ms
                                └─systemd-tmpfiles-setup.service @3.097s +86ms
                                  └─local-fs.target @3.091s
                                    └─home.mount @3.049s +40ms

malcolmlewis · October 3, 2023, 5:14pm

@non_space ok, so at least that is eliminated…

non_space · October 3, 2023, 5:21pm

I don’t know exactly what “smartctl stats for sdb” means, if that is now relevant. Not sure what my instructions are right now . . . .

the GUI updater says there are 25 packages to upgrade . . . since I’m here in TW I’ll run them; not holding my breath on the “silver bullet” . . . .

non_space · October 3, 2023, 5:36pm

Nope. There was a package “xdg-utils” that I hoped might make some difference, does not seem to have done anything relevant.

malcolmlewis · October 3, 2023, 8:34pm

@non_space Run (as root user) smartctl -a /dev/sdb and look at the statistics…

non_space · October 4, 2023, 2:33pm

OK, have run these tests before, the usual “pre-fail” and “old age” line items, but nothing has yet “failed” in the drive . . . showing approx 6K+ hours of run time??

Again, I have at least one or two other /home folders in that same partition on the drive, TW is the only OS that has this lag time to deal with.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

malcolmlewis · October 4, 2023, 3:16pm

@non_space ok, you need to provide the details asked for, not a subset of what you think was needed…
Look at the attribute output eg;

smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.5.4-1-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue / Red / Green SSDs
Device Model:     WDC  WDS500G2B0A-00SM50
Serial Number:    212613A0055D
LU WWN Device Id: 5 001b44 8ba2e9d3e
Firmware Version: 415020WD
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct  4 10:15:04 2023 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       16723
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       254
165 Block_Erase_Count       0x0032   100   100   ---    Old_age   Always       -       14614686
166 Minimum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       1
167 Max_Bad_Blocks_per_Die  0x0032   100   100   ---    Old_age   Always       -       77
168 Maximum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       3
169 Total_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       346
170 Grown_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Average_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       1
174 Unexpected_Power_Loss   0x0032   100   100   ---    Old_age   Always       -       24
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   057   049   ---    Old_age   Always       -       43 (Min/Max 17/49)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator 0x0032   001   001   ---    Old_age   Always       -       0x001f000a001f
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 NAND_GB_Written_TLC     0x0032   100   100   ---    Old_age   Always       -       917
234 NAND_GB_Written_SLC     0x0032   100   100   ---    Old_age   Always       -       1175
241 Host_Writes_GiB         0x0030   253   253   ---    Old_age   Offline      -       1095
242 Host_Reads_GiB          0x0030   253   253   ---    Old_age   Offline      -       448
244 Temp_Throttle_Status    0x0032   000   100   ---    Old_age   Always       -       0

Overall health may be fine, but if getting head errors etc, need to look at that…

non_space · October 4, 2023, 3:53pm

OK . . . I understand that you want to chase whatever seems to get up and run . . . . But, how do you account for the fact that only a few apps/functions seem to be affected?? Browser launches quickly enough, can get around on the web quickly??? Once booted up, system can “log in” quickly whereas on cold boot it does not? And then console boots with the “crash” of the applet, but GParted boots quickly?? It’s the “intermittent” nature of the problem and the exclusive to TW that for me points away from the drive . . . that drive is not OEM '12 spec, but is more recent.

I’m wondering if there are some “MATE” issues, because when I try to suspend or log out, the “starting Mate-session” applet shows, and then disappears . . . .

But, here you go, full output:

# smartctl -a /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.5.4-1-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5 (CMR)
Device Model:     ST1000DM010-2EP102
Serial Number:    Z9AJC0T0
LU WWN Device Id: 5 000c50 0a4937b5e
Firmware Version: CC43
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 08:45:30 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 104) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x1085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   075   063   006    Pre-fail  Always       -       39595072
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   090   090   020    Old_age   Always       -       10656
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   045    Pre-fail  Always       -       138571340
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6325
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   090   090   020    Old_age   Always       -       10347
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   074   059   040    Old_age   Always       -       26 (Min/Max 25/26)
193 Load_Cycle_Count        0x0032   095   095   000    Old_age   Always       -       10655
194 Temperature_Celsius     0x0022   026   014   000    Old_age   Always       -       26 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   002   001   000    Old_age   Always       -       39595072
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       6199h+42m+54.358s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       24224741970
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       12678380951

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6325         -
# 2  Extended offline    Interrupted (host reset)      00%      6324         -
# 3  Extended offline    Interrupted (host reset)      00%      6320         -
# 4  Short offline       Completed without error       00%      6315         -
# 5  Short offline       Completed without error       00%      6308         -
# 6  Short offline       Completed without error       00%      6303         -
# 7  Short offline       Completed without error       00%      6299         -
# 8  Short offline       Completed without error       00%      6293         -
# 9  Short offline       Completed without error       00%      6291         -
#10  Short offline       Completed without error       00%      6290         -
#11  Short offline       Completed without error       00%      6289         -
#12  Short offline       Completed without error       00%      6286         -
#13  Short offline       Completed without error       00%      6282         -
#14  Short offline       Completed without error       00%      6277         -
#15  Short offline       Completed without error       00%      6272         -
#16  Short offline       Completed without error       00%      6267         -
#17  Extended offline    Interrupted (host reset)      00%      6259         -
#18  Extended offline    Interrupted (host reset)      00%      6257         -
#19  Short offline       Completed without error       00%      6240         -
#20  Short offline       Completed without error       00%      6232         -
#21  Short offline       Completed without error       00%      6224         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

One item of note, after running yesterday’s zypper, now it seems like the “suspend” function is happening quickly, as it was before . . . but still the 3 - 4 minutes plus to get to the GUI remains.

malcolmlewis · October 4, 2023, 4:11pm

@non_space so you need to look at these attributes #1, #7, #195 and look up the manufacturers info on that attribute and what the values mean…

This system is a desktop?

non_space · October 4, 2023, 4:44pm

Alrighty . . . well . . . “attributes” . . . . Got it. Still, same questions as before as far as the exclusive to TW aspect . . . ??? I still have old machines that have really old HDD drives that boot with massive “I/O” errors that show in dmesg . . . and they run . . . “OK” . . . .

One test as far as using resources goes is that I have some OSX systems installed on that same drive . . . don’t use them much, but as far as being a resource hog, that would be a test to see how long it takes to get into the OSX GUI on the same drive . . . .

Yes. Desktop system.

malcolmlewis · October 4, 2023, 4:54pm

@non_space it’s just an observation, especially with the filesystem check time and just something to be aware of…

Further to the above, if Tumbleweed is installed and in a location on the drive that has a potential to have the seek/read errors one could expect undetectable errors (won’t be logged except smart) in performance… This could all be a red-herring, but an observation and would need clarification from the manufacturer for those raw values and also what the VALUE/WORST/THRESH mean for that device…

That drive would have been in the junk box if it were mine…

non_space · October 4, 2023, 5:01pm

LOL . . . I run them until they drop, totally dead . . . even if it started as “not high end” stuff, stuff gets flogged mercilessly. Having actually done that smartctl thing on my Macs over the years, it is “amazing” to me how quickly that “old age” stuff shows up . . . like in 6 months of operation.

I’m now over in OSX on that same “old” drive . . . all booted up fine, less than a minute to get to the GUI . . . loaded fine. OSX runs system checks before booting so it isn’t super zippy, but to me it points back to TW as the problem child . . . . OR, perhaps the canary in the mine shaft, finding problems with certain apps, but not others???

TW / system is installed in “new” SSD, so that part should be “fast”???