Plasma crashses - cannot connect to X

Hello there,

still continue to experience major problems here in Leap 15.
On battery (so far this has happened only while on battery) with me working in Firefox, Plasma crashed all of the sudden. At least, that’s what I assume, as I notice that the taskbar is gone.
Upon opening a tty to try restarting it I get a

Could not connect to any X display

message.

Here is the corresponding log file: Microsoft OneDrive
Just download and open it in kate. It’s a simple text file.

Thanks.

It’s preferable to upload such files to https://susepaste.org/ or https://pastebin.com/, than make users download files for inspection. With the susepaste site, there is the associated utility that makes uploading simple eg

dean@linux-kgxs:~> susepaste -e 60  /var/log/Xorg.0.log
Pasted as:
   http://susepaste.org/98393396
   http://paste.opensuse.org/98393396
Link is also in your clipboard.
dean@linux-kgxs:~> 

The -e option specifies how long the paste will be stored on the server (in seconds). Normally a few months or perhaps a year (604800) might be appropriate.

I tried susepaste but it gave me a 404 error.
However, the command line works:

http://susepaste.org/15100717

EDIT: Maybe not, still 404.

See pastebin:

https://pastebin.com/JDhC0t8J

It happened again. First Firefox crashed, then at the same time basically Plasma. All I could to was issuing a reboot command.

See log: http://susepaste.org/34197639

It happened at 15:10.
What’s strange, that it now also throws out XFS error messages.

Help would be much appreciated.




  1. 15:10:01.302955+09:00  computer kernel: [64498.762037] XFS (dm-3): Metadata corruption  detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block  0xe45e820

  1. 15:10:01.302976+09:00 computer kernel: [64498.762038] XFS (dm-3): Unmount and run xfs_repair

  1. 15:10:01.302979+09:00 computer kernel: [64498.762039] XFS (dm-3): First 64 bytes of corrupted metadata buffer:

  1. 15:10:01.302981+09:00  computer kernel: [64498.762041] ffff8801b9d73000: 49 4e 00 00 03 02 00  00 00 00 03 e8 00 00 00 64  IN.............d

  1. 15:10:01.302983+09:00  computer kernel: [64498.762042] ffff8801b9d73010: 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 00  ................

  1. 15:10:01.302984+09:00  computer kernel: [64498.762044] ffff8801b9d73020: 5b c0 23 3d 05 02 65  44 5b c0 23 3d 15 ef e4 4e  .#=..eD.#=...N

  1. 15:10:01.302985+09:00  computer kernel: [64498.762045] ffff8801b9d73030: 5b c0 2a ab 19 0a 6e  62 00 00 00 00 00 00 00 00  .*...nb........

  1. 15:10:01.302986+09:00  computer kernel: [64498.762076] XFS (dm-3): Metadata corruption  detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block  0xe45e820

  1. 15:10:01.302988+09:00 computer kernel: [64498.762076] XFS (dm-3): Unmount and run xfs_repair

  1. 15:10:01.302989+09:00 computer kernel: [64498.762077] XFS (dm-3): First 64 bytes of corrupted metadata buffer:




Your system looks to be suffering XFS filesystem corruption. Make sure any important data is backed up before attempting to repair it.

http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/xfs-repair.html
https://www.thegeekdiary.com/running-repairs-on-xfs-filesystems/
https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/xfsrepair.html

I don’t quite get this.
Let me show you why:

  1. The SSD is only half a year old (MX500, latest firmware).
  2. I have installed Leap 15 just three weeks ago using its default settings (BTRFS / XFS /home, however with full disk encryption/LVM),
  3. Plasma crashes - cannot connect to X -> So I reboot issueing a reboot command (and for some reason this has only happened on battery yet)

And now the file system got corrupted? Why?
The same crash happened before with the exact same symptoms, except the log didn’t have and XFS errors:

  1. Crash
    http://susepaste.org/36282386
  2. Crash
    http://susepaste.org/26418765

Try the following…create a new user (via YaST) and then log in to the new user account, and see how that goes.

This might also be worth a shot (for existing user)…

kbuildsycoca5 --noincremental 2> /dev/null
rm -fr ~/.cache/

Restart the X-server with CTRL+ALT+Backspace.

Hi,

thanks for your help. I will try that.

For now, it has happened a third time. This time, just a couple of minutes after I had booted into the system on battery.
Right now I’m using the machine without any issue connected to AC.

I observed Plasma gone at around 14:49~14:50. Here’s the log.

  1. Crash
    http://susepaste.org/32740458

That log has more than a dozen instances of failed, including one BTRFS related with a segfault. Maybe this has a hardware problem root. What happens if you open an IceWM session instead of Plasma?

So I tried one of your suggestions and did

kbuildsycoca5 --noincremental 2> /dev/null

and deleted the cache directory

rm -fr ~/.cache/

However, it happened again just recently:

  1. Crash at around 17:12~17.13.
    http://susepaste.org/17369202

I will now create a new user and see if it happens again.

This time I created a new user “test” and it happened again:

  1. Crash at around 18:08~18:11
    http://susepaste.org/7673741

:frowning:

Once I plug in AC again, I can happily work indefinitely without encountering any crash.

Quick look shows there is some corruption on the XFS partition for one thing. Have you run smartctrl to see if drive is failing???.

Here are the results of smartctl

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.12.14-lp150.12.19-default] (SUSE RPM)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     CT500MX500SSD1
Serial Number:    1814E13539C9
LU WWN Device Id: 5 00a075 1e13539c9
Firmware Version: M3CR022
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 15 11:43:08 2018 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       956
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       440
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   099   099   000    Old_age   Always       -       25
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       56
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       43
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   061   033   000    Old_age   Always       -       39 (Min/Max 0/67)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0030   099   099   001    Old_age   Offline      -       1
206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       6168532852
247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       104286581
248 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       171746061

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 1

ATA Error Count: 0
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 0 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 ec 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  ec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICE
  c8 00 00 00 00 00 00 00      00:00:00.000  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       956         -
# 2  Short offline       Completed without error       00%       956         -
# 3  Short offline       Completed without error       00%       950         -
# 4  Short offline       Completed without error       00%       946         -
# 5  Short offline       Completed without error       00%       942         -
# 6  Short offline       Completed without error       00%       932         -
# 7  Short offline       Completed without error       00%       927         -
# 8  Short offline       Completed without error       00%       925         -
# 9  Short offline       Completed without error       00%       918         -
#10  Extended offline    Completed without error       00%       910         -
#11  Short offline       Completed without error       00%       906         -
#12  Short offline       Completed without error       00%       896         -
#13  Short offline       Completed without error       00%       891         -
#14  Short offline       Completed without error       00%       887         -
#15  Short offline       Completed without error       00%       883         -
#16  Short offline       Completed without error       00%       881         -
#17  Short offline       Completed without error       00%       876         -
#18  Short offline       Completed without error       00%       872         -
#19  Short offline       Completed without error       00%       866         -
#20  Short offline       Completed without error       00%       859         -
#21  Short offline       Completed without error       00%       853         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



I don’t quite believe that a couple of months old SSD fails. Especially when everything is working normally in AC mode and problems occur only when running on battery.

Did you run smartctrl with AC or battery?? Maybe an intermittent power connection when running on battery.???

Ran it while on AC.
How should I go about this? Could it be something to do with power management failing at some stage?

So I ran xfs_repair. Here is the result:

neon@neon:~$ xfs_repair system/home
system/home: No such file or directory
system/home: No such file or directory

fatal error -- couldn't initialize XFS library
neon@neon:~$ xfs_repair /dev/system/home
xfs_repair: cannot open /dev/system/home: Permission denied
neon@neon:~$ sudo xfs_repair /dev/system/home
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
bad CRC for inode 325494347
bad CRC for inode 325494347, will rewrite
cleared inode 325494347
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

After rebooting I ran my usual rsync backup again to see if any file has been affected by xfs_repair.
Here is the result:


2018/10/18 15:01:42 [3109] >f..t...... user/.Xauthority
2018/10/18 15:01:42 [3109] >f.st...... user/.bash_history

So it seems that these files were corrupted. Any interpretation?
Oh boy, my bad experience with XFS continues. Half I had to run xfs_repair on an external SSD (of course, totally unrelated to this problem I’m facing right now, because I still don’t know what causes these crashes and xfs corruption)

The last days my PC has been working flawlessly.
Except when I just now unplugged power for maybe 6 min, it crashed again with a bunch of XFS errors (that were supposedly corrected already?)

See:

http://susepaste.org/89583557

I’m not quite sure when the system crashed exactly. It must have been around 20:26 plus/minus 3 min.

These lines I find suspicious:




  1. T20:20:28.871481+09:00 computer org_kde_powerdevil[2372]: powerdevil:

  1. T20:20:28.871826+09:00 computer org_kde_powerdevil[2372]: powerdevil: Can't contact ck

  1. T20:20:28.872100+09:00 computer org_kde_powerdevil[2372]: powerdevil: We are now into activity  "DEVICE2"

  1. T20:20:28.872295+09:00 computer org_kde_powerdevil[2372]: powerdevil: () ()

  1. T20:20:28.872464+09:00 computer org_kde_powerdevil[2372]: powerdevil: () ()

  1. T20:20:28.872624+09:00 computer org_kde_powerdevil[2372]: powerdevil: Loading profile for unplugged AC

  1. T20:20:28.872818+09:00 computer org_kde_powerdevil[2372]: powerdevil: Activity is not forcing a profile

  1. T20:20:28.872994+09:00 computer org_kde_powerdevil[2372]: powerdevil:

  1. T20:20:28.873191+09:00 computer org_kde_powerdevil[2372]: powerdevil: Loading timeouts with  120000

  1. T20:20:28.894852+09:00 computer plasmashell[2328]: plasma-pk-updates: Is on battery: true

  1. T20:20:29.028129+09:00  computer plasmashell[2328]:  file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/Task.qml:334:  Unable to assign [undefined] to int

  1. T20:20:29.062345+09:00  computer plasmashell[2328]: message repeated 13 times:   file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/Task.qml:334:  Unable to assign [undefined] to int]

  1. T20:21:29.685429+09:00 computer org_kde_powerdevil[2372]: powerdevil: Screen brightness value:  0

  1. T20:21:59.685347+09:00 computer org_kde_powerdevil[2372]: powerdevil: Screen brightness value:  0

  1. T20:22:24.452574+09:00  computer smartd[1531]: Device: /dev/sda [SAT], SMART Usage Attribute:  194 Temperature_Celsius changed from 58 to 62

  1. T20:22:29.685504+09:00 computer org_kde_powerdevil[2372]: powerdevil: Screen brightness value:  0

  1. T20:23:07.107141+09:00  computer plasmashell[2328]:  file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/Task.qml:334:  Unable to assign [undefined] to int

  1. T20:24:37.418692+09:00  computer kernel: [166624.336792] XFS (dm-3): Metadata corruption  detected at xfs_inode_buf_verify+0x72/0xf0 [xfs], xfs_inode block  0x1a1e9980




From powerdevil detecting battery mode to XFS corruption within 4 min?

I couldn’t find the cause for this issue let alone solve it.
So I reinstalled Linux, but this time tried out KDE Neon using ext4. And it has been working flawlessly for the last 2 weeks, on battery and on AC.