SSD went read-only once a day, but works fine after reboot...

Hy again!

Have a TW install on an Intel SSD

sudo smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.1.7-1-default] (SUSE RPM)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Intel 53x and Pro 2500 Series SSDs
Device Model:     INTEL SSDSC2BW120H6
Serial Number:    aaaaaaaaaaaaaaaaaaaaa
LU WWN Device Id: 5 5cd2e4 14cabbee6
Firmware Version: RG21
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jun 14 07:45:27 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                ( 2930) seconds.
Offline data collection
capabilities:                    (0x7f) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  58) minutes.
Conveyance self-test routine
recommended polling time:        (   4) minutes.
SCT capabilities:              (0x0025) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always       -       14697h+00m+00.000s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       350
170 Available_Reservd_Space 0x0033   081   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       162
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       11
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   026   100   000    Old_age   Always       -       26 (Min/Max 13/35)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       162
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       47016
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       65535
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       40
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       65535
232 Available_Reservd_Space 0x0033   081   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   083   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       47016
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       30561
249 NAND_Writes_1GiB        0x0032   100   100   000    Old_age   Always       -       32383


…which is not tooooooooooooooo old. But for the last 3-4 days once daily the file system went read-only and the machine became unresponsive.

When the system is in trouble I see in the console something like


systemd-journal ... : Failed to write entry (24 items...), ignoring: Read-only file system
print_req_error: I/O error, dev sda, sectro ... flags 01
Buffer I/O error on dev sda3, logical block 0, lost sync page write
EXT4-fs (sda3): I/O error while writing super block
EXT4-fs error (device sda3): ext4_journal_check_start:61: Detected aborted journal
EXT4-fs (sda3): Remounting filesystem read-only
.....

After rebooting the system is fine for some time. I see nothing in dmesg after the reboot.

Is the SSD end-of-life? Or another problem with the EXT4 file system I use on / and /home (sda3) (third partition is a 2.01 GB swap partition).

Any help highly appreciated, in the meantime I will prepare for a fresh install on an new SSD.

Has fstrim ever been configured or applied to that SSD? How full are its partitions? How much unpartitioned space does it have? Is the SATA cable connected to it old and red?

fstrim: no


df -i
Filesystem       Inodes  IUsed    IFree IUse% Mounted on
devtmpfs        1016099    815  1015284    1% /dev
tmpfs           1018567      1  1018566    1% /dev/shm
tmpfs           1018567   1269  1017298    1% /run
tmpfs           1018567     18  1018549    1% /sys/fs/cgroup
/dev/sda2       1313280 532950   780330   41% /
/dev/sda3       5890048   8328  5881720    1% /home
tmpfs           1018567     34  1018533    1% /run/user/1000
tmpfs           1018567      5  1018562    1% /run/user/0


df -H
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.2G     0  4.2G   0% /dev
tmpfs           4.2G     0  4.2G   0% /dev/shm
tmpfs           4.2G  2.1M  4.2G   1% /run
tmpfs           4.2G     0  4.2G   0% /sys/fs/cgroup
/dev/sda2        22G   18G  2.8G  87% /
/dev/sda3        95G   13G   81G  14% /home
tmpfs           835M   25k  835M   1% /run/user/1000
tmpfs           835M     0  835M   0% /run/user/0

Unpartitioned space: zero. I could erase the Swap (never needed, have 8 GB RAM, could add 8 GB more, if needed…)

The SATA cable is yellow and relatively new (acceptable quality with such metal brackets to fix the connectors…)

PS: I did now:


~ # fstrim -v /
/: 3.6 GiB (3840299008 bytes) trimmed
~ # fstrim -v /home
/home: 76.1 GiB (81732378624 bytes) trimmed

Which tells me what? :slight_smile:

Just out of curiosity, what’s your configurations in /etc/fstab?
As per some of the SSD optimization “tips and tricks” floating online I added noatime and discard options but I forget what they meant.

UUID=f27a3295-305c-491a-a210-eb3c73b1639d  /          ext4  noatime,acl,user_xattr,discard               0  1
UUID=a5e8ed32-bf39-4ac1-8fb9-a86a8ff0714e  /home      ext4  data=ordered,acl,user_xattr  0  2
UUID=17A6-7549                             /boot/efi  vfat  defaults                     0  0

noatime - don’t update inode access times

discard - whether ext4 should issue discard/TRIM commands to the underlying block device when blocks are freed

The machine froze twice yesterday evening while watching DVB-S2 with kaffeine (1.2.2, 4.14.38).

Afterwards I tried a zypper dup, now it want’s to install 3513 updates, the download of each update takes an eternity, I see something like “1kb/s”. We have now 10 updates downloaded in about 10 min, so it will take some days to download them.

What’s going on here, the machine was doing fine some days ago and I havn’t changed a thing in the meantime (I’m the only root on this one)…

Mirror trouble coupled with another complete rebuild of TW packages.

Manythanks for replying! Will try updating later again… :slight_smile:

Anything else I can do to increase the wellness of the SSD? I did (just for fun) another fstrim yesterday evening on /home and again it was something like 72 GB coming back as “trimmed”… Is that normal?

Most likely, yes.

Just as an example, I run fstrim weekly on / and /home (two separate ssd drives) any typically see:

*** Fri, 07 Jun 2019 17:45:03 +0100 ***
/: 41.5 GiB (44572663808 bytes) trimmed
/home: 91.4 GiB (98101776384 bytes) trimmed
*** Fri, 14 Jun 2019 20:15:01 +0100 ***
/: 41.5 GiB (44572049408 bytes) trimmed
/home: 91.3 GiB (98016595968 bytes) trimmed

Sorry, me be back, the nightmare is not over. I could install the 3000something updates successfully in the meantime, but while on kaffeine watching some DVB-S2 channel the system froze again, and on console 2 I saw that the file system is read-only again:

https://paste.opensuse.org/95262371

Any ideas what this can mean? Is the good Intel SSD damaged?

PS: the / partition is 81% full and ran out of disk space when starting installation of updates. I removed some locales an installation went smooth afterwards. Started now an fstrim for /

fstrim -v /
/: 4.5 GiB (4863619072 bytes) trimmed

So this can’t be the cause for the read-only filesystem or?

Why is there this upper limit of 20 GB for /? I tried in the past to get 25 GB during install, but iirc this never worked…

It appears your repos may be configured with keeppackages enabled, running you out of space with these repeated zypper dups of all installed packages. If keeppackages is enabled, try ‘zypper clean’, then checking freespace. If freespace grew significantly, try disabling keeppackages to avoid this problem repeating.

If the space is available there is no limit on the size of root partition. Note it must be continues unused space

fdisk -l will show a break down of the partition usage and where the partitions start and end.

20 gig is pretty thin for TW on BTRFS with snapper. Min recommended is 40 gig and more would be good.

…used space was reduced from 82% to 81% by “zypper clean”. Not really significant I guess…

Another complete system freeze with file system read-only (after 1-2 h of kaffeine DVB-S2 streaming). Subsequently two times I could not log in as the user to GUI after rebooting. After third reboot (and successful login as user to CLI) I managed to get the GUI back. But this is not sustainable, will have to replace the SSD and see if this helps. If there are no other ideas.

PS: @gogal all EXT4 here… :wink:

Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: INTEL SSDSC2BW12
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x90909090

Device     Boot    Start       End   Sectors  Size Id Type
/dev/sda1           2048   4208639   4206592    2G 82 Linux swap / Solaris
/dev/sda2  *     4208640  46153727  41945088   20G 83 Linux
/dev/sda3       46153728 234440703 188286976 89.8G 83 Linux


As root or using sudo, take a walk through the / filesystem with ncdu to see if you can find a location inappropriately gobbling space, such as /var/cache, /tmp or /var/tmp.

I use filelight for sumfink like this, but there doesn’t look anything strange in / … :-/

The SSD went read-only over night while just sitting around doing nothing, all users logged out. What’s going on here? :frowning:

Probably worth a check of the SATA and power cables to the drive to ensure they are fully seated.

A recent post by @montana_suse_user (https://forums.opensuse.org/showthread.php/536271-Loading-meta-information-failed) also had an SSD going read only. In that instance it was a faulty drive with the fault occurring as the drive warmed.

I think you may be looking at a new drive :frowning:

What I tried in the meantime:

  1. Took out the SSD and ran an extended SMART test. Came back without errors. Temp according to SMART was never above 39°C, which is plausible in the case/location of this Dell Precision T7500.

  2. I changed the whole RAM.

  3. I downloaded the current TW netinstaller 3 times and burned to two different USB-sticks. But trying to install to a new SSD (Intel 535 120 GB, as the old one) with /home copied over from the old SSD, I never came beyond the first few steps before the system froze.

  4. I ran an extended built-in hardware test on the workstation, without finding any errors.

  5. I booted on the very same machine my TW installed on an USB-stick (updated this morning), which is doing fine, as I write this post on this machine now.

I’m somewhat lost at that time…

Hi
Have you seen this?
SSD is in Locked Read-Only Mode