BtrFS has gone ReadOnly... again.

Hello

For direct background… https://forums.opensuse.org/showthread.php/527372-20170924-to-20170925-dup-failed

I thought everything was hunky-dory again now [ie, ok]. Since this afternoon’s Snapper Rollback, my TW Tower had been running well again. It had only been idling this evening, whilst i was off doing other stuff away from it. I’d not installed or removed any software, nor attempted any Repo-associated activity, since the successful rollback.

It’s now my 23:45, & ~30’ ago i just returned to it “briefly” to do something quick, when to my shock i realised it was not working properly, again… eg:

  1. Launch YaST
Configuration file "/root/.config/y2controlcenterrc" not writable.
Please contact your system administrator.
  1. Then launch YaST2
Configuration file "/root/.config/rubyrc" not writable.
Please contact your system administrator.
  1. Attempt to refresh repos in Konsole
gooeygirl@linux-Tower:~> sudo zypper refresh
sudo: unable to open /var/lib/sudo/ts/gooeygirl: Read-only file system

I examined KSystemLog, & found that tonight [when i was not at pc]:


Friday, 29 September 2017 22:30:30 AEST    kernel    BTRFS info (device sda2): forced readonly

Oh noooo, that’s also what happened just so very recently, per the linked thread above. Here’s every log entry from ~an hour before it occurred tonight: https://pastebin.com/cZcaS4pN

Sigh, i went to bed last night with a broken system, fixed it today, now going to bed again with it broken once more. Thanks universe. All comments & advice would be welcome, pls.

PS: Now 30/9 00:15 as i Send this post.

Hi
Any coredumps? Maybe start running things as real root (su - ) rather than sudo?


coredumpctl list
/CODE

Thanks Malcolm

I was off to bed, but thought of one more thing i wanted to check, so wandered back in & saw your post. I don’t know if i did this correctly, & if so, whether anything herein is significant:


gooeygirl@linux-Tower:~> **coredumpctl list**
Hint: You are currently not seeing messages from other users and the system.
      Users in the 'systemd-journal' group can see all messages. Pass -q to
      turn off this notice.
No coredumps found.


gooeygirl@linux-Tower:~> **sudo coredumpctl list**
sudo: unable to open /var/lib/sudo/ts/gooeygirl: Read-only file system
[sudo] password for root: 
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2017-08-23 00:56:45 AEST  30142  1000   100  31 none      /usr/bin/pulseaudio
Wed 2017-08-23 01:02:11 AEST   2109  1000   100  11 none      /usr/bin/plasmashell
Wed 2017-08-23 17:28:59 AEST  14492  1000   100  31 none      /usr/bin/pulseaudio
Wed 2017-08-23 17:57:34 AEST  23655  1000   100  31 none      /usr/bin/pulseaudio
Wed 2017-08-23 19:57:20 AEST   2890  1000   100   6 none      /usr/bin/kactivitymanagerd
Wed 2017-08-23 19:57:20 AEST  30954     0     0   6 none      /usr/bin/kactivitymanagerd
Wed 2017-08-23 19:57:20 AEST   2799  1000   100  11 none      /usr/bin/ksmserver
Wed 2017-08-23 19:57:30 AEST   3658  1000   100   6 none      /opt/teamviewer/tv_bin/TVGuiSlave.64
Wed 2017-08-23 20:02:00 AEST   4909  1000   100   5 none      /usr/bin/clementine
Wed 2017-08-23 20:06:57 AEST   8422     0     0  11 none      /usr/bin/ruby
Thu 2017-08-24 00:07:03 AEST   3747  1000   100   5 none      /usr/bin/python2.7
Thu 2017-08-24 08:27:58 AEST   6129  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Thu 2017-08-24 12:37:42 AEST  21314  1000   100  11 none      /opt/kingsoft/wps-office/office6/et
Thu 2017-08-24 23:35:32 AEST   2786  1000   100  11 none      /usr/bin/plasmashell
Fri 2017-08-25 12:34:53 AEST  15867  1000   100  31 none      /usr/bin/pulseaudio
Fri 2017-08-25 15:41:41 AEST   6677  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-08-28 09:29:07 AEST   8761  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-08-28 09:29:11 AEST   2720  1000   100  11 none      /usr/bin/plasmashell
Mon 2017-08-28 20:28:44 AEST   4583  1000   100  31 none      /usr/bin/pulseaudio
Mon 2017-08-28 20:31:40 AEST   4792  1000   100  31 none      /usr/lib64/qt5/libexec/QtWebEngineProcess
Mon 2017-08-28 20:31:40 AEST   4786  1000   100   6 none      /usr/bin/akregator
Mon 2017-08-28 20:42:36 AEST   5415  1000   100  31 none      /usr/bin/pulseaudio
Tue 2017-08-29 16:53:46 AEST  10636  1000   100   6 none      /usr/bin/firejail
Thu 2017-08-31 13:23:14 AEST   3469  1000   100  31 none      /usr/bin/dosemu.bin
Thu 2017-08-31 17:23:12 AEST  10202  1001   100  11 missing   /usr/bin/plasmashell
Thu 2017-08-31 17:25:03 AEST  10268  1001   100   6 missing   /usr/bin/kactivitymanagerd
Fri 2017-09-01 09:50:15 AEST   8762  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Sat 2017-09-02 21:42:38 AEST   4577  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Sat 2017-09-02 23:56:04 AEST   3363  1000   100   6 none      /usr/bin/cairo-dock
Sun 2017-09-03 09:18:23 AEST   4709  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sun 2017-09-03 11:18:48 AEST   4698  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sun 2017-09-03 12:12:26 AEST   4695  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin                                                                  
Sun 2017-09-03 17:10:34 AEST  12594  1000   100  31 none      /usr/bin/pulseaudio                                                                                
Mon 2017-09-04 13:07:23 AEST    925  1000   100  31 none      /usr/bin/pulseaudio                                                                                
Mon 2017-09-04 13:48:59 AEST   2411  1000   100  31 none      /Seagate/4. Software/Security/KeePassX/KeePassXC-2.2.0-x86_64.AppImage                             
Mon 2017-09-04 13:50:06 AEST   2459  1000   100  31 none      /Seagate/4. Software/Security/KeePassX/KeePassXC-2.2.0-x86_64.AppImage                             
Mon 2017-09-04 14:00:29 AEST   2887  1000   100  31 none      /home/gooeygirl/Downloads/AppImages active/KeePassXC-2.2.0-x86_64.AppImage                           
Mon 2017-09-04 17:11:22 AEST   8378  1000   100  31 none      /home/gooeygirl/Downloads/KeePassXC-2.2.0-x86_64.AppImage                                            
Mon 2017-09-04 18:05:32 AEST  10075  1000   100  31 none      /home/gooeygirl/Downloads/KeePassXC-2.2.0-x86_64.AppImage                                            
Mon 2017-09-04 18:42:29 AEST  11530  1000   100   5 none      /opt/vivaldi/vivaldi-bin                                                                           
Tue 2017-09-05 10:00:56 AEST   4710  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin                                                                  
Wed 2017-09-06 07:38:36 AEST   4167  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Thu 2017-09-07 12:07:36 AEST  26023  1000   100  31 none      /usr/bin/pulseaudio
Thu 2017-09-07 12:15:04 AEST  26477  1000   100  31 none      /usr/bin/pulseaudio
Thu 2017-09-07 12:53:52 AEST  30188  1000   100  31 none      /usr/bin/pulseaudio
Fri 2017-09-08 12:35:07 AEST   2759  1000   100  11 none      /usr/bin/plasmashell
Fri 2017-09-08 13:21:05 AEST   6265  1001   100   6 missing   /usr/bin/kactivitymanagerd
Fri 2017-09-08 14:32:21 AEST   2735  1000   100  11 none      /usr/bin/ksmserver
Fri 2017-09-08 14:32:21 AEST   2825  1000   100   6 none      /usr/bin/kactivitymanagerd
Sat 2017-09-09 09:01:19 AEST  11393  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-09-11 23:04:05 AEST   3865  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-09-11 23:54:26 AEST   3868  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Tue 2017-09-12 11:22:47 AEST   3779  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Wed 2017-09-13 08:48:37 AEST   4346  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Wed 2017-09-13 09:01:59 AEST   4347  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Thu 2017-09-14 12:38:00 AEST  23307  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Thu 2017-09-14 18:59:59 AEST  14936  1000   100  31 none      /usr/bin/pulseaudio
Fri 2017-09-15 14:19:07 AEST   4165  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-15 14:19:13 AEST   3883  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sat 2017-09-16 08:09:09 AEST   4493  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Sat 2017-09-16 19:43:09 AEST   4018  1001   100  11 missing   /usr/bin/plasmashell
Sat 2017-09-16 20:49:46 AEST  20975  1001   100   5 missing   /usr/bin/python2.7
Sat 2017-09-16 20:49:46 AEST   4075  1001   100   6 missing   /usr/bin/kactivitymanagerd
Sun 2017-09-17 06:53:17 AEST  24691  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Sun 2017-09-17 07:01:34 AEST  24696  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sun 2017-09-17 07:18:05 AEST  24654  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sun 2017-09-17 14:25:22 AEST  31118  1000   100  31 none      /usr/bin/pulseaudio
Mon 2017-09-18 09:04:40 AEST  24651  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-09-18 16:32:09 AEST   2745  1000   100  11 none      /usr/bin/plasmashell
Tue 2017-09-19 07:20:05 AEST   5871  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Tue 2017-09-19 18:53:21 AEST  16877  1000   100  31 none      /usr/bin/pulseaudio
Wed 2017-09-20 08:39:39 AEST  28532  1000   100   5 none      /opt/vivaldi-snapshot/vivaldi-bin
Wed 2017-09-20 14:33:00 AEST  15211  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Wed 2017-09-20 16:42:50 AEST   3125  1000   100   5 none      /usr/bin/clementine
Fri 2017-09-22 09:36:10 AEST    980  1000   100  31 none      /usr/lib64/thunderbird/thunderbird-bin
Fri 2017-09-22 09:43:43 AEST  31728  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 13:42:28 AEST  31482  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 15:41:29 AEST  14633  1000   100   6 none      /usr/bin/firejail
Fri 2017-09-22 19:01:15 AEST  21000  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:01:16 AEST  21011  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:01:33 AEST  21026  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:01:34 AEST  21035  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:02:59 AEST  21934  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:02:59 AEST  21945  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:03:23 AEST  21981  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:03:23 AEST  21987  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:06:09 AEST  22173  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:06:09 AEST  22179  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:06:32 AEST  22205  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:32:20 AEST  25887  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:32:21 AEST  25895  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:32:38 AEST  25932  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Fri 2017-09-22 19:32:39 AEST  25941  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sat 2017-09-23 07:53:20 AEST   6215  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sat 2017-09-23 11:14:38 AEST  16373  1000   100  31 none      /usr/bin/pulseaudio
Sat 2017-09-23 13:07:01 AEST   8205  1000   100  11 none      /usr/bin/perl
Sat 2017-09-23 13:46:57 AEST  23382  1000   100   6 none      /usr/bin/firejail
Sun 2017-09-24 08:42:31 AEST   5755  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Sun 2017-09-24 15:06:43 AEST  13254     0     0   6 none      /usr/bin/ruby
Sun 2017-09-24 16:48:02 AEST  17226     4     7  11 missing   /usr/local/Brother/cupswrapper/brcupsconfig3
Sun 2017-09-24 17:23:54 AEST  20427     0     0   6 none      /usr/bin/ruby
Sun 2017-09-24 17:28:58 AEST  23762  1000   100  31 none      /usr/bin/pulseaudio
Sun 2017-09-24 17:30:58 AEST  23846  1000   100  31 none      /usr/bin/pulseaudio
Sun 2017-09-24 20:01:49 AEST   4045  1000   100  31 none      /usr/bin/pulseaudio
Sun 2017-09-24 20:02:55 AEST   4074  1000   100  31 none      /usr/bin/pulseaudio
Sun 2017-09-24 20:07:53 AEST   4093  1000   100  31 none      /usr/bin/pulseaudio
Sun 2017-09-24 20:08:05 AEST   4138  1000   100  31 none      /usr/bin/pulseaudio
Sun 2017-09-24 20:22:58 AEST   5938  1000   100  31 none      /usr/bin/pulseaudio
Mon 2017-09-25 10:52:06 AEST   4620  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-09-25 12:41:47 AEST   4629  1000   100  11 none      /opt/vivaldi-snapshot/vivaldi-bin
Mon 2017-09-25 19:34:07 AEST  18031  1000   100  31 none      /usr/bin/pulseaudio
Mon 2017-09-25 19:35:23 AEST  18083  1000   100  31 none      /usr/bin/pulseaudio
Tue 2017-09-26 12:06:19 AEST   3797  1000   100  31 none      /usr/bin/pulseaudio
Tue 2017-09-26 13:04:45 AEST  11002  1000   100  31 none      /usr/bin/pulseaudio
Tue 2017-09-26 13:44:10 AEST   3532  1001   100  31 missing   /usr/bin/pulseaudio
Tue 2017-09-26 13:44:22 AEST   3578  1001   100  31 missing   /usr/bin/pulseaudio
Tue 2017-09-26 19:47:05 AEST   3760  1000   100  11 none      /usr/bin/plasmashell
Tue 2017-09-26 19:54:30 AEST  18830  1000   100  11 none      /usr/bin/plasmashell
Thu 2017-09-28 16:20:19 AEST  15649  1000   100  31 none      /usr/bin/pulseaudio
Thu 2017-09-28 16:22:11 AEST  15794  1000   100  31 none      /usr/bin/pulseaudio
Thu 2017-09-28 17:05:16 AEST  16740  1000   100  31 none      /usr/bin/pulseaudio
Fri 2017-09-29 12:09:09 AEST   3965  1000   100   5 none      /usr/bin/python2.7
lines 75-123/123 (END)



PS: I thought it was regarded as Sehr Verboten to run stuff as real root…?

This is that other thing i wanted to check. Way back when i first installed TW, with the Ruby Installer i set the No Access Time flag on my partitions [it was a legacy habit of mine from pre-oS, supposedly to enhance the SSD life]. If i recall & i’m just about falling down tired, so this might be wrong], way back you [or maybe Henk?] queried me on why i’d done that, & mentioned that with modern SSDs it was redundant. Back then i decided that “redundant” was not the same as “bad”, so i left it as-is, then forgot about it. Some time later [but still a fair while back now], i think i read somewhere that using noatime with BtrFS might actually lead to BtrFS becoming less reliable, & possibly resulting in data corruption/loss [again, i’m tired, that might be hogwash].

So the question is, wrt this now twice-occurred BtrFS resetting to ReadOnly… might noatime be a factor, & should i now remove all of those entries from my fstab?


UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c / btrfs noatime 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /boot/grub2/i386-pc btrfs noatime,subvol=@/boot/grub2/i386-pc 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /boot/grub2/x86_64-efi btrfs noatime,subvol=@/boot/grub2/x86_64-efi 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /opt btrfs noatime,subvol=@/opt 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /srv btrfs noatime,subvol=@/srv 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /usr/local btrfs noatime,subvol=@/usr/local 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/cache btrfs noatime,subvol=@/var/cache 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/crash btrfs noatime,subvol=@/var/crash 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/libvirt/images btrfs noatime,subvol=@/var/lib/libvirt/images 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/machines btrfs noatime,subvol=@/var/lib/machines 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/mailman btrfs noatime,subvol=@/var/lib/mailman 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/mariadb btrfs noatime,subvol=@/var/lib/mariadb 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/mysql btrfs noatime,subvol=@/var/lib/mysql 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/named btrfs noatime,subvol=@/var/lib/named 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/lib/pgsql btrfs noatime,subvol=@/var/lib/pgsql 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/log btrfs noatime,subvol=@/var/log 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/opt btrfs noatime,subvol=@/var/opt 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/spool btrfs noatime,subvol=@/var/spool 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /var/tmp btrfs noatime,subvol=@/var/tmp 0 0
/dev/mapper/cr_ata-ST2000DM001-1ER164_Z8E001EQ-part1 swap swap defaults 0 0
UUID=f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c /.snapshots btrfs noatime,subvol=@/.snapshots 0 0
UUID=23927deb-03f4-4b7b-9599-9e44c9f86919 /Seagate             ext4       noatime,acl           1 2
UUID=f4ae6a8c-a541-4bbd-b086-6358872a3962 /SeagateSpare        xfs        noatime               1 2
UUID=DFF3-1AD7       /boot/efi            vfat       umask=0002,utf8=true  0 0
/dev/mapper/cr_ata-Samsung_SSD_850_EVO_250GB_S21MNSAG105383J-part3 /home                xfs        noatime,nofail        0 2
tmpfs                /tmp                 tmpfs      size=2G               0 0



Finally, is it safe to reboot now in this ReadOnly status? Will that reset with the reboot, or is it now “stuck” like that?

I’ll read your anticipated thanks] response after i wake up again.

Hi
Quite a few coredumps…

You need to inspect with coredumpctl gdb <PID>

Could be noatime related, always used the defaults for awhile now, also running the bfq scheduler on Tumbleweed (with a SSD), system is rock solid.

Hi
So if you run the command mount what are all the options used? I guess it would reboot, but maybe back to RO… I would be tempted to remove your noatime from btrfs mounts.

Things may well have changed since theses two articles, which are approx 5 years old, but they advocate the use of noatime as the default on btrfs.

https://lwn.net/Articles/499293/
https://lwn.net/Articles/499294/

And if one scrolls down to almost the last paragraph on this much more recent (2017) wiki on the kernel.org site it also suggests using noatime.

https://btrfs.wiki.kernel.org/index.php/Mount_options

So perhaps the OP’s problem isn’t noatime related and lies elsewhere.

Let’s face it - your filesystem is corrupted. It may work for some time until it hits bad spot again when it screams. You can try btrfs repair, but honestly, if your goal is to get working system back - recreate filesystem and restore from backup.

Your system is corrupted and now unreliable. You need to start over, it’s too far gone.

I would suggest reinstalling a temporary Linux or Windows, and doing some stress testing first for a couple days with Prime95, IBT, compile loops, AIDA64, MemTest, etc. See if you can catch any cpu/mem hardware glitches.

Also, try to verify that your ssd / hdd is not on the way out.

Thanks for all replies, rather depressing though the tone is.

Qualifier before i ask my undoubtedly naive question below… I am not disputing anyone’s suggestion that my system is apparently beyond repair [though i still can’t grasp [i]how such calamity should have arisen], but it seems i still haven’t understood something basic to the whole Tumbleweed zypper dup concept.

Let’s face it - your filesystem is corrupted. It may work for some time until it hits bad spot again when it screams

Your system is corrupted and now unreliable. You need to start over, it’s too far gone

I had thought that the whole point of TW being a rolling release, & the approved way to “update” it being actually to “upgrade” it via zypper dup, was that * each time a virtually brand new OS is installed, replacing the previous snapshot. Obviously that is patently untrue for those occasional quite small dups that only contain a few packages, but conversely my experience frequently has been that many hundreds & indeed more than one thousand packages are involved, comprising often one or a few GB.

Thus, even if my Tower incomprehensively somehow managed to irreparably screw up its current snapshot, wouldn’t / shouldn’t / doesn’t the next “big” dup automagically take care of that problem by means of overwriting all the previous system files, thus making my TW “good” again?

Furthermore, given i use BtrFS, & the other day [per my linked thread] i did an apparently successful Snapper Rollback to a snapshot prior to “all hell breaking loose”, shouldn’t that have ipso facto given me a “good” system once again?

As you can see, i’m clearly badly misunderstanding the situation.

Now, moving onto the specifics of what i need to do, despite me still not understanding the why,

recreate filesystem and restore from backup

Sorry, but what does this actually mean?

  1. How do i recreate

my filesystem; is that just some shorthand way of saying i need to do a fresh installation of TW from media? 1. What
do i restore from backup? If this refers to all my docs & data, then of course that’s fine & easy. But is it referring to “my” stuff, or to “system” stuff?

Regarding:

a temporary Linux …, and doing some stress testing first for a couple days with Prime95, IBT, compile loops, AIDA64, MemTest, etc. See if you can catch any cpu/mem hardware glitches

,thank you. Later i shall DDG those things to try to decide what to do.

With respect to my earlier question about noatime, in addition to Paul’s interesting advice, i’m thinking for now that my query on that might be a red-herring [at least in terms of my current problem], because i took the same decision on my Lappy [this thread is for my Tower], on which i installed TW a few weeks before my Tower a few months ago, yet Lappy continues to be fine [touch wood**].

*As i write this [on Tower], Lappy is also on my desk near me, & is doing its weekly zypper dup… from 20170913 to 20170928. It has just frozen at this stage [it will not respond to the “r” option, it will not launch Dolphin or TeamViewer (hence this photo rather than copied text in a codebox) – note that Lappy was working just fine prior to initiating this [i]dup]:

https://paste.opensuse.org/images/70594744.jpg

I do not know if it was the same error, but seeing this now on Lappy has reminded me that, during a dup within the timeframe of my recent Tower dramas, it also threw a similar-looking error [ie, red font]. I can’t recall what option i chose then, but i’m now wondering if:

  1. That might have lead to my Tower’s TW pain?
  2. Is my Lappy TW now about to be broken as well?

If the upshot of all this is that yes i have to do a clean install of an OS into my root partition, then… if i stay with oS… I need to decide [again] on TW or Leap, AND on [for root] BtrFS or Ext4. I have no idea if BtrFS is causally implicated in my troubles *. If i decide to reinstall TW, & retain BtrFS, & later suffer the same problems, i would then feel foolish that i used BtrFS again. Conversely if i reinstall TW then use ext4 for root, i am throwing away the substantial safety-net that is Snapper Rollbacks. Using a cutting edge rolling release like TW without Snapper… would that be brave or stupid of me? Or, if i decide that despite my great like for TW i can’t tolerate this ongoing stress & time-stealing it’s causing me, & so install Leap instead, then i am accepting an old Plasma version, & old pgms & old kernel, though maybe that’s least important to me?]. I really strongly like Plasma 5.10.x over 5.8.x, so that would be a very sad outcome for me.

This is all really hard.*

Hi
It could be so many things… considering the SUSE developers are the top commiters to btrfs, one would think they know their mount settings… hardware, ssd, ram, Southern Hemisphere, dream time, who knows considering you have one system that works fine, sort of leans toward your hardware setup…?

You need to take small steps, stick with whichever filesystem, install the default stuff from the default repos only, no additional repositories and run that through a few update cycles over a week and see how it goes.

Footnote re Lappy [reminder, other than this temporary detour, this thread pertains to Tower, not Lappy]:

Indeed, it was broken, many normal functions stopped working, even trying to log out or reboot were ignored [from the desktop]. Toggling to TTY2 failed, but TTY3 worked, from where i was able to REISUB. Upon boot, as i suspected, could not log back into my desktop. I spat the dummy, rebooted, opted into Snapper in grub menu, booted into my erstwhile TW snapshot of 20170913 as per immediately before this morn’s self-aborted dup, verified it still worked ok, completed the rollback, did normal reboot, successfully logged back into my good desktop… & because i am apparently an idiot or masochist, am now part way through retrying the zypper dup. :stuck_out_tongue:

So:

  1. Bad
    = TW is tormenting me not only on Tower, but today [for first time] also on Lappy. Not confidence-inspiring! 1. Good
    = BtrFS + Snapper Rollbacks are simply magnificent [but is BtrFS [i]causing me these problems, or purely rescuing me from them?] !!

Wrt my ongoing Tower problem, on a purely probabilistic basis i am now disinclined to suspect my HW as root-cause, given i think the probability of me simultaneously having Lappy HW faults is too implausible. I thus suspect either BtrFS, or recent TW snapshots, as potential causes. The least-worst next step for me, re Tower, is still causing me to scratch my head.

Good call - of all the things i’d been suspecting, i forgot to include that sneaky Coriolis Force. Physics; gets ya every time!!

Not necessarily, per my post made during your post…

Hmmm, ok, but what should i do NOW? Is a new installation still unavoidable, & strongly advised? .

IMO

The file system exists as a layer above your hardware (ie disk blocks) and below your files.
Since a “zypper dup” only replaces files, if you have file system corruption, you’re not addressing.

I didn’t notice whether you tried a “btrfs repair” as was suggested.
But, before you do <anything> you should make sure you copy all your important files to external storage (or make a backup).
And, immediately after that, you should install smartmon tools and get a health check on your disk. Failing disks are a common cause of file system failure.

After the above,
Then sit down, have a drink and think about the info you’ve collected.
If the disk is going bad, then it’s clear you need to replace.
If the disk is in good health, then you may need to backup or at least detail your system, then maybe re-partition but definitely reformat, then re-install or restore from backup.
Since you’re working on a SSD, don’t forget to manually clear your traps.

HTH,
TSU

Guys, signal 9 is kill signal … either OP is messing with us, or they are running out of memory.

OP, start fresh with ext4.

I’ve still not tried a reboot yet, as i’m still trying to understand all the replies received here, & doing lots of research to try to understand associated concepts & methods. Am now working my way back down this thread to provide replies i’ve not earlier done.

Re mount:


gooeygirl@linux-Tower:~> **mount**
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=16165732k,nr_inodes=4041433,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
/dev/sda2 on / type btrfs (ro,noatime,ssd,space_cache,subvolid=642,subvol=/@/.snapshots/318/snapshot)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1739)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
tmpfs on /tmp type tmpfs (rw,relatime,size=2097152k)
/var/lib/snapd/snaps/keepassxc_23.snap on /snap/keepassxc/23 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/core_2898.snap on /snap/core/2898 type squashfs (ro,nodev,relatime)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro)
/dev/sda2 on /opt type btrfs (ro,noatime,ssd,space_cache,subvolid=262,subvol=/@/opt)
/dev/sda2 on /.snapshots type btrfs (ro,noatime,ssd,space_cache,subvolid=258,subvol=/@/.snapshots)
/dev/sda2 on /var/cache type btrfs (ro,noatime,ssd,space_cache,subvolid=265,subvol=/@/var/cache)
/dev/sda2 on /var/lib/machines type btrfs (ro,noatime,ssd,space_cache,subvolid=268,subvol=/@/var/lib/machines)
/dev/sda2 on /srv type btrfs (ro,noatime,ssd,space_cache,subvolid=263,subvol=/@/srv)
/dev/sda2 on /var/log type btrfs (ro,noatime,ssd,space_cache,subvolid=274,subvol=/@/var/log)
/dev/sda2 on /var/lib/pgsql type btrfs (ro,noatime,ssd,space_cache,subvolid=273,subvol=/@/var/lib/pgsql)
/dev/sdb3 on /Seagate type ext4 (rw,noatime,data=ordered)                                                                                                        
/dev/sda2 on /boot/grub2/i386-pc type btrfs (ro,noatime,ssd,space_cache,subvolid=260,subvol=/@/boot/grub2/i386-pc)                                               
/dev/sda2 on /var/lib/libvirt/images type btrfs (ro,noatime,ssd,space_cache,subvolid=267,subvol=/@/var/lib/libvirt/images)                                       
/dev/sda2 on /var/lib/named type btrfs (ro,noatime,ssd,space_cache,subvolid=272,subvol=/@/var/lib/named)                                                         
/dev/sda2 on /var/crash type btrfs (ro,noatime,ssd,space_cache,subvolid=266,subvol=/@/var/crash)                                                                 
/dev/sda2 on /var/spool type btrfs (ro,noatime,ssd,space_cache,subvolid=276,subvol=/@/var/spool)                                                                 
/dev/sdb2 on /SeagateSpare type xfs (rw,noatime,attr2,inode64,noquota)                                                                                           
/dev/sda2 on /var/tmp type btrfs (ro,noatime,ssd,space_cache,subvolid=277,subvol=/@/var/tmp)                                                                     
/dev/sda2 on /var/lib/mailman type btrfs (ro,noatime,ssd,space_cache,subvolid=269,subvol=/@/var/lib/mailman)                                                     
/dev/sda2 on /var/lib/mariadb type btrfs (ro,noatime,ssd,space_cache,subvolid=270,subvol=/@/var/lib/mariadb)                                                     
/dev/sda2 on /var/opt type btrfs (ro,noatime,ssd,space_cache,subvolid=275,subvol=/@/var/opt)
/dev/sda2 on /var/lib/mysql type btrfs (ro,noatime,ssd,space_cache,subvolid=271,subvol=/@/var/lib/mysql)
/dev/sda2 on /usr/local type btrfs (ro,noatime,ssd,space_cache,subvolid=264,subvol=/@/usr/local)
/dev/sda2 on /boot/grub2/x86_64-efi type btrfs (ro,noatime,ssd,space_cache,subvolid=261,subvol=/@/boot/grub2/x86_64-efi)
/dev/mapper/cr_ata-Samsung_SSD_850_EVO_250GB_S21MNSAG105383J-part3 on /home type xfs (rw,noatime,attr2,inode64,noquota)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=3234608k,mode=700,uid=1000,gid=100)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
gooeygirl@linux-Tower:~> 



I assume you mean well, but pls note these:
[ol]
[li]I explicitly stated that the photo containing that “signal 9” was from my Lappy, not my Tower, & i have several times emphasised that this thread pertains to my Tower. I only mentioned that Lappy dups failure at all, in as much as it potentially cast doubt on the TW snapshot, not the HW running it. [BTW, the end of that story was that the repeated Lappy dup, after successful Snapper Rollback, did complete successfully to give me (on Lappy) 20170928].[/li][li]I resent the implication that i am insincere & am doing all this stuff here in this thread just for kicks, ie, to mess with you. I have better ways to get my jollies than that. What have i written, anywhere here, to make you doubt my bona fides?[/li][li]Urging me to adopt ext4 instead of btrfs implies that we know the fault is btrfs… do we?[/li][/ol]

We see only tip of iceberg. Messages you provided mean filesystem metadata are corrupted. We do not know when corruption happened or for what reason. But with any filesystem when you see such corruption, almost the only way out is to recreate filesystem and restore data. You can try to reach developers whether there were known bugs that could lead to such corruption, but that’s probably all.

You could try “btrfs check --repair”, and if it succeeds (or at least does not obviously fail) then try rollback again. This could be faster than rebuilding your system from scratch.

See also https://btrfs.wiki.kernel.org/index.php/Btrfsck (do not take it as final wisdom, btrfs wiki is known to be outdated).

Oh that’s a very nice explanation – many thanks!

You are correct, i have not [yet?] tried this, because:

  1. The full context of that advice from arvidjaar
    was > You can try btrfs repair, but honestly, if your goal is to get working system back - recreate filesystem and restore from backup.
    which i interpreted as meaning that he had a low probability estimation of it helping, & that i probably should instead just grasp the nettle & reinstall… at least i think that’s what he meant… hence i asked for clarification per my

How do i *recreate my filesystem; *is that just some shorthand way of saying i need to do a fresh installation of TW from media?

  1. I have DDG’d btrfs repair
    & have found lots of old &/or contradictory info that has further confused me. The best i’ve been able to interpret is from https://btrfs.wiki.kernel.org/index.php/Btrfsck & it is that i should ?] do this:
  • Boot into live media

  • Run

btrfs check --repair <device>
  • Given from
gooeygirl@linux-Tower:~> **sudo blkid**
**​**sudo: unable to open /var/lib/sudo/ts/gooeygirl: Read-only file system
[sudo] password for root: 
/dev/sda1: SEC_TYPE="msdos" UUID="DFF3-1AD7" TYPE="vfat" PARTLABEL="primary" PARTUUID="bde1c356-f603-4a7e-ad42-c399c35f9750"
**/dev/sda2: LABEL="root"** UUID="f3e11c85-7f1a-4e9e-8585-5b6a61e4ea8c" UUID_SUB="65ee96ef-42d9-4fe4-b96c-c57e868cc214" TYPE="btrfs" PARTLABEL="primary" PARTUUID="b56c2d68-cc5f-48e7-bbd0-2b981a6c602a"
/dev/sda3: UUID="a3869823-41e7-498d-abe7-84438db8f4af" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="d2fc8cea-af0e-446f-8a59-264cc59a1451"
/dev/sdb1: UUID="1be44f1e-8593-460e-91ab-c6132245f640" TYPE="crypto_LUKS" PARTUUID="00060756-01"
/dev/sdb2: LABEL="SeagateSpare" UUID="f4ae6a8c-a541-4bbd-b086-6358872a3962" TYPE="xfs" PARTUUID="00060756-02"
/dev/sdb3: LABEL="Seagate" UUID="23927deb-03f4-4b7b-9599-9e44c9f86919" TYPE="ext4" PARTUUID="00060756-03"
/dev/mapper/cr_ata-ST2000DM001-1ER164_Z8E001EQ-part1: UUID="18f78b3e-eb9c-442f-b25a-8975c4e927b1" TYPE="swap"
/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/mapper/cr_ata-Samsung_SSD_850_EVO_250GB_S21MNSAG105383J-part3: UUID="13867e5b-203b-4cce-ad30-a23e826a77cb" TYPE="xfs"
gooeygirl@linux-Tower:~> 
  • …my root partition is /dev/sda
    , i assume then that i should run ```
    btrfs check --repair /dev/sda
 from the live media ???




[quote="tsu2,post:14,topic:128025"]
IMO
But, before you do <anything> you should make sure you copy all your important files to external storage (or make a backup).

And, immediately after that, you should install smartmon tools and get a health check on your disk. Failing disks are a common cause of file system failure.

TSU
[/quote]

Yes all my data is backed up, other than the browser i'm using right now for this thread. Once i have finished replying & researching, i shall close it, back it up too, then proceed... albeit it's my evening now, so i might hold off on proceeding til tomorrow morning, by whence all you kind folk might have replied back if anything i've written is wrong or bad. 

Wrt *smartmon*, it is already installed in TW, both PCs. Eg, here from Tower:

gooeygirl@linux-Tower:~> sudo smartctl --all /dev/sda
sudo: unable to open /var/lib/sudo/ts/gooeygirl: Read-only file system
[sudo] password for root:
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.13.3-1-default] (SUSE RPM)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 850 EVO 250GB
Serial Number: S21MNSAG105383J
LU WWN Device Id: 5 002538 da00d3127
Firmware Version: EMT01B6Q
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Sep 30 18:30:30 2017 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 133) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 12309
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 888
177 Wear_Leveling_Count 0x0013 095 095 000 Pre-fail Always - 90
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 099 010 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 073 057 000 Old_age Always - 27
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 25
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 45232701090

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
255 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

gooeygirl@linux-Tower:~>


Maybe overnight i shall run a self-test.


[quote="tsu2,post:14,topic:128025"]
IMO
If the disk is in good health, then you may need to backup or at least detail your system, then maybe re-partition but definitely reformat, then re-install or restore from backup.
Since you're working on a SSD, **don't forget to manually clear your traps**.

HTH,
TSU
[/quote]

Could you pls explain that bolded part?

Thank you *tsu2*.

No, it is not.