Why? root filesystem turned read-only during zypper dup

I turned on my desktop and booted to primary install … I let it sit for 10 minutes, then opened a command line … executed the usual “zypper dup”. (about 130 something updates)

Sat back on the couch, and a few minutes later, I see red text on the console and thought “download error” (not unusual).

Actually, all files for update had downloaded and 3/4 of the way thru installation, the error showed:
“unable to install … read only filesystem (root)”

I rebooted back, then ran “zypper dup” again and it realized where it left off and continued to install remaining packages, no problem.

Any suggestions on next steps ?
What should I investigate ?

Take a look at the output from dmesg - I would guess that there was an I/O error with your storage device, and the system switched it to read-only as a result, but the dmesg output will give you more info if that’s the case.

Thanks for the reply @hendersj

I don’t see anything “significant” in the logs.
First thing I did after booting was to scan the zypper.log file, so I could establish the time-frame of the “zypper dup” that got stumped with a install issue.

2023-05-28 21:51:08 <1> daffy(3177) [zypper] main.cc(main):125 ===== Hi, me zypper 1.14.60
2023-05-28 21:51:08 <1> daffy(3177) [zypper] main.cc(main):126 ===== 'zypper' '-vvv' 'dup' =====
...
2023-05-28 22:05:37 <1> daffy(2944) [zypper] main.cc(~Bye):98 ===== Exiting main(0) =====

So, somewhere in that 14-15 minute timeframe, I should find something in journalctl output.
Nothing jumps out at me … I did a search for “read-only” and only found one instance

(journalctl -o short-precise -k -b -2)

May 28 21:49:51.353461 daffy kernel: Freeing unused decrypted memory: 2036K
May 28 21:49:51.353467 daffy kernel: Freeing unused kernel image (initmem) memory: 4084K
May 28 21:49:51.353472 daffy kernel: Write protecting the kernel read-only data: 30720k
May 28 21:49:51.353478 daffy kernel: Freeing unused kernel image (rodata/data gap) memory: 1840K
May 28 21:49:51.353483 daffy kernel: Run /init as init process
May 28 21:49:51.353488 daffy kernel:   with arguments:
May 28 21:49:51.353493 daffy kernel:     /init
May 28 21:49:51.353499 daffy kernel:   with environment:
May 28 21:49:51.353504 daffy kernel:     HOME=/
May 28 21:49:51.353511 daffy kernel:     TERM=linux
May 28 21:49:51.353516 daffy kernel:     BOOT_IMAGE=/boot/vmlinuz-6.3.2-1-default
May 28 21:49:51.353521 daffy kernel:     splash=silent

Then I did a search for “error” - again, nothing significant for that timeframe.

So back in the journalctl output, I looked for the timeframe of the zypper dup and here it is - as you can see, no entries around the 21:51 timeframe - AAMOF, it jumps from 21:50:32 to 21:53:36

May 28 21:50:22.163704 daffy kernel: Bluetooth: RFCOMM TTY layer initialized
May 28 21:50:22.163725 daffy kernel: Bluetooth: RFCOMM socket layer initialized
May 28 21:50:22.163737 daffy kernel: Bluetooth: RFCOMM ver 1.11
May 28 21:50:32.815753 daffy kernel: logitech-hidpp-device 0003:046D:1025.0008: HID++ 1.0 device connected.

May 28 21:53:36.451714 daffy kernel: BTRFS info (device nvme0n1p2): using crc32c (crc32c-intel) checksum algorithm
May 28 21:53:36.451789 daffy kernel: BTRFS info (device nvme0n1p2): disk space caching is enabled
May 28 21:53:36.451812 daffy kernel: BTRFS info (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 670, gen 0
May 28 21:53:36.455701 daffy kernel: BTRFS info (device nvme0n1p2): enabling ssd optimizations
May 28 21:53:36.455733 daffy kernel: BTRFS info (device nvme0n1p2): auto enabling async discard

That’s really strange. While I’d have looked at dmesg, -k on journalctl should show the same messages, so that’s fine.

I’ve never seen this kind of behavior before - maybe someone else will have an idea since the logs aren’t showing anything. If the device is SMART-capable, you might run a diagnostic on the drive that holds the data and see if it’s reporting any errors. In my experience, remounting the filesystem only happens when the kernel needs to protect the data, and that’s usually when there’s a hardware issue going on.

With the root file system turned read-only saving the journal to disk will stop.

  1. You may want to check for UNALLOCATED space as follows:
erlangen:~ # btrfs filesystem usage -T /
Overall:
    Device size:                   1.77TiB
    Device allocated:            538.07GiB
    Device unallocated:            1.25TiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        523.22GiB
    Free (estimated):              1.26TiB      (min: 649.68GiB)
    Free (statfs, df):             1.26TiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

                  Data      Metadata System                            
Id Path           single    DUP      DUP      Unallocated Total   Slack
-- -------------- --------- -------- -------- ----------- ------- -----
 1 /dev/nvme0n1p2 530.01GiB  8.00GiB 64.00MiB     1.25TiB 1.77TiB     -
-- -------------- --------- -------- -------- ----------- ------- -----
   Total          530.01GiB  4.00GiB 32.00MiB     1.25TiB 1.77TiB 0.00B
   Used      

With some 1.25TiB unallocated space maintenance of infamous host erlangen is virtually hassle-free.

The backup system is fine too:

erlangen:~ # btrfs filesystem usage -T /mnt
Overall:
    Device size:                  48.83GiB
    Device allocated:             37.07GiB
    Device unallocated:           11.76GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         27.96GiB
    Free (estimated):             20.05GiB      (min: 14.17GiB)
    Free (statfs, df):            20.05GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               74.75MiB      (used: 0.00B)
    Multiple profiles:                  no

                  Data     Metadata System                             
Id Path           single   DUP      DUP      Unallocated Total    Slack
-- -------------- -------- -------- -------- ----------- -------- -----
 1 /dev/nvme0n1p3 34.01GiB  3.00GiB 64.00MiB    11.76GiB 48.83GiB     -
-- -------------- -------- -------- -------- ----------- -------- -----
   Total          34.01GiB  1.50GiB 32.00MiB    11.76GiB 48.83GiB 0.00B
   Used           25.72GiB  1.12GiB 16.00KiB                           
erlangen:~ # 

The Tumbleweed backup system sits on a 50 GB partition. Some 38 GB allocated space is a much higher value than expected for a Tumbleweed default installation. When in trouble always watch for unallocated space first.

  1. Do some stress testing. While typing this post on host erlangen command stress-ng --hdd 2 --iomix 4 --vm 6 --cpu 8 runs with foreground priority causing these load factors:
erlangen:~ # w
 07:03:36 up 27 min,  4 users,  load average: 55.88, 54.27, 43.39
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
karl     tty7     :0               06:36   26:55  26.86s  0.09s /usr/bin/startplasma-x11
karl     pts/0    :0               06:36   26:40   0.00s  1.07s /usr/bin/kded5
karl     pts/1    :0               06:36   25:52   6:26m  0.15s /bin/bash
karl     pts/2    :0               06:37    0.00s  0.47s  0.14s /bin/bash
erlangen:~ # 

Host erlangen stays fully responsive. Inexperienced users won’t even notice the high system load.

What are the load factors when running the above command on your system? Does it stay responsive?

What do you mean by “primary install” ?
Because if you mean that you started the installation snapshot i.e. the first one, it is completely normal that it is read-only, because you have to rollback to get full functionality.

That machine has 2 separate nvme drives, each drive has its own dedicated TW install.
The primary is the main/default that I use. The redundant gets updated (zypper dup) frequently, but its main goal is “in case there is a catastrophic failure of primary drive”, I can boot up and use the redundant TW (while casually recovering the primary).

@aggie:

Did you, accidentally, enable some openSUSE MicroOS repositories?

Well, when I look at the “Enabled” repos view, they don’t show.
I do not see any Patterns selected for MicroOS …
HOWEVER, if I click on a couple different “MicrOS” patterns (Opensuse MicroOS and MicroOS KDE Plasma Desktop) , I show packages installed
Is that incorrect ? I don’t ever recall selecting packages in MicroOS

repos

micro-os1

micro-os2


Hi @karlmistelberger … thanks for the details … I guess I am not understanding “Unallocated”, because I show the output here, but my Conky and “df” shows about 10gb space available, but “unallocated” shows 1mb ???

Am I in trouble here with my root partition (I have a separate /home)
I have been doing quick research and articles I read suggest using “btrfs balance” ??

====== primary drive ==
:~ # btrfs filesystem usage -T /
Overall:
    Device size:                  30.00GiB
    Device allocated:             30.00GiB
    Device unallocated:            1.00MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         18.78GiB
    Free (estimated):             10.39GiB      (min: 10.39GiB)
    Free (statfs, df):            10.39GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               55.44MiB      (used: 0.00B)
    Multiple profiles:                  no

                  Data     Metadata  System                             
Id Path           single   single    single   Unallocated Total    Slack
-- -------------- -------- --------- -------- ----------- -------- -----
 1 /dev/nvme1n1p3 28.46GiB   1.51GiB 32.00MiB     1.00MiB 30.00GiB     -
-- -------------- -------- --------- -------- ----------- -------- -----
   Total          28.46GiB   1.51GiB 32.00MiB     1.00MiB 30.00GiB 0.00B
   Used           18.07GiB 727.78MiB 16.00KiB                           
:~ #

@karlmistelberger … well, I went ahead and ran “balance” … no real difference
(based on what i read here: Balancing a Btrfs filesystem | Forza's Ramblings)

:~ # btrfs balance start -dusage=6 /
Done, had to relocate 0 out of 43 chunks

:~ # btrfs filesystem usage -T /
Overall:
    Device size:                  30.00GiB
    Device allocated:             30.00GiB
    Device unallocated:            1.00MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         18.63GiB
    Free (estimated):             10.51GiB      (min: 10.51GiB)
    Free (statfs, df):            10.51GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               55.44MiB      (used: 0.00B)
    Multiple profiles:                  no

                  Data     Metadata  System                            
Id Path           single   single    single   Unallocated Total    Slack
-- -------------- -------- --------- -------- ----------- -------- -----
 1 /dev/nvme1n1p3 28.46GiB   1.51GiB 32.00MiB     1.00MiB 30.00GiB     -
-- -------------- -------- --------- -------- ----------- -------- -----
   Total          28.46GiB   1.51GiB 32.00MiB     1.00MiB 30.00GiB 0.00B
   Used           17.95GiB 693.14MiB 16.00KiB                          
:~ #

Device size is poor and your system is out of unallocated space. You may regain some allocated, but unused space by running a more aggressive btrfs balance: BTRFS and free space - emergency response • Oh The Huge Manatee!

6700k:~ # btrfs filesystem usage -T /
Overall:
    Device size:                  59.57GiB
    Device allocated:             37.05GiB
    Device unallocated:           22.52GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         29.95GiB
    Free (estimated):             28.82GiB      (min: 28.82GiB)
    Free (statfs, df):            28.82GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               85.67MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data     Metadata System                             
Id Path      single   single   single   Unallocated Total    Slack
-- --------- -------- -------- -------- ----------- -------- -----
 1 /dev/sda8 35.01GiB  2.01GiB 32.00MiB    22.52GiB 59.57GiB     -
-- --------- -------- -------- -------- ----------- -------- -----
   Total     35.01GiB  2.01GiB 32.00MiB    22.52GiB 59.57GiB 0.00B
   Used      28.71GiB  1.24GiB 16.00KiB                           
6700k:~ # btrfs balance start -dusage=66 /
Done, had to relocate 5 out of 40 chunks
6700k:~ # btrfs filesystem usage -T /
Overall:
    Device size:                  59.57GiB
    Device allocated:             34.05GiB
    Device unallocated:           25.52GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         29.95GiB
    Free (estimated):             28.83GiB      (min: 28.83GiB)
    Free (statfs, df):            28.82GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               84.77MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data     Metadata System                             
Id Path      single   single   single   Unallocated Total    Slack
-- --------- -------- -------- -------- ----------- -------- -----
 1 /dev/sda8 32.01GiB  2.01GiB 32.00MiB    25.52GiB 59.57GiB     -
-- --------- -------- -------- -------- ----------- -------- -----
   Total     32.01GiB  2.01GiB 32.00MiB    25.52GiB 59.57GiB 0.00B
   Used      28.71GiB  1.24GiB 16.00KiB                           
6700k:~ # 

The above run freed 3 GiB of allocated space.

If Btrfs then –

  • Regularly execute the Btrfs “Balance” and “Scrub” routines.
    But. you should have enabled the systemd Btrfs “Balance” and “ScrubTimers.
 > systemctl list-unit-files | grep -iE 'UNIT FILE|btrfs'
UNIT FILE                                                                 STATE           VENDOR PRESET
btrfsmaintenance-refresh.path                                             enabled         enabled
btrfs-balance.service                                                     static          -
btrfs-defrag.service                                                      static          -
btrfs-scrub.service                                                       static          -
btrfs-trim.service                                                        static          -
btrfsmaintenance-refresh.service                                          disabled        disabled
btrfs-balance.timer                                                       enabled         enabled
btrfs-defrag.timer                                                        enabled         enabled
btrfs-scrub.timer                                                         enabled         enabled
btrfs-trim.timer                                                          enabled         enabled
474 unit files listed.
 >
  • If, you have at least a Btrfs system partition, the following Timers must be enabled:
    btrfs-balance.timer
    btrfs-defrag.timer
    btrfs-scrub.timer
    btrfs-trim.timer

Assuming that, you haven’t changed the systemd TimerOnCalendar” setting, the Btrfs housekeeping should have been executing.

  • You can check the time when the systemd Timer will expire with, for example, the following command:

# systemctl status btrfs-balance.timer

@aggie:

Please be aware that, openSUSE Tumbleweed and openSUSE microOS are currently quite close to one another – on openSUSE Leap I don’t see anything related to openSUSE microOS in the YaST Software Management module …

  • A major difference between Tumbleweed and microOS is, the “Transactional (Atomic) Updates upon a read-only btrfs root filesystem” …
    Plus, Containers …

Therefore, if, for whatever reason, you accidentally pulled anything related to microOS into you Tumbleweed system when you executed “zypper dist-upgrade” yes, a consequence may well be that, you ended up with a read-only file-system on the system partition.

@karlmistelberger has a point as you need to make your system adequate to that small partition. Balance won’t help much, as what matters is the free space. Balance just moves data around.

Do you have snapshots enabled?

$ sudo snapper list

If you do, you need to limit them, as they take some space.

$ grep NUMBER_LIMIT /etc/snapper/configs/root
NUMBER_LIMIT="10"
NUMBER_LIMIT_IMPORTANT="10"

I’d set these to 4 at most given your partition size.

Hi @dcurtisfra and @awerlang … thanks for the replies.

Here’s the list-unit-files … looks like btrfs-scrub.timer and btrfs-trim.timer are Disabled.

:~ # systemctl list-unit-files | grep -iE 'UNIT FILE|btrfs'
UNIT FILE                                                                 STATE           PRESET
btrfsmaintenance-refresh.path                                             enabled         enabled
btrfs-balance.service                                                     static          -
btrfs-defrag.service                                                      static          -
btrfs-scrub.service                                                       static          -
btrfs-trim.service                                                        static          -
btrfsmaintenance-refresh.service                                          static          -
btrfs-balance.timer                                                       enabled         enabled
btrfs-defrag.timer                                                        disabled        enabled
btrfs-scrub.timer                                                         enabled         enabled
btrfs-trim.timer                                                          disabled        enabled
478 unit files listed.

.
====== for @awerlang 's questions:
Here’s the NUMBER_LIMIT

:~ # grep NUMBER_LIMIT /etc/snapper/configs/root
NUMBER_LIMIT="2"
NUMBER_LIMIT_IMPORTANT="2-4"
:~ #

.
And yes, snapshots are enabled

:~ # snapper list
    # | Type   | Pre # | Date                            | User | Used Space | Cleanup | Description           | Userdata     
------+--------+-------+---------------------------------+------+------------+---------+-----------------------+--------------
   0  | single |       |                                 | root |            |         | current               |              
   1* | single |       | Thu 20 Aug 2020 02:00:43 PM CDT | root |   4.86 MiB |         | first root filesystem |              
1845  | pre    |       | Wed 31 May 2023 10:43:20 AM CDT | root | 290.89 MiB | number  | zypp(zypper)          | important=yes
1846  | post   |  1845 | Wed 31 May 2023 10:43:49 AM CDT | root | 832.00 KiB | number  |                       | important=yes
1847  | pre    |       | Wed 31 May 2023 11:46:39 AM CDT | root |   1.11 MiB | number  | zypp(zypper)          | important=yes
1848  | post   |  1847 | Wed 31 May 2023 11:46:59 AM CDT | root | 336.00 KiB | number  |                       | important=yes
1849  | pre    |       | Wed 31 May 2023 12:03:22 PM CDT | root | 448.00 KiB | number  | yast sw_single        |              
1850  | post   |  1849 | Wed 31 May 2023 12:06:50 PM CDT | root | 592.00 KiB | number  |                       |              
1851  | pre    |       | Wed 31 May 2023 01:09:51 PM CDT | root | 112.00 KiB | number  | yast sw_single        |              
1852  | pre    |       | Wed 31 May 2023 01:10:43 PM CDT | root |  80.00 KiB | number  | zypp(ruby.ruby3.2)    | important=no 
1853  | post   |  1852 | Wed 31 May 2023 01:10:45 PM CDT | root |  64.12 MiB | number  |                       | important=no 
1854  | post   |  1851 | Wed 31 May 2023 01:11:26 PM CDT | root | 112.00 KiB | number  |                       |              

Nope!

btrfs balance packs data and releases allocated but unused 1 GiB chunks of disk space. Running btrfs balance start -dusage=99 / on host 6700k resulted in:

6700k:~ # btrfs filesystem usage -T /
Overall:
    Device size:                  59.57GiB
    Device allocated:             33.05GiB
    Device unallocated:           26.52GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         30.66GiB
    Free (estimated):             28.18GiB      (min: 28.18GiB)
    Free (statfs, df):            28.18GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               82.25MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data     Metadata System                             
Id Path      single   single   single   Unallocated Total    Slack
-- --------- -------- -------- -------- ----------- -------- -----
 1 /dev/sda8 31.01GiB  2.01GiB 32.00MiB    26.52GiB 59.57GiB     -
-- --------- -------- -------- -------- ----------- -------- -----
   Total     31.01GiB  2.01GiB 32.00MiB    26.52GiB 59.57GiB 0.00B
   Used      29.35GiB  1.31GiB 16.00KiB                           
6700k:~ # 

Used space is 29.35GiB . After balancing values are 31.01GiB - 29.35GiB = 1.66 GiB unused space.

@aggie reports 28.46GiB - 18.07GiB = 10.39GiB allocated but unused space. This will be reduced by running the appropriate balance start -dusage=99 /.

Do you mean it produced 1.66 GiB unallocated space? Used space didn’t change.

Actually, it grew comparing to the output shown previously :slight_smile: Both data and metadata.

No, that is really unused allocated space. I am rather puzzled what exactly it proves though.