Tumbleweed 20190428 stuck during boot at Assuming write cache: write through

averyfreeman · May 2, 2019, 11:41am

Hello,

I was reading about storage-ng and thought it was kind of neat sounding, so I decided to download the latest snapshot of Tumbleweed and install it in an ESXi VM.

The starting ISO was x86_64 net install.

A couple notable points during installation:

I opted for the encryption and entered passcode
The partition is EXT4 on LVM with snapshots, no separate home, swap sized to RAM
Scheduler is NOOP

That’s about it - the rest I tried to leave pretty default as I was just demoing it

The installation type was KDE Desktop, I turned on SSH and opened the port in firewall

Encryption key prompt comes up and moves past once I have entered the passphrase

I can choose either to boot or advanced boot options

If I choose regular boot I get the stuck result after that one line of text:
Now when I try to boot it I get stuck at this “[sda2:0:0] assuming write cache: write through…”

If I choose rescue mode boot I appear to be able to boot normally

This VM is not particularly important to me, but I wanted to make sure to report the issue so that devs could explore it. If it’s related to ESXi, that’s probably not exactly a tiny share of the user population. If it’s related to the new storage-ng encryption, I’m sure it’s something people want to know about.

Thanks!

gogalthorp · May 2, 2019, 3:35pm

DO snapshots work on ext4 now?? AFAIK snapper has not worked in the past on ext4 only on BTRFS.

knurpht · May 2, 2019, 4:46pm

No, snapshots do not work on ext4.

averyfreeman · May 2, 2019, 8:50pm

https://doc.opensuse.org/documentation/leap/reference/html/book.opensuse.reference/cha.snapper.html

Says BTRFS or thin-provisioned LVM. It was an option in the installer, not sure why it’d be there if it’s inoperable.

Does anyone have any idea why the system won’t boot without entering recovery mode?

Edit:

I got the idea from this video of an OpenSUSE dev presentation - here’s a screenshot of how they configured their system for the demonstration:

https://averyfreeman.com/images/opensuse-lvm-ext4-snapshots.JPG

https://www.youtube.com/watch?v=_0VKUjFAIwo

nrickert · May 3, 2019, 4:21am

I’m just commenting on the quoted part.

I do sometimes see that “assuming write cache: write through”, but I think I see [sd 2:0:0:0] or similar.

I only see that when I have a USB drive plugged in.

As best I recall, the order is:

Prompt for encryption password
“assuming write cache” message – this comes before I have time to enter the password.

I then enter the password anyway (after the “assuming write cache” message, and it is usually accepted and the system boots.

averyfreeman · May 3, 2019, 11:46am

Thanks for the idea. I had already entered my password for Grub so this did not occur to me as a possibility.

I tried entering my password a couple times at the point where the system hung, but I couldn’t get any response.

I don’t particularly need encryption, I am just testing the newest snapshot, but I thought it would be nice of me to give any bug reports I could.

Is there any sort of log that might be generated at this point? Can dmesg go back to previous boots? If I could get a dmesg, or any other log, from this portion of the boot process, I’d be glad to upload it to bugzilla et. al.

The problem is, it appears to be before any sort of logs are being generated. Please correct me if that is erroneous.

Thank you
Avery

nrickert · May 3, 2019, 2:13pm

That’s a problem for issues early in boot when no disk is yet available to write logs.

Can dmesg go back to previous boots?

No, it is only for the current boot. But it is stored in memory, so it can contain information about problems that occur before disks are available. However, if the boot is not successful, it is lost.

If you are using a virtual machine, then there might be an option to take screenshots of early boot problems. At least I can do that with KVM virtual machines.

averyfreeman · May 3, 2019, 2:13pm

Oh!

I left the VM on for a few hours and came back to it and it got two steps further. Now it says the following dmesg statements:

[sad] assuming drive cache: write through
Out of memory: Kill process 403 (plymouthd) score 907 or sacrifice child
Killed process 403 (plymouthd) total-vm:1848256kb, anon-rss:1843676kb, file-rss:2500k, shmem-rss:0kb

Does anyone think if I give the VM more memory it might load? Looks like plymouthd is running out of memory during boot, but could be indicative of another issue, I suppose…

Edit: I threw 4096MB at it and got the same result…

nrickert · May 3, 2019, 5:23pm

Try “plymouth.enable=0” as a kernel parameter on the boot line.

averyfreeman · May 4, 2019, 5:37am

plymouth.enable=0 on the boot line worked.

Thanks, that was most helpful. KDE also loaded as normal, AFAIK. I will edit my grub.cfg to reflect the change.

Do you think I should I submit a bug to bugzilla with my dmesg?

nrickert · May 4, 2019, 5:42am

No, don’t edit “grub.cfg”. Edit “/etc/default/grub” or – better – use Yast bootloader to make that change.

If you edit “grub.cfg” the change will go away on the next kernel update (or sooner).

Do you think I should I submit a bug to bugzilla with my dmesg?

That’s probably a good idea.

averyfreeman · May 4, 2019, 6:06am

@Knurpht said “No, snapshots do not work on ext4”

I tried taking a snapshot with snapper and indeed I get:

Creating config failed (invalid system type).

Is there any way to set LVM on the installer not thin-provisioned?

averyfreeman · May 4, 2019, 6:08am

@Knurpht said “No, snapshots do not work on ext4”

I tried taking a snapshot with snapper and indeed I get:

Creating config failed (invalid system type).

Is there any way to set LVM on the installer to be thin-provisioned?

nrickert · May 4, 2019, 6:41am

I think so. If I can find time tomorrow, I will experiment with that.

nrickert · May 5, 2019, 5:26am

I experimented with that, today. This was with Tumbleweed 20190502.

Yes, you can setup thin provisioning during install. But you will need to use the expert partitioner. And you will need to have some prior knowledge about this (which I didn’t have).

My install was to a KVM virtual machine with a 40G hard drive. I configured the VM for UEFI booting.

I will describe what I did. And, I’ll note that my first attempt failed. It looks as if “/boot” cannot be part of a thin provisioned volume. So, for my second attempt, I used a separate “/boot” partition.

I first created the EFI partition (256M). Then I created a partition for “/boot” (also 256M, format with “ext2”).
I then created a third partition with the remainder of the disk, which I set to be unformatted raw data.

I then went to the Logical Volume setup of the partitioner.

The first step was to create a new volume group. I gave it the name “system”, and I assigned that third partition to the volume group.

Then I started creating logical volumes within that group. I first create “swap” at 4G size. I thought it best to keep swap separate from the thin provisioning.

Next I created a logical volume, which I called “thinpool”. I checked the box for a thin pool. And I gave it all remain space in the volume group.

Next, I created volumes “root” and “home”. Since there was no free space remaining, it assigned those to use thin provisioning from “thinpool”. I made each 20G. I have no idea what would be a good choice of sizes if the whole purpose is to allow snapshots.

In any case, that all worked and the system booted.

No I’m off to report the bug about needing a separate “/boot”.

averyfreeman · May 6, 2019, 7:37pm

Oh sorry, I meant /etc/default/grub - I kind of meant it as a shorthand for the update from # grub2-mkconfig -o /boot/grub2/grub.cfg

But thanks for making that explicit, which is what I should have done - it might help someone else.

averyfreeman · May 6, 2019, 8:01pm

Thin-provisioning swap is interesting from a space usage standpoint, since it could expand/contract as needed, leaving more room for other partitions over-provisioning. Obviously using a separate swap partition with linux swap formatting is preferable from a performance standpoint, though.

Here’s the interesting thing about snapshot sizes, if I read this correctly they have to be pre-configured - I am not sure how that would interact with the ability to over-provision thin volumes. From Redhat 2.3. LVM Logical Volumes Red Hat Enterprise Linux 7 | Red Hat Customer Portal

The size of the snapshot governs the amount of space set aside for storing the changes to the origin volume. For example, if you made a snapshot and then completely overwrote the origin the snapshot would have to be at least as big as the origin volume to hold the changes. You need to dimension a snapshot according to the expected level of change. So for example a short-lived snapshot of a read-mostly volume, such as /usr, would need less space than a long-lived snapshot of a volume that sees a greater number of writes, such as /home.
If a snapshot runs full, the snapshot becomes invalid, since it can no longer track changes on the origin volume. You should regularly monitor the size of the snapshot. Snapshots are fully resizable, however, so if you have the storage capacity you can increase the size of the snapshot volume to prevent it from getting dropped. Conversely, if you find that the snapshot volume is larger than you need, you can reduce the size of the volume to free up space that is needed by other logical volumes.

It seems kind of clunky to me compared to CoW FS (still unclear whether thin-lvm is considered a type of CoW FS) but it is definitely a fascinating option. For people who are EXT4 or XFS purists (like RedHat) it offers the ability to have more modern features of CoW FS like snapshots, data-discard, etc. but obviously BTRFS seems superior on the face of it. Over-provisioning seems to be one truly unique feature, I’m not sure if that’s available any other way. In any event, no FS is truly “superior”, they’re all just different, right tool for the job and all that

Thanks for trying that and explaining how you did it, I really appreciate it!

tsu2 · May 6, 2019, 9:06pm

If you want to create EXT snapshots and have the ability to roll back similar to what you can have on BTRFS,
I highly recommend the following

TSU

nrickert · May 6, 2019, 10:15pm

It probably has a different intended usage case than “btrfs” snapshots.

Perhaps the idea is that you make a snapshot. And then you backup the snapshot. This avoids backing up actively changing files. Once you have made the backup, you can release the snapshot.

Thanks for trying that and explaining how you did it, I really appreciate it!

I did learn something about it. And I reported bug 1134130 (that “/boot” cannot be part of a thinly provisioned volume). I’m not sure what they will do about that. My recommendation was to just document the problem and mark as WONTFIX.

averyfreeman · May 7, 2019, 12:44am

That’s not snapshot-related, it’s file recovery software. It’s useful, but not in the same ballpark.

Snapper can take snapshots of thin-provisioned LVM volumes similar to BTRFS (although it is a bit different, not quite as elegant or refined).

I am just experimenting, I usually use BTRFS or ZFS, but I wanted to see the new storage-ng+yast development and thought it’d be fun to try LVM on OpenSUSE for a change.

I am disappointed that thin-provisioned LVM is not available through the guided partition setup, but @nrickert was nice enough to walk me through how to do it with the manual partitioner.

Thanks for the EXT4 file recovery tip, though.