I’ve been testing out leap micro 5.2, 5.3 (and 5.4 alpha) as a base OS for container and VM workloads. It is working really well.
I am willing to commit to roll out in production machines (3 independent machines), however there is one unsolved system desing issue for me at least:
- I get that the aim of leap micor is a disposable, container base / VM host runtime OS, therefore the default install is into one single disk partition, without any redundancy.
- In the last 14-15 years I’ve run with great satisfaction SuSE/OpenSuse as a server OS with ext3/4 on top of (lvm and) mdraid1 on remote servers: when one of the hdds inevitably failed, the system continued to operate without interruption, even if rebooted - until I replaced the failed hdd.
- Is it this possible with leap micro?
- btrfs root is mandatory/necessary for snapsnots and rollback => if I add a second device to btrfs during install and convert to raid1 profile everything is working as expected during normal operation. However in case of a simulated hdd failure (remove the 1st or the 2nd device…) the system can no longer boot. I’ve looked in SUSE, OpenSUSE documentation and I have found nothing how to handle this case. I’ve tried various combination of rootfsflags=degraded, rc.break=pre-mount etc: I could not start the system (waiting on missing drives to appear…)
Sure, booting the machine off a USB stick, mounting -o degraded seems to work, until discovering that @/etc is a read only snapshot… so no dice of simple recovery.
- btrfs root is mandatory/necessary for snapsnots and rollback => if I add a second device to btrfs during install and convert to raid1 profile everything is working as expected during normal operation. However in case of a simulated hdd failure (remove the 1st or the 2nd device…) the system can no longer boot. I’ve looked in SUSE, OpenSUSE documentation and I have found nothing how to handle this case. I’ve tried various combination of rootfsflags=degraded, rc.break=pre-mount etc: I could not start the system (waiting on missing drives to appear…)
What is the suggested use of this distribution in this case:
- root btrfs on top of md-raid does not seems that optimal, but maybe this is the only way? (I know that I’ll loose btrfs based self healing…)
- root btrfs on top of lvm raid1: seems very not recommended, but at lest it is very flexible…
So my question is: I want some level of better availability for my servers with leap micro, can it be done?
If so what is the recommended way of achieving it in a single node system?