Problem with disks order after snapshot 20230921

I have a HP server with 3 logical disks initialy defined as
#lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
ssda
├─sda1 swap 1 ab7656ab-41e6-41e8-aa16-bbb4ca3c643e
├─sda2 ext4 1.0 Tumbleweed 78cf2dab-782f-467c-b749-1bc362844d10 96G 13% /
└─sda4 ext4 1.0 home dc7675a1-24a2-4ec2-a3a1-458af1b9d1e0 34.8G 74% /home
sdb
├─sdb1 ext4 1.0 srv bf2e6374-cee0-4c18-96b5-851f821ac807 78G 16% /srv
├─sdb2 ext4 1.0 var 4136696f-efbb-4c11-94f4-99aaf7ec32a6 91.9G 1% /var
├─sdb3 ext4 1.0 local 89036c77-2a68-4ad2-9dfd-f950288600fa 18.7G 57% /local
└─sdb4 ext4 1.0 opt 55cafd49-1996-4599-8af6-46fbaf61b5eb 46.5G 0% /opt
sdc
└─sdc1

partition sdc1 238 GB not mounted and used for xen VM
all filesystems are ext4
I Intalled this morning snapshot 20230921.
After reboot with new kernel I had problem with the VM which didn’t start
Looking in fdisk I saw that the disks where swapped
sda was old sdc xen partition
sdb was old sda
sdc was old sdb

The swap partition was also not reachable
I did "shutdown -H now ", restarted the system and the disk order again changed :rage:

hpprol2:~ # lsblk -f
NAME   FSTYPE FSVER LABEL      UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                
├─sda1 ext4   1.0   srv        bf2e6374-cee0-4c18-96b5-851f821ac807   72.2G    21% /srv
├─sda2 ext4   1.0   var        4136696f-efbb-4c11-94f4-99aaf7ec32a6   90.9G     2% /var
├─sda3 ext4   1.0   local      89036c77-2a68-4ad2-9dfd-f950288600fa   22.8G    48% /local
└─sda4 ext4   1.0   opt        55cafd49-1996-4599-8af6-46fbaf61b5eb   46.4G     0% /opt
sdb                                                                                
├─sdb1 swap   1                cfee5866-6fbc-4b4f-bc7a-f0a5312ccc25                
├─sdb2 ext4   1.0   Tumbleweed 78cf2dab-782f-467c-b749-1bc362844d10     86G    22% /
└─sdb4 ext4   1.0   home       dc7675a1-24a2-4ec2-a3a1-458af1b9d1e0   57.7G    60% /home
sdc                                                                                
└─sdc1  

but now I could start the vm which uses sdc1 as stockage space

I have also see that in yast the bootlader works now with sdb

Boot Loader Settings         
  ┌Boot Code Options──Kernel Parameters──Bootloader options─────┐
  │                                                                │
  │ Boot Loader                                                    │
  │ GRUB2▒▒▒▒▒▒▒▒▒↓                                                │
  │                         
  │ ┌Boot Code Location──────────────────────────────────────────┐ │
  │ │ [ ] Write to Partition (/dev/sdb2)                         | │
  │ │ [x] Write to Master Boot Record (/dev/sdb)                 │ │
  │ │ [ ] Custom Boot Partition                                  │ │
  │ │ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │
  │ └────────────────────────────────────────────────────────────┘ │
  │                                                                │
  │ [x] Set active Flag in Partition Table for Boot Partition        
  │ [ ] Write generic Boot Code to MBR                             │ 
  │ [Edit Disk Boot Order]            

and I m sure that before it worked with sda
Boot command

# cat /proc/cmdline
root=UUID=78cf2dab-782f-467c-b749-1bc362844d10 splash=verbose showopts vga=838 mitigations=auto security=apparmor

Any idea what trigger these changes?

For a long time disk drivers were built-in and now they are built as modules and loaded on demand. So apparently USB drivers are loaded before HDD drivers.

The order of disk enumeration was never guaranteed to be stable across reboots and you should never rely on it. Why do you think the /dev/disk/by-* exist?

/dev/sd*/dev/nvme* are dynamic - you cannot guarantee the order they will appear in.

use the UUID - it is the only thing that will not change.

to see what UUID is attached to each drive use the blkid command.

1 Like

There are 4 physical identical disks on a array (raid 5) split in 3 logical disks. So they are of same type

# blkid
/dev/sdb4: LABEL="home" UUID="dc7675a1-24a2-4ec2-a3a1-458af1b9d1e0" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="000aab75-04"
/dev/sdb2: LABEL="Tumbleweed" UUID="78cf2dab-782f-467c-b749-1bc362844d10" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="000aab75-02"
/dev/sdb1: UUID="cfee5866-6fbc-4b4f-bc7a-f0a5312ccc25" TYPE="swap" PARTUUID="000aab75-01"
/dev/sdc1: PTUUID="2a446799-6ed3-4719-82cc-c54d7e10c01c" PTTYPE="gpt" PARTUUID="0001a1e7-01"
/dev/sda4: LABEL="opt" UUID="55cafd49-1996-4599-8af6-46fbaf61b5eb" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="0005f844-04"
/dev/sda2: LABEL="var" UUID="4136696f-efbb-4c11-94f4-99aaf7ec32a6" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="0005f844-02"
/dev/sda3: LABEL="local" UUID="89036c77-2a68-4ad2-9dfd-f950288600fa" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="0005f844-03"
/dev/sda1: LABEL="srv" UUID="bf2e6374-cee0-4c18-96b5-851f821ac807" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="0005f844-01"

I don’t see how to use the uuid in the xen VM définition storage

hpprol2:/etc/libvirt/storage # cat diskc.xml
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh pool-edit diskc
or other application using the libvirt API.
-->

<pool type='disk'>
  <name>diskc</name>
  <uuid>480e9674-a615-460f-9410-062907fbc0a8</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <device path='/dev/sdc'/>
    <format type='unknown'/>
  </source>
  <target>
    <path>/dev</path>
  </target>
</pool>
hpprol2:/etc/libvirt/storage # 

the disk for the VM (sdc) is a raw disk( formated during installation of xen VM) and is not known by-uuid (because it is not mounted?)

hpprol2:/dev/disk/by-uuid # ls -l
total 0
lrwxrwxrwx 1 root root 10 Sep 23 14:44 4136696f-efbb-4c11-94f4-99aaf7ec32a6 -> ../../sda2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 55cafd49-1996-4599-8af6-46fbaf61b5eb -> ../../sda4
lrwxrwxrwx 1 root root 10 Sep 23 14:44 78cf2dab-782f-467c-b749-1bc362844d10 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 89036c77-2a68-4ad2-9dfd-f950288600fa -> ../../sda3
lrwxrwxrwx 1 root root 10 Sep 23 14:44 bf2e6374-cee0-4c18-96b5-851f821ac807 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 cfee5866-6fbc-4b4f-bc7a-f0a5312ccc25 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 dc7675a1-24a2-4ec2-a3a1-458af1b9d1e0 -> ../../sdb4
hpprol2:/dev/disk/by-uuid # 

This setting was done 5 years ago and worked without problem until today.

@phil524 I wonder if it’s related to this (That’s why following the Factory ML is a good idea when using Tumbleweed) SCSI device identification and SCSI symlink generation in sg3_utils 1.48

2 Likes

path=/dev/disk/by-uuid/xxxx
Anyway, any further question about VM configuration belongs to Virtualization section.

1 Like

I have trouble imagining UUID being any more resistant to change than LABEL.

or lsblk -f, when filesystems matter more than partitions or disks.

Here is my fstab and what I execute to add my usb drive to the correct mount point.
I always use the UUID - it is fail proof since /dev/sd? are dynamically assigned.

LLR1:~ # cat /etc/fstab
UUID=b8b9c9a0-e9dc-4a23-bcf4-39f72538c6be  swap       swap  defaults  0  0
UUID=868bf3ab-5299-495e-8104-fee60868253f  /          ext4  defaults  0  1
UUID=4A62-631A                             /boot/efi  vfat  defaults  0  2
LLR1:~ # cat bin/mountit
mount -t ext4 -U 7b809416-35cb-4560-88e0-187849a25d43 /a/o
mount -t ext4 -U 50641663-0199-491e-9114-82e1eb9d4a70 /a/p
mount -t ext4 -U 730b7e52-5edd-4f2b-8e5e-933c16192a21 /a/q
mount -t ext4 -U 4af0f788-8703-44ec-be7c-88b92e733c7d /a/u
LLR1:~ #

Then use any other stable link under /dev/disk like /dev/disk/by-id or /dev/disk/by-path or /dev/disk/by-partuuid`. Whatever is available.

Device paths change randomly on infamous host erlangen:

erlangen:~ # inxi -D
Drives:    Local Storage: total: 5.48 TiB used: 1.74 TiB (31.8%)
           ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 2TB size: 1.82 TiB
           ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 970 EVO Plus 2TB size: 1.82 TiB
           ID-3: /dev/sda vendor: Crucial model: CT2000BX500SSD1 size: 1.82 TiB
           ID-4: /dev/sdb vendor: Garmin model: Garmin Flash size: 14.58 GiB type: USB
           ID-5: /dev/sdc vendor: Garmin model: Garmin SD Card size: 7.32 GiB type: USB
           ID-6: /dev/sdd vendor: Garmin model: FR735XT size: 10.1 MiB type: USB
erlangen:~ # 

Mounting relies entirely on UUIDs, which never changed so far. Even when replacing the hardware I reused the existing UUID. :slight_smile:

1 Like

I don’t have time to troubleshoot now, but will jump in long enough to note that I also had the problem reported by Phil524 with snapshot 20230921, and had to restore to an earlier snapshot to keep working. On reboot, TW sometimes mixed up disk order with 20230921, or didn’t see some disks at all. I’ve never experienced anything similar on my system with TW or other distros.

1 Like

I’ll note that I have had this “problem” with Leap 15.4. But I don’t consider it a real problem. I did have to modify one script to work around it. It isn’t just a Tumbleweed problem.

1 Like

Thanks,
I had already read this email but it seemed insignificant to me following this remark

If you don’t use dm-multipath, and don’t use hardware IDs to refer to
any disks or partitions anywhere, you can stop reading here.

Also I can’t find the file he’s talking about (00-scsi-sg3_config.rules). I have only

hpprol2:/usr/lib/udev/rules.d # ls -l *sg3*.rules
-rw-r--r-- 1 root root 2340 août  12 17:21 54-before-scsi-sg3_id.rules
-rw-r--r-- 1 root root 6359 août  12 17:21 55-scsi-sg3_id.rules
-rw-r--r-- 1 root root 2864 août  12 17:21 58-scsi-sg3_symlink.rules
hpprol2:/usr/lib/udev/rules.d # 

if I look in the /dev/diisk/by-id I have

hpprol2:/dev/disk/by-id # ls -l
total 0
lrwxrwxrwx 1 root root  9 Sep 23 14:44 ata-hp_DVD-RAM_GHA3N_KEHE2EG3554 -> ../../sr1
lrwxrwxrwx 1 root root  9 Sep 23 14:44 ata-hp_DVD-ROM_SH-116AB_R8VT68BDA000CR -> ../../sr0
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sdb
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_00000000-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_01000000 -> ../../sda
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_01000000-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_01000000-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_01000000-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_01000000-part4 -> ../../sda4
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_02000000 -> ../../sdc
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-0HP_LOGICAL_VOLUME_02000000-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-3600508b1001c21d488c454a46bc39f22 -> ../../sdc
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001c21d488c454a46bc39f22-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-3600508b1001c99233458581ffb65cc88 -> ../../sdb
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001c99233458581ffb65cc88-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001c99233458581ffb65cc88-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001c99233458581ffb65cc88-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-3600508b1001cd3e7527deb3b931a6639 -> ../../sda
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001cd3e7527deb3b931a6639-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001cd3e7527deb3b931a6639-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001cd3e7527deb3b931a6639-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-3600508b1001cd3e7527deb3b931a6639-part4 -> ../../sda4
lrwxrwxrwx 1 root root  9 Sep 23 14:44 scsi-SHP_LOGICAL_VOLUME_0014380280B60D0 -> ../../sdb
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-SHP_LOGICAL_VOLUME_0014380280B60D0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-SHP_LOGICAL_VOLUME_0014380280B60D0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-SHP_LOGICAL_VOLUME_0014380280B60D0-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Sep 23 14:44 scsi-SHP_LOGICAL_VOLUME_0014380280B60D0-part4 -> ../../sda4
lrwxrwxrwx 1 root root  9 Sep 23 14:44 wwn-0x600508b1001c21d488c454a46bc39f22 -> ../../sdc
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001c21d488c454a46bc39f22-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Sep 23 14:44 wwn-0x600508b1001c99233458581ffb65cc88 -> ../../sdb
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001c99233458581ffb65cc88-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001c99233458581ffb65cc88-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001c99233458581ffb65cc88-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Sep 23 14:44 wwn-0x600508b1001cd3e7527deb3b931a6639 -> ../../sda
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001cd3e7527deb3b931a6639-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001cd3e7527deb3b931a6639-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001cd3e7527deb3b931a6639-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Sep 23 14:44 wwn-0x600508b1001cd3e7527deb3b931a6639-part4 -> ../../sda4
hpprol2:/dev/disk/by-id # 

Here I have the disk sdc but not in /dev/disk/by-uuid.

In fact all these files in /dev/disk/by* are just symbolic links to /dev/sd[a-c].
Common advice seems that onlu by-uuid are reliable but I have a problem with sdc1 which is a raw device (not formatted and not present in by-uuid)
Can I be sure that links other than those in /dev/disk/by-uuid are invariable?

Many thanks in adavance
Philippe

My fstab is a bit different because I use label

 # cat /etc/fstab
LABEL=Tumbleweed  /       ext4  defaults      0  1
LABEL=var         /var    ext4  data=ordered  0  2
LABEL=srv         /srv    ext4  data=ordered  0  2
LABEL=opt         /opt    ext4  data=ordered  0  2
LABEL=local       /local  ext4  data=ordered  0  2
LABEL=home        /home   ext4  data=ordered  0  2
/dev/sda1       swap    swap    defaults        0  0

I now changed the swap définition using the UUID

Regards
Philippe

1 Like

It looks like the problem arrived for me with the 6.5.4 kernel. I managed a temporary fix by adding locks for:

kernel-default kernel-default-base kernel-default-devel kernel-devel kernel-macros kernel-source kernel-syms

to keep the 6.5.3-1 kernel and dup’ing again to the 20230921 snapshot. All is well again.

Thanks I changed to UUID or by-id on all scripts, config files and VM definition and all seems working without problem.
Let’s hope this continues :slightly_smiling_face:

Many thanks
Philippe

(With more troubleshooting time) I think I figured it out, thanks to posts by arvidjaar and others in this thread, and want to share what I learned with other less technically astute TW users:

Until yesterday, I used this command in a script file to open an encrypted disk:

sudo cryptsetup luksOpen /dev/sdb2 encrypteddisk

That suddenly stopped working with the dup to snapshot 20230921.

I used $ lsblk -f to get the UUID of /dev/sdb2

sdb2 crypto_LUKS 1 48ea2z09-1357-84v5-r1lm-695bc31d35b7

and edited the script file command to read:

sudo cryptsetup luksOpen /dev/disk/by-uuid/48ea2z09-1357-84v5-r1lm-695bc31d35b7 encrypteddisk

The script has worked on repeat reboots, with all above-mentioned kernel locks removed and the current TW kernel 6.5.4-1 running. I’ll guess that the post that malcolmlewis linked explains why the so-long tried-and-true script suddenly stopped worked with the arrival of snapshot 20230921.

That is more then logic. UUIDs are part of file systems (and swap spaces). So, no file system: no UUID. But when there is no file system, nothing can be mounted, so no need for an UUID.

And when you are fed-up with UUIDs, you can always add a LABEL to file systems (again: file systems) and use by LABEL. And then (I hope) those LABELs will be of your own, easy to understand, invention.

I agree for filesystem labels seem as reliable as UUIDs and much more easy to use.
What confuses me is for the data by-id there are several records for each partition.
example for /dev/sdb1:

hpprol2:~/Script # ls -l /dev/disk/by-id | grep sdb1
lrwxrwxrwx 1 root root 10 Sep 24 11:53 scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 24 11:53 scsi-3600508b1001c99233458581ffb65cc88-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 24 11:53 scsi-SHP_LOGICAL_VOLUME_0014380280B60D0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 24 11:53 wwn-0x600508b1001c99233458581ffb65cc88-part1 -> ../../sdb1
hpprol2:~/Script # 

Regards
Philippe

Because there are several variants of “id”. Different tools generate different “id”. As long as they remain stable across reboot, it does not matter which one you use.