Grub disordering in Leap 15.6 keeps recurring?

non_space · October 14, 2023, 3:58pm

Unfortunately, or “luck” . . . running those commands did not rally the grub ordering back into cosmic alignment.

What I did:
> 
>    mv /etc/modprobe.d/10-ahci-scsi.conf ~/  # disable previous fix
>    echo sd_mod > /etc/modules-load.d/sd_mod.conf
>    dracut -f --regenerate-all
>    lsinitrd | less   # check for /etc/modules-load.d/sd_mod.conf

The “lsinitrd | less” command showed over a thousand lines of data . . . I could not see or find the “sd_mod.conf” data in there, although there were some “Oct 14” lines.

This suggested data seems to bring “file or directory not found” ?? So, either it’s not there, or there is some data corruption in the posted information??

And, just for good measure, now Leap is not reviving from suspend . . . the gift that keeps on giving.

hui · October 14, 2023, 4:00pm

The correct path is simply /dev/disk/by-uuid/ as already mentioned by hcvv in his last comment…

non_space · October 14, 2023, 4:25pm

OK . . . seems like there are UUIDs listed.

ls /dev/disk/by-uuid/
01988475-fd31-486a-a258-ae01a153a885  67E3-17ED
11b8a168-b278-3d38-8df7-daee4fec3d35  8037621e-c63c-3884-bfb4-729282cf76fd
15f8192a-a9d0-3409-a5b9-084fc913ca7e  907551ab-87d2-41c0-9b56-6e68d9947d7e
17bcb813-d05e-4cc3-bcdf-16d283156780  929e9949-2d18-424c-be4a-80166827335b
2779207b-54a2-4b94-a7df-75edd1de9831  93f0ad8c-baed-3a31-8811-56b4a5cc6765
3011fcb2-1618-4d9a-975a-6426eb7e0f8e  9f39fc44-d590-4a43-9544-56ad458a9e6e
3a958b85-2be4-36cb-a8bc-6d470790d9ff  a2775dc9-eb23-316b-a4d8-7d4361ba2dad
4815c1b0-16a2-3b33-b224-1be9a3e9d181  ab58c2d1-47b2-4b63-9b1c-b86c4f182c8e
4b916772-18f6-4b7c-b47c-eb0fa3796eff  adb56592-c260-4620-b298-be3e012b5e57
50f7f65f-5963-49a9-b091-995afbf66e26  d3f92ffb-ed1e-4f91-a030-97d5b608e2fa
598d1ebe-6ad9-4491-b95a-7540208c28e3  d4853378-2816-4678-8308-1f245d9123ac
6073161c-00c1-3827-9f81-000472e23582  E592-ECD3
632a31b3-7430-3c28-a992-c92271adfa58  ee616cc5-97f5-3568-8841-0fdd4585518e
6416-071B                             ee6d03c4-27ea-4218-8b30-36329d26cdd0
64c5dba2-bacd-4459-9904-6bda0fbe24e7

My problem is that even running the “grub2-mkconfig xxxx” command is showing OSs that are in the same disk, as being in another disk . . . for the most part, loss of order.

mchnz · October 14, 2023, 6:18pm

That does not appear to be the case. If I boot TW with sd_mod loaded early, the /dev/sd* assignment is completely deterministic, and always remains the same for all PC’s in the house. If I leave it to the new default, the ordering is effectively random, where /dev/sda can be mapped to a different physical drive at each boot. For example, with the new default, sometimes /dev/sda is mapped to my home drive, sometimes its mapped to my os drive, sometimes its mapped to my online-backup drive, and so forth.

So, while the BIOS may decide and initial ordereding, loading sd_mod late will shuffle whatever the BIOS has decided (this could perhaps be the result of interacting with the BIOS happening later than before). This behaviour is true for any desktop PC I have (three PCs, with a MB vintages ranging from 10 years to last-year), but non-desktops may well behave in their own way.

karlmistelberger · October 14, 2023, 6:52pm

From: Device Path Protocol

“This section contains the definition of the device path protocol and the information needed to construct and manage device paths in the UEFI environment. A device path is constructed and used by the firmware to convey the location of important devices, such as the boot device and console, consistent with the software-visible topology of the system.”

The kernel reads the device paths and processes them asynchronously: SCSI Interfaces Guide — The Linux Kernel documentation

erlangen:~ # lsscsi -v
[0:0:0:0]    disk    ATA      CT2000BX500SSD1  030   /dev/sda 
  dir: /sys/bus/scsi/devices/0:0:0:0  [/sys/devices/pci0000:00/0000:00:02.1/0000:01:00.1/ata1/host0/target0:0:0/0:0:0:0]
[1:0:0:0]    cd/dvd  PIONEER  DVD-RW  DVR-221  1.00  /dev/sr0 
  dir: /sys/bus/scsi/devices/1:0:0:0  [/sys/devices/pci0000:00/0000:00:02.1/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0]
[12:0:0:0]   disk    ST8000VN 004-2M2101       0     /dev/sdb 
  dir: /sys/bus/scsi/devices/12:0:0:0  [/sys/devices/pci0000:00/0000:00:02.1/0000:01:00.0/usb2/2-2/2-2.1/2-2.1:1.0/host12/target12:0:0/12:0:0:0]
[N:0:4:1]    dsk/nvm Samsung SSD 970 EVO Plus 2TB__1            /dev/nvme0n1
  dir: /sys/class/nvme/nvme0/nvme0n1  [/sys/devices/pci0000:00/0000:00:02.1/0000:01:00.2/0000:02:08.0/0000:09:00.0/nvme/nvme0/nvme0n1]
[N:1:4:1]    dsk/nvm Samsung SSD 970 EVO Plus 2TB__1            /dev/nvme1n1
  dir: /sys/class/nvme/nvme1/nvme1n1  [/sys/devices/pci0000:00/0000:00:02.2/0000:0a:00.0/nvme/nvme1/nvme1n1]
erlangen:~ #

mchnz · October 14, 2023, 7:07pm

non_space:

mchnz:
echo sd_mod > /etc/modules-load.d/sd_mod.conf
   dracut -f --regenerate-all
Unfortunately, or “luck” . . . running those commands did not rally the grub ordering back into cosmic alignment.
What I did:
> 
>    mv /etc/modprobe.d/10-ahci-scsi.conf ~/  # disable previous fix
>    echo sd_mod > /etc/modules-load.d/sd_mod.conf
>    dracut -f --regenerate-all
>    lsinitrd | less   # check for /etc/modules-load.d/sd_mod.conf
The “lsinitrd | less” command showed over a thousand lines of data . . . I could not see or find the “sd_mod.conf” data in there, although there were some “Oct 14” lines.

Less has a search command, but you could also just do (as root or by using sudo):

sudo lsinitrd | grep sd_mod
-rw-r--r--   1 root     root            7 Oct  9 21:49 etc/modules-load.d/sd_mod.conf
-rw-r--r--   1 root     root        55047 Oct  9 00:00 usr/lib/modules/6.5.6-1-default/kernel/drivers/scsi/sd_mod.ko.zst

Perhaps it would help to get a better picture of what’s going on. Maybe there’s more than one thing broken and that’s confusing the issue. Perhaps a better picture of your disk layout might help. The output from following commands might help people here figure out what is going on:

% ls -l /dev/disk/by-uuid/
% cat /etc/fstab
% dmesg |grep 'Command line'
% lsblk

As others have said, the best approach is to use UUIDs in /etc/fstab and in /boot/grub2/grub.cfg and never use /dev/sd* in as a permanent reference. That would use the OS in the way SUSE intended. The output from the above should help sort that out.

You could also use labels, which is also an intended solution, but you’d need to learn how to do that, and perhaps that’s best done after things are learning (if at all).

On the other hand, if you’re impatient, and think your TW/Leap switch has potentially messed things up, perhaps you could just proceed with a fresh install and start with a clean slate.

mchnz · October 14, 2023, 7:37pm

If, by loss of order, you mean /dev/sdX has become /dev/sdY, that’s totally normal. Now that sd_mod is a module, the assignment of /dev/sd* to any physical disk will vary from one boot to the next because the mapping happens asynchronously, it just depends on who wins the race on this particular boot.

So all permanent references to disk partitions must now use something other than /dev/sd*. Each UUID is unique permanent ID of the physical partition. The directory /dev/disk/by-uuid/ links the permanent UUIDs as symbolic link to the /dev/sd* that have been assigned at this boot (and might be different from last boot), for example:

kosmos1:~ # ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 10 Oct 15 07:00 22a1937f-9205-44c5-bcd2-24ae9ed4a774 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Oct 15 07:00 453caaf5-5fed-43d8-b823-58f4ded3ab63 -> ../../sda2
lrwxrwxrwx 1 root root 10 Oct 15 07:00 478d16c2-5627-4899-af6d-911b6ec687c8 -> ../../sdd2
lrwxrwxrwx 1 root root 15 Oct 15 07:00 5bc9d2d6-2212-4838-aa9e-a999efda1461 -> ../../nvme0n1p4
lrwxrwxrwx 1 root root 10 Oct 15 07:00 8B2D-89F3 -> ../../sda1
lrwxrwxrwx 1 root root 15 Oct 15 07:00 A0B7-6A4E -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 10 Oct 15 07:00 a83d704e-ff90-45f2-903e-63b68a24eabc -> ../../sdb1
lrwxrwxrwx 1 root root 15 Oct 15 07:00 c94208de-d084-470a-bd4a-0bf8a8518154 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 10 Oct 15 07:00 df26a260-6198-44f8-b9ea-94968bf7ead5 -> ../../sda3
lrwxrwxrwx 1 root root 10 Oct 15 07:00 e4123e0c-f1a4-4b97-a7f4-a306c595c7c9 -> ../../sdc1

All permanent references in places such as /etc/fstab and /boot/grub2/grub.cfg must now use indirect reference to physical partitions and disks such as UUID’s (or labels).

Additionally, when referring to entire disks, they don’t have UUID’s, so they must be identified by various other means via the directories under /dev/disk, for example:

kosmos1:~ # ls -l /dev/disk/
total 0
drwxr-xr-x 2 root root  400 Oct 15 07:00 by-diskseq
drwxr-xr-x 2 root root 1000 Oct 15 07:00 by-id
drwxr-xr-x 2 root root  160 Oct 15 07:00 by-label
drwxr-xr-x 2 root root   80 Oct 15 07:00 by-partlabel
drwxr-xr-x 2 root root  240 Oct 15 07:00 by-partuuid
drwxr-xr-x 2 root root  680 Oct 15 07:00 by-path
drwxr-xr-x 2 root root  240 Oct 15 07:00 by-uuid

For example, I have script that idles a couple of drives at boot, I can’t use /dev/sd* because it’s not constant and will change from one boot to the next, I have to refer to the drives indirectly:

backup_drive=/dev/disk/by-id/ata-ST3000DM008-2DM166_Z5051V50
spare_drive=/dev/disk/by-id/ata-ST4000DM004-2CV104_ZTT0XJAW
ls -l $spare_drive $backup_drivels
lrwxrwxrwx 1 root root 9 Oct 15 07:00 /dev/disk/by-id/ata-ST3000DM008-2DM166_Z5051V50 -> ../../sdd
lrwxrwxrwx 1 root root 9 Oct 15 07:00 /dev/disk/by-id/ata-ST4000DM004-2CV104_ZTT0XJAW -> ../../sdc

In other posts have described work-arounds that allow my desktops to retain a deterministic and fixed /dev/sd*. Apparently they might not work for all systems, but they do work on all my home PC’s. Presumably building a custom kernel with a built in driver would also revert to the previous more stable situation.

mrmazda · October 14, 2023, 11:00pm

deleted dup

non_space · October 14, 2023, 11:37pm

Tried to post this several hours back and it apparently didn’t post?

Appreciate the reply on the thread . . . there has been a “development” in that I pulled one of the four drives, the one that only has OSX 10.15 with their “special” APFS formatting . . . and then running “grub2-mkconfigxxxxx” and that showed the 7 linux options in their proper “sdx” locations and I was able to boot several of them to the correct system.

However, within each of the three I booted GParted is still showing that for some the reversal of sdb and sdc partitions is still happening.

I’ll have to fiddle with it a bit more to see if it “matters” to the operation of each of the systems.

At this point, yes, I am “impatient” . . . I have prepped a Slow Roll installer usb if I completely blow my top on it . . . .

But, at the very least, grub does seem to be showing the / in the right places . . . I’ll have to monitor the situation to see if each of the systems now boots. I would like to be able to have the option to boot the 4th

non_space · October 14, 2023, 11:44pm

OK . . . so what does do? No worries about bringing in unwanted upgrades?

I guess that would be one question, is “dup-ping” something that we really need to do to have a working system??? And would being a “dup resistor” be the best strategy to the path to joyful and happy living???

mchnz · October 15, 2023, 12:18am

Providing security is of little concern, a working TW system can often be left for months without an update. There are definitely some TW users that take this approach. Others don’t leave it that long, but only dup when they have the time to deal with the fallout (which is often none, but you never know).

There are others that lock down the kernel from updates and otherwise dup everything else quite frequently. This is an especially good approach if an Nvidia driver is involved.

If your system’s current problem was caused by the TW kernel being changed to be more like Leap/SUSE-Enterprise, you were going to trip over this one day, but locking down the kernel would have delayed it until you were prepared to spend some time on dealing with the fallout.

non_space · October 15, 2023, 12:31am

Well, the whole point of TW is that it is a “tumbling tumbleweed” . . . and up until a few years back, the “rolling” of TW was smooth and largely tidy. It seems like in the recent years there was some kind of “change” that warranted large “balloon” package upgrades, and included much more frequent breakages . . . .

I’ve tried to just “flow along” with TW without getting under the hood OR spending too much time making adjustments to “cope” with the volatility . . . the last now month of just trying to get TW something that I could log into and use AND then the grub disordering problem . . . which transferred over to Leap . . . . Have made me consider not keeping up with the constant “total repackaging” philosophy that seems to be associated with using it.

I did finally get back into my Gecko Rolling edition, it doesn’t have quite the same huge number of tsunami upgrades . . . I got into it by selecting “advanced options” in grub and scrolling down to the lowest/oldest kernel option listed. So there is something that has changed behaviors in the newer kernels . . . that has been “time-consuming” to deal with.

I’ll try to look at your suggestions to post data back about the system, when I get back over to it. The fact that I can now at least get grub to recognize the right system in the right box . . . basic grub function again . . . is/was the main concern. I can run the machine with three drives . . . that just leaves a drive bay “empty.” sad.

mrmazda · October 15, 2023, 12:58am

Not keeping a rarely used HDD/SSD powered up is “green”, saving dead dinosaurs, nothing to be sad about.

non_space · October 15, 2023, 1:12am

LOL . . . that is very funny . . . good use of humor. Yeah, that’s what it is, it’s “green,” very very “green.”

karlmistelberger · October 15, 2023, 6:03am

Tumbleweed stopped tumbling years ago. It now rolls forward and improves every day. “Breakages” occur mostly for users trying hard to freeze their idiosyncratic system.

Stuff does happen: Disturbing messages from infamous host erlangen - they call it a BUG - #7 by karlmistelberger However they fix these annoyances readily.

There is no volatility. The Linux Kernel Monkey got it right: MagicPoint presentation foils. Configuration ist no longer static. udev watches for events and readily adds devices.

Infamous host erlangen now has a Mobile rack for a 3.5" HDD:

erlangen:~ # grep /HDD /etc/fstab 
UUID=2260f160-cc05-47cc-9893-cc32c050177d  /HDD                    btrfs  user,noauto                   0  0
erlangen:~ #

Automount is straight forward:

erlangen:~ # systemctl cat HDD.automount 
# /etc/systemd/system/HDD.automount
[Automount] 
Where=/HDD
TimeoutIdleSec=1min

[Install]
RequiredBy=local-fs.target
erlangen:~ #

Configuring a Backup of the Forerunner® 735XT activities upon attaching the running watch is a few lines only:

erlangen:~ # grep /FR735 /etc/fstab 
LABEL=FR735                                /FR735                  vfat   user,noauto                   0  0
erlangen:~ #

systemd-service running upon attaching the watch:

erlangen:~ # systemctl cat FR735.service 
# /etc/systemd/system/FR735.service
[Unit]
Description=Get FR735 Activities
Requires=FR735.mount
After=FR735.mount

[Service]
ExecStart=/usr/bin/rsync -av /FR735/ /home/karl/Forerunner/

[Install]
WantedBy=FR735.mount

erlangen:~ #

Infamous host erlangen gets a daily zypper dist-upgrade. None of these ever caused a “breakage”.

Once the transition to UUIDs (or labels) is complete, you will never spend more time on configuration.

You can fix this problem.

By the way: Configuring grub with grub2-install --removable and moving the install drive between the different boxes is a great sanity check for both, the configuration of the system and your understanding.

gogalthorp · October 15, 2023, 2:05pm

Right click and select properties to see what sdX is associated to a UUID

non_space · October 15, 2023, 3:14pm

?? My recent thread “log in slowness” was something where the log in manager “broke” and the thread went on for weeks trying to figure out the problems on it. Life shattering? No. Just time consuming.

No time yet to mess with it to post the requested outputs, but my /etc/fstab’s all seem to have UUIDs, still don’t think the definitive instruction on how to check or troubleshoot the change over, which may already be “as it is” . . . could be done??

OK, I’m all for achieving “sanity” in this world.

Thanks for this hint . . . where am I right-clicking to do this check?

gogalthorp · October 15, 2023, 5:38pm

In a file browser go to /dev/disk/by-uuid. You will see a list of all known uuid/s . Right click on one select properties from the menu note the points to files it shows which sdX that uuid points to. Note that UUID’s are fixed to each partition but the sdX.s are assigned by the bios in there order of discovery. The order is not fixed but depends on many things like if an external drive is plugged in.

mrmazda · October 15, 2023, 6:28pm

I don’t like that external storage might be assigned ahead of internal, so I inhibit that possibility via /etc/dracut.conf.d/ on an omit_drivers+= line including usb_storage.

non_space · October 15, 2023, 10:20pm

et al:

I checked through most of the requested data, seems like right now in Leap that is checking out OK. Still have yet to go through and boot every system now in the three drives, but seems like grub2 can handle three drives, but seemed to fumble on the 4th drive formatted as Apple’s “APFS” method?? Or, the new “sd_mod” doesn’t recognize it?? As the 4 drives were allowing grub to properly boot the linux systems for months, without “disordering” itself . . . until recently.

At some point the “test” would be to add the 4th internal drive back in, or what, plug in an external usb drive and see if grub falls to pieces again??
The /dev/disk/by-uuid data is quite lengthy, but shows both the UUID AND the sdx of associated partition . . . not posting it to save on cyber space . . . “green savings.”

sudo lsinitrd | grep sd_mod
[sudo] password for root: 
-rw-r--r--   1 root     root            7 Oct 14 07:55 etc/modules-load.d/sd_mod.conf
-rw-r--r--   1 root     root        54588 Oct 13 03:28 lib/modules/6.5.7-lp155.2.ge060757-default/kernel/drivers/scsi/sd_mod.ko.zst

[quote="mchnz, post:26, topic:169714"]
`dmesg |grep 'Command line'`
[/quote]

`dmesg |grep 'Command line'
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.7-lp155.2.ge060757-default root=UUID=3011fcb2-1618-4d9a-975a-6426eb7e0f8e mitigations=auto quiet

[quote="mchnz, post:27, topic:169714"]
`kosmos1:~ # ls -l /dev/disk/`
[/quote]

# ls -l /dev/disk/
total 0
drwxr-xr-x 2 root root 4020 Oct 15 14:41 by-id
drwxr-xr-x 2 root root  240 Oct 15 14:41 by-label
drwxr-xr-x 2 root root  180 Oct 15 14:41 by-partlabel
drwxr-xr-x 2 root root  640 Oct 15 14:41 by-partuuid
drwxr-xr-x 2 root root 1400 Oct 15 14:41 by-path
drwxr-xr-x 2 root root  600 Oct 15 14:41 by-uuid