Dual boot installation with Windows 10 repeatedly fails after approx one week

Hello, I am dual booting Leap 15 with Windows 10 Pro. I have a Ryzen 3 2200G CPU and an Asus Prime A320M-K motherboard (with the latest BIOS). Both operating systems (and the EFI partition) are on an M.2 drive. Because I’m using the integrated CPU graphics, I had to upgrade the kernel to 4.18. Secure Boot is enabled in the BIOS (the default), and I am booting through Grub2 in EFI mode. The Windows 10 installation has ‘Fast Startup’ (or whatever it’s called) disabled.

The first installation about a month ago went fine. I booted into Suse a few times but mostly used Windows. After about a week. I tried to boot into Suse again. The Grub menu came up OK, I selected Suse, about 4-5 messages appeared (as normal) but then the machine hung. I let it run, and after approx 2-3 mins a huge number of messages scrolled past. I am pretty certain that the first one referred to the ‘nvme’ module and the implication of the rest was that the M.2 SSD was now unreadable. Once the messages settled down and clearly the machine wasn’t going to boot, I rebooted (with a hardware reset). This time Grub didn’t load and scarily I went straight into the BIOS screen, with no boot options available at all (apart from the CD drive). I later found out (by luck) that switching the mains power off then on seemed to reset the motherboard and the BIOS was able to detect the SSD drive again. However the ‘opensuse’ EFI boot option in the BIOS did not bring up the Grub menu, but went straight in to Windows (as did the Windows Boot Manager option). The Windows 10 installation seemed to be fine.

I then tried booting the Suse installation through the installation media, but got the message “Unable to boot from this partition” (or something like that). I eventually decided to reinstall Leap (which is actually pretty quick on an M.2 drive). I did that and everything was fine - for about another week, and then the same thing happened. And then again a third time, which brings me up to date.

The boot log messages don’t seem to be stored in /var/boot/log - which is consistent with the M.2 drive becoming (temporarily) unusable during boot.

If you’ve read this far, thanks! Two questions:

  • What diagnostics could I use to try and trace this problem? I have no idea where even to start. It goes without saying that this might not be an Opensuse problem but something to do with Windows, the motherboard or BIOS.
  • More specifically, could Secure Boot be the problem? Is there a periodic (weekly?) scan by Windows 10 which might be messing with either the BIOS or the EFI partition? Obviously I can test by disabling Secure Boot, reinstalling Leap and then waiting a week.

Any clues or pointers would be appreciated.

Do you get a grub menu?

We need more detail then something like that.

What is said m.2 drive?

M.2 is the connector. Like SATA, IDE

You could boot from the install medium, and check whether the openSUSE partitions are OK.
Also, did you by any chance install Windows updates? Some of these are known to bork GRUB.

Know what m.2 I meant model/brand

Maybe try EXT4 rather then BTRFS

Yes. The problem starts at the next stage, after selecting the Leap boot option.

It’s a 256GB Western Digital Black PCIe SSD NVMe

As I mentioned, booting from the install medium just doesn’t seem to work. I think the message is actually “Sorry, unable to boot” (not what I put above)

No Windows updates installed during the week before it happened the latest time. But Windows Defender will have run scans.
The Grub menu is OK. The problem starts immediately after Grub.

Since the M,2 style drivers are very fast maybe a timing issue causing file system corruption, why I suggested trying ext4 instead of BTRFS. Boot to a Linux external drive or rescue media and see if you can mount the bad partition. Let us know the errors

OK - I’ve been able to mount the Leap partition using the rescue option on the installation media, reinstalled Grub, and now it’s working again. I had tried to do this the first time I had a problem, but failed - however I may have missed out one of the steps back then (or something else might have been in play).
I’ve also disabled Secure Boot in the BIOS. In the meantime I will try to reproduce the problem by running Windows scans or anything else I can think of, and/or just wait a week…

Thanks. I will keep this up my sleeve to try if (when) it happens again. Can I ask though, why would ext4 be less likely to cause a timing problem?

Different code different timings. Also may be a configuration may need tickling since things are normally still set up assuming spinning rust

BTW if you manged to mount the partition then it has not been corrupted. It has been reported that some hardware seems to like only Windows and may change the UEFI boot flash so check that also if it gores again.