A series of unfortunate & unwise events (data recovery & fixing boot)

Hello openSUSE forums!
I will apologize in advance for this being so long-winded but I want to be as thorough as possible for the sake of clarity & being guided to the correct place.

Hardware overview:
Ryzen 7 1700
32GB Corsair DDR4
EVGA GTX 980Ti
2x Corsair 500GB SATA SSD (Linux on one, the other as a data drive)
1x 1TB Crucial nVME (Windows 10 for games)
EDUP AX3000 PCIe WiFi/Bluetooh (Intel Wi-Fi 6E AX210 chipset)

For the majority of 2020 I distro hopped & tried to find a distro that worked for me. But I constantly ran into an ungainly problem: my computer would lock up, be completely unresponsive, and I would be forced to reboot. Would happen 3-5 times a week. My system is dual-booted between Linux & Windows 10 (separate physical drives) and would never happen on Windows; never had power issues of any kind. After hopping between various Pop!OS, Ubuntu, and Debian I began to think it was something with Debian-based distros. So I switched to Fedora at the advice of a friend (and Fedora advocate) only to find the issue was far worse (6-10 times/week). After swapping everything except the PSU, motherboard, and CPU, I began to suspect maybe it wasn’t hardware & decided to try openSUSE (I learned Linux using SLE in college, seemed like a good idea). My system was far, far more stable. Since making the switch in September it’s only locked up twice that I recall until this past week. One time it did display a curious error, as instead of locking up it went into an unprompted shutdown. It dropped X11 entirely, had a terminal-like view, and showed “Central Processor Hardware Error…” very briefly (with some other text I wasn’t able to read) before turning off. Turning it back on seemed like everything was fine again & I went on with my life. That was a couple weeks ago, no issues since.

And this is where my unwise decision making comes in.

I was attempting to run updates via zypper, but was getting a lot of errors. In doing some research I found a post on Stack Exchange that suggested clearing the /tmp and /var/tmp directories. I ran ls -l to see what was in there & sure enough lots of various things that look pretty safe to delete. So I ran **rm -rf *** … except I was in my home directory (/home/jim not /home), I hadn’t cd’d out to /var/tmp - so I blew away my home folder.

In getting some advice from my Discord Linux group I found Test Disk, installed it, and attempted to recover my deleted directories to a .dd file on my primary openSUSE drive. Well, my system lock-up issue reappeared during that time. While copying the image.dd file to my data drive the system locked up. I thought that maybe it was just the GUI locked up & I could just let it sit for a few hours to let Test Disk finish. Eventually, didn’t really have a choice except to reboot the thing and when I did openSUSE predictably couldn’t load the GUI, it dropped me to a prompt for the root password & showed the root terminal prompt. I can also still boot into Windows 10 fine, so GRUB seems to be intact.

If I can fix the openSUSE boot somehow that would be nice, but not critical.

What I am concerned about is that my data drive, which I had formatted to exFAT so I could read it everywhere without a 4GB file limit, is now unreadable on anything. In fact, it shows up as a BTRFS drive. I tried running a couple BTRFS and exFAT utils to see if I could fsck it or anything but no good. exFAT utils doesn’t recognize it as an exFAT drive and BTRFS utils states it can’t find a BTRFS partition. I also tried fooling around with it in Windows 10 and macOS, and on Manjaro via my Pinebook Pro & a USB adapter: no luck anywhere.

TLDR: I broke my openSUSE install & corrupted my data drive. Would anyone be willing to provide some advice, or even point me to another post here on the forums, about how to tackle this? I would’ve looked already but I feel my situation is somewhat unique in that the exFAT drive now thinks it’s BTRFS. Also since it’s a SATA SSD I’ve no idea if I really can do anything at all.

And yes, I’ve already contact AMD about seeing if I can have my CPU warrantied (I think in my 10+ years of IT work I’ve only ever seen two legitimate CPU failures, so I know it’s terribly uncommon).

Thank you for your time in reading this, and on any advice you can offer!

I think the first question I would ask is if you have access to a reliable working computer where you can plug in your data SATA SSD into and what kind of OS that computer has.

Also can you elaborate what you mean by
“While copying the image.dd file to my data drive the system locked up.”
?

I’m quite confused because no one I know would partition the /home drive with exFAT and the default OpenSUSE partition table will set /home to be BTRFS, so it may even be possible that the data drive was already partitioned to BTRFS when you installed OpenSUSE in the first place, depending on what you did.

Is is a long story (and I do not understand all of it) and a lot of comments on a lot of aspects can be given.

E.g., I understand you destroyed the contents of the home directory of jim (/home/jim). Well, then restore that from the latest backup. Maybe that backup wasn’t not up-to-date to the last moment, but talking to jim might help him in re-doing the things he did after the backup was made.

Very confusing (like @SJLPHI) is your talking about disks. call those disks by their proper name (like sda, sdb) instead of vague indications like “data drive”, to get an exact idea about your shop. Best is of course to post what the system has, e.g.

lsblk -f

which can be done from a rescue system.

Just an idea:

2x Corsair 500GB SATA SSD (Linux on one, the other as a data drive)

You said you were distro hopping and whatever linux you tried you have the same lock-up issues. Is it possible you have a hardware problem with that SSD running the linux distro? It seems even more likely as this reoccured while seriously working on it with the test disk tool. That’s assuming your home folder is on the “linux” drive rather than on the data drive. A resque system might be useful indeed.

@SJLPHI

the default OpenSUSE partition table will set /home to be BTRFS

Is that true? IIRC with 15.1 it was XFS? Did that change or is my memory wrong? (The latter wouldn’t let me wonder…)

Yes, by default anything past LEAP 42.0~15.0 will by default try to install BTRFS root and home but this is very easy to change during installatin process. I personally used EXT4 typically for both root and home.

Thank you both. I’ll start with the lsblk -f output (I’m abbreviating UUIDs, unless those are really needed; some forums remove posts for UUIDs, so just playing safe).

Drive -> Format -> UUID

sda
-sda1
-sda2 NTFS 442EF…92E2
sdb
-sdb1 vfat 49…DF
-sdb2 LVM2_member Tmq3…MBGf
-system-swap swap e9e0…5e96
-system-root BTRFS aa97…5216
sdc
-sdc1 BTRFS aa97…5216

An interesting point to note is that sdc1 has the same UUID as sdb2-system-root.
Also sdb2, which I think is my boot drive where openSUSE is installed, is only showing as 1G in lsbkl.

@SJLPHI
My apologies on not being clear. My /home is not on an exFAT drive, it’s on the sdb drive, which is BTRFS. sdc was my “data drive” - I know this was my exFAT drive because I intentionally added exFAT support to openSUSE, I pulled it from here: https://software.opensuse.org/package/fuse-exfat
I used exFAT because this has read/write by Windows 10, in the instances I needed to move files between the OSes.

I made an assumption about general knowledge of how Test Disk works. When Test Disk is run to do a file recovery it creates a .dd file named “image.dd” - when I initially ran it, it created that file on sdb (I don’t know which partition on sdb, I assume the “adb2-system-root” since it’s the only one showing a file system). I thought perhaps I shouldn’t have that on my primary drive, and decided to move it to sdc1 - which is when my system locked up (it didn’t lock up immediately, it ran for a few minutes first).

Yes, I do have another stable system I’ve already moved the drives to. It boots the same manner as my prior: GRUB with options to load openSUSE or Windows. I have an openSUSE live USB I can boot from, as well as a GPARTED live USB. I also have access to a Mac that I can connect the SATA drive to using a USB adapter.

@hcvv
I have made quite a few comments to myself in regards to the aspects of my spectacular failure. I will endeavor to get a proper backup strategy going in the near future.

@kasi042
You know I can’t believe this but - I never considered a failing SSD. I’ve never personally had one fail on me before & the few I’ve dealt with my job are usually part of some RAID setup, so it’s not like they cause system lock-ups. I was under the impression if an SSD fails the whole thing just dies. I don’t really have any tools to test SSDs with; just things for HDDs like SeaTools, WD Data Lifeguard, and Drive Fitness Test. Do you have any recommendations?

At this point I’m concerned about recovering the data from sdc1, if at all possible. If I can fix the boot of openSUSE that’d be nice but that’s more of a secondary concern.

Looks like you cloned the system (root) partition to sdc. Depending on the software used, on a full clone the UUID is also copied. :open_mouth:

Note that you should never put home on a FAT partition.

In addition you can’t boot with two partitions with the same UUID. It confuses system.

I don’t think that’s what I was doing; at least that’s not what I was trying to do.

I know. And I didn’t, it was on my primary partition which was BTRFS. I was trying to copy the “image.dd” to my other drive.

But you saying that gets me to thinking: what if I had run the wrong command in Test Disk & was restoring image.dd, not copying it, to my exFAT drive? That would indeed explain why it’s got BTRFS instead of exFAT and the partitions have the same UUID. It’s not impossible to imagine my inexperience with Test Disk & being in a hurry didn’t cause me to compound my errors.

I think I am starting to lean towards the futility of trying to recover my data from a drive that’s possibly failing (as kasi042 suggested). In looking at another computer it looks like I have almost all my financial data, save a few things I can manually re-enter, so it’s not like I’m dead in the water. Perhaps just accepting that & a hard lesson learned is the best takeaway I can get from this.

That is not what we want. To be able to help we want to see exactly what you see. Thus copy/paste the complete action, including the first line with the prompt and the command, up to and including the last line with the new command. And that pasting must be done between CODE tags in the post. You get the CODE tgas by clicking on the # button at the top of the post editor. See Using CODE tags Around your paste.

As it is now, it is unreadable by me and I will not even try to interpret it.

Sorry, no. I was just guessing as that SSD seemed to be the only constant in your description. At least BTRFS is offering some tools for repairing and restoring:

https://btrfs.wiki.kernel.org/index.php/Main_Page#Manual_pages

But I’m not sure if that is helpful for you.

@kasi042
Thanks for the advice. I might try the tools from the “Restore” wiki page (https://btrfs.wiki.kernel.org/index.php/Restore). I appreciate you for the idea & effort.

@hcvv
I believe this is what you’re looking for.


NAME            FSTYPE      FSVER LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda                                                                                           
├─sda1                                                                                        
└─sda2          ntfs                    442EF6A92EF692E2                                      
sdb                                                                                           
├─sdb1          vfat                    497C-0EDF                               466.4M     7% /boot/efi
└─sdb2          LVM2_member             Tmq3w4-HivN-Ppll-bimt-3ldb-Ln1d-ccMBGf                
  ├─system-swap swap                    e9e0d047-3b10-4c6d-a041-98c77e125e96                  [SWAP]
  └─system-root btrfs                   aa97081b-0a4f-4e11-9125-2a6a940c5216        1G   100% /
sdc                                                                                           
└─sdc1          btrfs                   aa97081b-0a4f-4e11-9125-2a6a940c5216                  


Let me know if there is something I have failed to include.

Looks like you should be able to tell the bios to boot from sda1 ( windows boot loader ) and get windows back.
I suspect that grub points to sdb1 and that may be corrupted.

To begin with, I asked " including the first line with the prompt and the command". That is missing!

I assume you can interpret a lot of what we see here yourself.

  • sda is partitioned in sda1 and sda2. It is unclear what sda1 is. sdb1 has an NTFS file system on it.
  • sdb is partitioned in sdb1 and sdb2. sdb1 is an EFI partition, thu it looks as if this is a boot disk. sdb2 has two logical volumes, a SWAP space and a Btrfs file ssyem, thus this seems to contain an installed Linux (probably openSUSE).
  • sdc is also partitioned and has one partition. that partition has a copy of the Btrfs file sytem on the Logical Volume system-root. Very strange. I do not know what (or why) you did that, but it looks as if you made a “clone” of you openSUSE system partition here.

What all should be on those file systems is of course only something you can know.
In any case, I hope that from now on you can talk about e.g. sdc1 instead of avague “data drive”.

Oh, and maybe an

fdisk -l

helps also.

Sorry, I was in a hurry (my wife invited me for dinner), the command should be

fdisk -l

I will see if I can avoid a repeat mistake, this time.


localhost:~ # fdisk -l

Disk /dev/sda: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000MX500SSD4 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 74A5CCBC-A2E4-4081-978C-FAAB031DF442

Device     Start        End    Sectors   Size Type
/dev/sda1   2048      34815      32768    16M Microsoft reserved
/dev/sda2  34816 1953523711 1953488896 931.5G Microsoft basic data


Disk /dev/sdb: 447.13 GiB, 480103981056 bytes, 937703088 sectors
Disk model: Corsair Force LE
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 7C3CD627-EEDA-4926-847D-E843C2B22744

Device       Start       End   Sectors   Size Type
/dev/sdb1     2048   1026047   1024000   500M EFI System
/dev/sdb2  1026048 937703054 936677007 446.7G Linux LVM


Disk /dev/mapper/system-swap: 31.29 GiB, 33587986432 bytes, 65601536 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/system-root: 415.37 GiB, 445988732928 bytes, 871071744 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sdc: 447.13 GiB, 480103981056 bytes, 937703088 sectors
Disk model: Corsair Force LE        
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 33553920 bytes
Disklabel type: gpt
Disk identifier: E2308DF7-ABBC-43C8-A924-039B2F560B00

Device     Start       End   Sectors   Size Type
/dev/sdc1   2048 937701375 937699328 447.1G Microsoft basic data


I think this confirms what I said earlier.

  • sda has a small “Microsoft reserved” partition and the rest is occupied with an NTFS file system. This might be a MS Windows installation (but only you can confirm this).
  • sdb has an 500 MB EFI partitions and the rest of the disk is used as by LVM. As we have seen above the LVG has a Swap and a Btrfs LV, thus a Linux system.
  • sdc has one partition. the type is “Microsoft basic data”, but the file system UUID is the same as that of the Btrfs file system on the system_root LV. Thus this is probably a byte by byte opy (many people call that a “clone”) of the Btrfs file system. As the Btrfs is 415.37 GiB and sdc1 is 447.1G there is some space left there inside the partition.

Now we have some insight in what you have. And I hope you also have some insight now.
Can you please describe now what the problem is.

I tried to re-read your OP. When I do understand correctly that sdc1 had an exFAT file sytem with data, then it is now overwritten with a clone of your Btrfs file system. Have no idea how you did that. But when there were files n that exFAT file system, they are gone now. Only your backups will have them.