windows xp partition totally destroyed by xen/virtualization

topak123 · August 9, 2008, 9:25pm

I had a dual-boot pc with the following installed OSes:

linux (opensuse 11.0) installed on drive /dev/sda (external USB 2.0 drive)
winxp installed on drive /dev/sdb (internal SATA drive).

I had installed linux from the opensuse-11.0 KDE live-CD. On linux, my winxp NTFS partition is mounted on /mnt/windows.

I followed the tutorial in
How to install & configure Xen Virtualization in openSUSE 11.0 | SUSE & openSUSE
to configure a guest OS using the xen-based virtualization feature available in YAST2. Basically, my intention was to boot the winxp installation (mounted on /mnt/windows) as a guest OS from within the linux/xen host OS.

I booted my pc from the xen kernel and launched yast2. Then, in yast2 virtualization section, I went about configuring a guest OS. I selected the option I have a disk or disk image with an installed OS and selected the appropriate OS in subsequent steps. Next, in the Virtual Disk option, I specified “file:/dev/sdb” and clicked ok. After this, whatever configuration steps YAST2 ran totally destroyed the winxp partition. YAST2 reported some error about failure to configure xen and at the same time I noticed that the /mnt/windows mount had become invalid (/dev/sdb was now totally unmountable).

Now my pc just does not boot from winxp. On top of that the winxp drive cannot even be mounted as a data drive because the filesystem itself is damaged. On linux,

fdisk -l

still reports /dev/sdb as a “NTFS/HPFS” drive but the command

mount /dev/sdb

reports errors about failing to find ntfs signature and so on. The filesystem on the winxp drive is now invisible from any OS (linux or windows).

I have tried all means to recover the data from the winxp drive by using all of these methods:

Ran

fdisk /mbr

after booting from freedos 1.0 cd

Tried to reconstruct the boot sector by copying boot sector from working winxp drives, based on instructions in Understanding and Working with the MBR

but all have failed.

It appears that YAST2 virtualization tool has severe bug(s) that can destroy the user’s windows OS. I have not yet filed a formal bug report in https://bugzilla.novell.com/enter_bug.cgi?product=openSUSE+11.0 since I wanted some useful information/feedback first. What I would like to know is what YAST2 did (and how) to damage the winxp partition.

topak123 · August 9, 2008, 9:43pm

By comparing the damaged winxp drive’s boot sector (first 512 bytes) with that of a good winxp drive, I verified that the partition table was still intact on the damaged winxp drive.

topak123 · August 10, 2008, 1:10am

In /var/log/xen/xend.log these messages can be found


VmError: Disk image does not exist: /dev/sdb1
[2008-08-06 12:03:43 3293] ERROR (XendDomainInfo:440) VM start failed

mingus725 · August 10, 2008, 1:28am

I’m sure you are extremely frustrated with this situation. My sympathies to you. However . . .

Your post seems (at least to me) to confuse somewhat the MBR with the partition table with the file system. The tests you ran are not conclusive. It is more likely that the problem is with the partition table (the ntfs signature error is an error from the table, not the file system). Everything may be recoverable. I strongly suggest you take a look at the tool TestDisk (google it; there are several supplementary howto’s on the web, and it’s included in the openSUSE photorec package) or possibly gpart (also in the repository). These tools are designed to recover lost partitions and to rebuild the partition table.

Unfortunately, you used very old ref material (2001) and DOS fdisk to try to repair the IPL (the MBR bootstrap code). That reference describes the IPL as being 446 bytes, which it was then. And, DOS fdisk will indeed write 446 bytes. However, since FAT32 the IPL code has been 440 bytes, the remaining 6 bytes are the disk signature which is used for example by XP to store drive letters. Under some circumstances, the old fdisk will zero out the entire partition table; this is guaranteed to happen if the last 2 bytes of the MBR (the boot record signature, which validates the volume as bootable) has been somehow disturbed. So while DOS fdisk may not cause a problem recreating the IPL, there are definitely instances where it can, and the results can be disastrous.

The (maybe) good news is that everything else is probably on the disk exactly where it was. I assume your Windows was set up with the system and boot volumes being the same partition; it is not terribly difficult to determine the starting and ending sectors of that partition. And, unless something physically altered the disk, the filesystem table should be sitting where it was before. Even if it is damaged, once the system can boot into Windows, it may be recoverable.

Give those tools a try. No doubt others will have additional suggestions. All may not be lost.

mingus725 · August 10, 2008, 1:33am

When Xen (or anything else) consults the table to locate the partition and there is no record of it, indeed it “officially” does not exist. And it may not exist physically. On the other hand, it may. Xen cannot know more than what the table says. But, again, that doesn’t mean that it is not physically exactly where it was before. The key is to reconstruct the table.

topak123 · August 10, 2008, 2:14am

Thanks …

Your post seems (at least to me) to confuse somewhat the MBR with the partition table with the file system.

Sorry if I gave that impression but I do know that the partition table and the MBR are distinct entities. Thanks for the clarification anyway.

The tests you ran are not conclusive. It is more likely that the problem is with the partition table (the ntfs signature error is an error from the table, not the file system). Everything may be recoverable. I strongly suggest you take a look at the tool TestDisk (google it; there are several supplementary howto’s on the web, and it’s included in the openSUSE photorec package) or possibly gpart (also in the repository). These tools are designed to recover lost partitions and to rebuild the partition table.

I will try those tools if possible.

Unfortunately, you used very old ref material (2001) and DOS fdisk to try to repair the IPL (the MBR bootstrap code).

Actually I used freedos 1.0 (which is quite recent) and I even tried copying over the 512 bytes of the boot sector from a working identical drive but that did not help either.

Under some circumstances, the old fdisk will zero out the entire partition table;

Thats not applicable in my case since I verified that the partition table of the damaged drive has exactly the same bytes as there are on a working identical winxp drive.

this is guaranteed to happen if the last 2 bytes of the MBR (the boot record signature, which validates the volume as bootable) has been somehow disturbed.

These two bytes are also intact and have the value 55aa which is why this whole scenario is so bizarre!

So while DOS fdisk may not cause a problem recreating the IPL, there are definitely instances where it can, and the results can be disastrous.

As I have mentioned I also overwrote the boot sector bytes directly (using the dd command) by copying the boot sector of working identical winxp drive but that did not help.

The (maybe) good news is that everything else is probably on the disk exactly where it was.

Which is the assumption I made as well. Otherwise, if the data itself has changed then its too late.

I assume your Windows was set up with the system and boot volumes being the same partition;

Right, 1 single partition (simplest scenario).

not terribly difficult to determine the starting and ending sectors of that partition. And, unless something physically altered the disk, the filesystem table should be sitting where it was before.

Yes, it still seems to be sitting where it is.

Even if it is damaged, once the system can boot into Windows, it may be recoverable.

I won’t even worry about booting now. I would be happy if it can simply be mounted as a data drive.

Give those tools a try. No doubt others will have additional suggestions. All may not be lost.
Thanks, I will try.

mingus725 · August 10, 2008, 2:26am

Identical geometries and table? And disk signature? - isn’t that unique to a disk?

topak123 · August 10, 2008, 3:12am

Yes, identical size (80GB) and the partition tables were identical as well. As for disk signature, should that prevent simple mounting of the disk?

topak123 · August 10, 2008, 5:29am

I used testdisk and was able to rebuild the boot sector. There was some progress since now when trying to mount the NTFS partition, the error message is


Record 0 has no FILE magic (0x0)
Failed to load $MFT: Input/output error
Failed to mount '/dev/sda1': Input/output error
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for the details.

lornix · August 10, 2008, 8:51am

topak wrote:

> Then, in yast2 virtualization section, I went about configuring a guest
> OS. I selected the option I have a disk or disk image with an installed
> OS and selected the appropriate OS in subsequent steps. Next, in the
> Virtual Disk option, I specified “file:/dev/sdb” and clicked ok.

Based on the above, this isn’t a bug… it did EXACTLY what you told it to
do.

/dev/sdb is a DRIVE
/dev/sdb1 is a PARTITION, containing an operating system.

You quite nicely told the Virtual Disk creation application that you wanted
to create a virtual disk named ‘/dev/sdb’.

Which it did.

Which ate your XP partition contents.

EVERYTHING in linux is a file. /dev/sdb is a file, /dev/sdb1 is a file
(it just starts further in on the sdb drive), hello.c is a file. And if
you have root permissions, you may WRITE to ANY file which has appropriate
permissions.

gpart may be able to recover your partition enough to get your data out.
It really depends on how much of the beginning of /dev/sdb1 was
overwritten.

–
L R Nix
lornix@lornix.com

mingus725 · August 10, 2008, 5:34pm

@lornix -

Good catch. I missed that.

@topak -

With that MFT (Master File Table) error, it’s not looking good. But all is not lost just yet. But first, just for clarification, I notice your using the term “boot sector” for the MBR. Technically speaking, that is correct; the MBR is the disk boot sector. To avoid confusion though, generally “boot sector” is used to refer to the PBR (the Partition Boot Sector) and MBR is used to refer to the disk boot sector. Anyway, from MS TechNet:

The first record of the MFT describes the MFT itself, followed by the MFT mirror record. If the first MFT record is corrupted, the NTFS file system reads the second record to find the MFT mirror file, which is a copy of the first three records of the MFT. The locations of the data segments for both the MFT and the MFT mirror file are recorded in the Partition Boot Sector. A duplicate of the Partition Boot Sector is located at the end of the volume.

So, NTFS keeps a duplicate of the MFT header, within the PBR. And, Windows writes a duplicate of the PBR to the end of the volume when it is created (or changed). So the first recovery step to attempt would be to run chkdsk; I’m assuming you have tried that? Failing that, you could attempt to recover the PBR from its backup, which will restore the MFT header. If memory serves (it’s been years since I’ve dealt with this stuff), diskprobe is MS’s tool to do this. AFAIK, this is all MS offers for MFT repair; the next step would be reformat and recovery from backup. But there are other tools which will do this, and even attempt to repair the MFT itself. TestDisk is one of them. See here: Advanced NTFS Boot and MFT Repair That page cites several alternatives which may work if TestDisk can’t do the job.

By the way, why aren’t you just recovering the disk from your backup?

Ref your earlier question re the disk signature: It depends on the OS and/or specific functions enabled by the OS that utilize the signature. AFAIK, Linux doesn’t use it. But different versions of Windows do for different purposes. With XP, for sure drive letters are stored there as an index; XP branches from the registry to the partiton table disk signature to correlate which physical partition matches the drive (i.e., partition) letter (pretty hokey, huh?). It also appears that XP stores the disk signature in the registry; it may be using it to correlate the matching physical disk, analogous to what it does with partitions. What is known is that users have experienced failures switching drives, and it has been very difficult to pin down exactly why, even with controlled tests. There are indications that MS - and other vendors! - use this space for unique proprietary reasons. In Vista, the space is used in lieu of the partition table to find the boot loader, and it is used for BitLocker. So, in short, that may or may not be a problem. I should add that the same size (i.e., “80GB”) does not at all necessarily translate into “identical”; it’s a matter of the geometries.

topak123 · August 10, 2008, 6:24pm

Thanks for the clarification. Its just that I don’t quite remember if I chose /dev/sdb or /dev/sdb1. In any case, my main concern is that it overwrote some data on the winxp partition without so much as giving a warning! What kind of brain-dead tool does that? If the tool YAST2 had even bothered to issue a warning like


Data and/or files on /dev/sdb1 (or /dev/sdb) will be overwritten. Press ok to proceed or cancel to stop.

then I would have immediately stopped. Where’s the user-friendliness in a open-source tool that simply overwrites data without bothering to inform the user?

I know all that. And yes the NTFS partition was mounted by the ntfs-3g driver (which has write support enabled by default these days). But the main issue is that YAST2 changed the contents of the virtual disk without informing/asking. I never expected YAST2 to be such a deadly tool.

I have already checked out
Ubuntu Rescue Remix | The Rescue Remix provides a Free-Libre Open-Source data recovery software toolkit based on Ubuntu
https://help.ubuntu.com/community/DataRecovery
and will try some of the methods described there.

topak123 · August 10, 2008, 7:11pm

You guessed right, I was using boot sector to mean the MBR (and partition table).

My pc is a laptop so I would need to boot from a working XP drive, then plug-in the damaged drive via an SATA-to-USB adapter (or vice-versa). I don’t have the SATA-to-USB adapter right now. Any other solution you have in mind?

I am also looking into these:
https://help.ubuntu.com/community/DataRecovery
Ubuntu Rescue Remix | The Rescue Remix provides a Free-Libre Open-Source data recovery software toolkit based on Ubuntu

Never had a backup.

topak123 · August 10, 2008, 8:53pm

After running testdisk (v6.9), it reports:


TestDisk 6.9, Data Recovery Utility, February 2008
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org

Disk /dev/sdb - 80 GB / 74 GiB - CHS 9729 255 63
     Partition                  Start        End    Size in sectors
 1 * HPFS - NTFS              0   1  1  9728 254 63  156296322

Boot sector
Status: OK

Backup boot sector
Status: OK

Sectors are identical.

A valid NTFS Boot sector must be present in order to access
any data; even if the partition is not bootable.

Here are additional results from testdisk:

It reports that the partition structure: ok
It reports MFT and MFT mirror matches perfectly.
But filesystem is problematic and it reports Can’t open filesystem. Filesystem seems damaged.

mingus725 · August 10, 2008, 9:57pm

. . .Any other solution you have in mind?

Chkdsk is usually run from the Windows Recovery Console (for W2K and XP). It is on the bootable installation CD. Unfortunately, most PC’s only come with a Recovery CD, which is just an image copy of the factory installed version of the OS. But you might check with friends; someone might have one (or a bootlegged copy, there are plenty of those torrent’ing about). It is also on the XP 6-set floppy install setup, and the 4-set W2K setup; but I’m guessing your laptop doesn’t have a floppy.

I am also looking into these:
DataRecovery - Community Help Wiki
Ubuntu Rescue Remix | The Rescue Remix provides a Free-Libre Open-Source data recovery software toolkit based on Ubuntu

I just saw your other post re TestDisk. Dig into the docm to verify that testdisk copied the PBR from its backup at the end of the partition, as opposed to using the MFT Mirrorthat is in the in-place PBR. In other words, there are 2 MFT backups, the one in the PBR that’s there, the one in the backup PBR which can only be gotten to by recovering it. Having said that, it appears the MFT is there but something in it is broken - that’s where chkdisk comes in; it’s designed to fix the directory index and broken chains. Low-level tools don’t apply here.

Never had a backup.

I feared as much. A first-class bummer, to be sure. This kind of relates back to your very unhappy complaint about not being warned of what you were about to do (and btw, I suspect it wasn’t YaST but Xen that caused the problem; YaST is simply building the parms and feeding those to Xen - but that is neither here nor there at this point). Arguably, you have a fair point re user-friendliness. Thing is, when you get into tools of this sophistication and complexity - there are a gazillion similar examples with sysadmin tools in Windows land, too - assumptions are made about the experience level of the user. I’m not defending that, just stating that it’s a fact. openSUSE is positioned as “cutting edge”, but actually its a heckava lot easier to get into deep poop with Fedora, Slax, or Gentoo. I appreciate that none of this is any consolation. The takeaway is that whenever one starts any work that involves the partition table, extreme caution and preparation is critical - and that includes having a backup. After several decades of systems engineering, I’ve come to believe that Murphy’s Law is the most important axiom in IT. Just a friendly fwiw.

topak123 · August 10, 2008, 11:10pm

I plan to plug the damaged drive via a SATA-USB adapter to a pc running windows and then run disk check with error correction (same as chkdsk /f). My laptop does not have a floppy drive (as you guessed).

I just saw your other post re TestDisk. Dig into the docm to verify that testdisk copied the PBR from its backup at the end of the partition, as opposed to using the MFT Mirrorthat is in the in-place PBR. In other words, there are 2 MFT backups, the one in the PBR that’s there, the one in the backup PBR which can only be gotten to by recovering it. Having said that, it appears the MFT is there but something in it is broken - that’s where chkdisk comes in; it’s designed to fix the directory index and broken chains. Low-level tools don’t apply here.

Thanks for the advice.

This kind of relates back to your very unhappy complaint about not being warned of what you were about to do (and btw, I suspect it wasn’t YaST but Xen that caused the problem; YaST is simply building the parms and feeding those to Xen - but that is neither here nor there at this point). Arguably, you have a fair point re user-friendliness.

My point is that whether its YAST2 or xen or any other tool writing onto a hard drive, YAST2 should be knowing it ahead of time and warn the user. Think of formatting a partition, when popular linux installers/tools warn the user that data on the partition will be lost even though the actual formatting is done by calling some other low-level utility (file-system specific).

Thing is, when you get into tools of this sophistication and complexity - there are a gazillion similar examples with sysadmin tools in Windows land, too - assumptions are made about the experience level of the user. I’m not defending that, just stating that it’s a fact.

Which is where the real problem is: developers not bothering to think much and instead putting out some crap as part of a stable release and touting it as a easy-to-use feature.

mingus725 · August 11, 2008, 12:56am

@topak -

As I wrote previously, you make a fair point. I do appreciate your anger - you won’t get an argument from me; I’ve seen this kind of thing happen many a time. But from the developer’s point of view, when there is an “experts only” warning like there is when going into the Partitioner, that is sufficient. Or consider all the root commands that from a terminal can destroy the system - and there is no warning whatsoever; that’s why Ubuntu has forcibly implemented sudo. Or consider this: In Vista, the User Account Control feature was added to provide a level of security similar to sudo in Linux; MS has been beat over the head for this as creating a big nuisance and now a sizable pct of users have simply disabled the feature - so those dev’s feel like, “can’t win for losing.” In your situation, a user could argue that not only should there have been a warning, but had you still proceeded, there should have been yet another “are you absolutely sure” warning. It’s very subjective, it’s relative, it’s often a grey area. Engineers will entertain requests for user messages and warnings under the heading of “ease of use”, but you won’t get much traction with “the lack of a warning is a bug”.

At the end of the day, however much we think it ought not to be, it is what it is and will remain so. That’s why the first rule in IT is to protect oneself so that, whatever happens for whatever reason, you can recover. That is the key takeaway.

Now, it’s time for a cocktail.

topak123 · August 11, 2008, 2:02am

mingus725:

@topak -

As I wrote previously, you make a fair point. I do appreciate your anger - you won’t get an argument from me; I’ve seen this kind of thing happen many a time. But from the developer’s point of view, when there is an “experts only” warning like there is when going into the Partitioner, that is sufficient. Or consider all the root commands that from a terminal can destroy the system - and there is no warning whatsoever; that’s why Ubuntu has forcibly implemented sudo. Or consider this: In Vista, the User Account Control feature was added to provide a level of security similar to sudo in Linux; MS has been beat over the head for this as creating a big nuisance and now a sizable pct of users have simply disabled the feature - so those dev’s feel like, “can’t win for losing.” In your situation, a user could argue that not only should there have been a warning, but had you still proceeded, there should have been yet another “are you absolutely sure” warning. It’s very subjective, it’s relative, it’s often a grey area. Engineers will entertain requests for user messages and warnings under the heading of “ease of use”, but you won’t get much traction with “the lack of a warning is a bug”.

Thats not what I meant. In the end, the real bug lies in the fact that it did destroy data. Another person pointed out that specifying /dev/sdb1 would have been more correct than /dev/sdb but that logic holds no water. Remember that the option I chose was I have a disk or disk image with an installed OS (it does not get any clearer than that). To put it in simple terms, if the YAST2/xen config tool tried to detect a winxp install and could not find one (since the correct partition should be /dev/sdb1), it should simply report an error about not finding an OS and stop, not corrupt the drive. Does that sound logical?

Also, I expected the YAST2 virtualization conguration process to read from the windows drive, never write to it. Nowhere did it say that it will modify the installed OS! Besides, when it does write to a drive, its not expected to damage the MBR by any means.

In all, its a poorly designed tool with catastrophic bugs.

mingus725 · August 11, 2008, 4:32am

Then create a use case and demonstrate the error. So far, you’ve re-stated and posted elsewhere what you claim happened. Fine. But there is also such a thing as pilot error, too. Or maybe there is additional relevant data. What is clear is that for a bug report to be accepted and acted upon, it must be confirmed and validated. Go ahead.

Regardless, next time, make sure the basics are covered first. Like having a backup.

Now . . . back to that cocktail.

topak123 · August 11, 2008, 5:57pm

I am quite sure its a bug and easy to reproduce. In fact, if I had spare winxp installs (!!) I would reproduce it myself. BTW, there is no error the part of the “pilot” in this case since all the user is doing is following basic steps of a GUI-based config tool. In any case, I think I have already explained why there is no excuse for destroying a partition’s data and ability to boot. Its odd that some users are so quick to blame it on user error without thinking twice about what the tool should have done. Anyway, it seems reporting a bug is the final step.