Mounting or checking a partial ext4 file system

If you have a contiguous partial piece of an ext4 file system (assuming it’s perfectly clean), starting from the beginning of the partition, is there any way to check it, or to mount it to get the files whose parents, inodes and data are all completely contained inside?

Background:

Have (or maybe had) a very large 11TB RAID 6 array, filled with a single large ext4 partition. Something strange happened when a single drive failed and the array ended up failing 13 out of the 11 drives. I had trouble getting the array restarted, and got to the point where I exhausted all of the options I considered completely safe. I considered a few things that may have worked, but mdadm doesn’t seem to have a definite “do not change anything” option. So I decided the only way to be absolutely safe would be to clone the disks before proceeding - then I realized how much time that would take and sent the drives off to a recovery service so they could image them and check it out.

Before doing so, I copied the first 2GB from each disk. I XORd the images from the working drives to reconstruct the data chunks that were on the failed disk, manually assembled the chunks, and am very confident that I have 22GB of “correct” data in a single file. The parity and Q syndromes all matched (with RAID 6 you can still check with only 1 missing device). I’ve learned the fine details of ext4 from https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout, and have looked at lots of raw data from the reconstructed partition, and it all looks good. The recovery company says that they’re not finding many inodes, but I found a lot of them, exactly where they’re supposed to be. I tried to mount and e2fsk, but both processes seem to be extremely unhappy that the device size doesn’t match the size implied by the file system geometry.

I considered hacking the superblock to manually reduce the size, but I figure that wouldn’t work because there would then be more group descriptor blocks than it would expect after the superblocks. I might try doing that and compensating by incrementing the “reserve block count” to compensate. Alternatively, if there is some way to make the file appear to be the expected size with nothing but zeroes after the end of the actual data, maybe I could mount it and not get any errors until I cause the kernel to read past the true end of the file.

You could try extending the image to the declared size by seeking to the end and writing one byte there. The blocks in between will not be allocated on the filesystem until written as it will be a Unix sparse file (search the web for this). I have no idea if you can successfully fool the tools this way. Good luck.

As usual, ken_yap, you’re awesome! It worked - thanks for that. I hadn’t seen anything about “sparse files” in ext4, but I guess sparse files are an implicit possibility when extents are used.

Here’s what I did and what happened:

I put the file that contains the first 22GB of an 11TB ext4 file system inside another mounted ext4 partition. Then I calculated the size implied by its geometry, multiplying the total number of blocks indicated in the superblock by the block size. That number minus one should be the last byte offset in the file system, so I ran:

dd if=/dev/zero of=<<file containing piece of ext4 partition>> bs=1 count=1 seek=<<last byte offset>>

The file grew to 11TB, and I was able to mount it without error. I was then able to ls the root directory without error. Past that point, any directory I could get inside would ls its entire contents, but ls would report “ls: cannot access <<dir>>: Input/output error” for some or all of the directories/files it contained. Surely these must be files and directories whose inodes are past the end of the actual data, and are coming back as all zeroes. Some of the files read okay, and some of them read as all zeroes, presumably because their inodes are inside the boundary of actual data, but their data blocks are in the nebulous unassigned space. The error handling was just about as graceful as possible - I almost expected the whole mount to come crashing down once I ventured outside of the actual data area.

Sparse files have been in Unix/Linux pretty much from the beginning. The implementation is elegant: if a block pointer from the inode is null, but the byte range is within the file, then this is a sparse block and reads as all zeros. When a write is done, a real block will be allocated.