Bad superblock preventing system from booting!

Please, can anyone help me!
My setup in question is the mighty “bgrsvrx”, it has 8 disks which are combined into a single LVM.
I suspected one of my disks was bad as it started clicking. To prevent data loss I figured I would remove it from the LVM and then remove from the system as a whole.

So I went into LVM but instead of being shown the LVM, I was presented with the “first run” screen; ie my existing LVM had vanished!
Panicing (as this is a 78% full 3TB group!) I came out and went into Partitioner instead, to make sure they were at least being recognised.
I was presented with a warning about one of the disks (/dev/sde in this case) not being able to be read and I can’t do anything with it in this screen.

Not too worried at this stage I quit parted and tried the LVM again - still nothing shown.
I also noticed that some of the folders were now giving me an IO error when I tried to LS.

“Hmm, perhaps a reboot might kick it over” I thought and being awesome, typed the reboot command and hit enter.
Now, this is a slow machine (P3 800mhz) so I left it for 10 minutes to come back in but after that time, I still had nothing. I tried to ping the machine and it didn’t respond so I climbed into the loft where I was prompted to “enter root password” (which I did).

The prompt then said "(Recovery) root # "
I noticed up the page that it was saying there was a bad superblock on volume /dev/system/home (the LVM).

NOW WHAT?! PLEASE HELP! I have about 2.5TB of data which although it’s nothing life threatening, would be a disaster never-the-less if lost.

Hi,

What kind of file system does the affected LV have?

Depending on the type of fs you will have to use different tools when trying to recover a corrupt fs.

For example, for reiser file systems, you have debugreiserfs, and for ext3/ext3 you can use dumpe2fs to get information about alternative superblocks that may help you to fix corrupt fs.

Once you know the superblocks in them, you can try to fix the problem using the corresponding fixing tool:

reiserfsck for reiser file systems
e2fsck -b for ext2/ext3 file systems

Before running these commands read the man pages and double check that you understand the aim of every potential flag you want to add to the base command.

Hope this helps.

Regards.

IIRC the FS is EXT3
I tried the e2fsck but it whined at me saying the SB was bad (no, really?!) and so instead I tried the reiserfsck instead. This seems to be doing something now but it says it has about 17 HOURS left (yikes!)

For info, I ran these three commands:-

reiserfsck --rebuild-sb /dev/system/home
reiserfsck --check /dev/system/home
reiserfsck --rebuild-tree /dev/system/home

Should I control-c it or leave it running do you think?

Hmm, don’t run reiserfsck if it’s not a reiserfs. You would corrupt the system more.

Nornally e2fsck, specifying a backup superblock (see man e2fsck), is used to rescue ext2/3 filesystems with a bad superblock, But I don’t know what you do when it’s a LVM member, haven’t had a LVM to deal with before myself.

Maybe first try unplugging the disk you think is bad and then boot with a LiveCD… you might be able to do some better troubleshooting from there.

What’s the output of pvdisplay and lvdisplay?

Wishing you luck,
Wj

Uh oh! Well, it is ext3 but I’ve left the resierfsck running, I tried “e2fsck -b nnnn /dev/system/home” but each block I specified said it was either not found or corrupt.

I also tried unplugging the disk I suspected as faulty and the system threw a load of errors (my Live CD images are stored on that LVM!) but they were expected and non-fatal (ie “Missing Disk from LVM”).

It’s about 60% through when I checked this morning and there were lots of “Repaired” messages so fingers crossed it will be OK.

If not, it’s a harsh reminder for me to BACKUP my data frequently!

I guess only time will tell (ie when I get home tonight) I suspect it will either be fixed or trashed.

Well… crossing my fingers for you too! Hope you get lucky!

PHEW!
So returned home from work and there was a message “No reiser partitions found” (or something) so PHEW again!

I re-tried the

e2fsck -b nnnn /dev/system/home
nnnn = 8193, 32768, 98304, 163840, 229376 and 294912

but each returned something about a “bad magic number”
any ideas? i promise not to run off and perform open suse surgery without hearing from you guys first!

check for where superblocks are stored

dumpe2fs /dev/sda2 | grep superblock

replace sda2 with the necessary drive.then use

fsck -b 32768 /dev/sda2

changing the necessary parts to what you have/need.Also, make sure the partitions/drives are unmounted beforehand

Andy

thanks for the tip df, the individual disks are
sda to sdh
the volume group is mounted (or should be mounted) to
/dev/system/home

which should i use
dumpe2fs /dev/sdxn | grep superblock
or
dumpe2fs /dev/system/home | grep superblock

?

Try /dev/system/home as it’s lvm ( i may be wrong )

Andy

sorry for the camera-screenshot, no inet on the faulty pc!
output of “dumpe2fs /dev/system/home | grep superblock”

http://img218.imageshack.us/img218/4382/spa1107iy1.th.jpg](http://img218.imageshack.us/my.php?image=spa1107iy1.jpg)

/awaits someone to hold hand hahaha

Don’t go blindly on my advice here, as I don’t have enough experience with digging into LVM issues to guide you remotely… this is one of these things that might be better to do with someone who knows what he’s doing & sitting next to you!

If it where me I’d try to see which configuration details are still in tact.

Could you post the output of :
fdisk -l
lvs -a -o +devices
pdisplay
lvdisplay

then we have an idea of your layout and what the system is seeing.

edit: sorry for the late reply after Andys posts… My internet connection was just down.

i sure can, watch this space (it may have to be camera-phone shots as before though!)

brb …

Ok, try these for size:-

http://img261.imageshack.us/img261/9111/spa1108xy4.th.jpg](http://img261.imageshack.us/my.php?image=spa1108xy4.jpg)

http://img201.imageshack.us/img201/411/spa1110zw5.th.jpg](http://img201.imageshack.us/my.php?image=spa1110zw5.jpg)

http://img367.imageshack.us/img367/1949/spa1111an7.th.jpg](http://img367.imageshack.us/my.php?image=spa1111an7.jpg)

http://img91.imageshack.us/img91/3316/spa1112sx6.th.jpg](http://img91.imageshack.us/my.php?image=spa1112sx6.jpg)

Anyone able to help here?
I’ve given myself until Monday at which point I’m calling it a day (well, three!) and chalking it up to a bad experience and a severe reminder to BACK UP MY FILES!!!

Interestingly enough, I called the Geek Squad, they couldn’t help as they only support Ubuntu at the moment (booo!)

In the meantime, I’ve connected one of the HDDs to a Windows machine and installed “Recover my Files”, it seems to be picking SOME up (it’s the JPEG photos I really am concerned about) so I don’t think that ALL the data is lost but it would be so much better if I could restore the entire LVM so I can split up and BACKUP the data!

But what you are facing is not distro specific!?? They should be able to help/guide you -as we are talking basic Linux functionality!
You might want to ring them again and explain this fact.
They should even be able to diagnose and troubleshoot using a Ubuntu LiveCD AFAIK…

I’m afraid (as typed before) I can’t offer any solid advice here other then: get someone to help that can sit behind your system.
Also take care not to switch and move around the disks too much as that will also slim down chances of a successful restore.

The locking errors you are getting can be bypassed by running the lvdisplay commands with the ’ --ignorelockingfailure ’ switch…

Can only wish you luck!
-Wj

earlier today i removed all the disks (physically) from the machine, purchased a new 80gb drive and installed suse 11 on ready to re-create the environment from scratch.

I then figured I would give it one last go before crying myself to sleep … this time, the system still reported errors but allowed / to be mounted read-write (previously it had only let me read /)

Anyway, I changed my /etc/fstab, commented out the line to mount the LVM and rebooted.

The system started up as expected.
Now it allows me to SSH into it from a remote PC (ie be able to copy/paste outputs) and go into YAST > LVM and Partitioner.

I firstly went into partitioner, it saw all the drives and the FS (Ie “Linux LVM”); I then went into Yast > LVM and it no longer tried to run the “first run”; instead, there were all my disks allocated to the volume group “system”.

I clicked apply and rebooted; unfortunately, it’s made no difference.

Here are some other things I tried:-


dumpe2fs /dev/system/home | grep superblock



  Primary superblock at 0, Group descriptors at 1-186
  Backup superblock at 32768, Group descriptors at 32769-32954
  Backup superblock at 98304, Group descriptors at 98305-98490
  Backup superblock at 163840, Group descriptors at 163841-164026
  Backup superblock at 229376, Group descriptors at 229377-229562
  Backup superblock at 294912, Group descriptors at 294913-295098
  Backup superblock at 819200, Group descriptors at 819201-819386
  Backup superblock at 884736, Group descriptors at 884737-884922
  Backup superblock at 1605632, Group descriptors at 1605633-1605818
  Backup superblock at 2654208, Group descriptors at 2654209-2654394
  Backup superblock at 4096000, Group descriptors at 4096001-4096186
  Backup superblock at 7962624, Group descriptors at 7962625-7962810
  Backup superblock at 11239424, Group descriptors at 11239425-11239610
  Backup superblock at 20480000, Group descriptors at 20480001-20480186
  Backup superblock at 23887872, Group descriptors at 23887873-23888058
  Backup superblock at 71663616, Group descriptors at 71663617-71663802
  Backup superblock at 78675968, Group descriptors at 78675969-78676154
  Backup superblock at 102400000, Group descriptors at 102400001-102400186
  Backup superblock at 214990848, Group descriptors at 214990849-214991034
  Backup superblock at 512000000, Group descriptors at 512000001-512000186
  Backup superblock at 550731776, Group descriptors at 550731777-550731962
  Backup superblock at 644972544, Group descriptors at 644972545-644972730

I tried for each entry “e2fsck -b nnnnnn /dev/system/home”
but each one failed with the same error:-


bgrsvrx:/ # e2fsck -b 2654208 /dev/system/home
e2fsck 1.40.2 (12-Jul-2007)
e2fsck: Bad magic number in super-block while trying to open /dev/system/home

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

bgrsvrx:/ #

However, I noticed that the higher numbers said “invalid argument”:-


bgrsvrx:/ # e2fsck -b 644972544 /dev/system/home
e2fsck 1.40.2 (12-Jul-2007)
e2fsck: Invalid argument while trying to open /dev/system/home

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

bgrsvrx:/ #   

I browsed the web and asked google and some sites suggested readin the block using ‘dumpfs’:-


bgrsvrx:/ # dumpe2fs -h /dev/system/home
dumpe2fs 1.40.2 (12-Jul-2007)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          d5146357-8880-48b2-88d4-c31c349a9f78
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype sparse_super large_file
Filesystem flags:         signed directory hash
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              389234688
Block count:              778463232
Reserved block count:     38913232
Free blocks:              221451596
Free inodes:              387041694
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      838
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Sat Mar 29 21:18:44 2008
Last mount time:          Sun Nov 30 12:06:18 2008
Last write time:          Sat Dec  6 20:50:04 2008
Mount count:              9
Maximum mount count:      500
Last checked:             Fri Oct 31 09:37:54 2008
Check interval:           5184000 (2 months)
Next check after:         Tue Dec 30 09:37:54 2008
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      f294e4c9-0bf4-490f-9251-17790af199c5
Journal backup:           inode blocks
Journal size:             128M

I was surprised it read it to be honest!
Anyway, I also managed to get a decent output to some commands suggested earlier:-


bgrsvrx:/ # lvdisplay
  --- Logical volume ---
  LV Name                /dev/system/home
  VG Name                system
  LV UUID                tkTEnV-0PE6-AZuY-dSWy-2izA-2cDd-2vzaRw
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                2.90 TB
  Current LE             760218
  Segments               8
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

Oh, I also tried to mount an individual partition to see if that would work (it didn’t!):-


bgrsvrx:/ # mount /dev/sdc1 /bgr
mount: unknown filesystem type 'LVM2_member'

For information (if it helps) I ran ‘dumpe2fs /dev/system/home’, it produced a LOT of information, I won’t c/p it all here but here is a snippet


Group 0: (Blocks 0-32767)
  Primary superblock at 0, Group descriptors at 1-186
  Reserved GDT blocks at 187-1024
  Block bitmap at 1025 (+1025), Inode bitmap at 1026 (+1026)
  Inode table at 1027-1538 (+1027)
  0 free blocks, 16373 free inodes, 2 directories
Group 1: (Blocks 32768-65535)
  Backup superblock at 32768, Group descriptors at 32769-32954
  Reserved GDT blocks at 32955-33792
  Block bitmap at 33793 (+1025), Inode bitmap at 33794 (+1026)
  Inode table at 33795-34306 (+1027)
  0 free blocks, 16384 free inodes, 0 directories
Group 2: (Blocks 65536-98303)
  Block bitmap at 65536 (+0), Inode bitmap at 65537 (+1)
  Inode table at 65538-66049 (+2)
  0 free blocks, 16384 free inodes, 0 directories


Group 23755: (Blocks 778403840-778436607)
  Block bitmap at 778403840 (+0), Inode bitmap at 778403841 (+1)
  Inode table at 778403842-778404353 (+2)
  32254 free blocks, 16384 free inodes, 0 directories
Group 23756: (Blocks 778436608-778463231)
  Block bitmap at 778436608 (+0), Inode bitmap at 778436609 (+1)
  Inode table at 778436610-778437121 (+2)
  26110 free blocks, 16384 free inodes, 0 directories

dumpe2fs: /dev/system/home: error reading bitmaps: Can't read an block bitmap

Anyway, I doubt anyone will be able to help even with all that info but I’m still clinging onto a little bit of hope that even though ALL my superblocks appear to be dead, someone can help me rebuild just one!!

I was also wondering if I removed a disk from the LVM but didn’t format it, as it’s originally EXT3, would I then be able to mount that disk “as normal” and read the contents? Or would removing it from the LVM move the data off of it too?

Thanks to all who read this by the way, especially those who are trying to help, I really do appreciate it!!!

There’s a bit of inconsistency in the info you posted. dumpe2fs apparently can read the superblock, but e2fsck cannot. But did you try e2fsck without the -b argument this time around? You seem to have tried all values of -b X but did you try omitting -b and the superblock number altogether and use the primary superblock? Seems to me that ought to work…

The way LVM works is that each volume is a piece of a larger volume. So each volume doesn’t have the normal filesystem header. However once they are fused by LVM into one large virtual volume, then any filesystem can be put on the volume, So it is useless to try to fsck each member volume individually. As you’ve seen the type will be LVM2_member.

Seems to me that now that you have LVM building the virtual volume, it ought to be possible to repair it, and then mount it. If the member volume is totally lost, then you would have lost any files that had any blocks on it. But it ought to be possible to recover files that reside on other volumes.

Ken,
I completely agree, it’s very strange how one reads it (seemingly OK) and the other refuses to totally!!

With regards to trying without -b nnnnn, I did and I get this:-


bgrsvrx:~ # e2fsck /dev/system/home
e2fsck 1.40.2 (12-Jul-2007)
Group descriptors look bad... trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/system/home

The superblock could not be read or does not describe a correct ext2 filesystem.  If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

I also tried:-


bgrsvrx:~ # fsck -y /dev/system/home
fsck 1.40.2 (12-Jul-2007)
e2fsck 1.40.2 (12-Jul-2007)
Group descriptors look bad... trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/system/home

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

fsck.ext3 /dev/system/home failed (status 0x8). Run manually!

:frowning:

Seems to me that now that you have LVM building the virtual volume, it ought to be possible to repair it, and then mount it. If the member volume is totally lost, then you would have lost any files that had any blocks on it. But it ought to be possible to recover files that reside on other volumes.

Either it’s too early in the morning for that kind of talk or I really have no clue what you mean!

I tried to remove various partitions from the LVM (“system”) but it gave me an error:-


 Error
If you remove the selected device from the volume group, there will not be enough space for all logical volumes in this group. To delete the volume group completely, remove its logical volumes first.    

I figured it was because I have the LVM set to MAX in the Partitioner so went into there, selected the LVM and reduced its size to 2.6TB.

It seems to let me reduce the size of the LVM via the Partitioner, but I was worried if I reduce it then it would loose data so I didn’t actually APPLY any changes there, just Aborted back to Yast.

Oh, I also notice that since I physically removed all the disks and re-installed them (physically that is), when I go into LVM or Partitioner, i’m not getting any warnings about “bad /dev/sde1” which I was previously. It seems to actually let me in OK.

I have been looking at the internet for “LVM Recovery” and there seems to be a number of Windows-based tools which will do this; the biggest problem I have is due to the physical number of disks (8) in the LVM, the only machine I have with enough IDE and SATA ports, is the server and only because I have a PCI SATA & IDE card!

A quick overview:-
Disk 1 - IDE 0, Channel 0
Disk 2 - IDE 0, Channel 1
Disk 3 - IDE 1, Channel 0
Disk 4 - IDE 1, Channel 1
Disk 5 - SATA (PCI Card)
Disk 6 - SATA (PCI Card)
Disk 7 - IDE 2 (PCI Card), Channel 0
Disk 8 - IDE 2 (PCI Card), Channel 1

I could buy another SATA & IDE PCI card and add a CDROM to the new card, which would let me use a LiveCD of some distro. But then I still think with the bad superblock, I wouldn’t be able to mount the LVM.

Does anyone know if it’s possible to rebuild a superblock from scratch?