Bad blocks during startup

Hi all,
I have a problem. I want to start my opensuse 10.1 machine. Last morning I start my computer but it was very slow opening programs, so I shutdown it using the reboot button. When it was starting it throws an kernel panic because it has some damaged blocks on /dev/hda2 and it said that I have to run reiserfs command with -B option, so I launch the rescue session from the instalation DVD and I ran the fsck.reiserfs
–fix-fixable /dev/hda2, but it does not solve the problem.

Do somebody know how to use the -B option?

How about fsck.reiserfs --rebuild-tree? Is it safe? Do I have to make a backup before use it?

If I use fsck.reiserfs --rebuild-tree /dev/hda2, is it safe or can I lost all my data.

Somebody suggested me to use the dd_rescue over a new installation but I´m a newbie user and I don´t know how to use it?

Please I need help.

Thanks in advance

Ana

Hi Ana,

You’ll want to use the rescue session from the installation DVD again.

The following commands will check and (probably) fix the errors you’re
getting, but they will take some time, several hours most likely depending
on the size of your /dev/hda2 partition.

(Please read all the way through before trying this, I’ve had another idea
I’ll add to the end)

Scan the partition for bad blocks:

badblocks -o badlist -n /dev/hda2

saves list of badblocks in ‘badlist’ file, does a non-destructive read-write
test of every sector in /dev/hda2. We’ll need the ‘badlist’ file for the
next command:

reiserfsck --fix-fixable --rebuild-tree --badblocks badlist

This should use the corrupt blocks listed in ‘badlist’ and repair the reiser
filesystem. Again, this will take some time. you could add ‘-v’ at the
end if you wish a more verbose output. (normal is generally plenty)

=====================

Ok, the other idea… You can use smartctl to request the hard drive do a
full hardware scan and repair test. Takes about an hour on my 300GB
drives.

Again, using the rescue session from DVD:

smartctl -t long /dev/hda

Will start a ‘long’ drive check sequence, which will read and remap any bad
sectors if possible. It should report how long the test should take…,
let it run for at least that long, and do try to not access that drive
which could abort the test. (best to just leave it alone while it runs)

Afterwards, it would be prudent to run fsck.reiserfs on /dev/hda2 when it
completes.

fsck.reiserfs /dev/hda2

As a note, ‘fsck.reiserfs’ and ‘reiserfsck’ are actually the same
command/program. usage is the same, regardless of the command used to
start it.

I would definitely recommend that you invest in a new hard drive as soon as
possible if ANY bad blocks are found and reported. Current drive
technology can recover/repair many errors behind the scenes without you
ever knowing about them… but once errors become non-fixable and begin to
show up as errors, it’s time to replace the drive.

Hope this helps

Loni


L R Nix
lornix@lornix.com

L R Nix wrote:
> Hi Ana,
>
> You’ll want to use the rescue session from the installation DVD again.
>
> The following commands will check and (probably) fix the errors you’re
> getting, but they will take some time, several hours most likely depending
> on the size of your /dev/hda2 partition.
>
> (Please read all the way through before trying this, I’ve had another idea
> I’ll add to the end)
>
> Scan the partition for bad blocks:
>
> badblocks -o badlist -n /dev/hda2
>
> saves list of badblocks in ‘badlist’ file, does a non-destructive read-write
> test of every sector in /dev/hda2. We’ll need the ‘badlist’ file for the
> next command:
>
> reiserfsck --fix-fixable --rebuild-tree --badblocks badlist
>
> This should use the corrupt blocks listed in ‘badlist’ and repair the reiser
> filesystem. Again, this will take some time. you could add ‘-v’ at the
> end if you wish a more verbose output. (normal is generally plenty)
>
> =====================
>
> Ok, the other idea… You can use smartctl to request the hard drive do a
> full hardware scan and repair test. Takes about an hour on my 300GB
> drives.
>
> Again, using the rescue session from DVD:
>
> smartctl -t long /dev/hda
>
> Will start a ‘long’ drive check sequence, which will read and remap any bad
> sectors if possible. It should report how long the test should take…,
> let it run for at least that long, and do try to not access that drive
> which could abort the test. (best to just leave it alone while it runs)

Since when is smartctl able to remap blocks? It just does a read-only
scan of the drive and fails at the first read error found. You have to
manually write zeros in the block to tell the drive to remap it. I would
use the drive test application from your hardisc vendor to check and
remap blocks. If it is able to remap all blocks, you do not need to use
the badblocklist with reiserfsck. Nevertheless, bad blocks of the drive
will likely increase so you should exchange it. It is also very likely
that lots of your data on the reiserfs partition is damaged without
knowing it because reiserfs does not deal well with drive errors.

So, best would be a reinstall if it is root or a restore from backup.

Marco Munderloh wrote:

>> smartctl -t long /dev/hda
>
> Since when is smartctl able to remap blocks? It just does a read-only
> scan of the drive and fails at the first read error found. You have to
> manually write zeros in the block to tell the drive to remap it. I would
> use the drive test application from your hardisc vendor to check and
> remap blocks. If it is able to remap all blocks, you do not need to use
> the badblocklist with reiserfsck. Nevertheless, bad blocks of the drive
> will likely increase so you should exchange it. It is also very likely
> that lots of your data on the reiserfs partition is damaged without
> knowing it because reiserfs does not deal well with drive errors.
>
> So, best would be a reinstall if it is root or a restore from backup.

smartctl enables features found in all newer driver which support S.M.A.R.T.

One being a full hardware / media test.

I’ve recovered several drives this way, but I do agree and stress that if
ANY errors are noticed on a drive, it should be immediately slated for
replacement, as it is now suspect in its operation.

SMART was designed to help systems predict failure and prevent loss of data.
Drives nowadays will silently remap a bad block if found during a SMART
invoked test. Usually the tests are performed automatically every so
often, but they are usually ‘short’ tests and do not encompass the entire
drive, so may miss errors. Invoking a ‘long’ test often allows the drive
logic to remap the errors to good sectors again.

Again, I consider this a stopgap measure to allow retrieval of your data…
replacement as soon as possible is always recommended.

Loni


L R Nix
lornix@lornix.com