Suse linux server crash

Hi,

I have a big folder with more than 500,000 files of 30 GB size. If I make a backup or just copy those files in a newly created folder, the suse system crashes and it presents me with the console tty1 with login.
When i try to login it gives this
“INIT: id ‘1’: Respawning too fast: Disabled for 5 minutes”

and for all the ttys it give this with ofcourse the id changing

Now if i do ctrl+al+f10, the kernel is logging all sort of messages like these:

reiserfs: sda2: warning zam-7001: i/o error in reiserfs_find_entry.
scsi 0(0:0) rejecting i/o to offline device

the messages 1 changes every time with sometime message for smbd, nmbd etc but message 2 remains the same.

All i have to do afterwards is just cut the power and restart the system and everything will start to work again but i am worried this might not continue for long.

I have tried to boot as rescue system from bootable cd and then do
fsck.reiserfs --check /dev/sda2 (the root partition)
but it reports everything ok.

I also have installed Raidman for ibm servers as i have hardware raid running but its says there is no problem with disks.

What could be the problem, i can’t seem to understand, any help would be greatly appreciated.

Best regards,
RUI

Because ReiserFS hasn’t been changed it might be possible that
there are issues left. Your disk or partition might be getting
full. Try to use ext3 which is much safer than ReiserFS.
ext3 has one disadvantage that every 2 months at least the root
(/) partition get subjected to a thorough disk check. As long as
there are no strange errors, your disk is okay.

you can disable periodic FS checks in ext2/3/4 using tune2fs (or at least set them to occur far in between)

As for reiserfs, either there are errors on the disk itself (bad blocks?) or there’s a bug in reiserfs itself

I had faced problems with ReiserFS long time back. Before ext3 became prominent, SUSE used to suggest ReiserFS during installation.

I don’t remember the exact problems I faced but I no more use it nowadays.

that’s because SUSE was a sponsor and had developers working on reiserfs. Even today, Jeff Mahoney from SUSE labs is one of the most active devs working on it. He just pushed a huge set of patches for kernel 2.6.30 which cleans a lot of code in reiserfs and improves security attributes and SELinux support

But, I have to admit, reiserfs is a terrible file system, not only performance wise, but also it ages much faster than other systems, thus over time it becomes slower… read a bit on the wiki about its problems. Also its tail packing makes it even slower

These are very important facts shared but what should i do next, that’s the thing bothering me.

Should i check for bad sectors – I don’t think there are any as if there would have been any, fsck would have reported that? How can i recover my system from it. BTW, there is still lot of space left on the root reiserfs partition, so space is not an issue.

First thing I would do is try a different kernel as this may very well be a bug in reiserfs itself. You can get a new kernel from the link below. If this still gives you problems, it could mean two things: if there really is a bug in reiserfs, it hasn’t been fixed yet even in the new kernel, or there’s really something up with the hardware and first thing I’ll do is check for bad blocks with: badblocks -b 4096 /dev/sdxx (replace xx to fit your actual config, eg sdb1, sdb2, etc)

http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_Factory

I would get the 2.6.29 kernel as .30 is still in rc

Hi,

Can’t I compile reiserfs moudule for kernel or get updated patch for it(maybe already compiled and ready to use) as my lsmod presents reiserfs module as loaded. Changing the kernel is a big step – i think.

I am also running memtest86 and badblocks at the moment.

Regards,
RUI

changing kernel is not really that big of a step as I do it very often. I guess it’s best to check for HW problems first like you’re doing and then if nothing comes up, start considering that it may be a bug in resierfs itself. I’m not sure compiling the patched module itself for your kernel will be easier for you. You also have to find out if this really is a resierfs problem and if so, if there’s a patch released. That’s a lot of trouble and time just to compile a patched version of reiserfs for your kernel. Easier would be to just upgrade the kernel, imho

When compiling a new reiserfs module:
I am just writing this without thinking much about it. Someone can comment on this.
After the new reiserfs module is in the system, your initrd will still contain the old module! So, you may have to do mkinitrd. (I faintly remember some problem when mixed up two versions like that).

Hi,

Here is the situation right now:

I tried this at the moment Brian’s Blog: Reiserfs - Dealing with Bad Blocks

and badblocks is generating a lot of badblocks, as much as my floppy being filled up.

From 14402624-14564585(in sequence) almost 161961 blocks are reported to be badblocks in the file, it might have gone up more if it not have been for the floppy space being completely full.

What should I do in this case when so many blocks are being reported as badblocks, is the filesystem corrupt etc?

There are not supposed to be any badblocks on the hard disk by the way as it 2 disk hardware raid.

Best regards,
RUI