kernel 2.6.31.12 floading /var to death

I tried to install the 2.6.31.12 kernel (as suggested for security reasons) twice, and each time, I got a BIG mess up of the installation.
Here is the story (I’ll try to keep it short):

The installation of the kernel was not a problem, it went fine both time (I just had to reorganize Grub entries a bit). At first, all was running normal.

Suddenly, after having left the computer idle for a moment, I was faced with a black screen (blanker probably) and I could not wake up the system: no mouse, no keyboard, all seemed to be frozen and I had to press the reset button.

After the reboot, and as soon as the desktop was loaded, I got an error message saying that /var has 0 bytes left !
The nightmare began…

The first time that happened, I tried to find what was wrong in /var and /var/log , but found nothing abnormal. The info given by the Diskinfo tool were strange, sometimes showing /var as 100% full, sometimes it was /home that was 100% full and some other times it was /.
I tried to fall back to kernel 2.6.31.8 using Yast, but it failed at the middle of the process.
At this point I realized that the system was very unstable.
I then booted the Install HD and run the repair mode. Some errors were found and corrected, but in the end, the system was not repaired; and worst, Grub lost all entries except the floppy one. (Hopefully, I knew how to boot the Install HD from Grub’s command line).
I tweaked again and again, and finally got my system up again with kernel 2.6.31.8 (don’t remind all the steps).

Recently, as the kernel update was still suggested, as there were other updates in the meantime (like mkinitrd), and as haven read anyone else complaining about 2.6.31.12, I decided to give a try again.

And again, a couple of days later, same symptoms: black frozen screen, no space left in /var.
As I knew the system was unstable, I immediately booted the Install HD. Tried the repair mode and it failed (just mess Grub again). Tried to (update) install kernel 2.6.31.8 from there and it also failed.
I then started the rescue mode and tried to delete some big files in /var (again I did not found any monstrous thing in there), then tried again the repair and update process with no better results.

From rescue mode, I finally deleted everything in /var (that was not a good idea as I also deleted some NEEDED files)…

Got the system to boot, but only to the text login: nscd failed because of missing files in /var, starx failed because no X server found, …

Fed up, I lastly decided to “update” all system base RPM, the Gnome and KDE desktop environment from the 11.2 install HD, then update everything again with Yast Online Update. This took HOURS !

My 11.2 installation is now up again.

Questions:

  • How can I, from command line, find the biggest files nested somewhere in /var ? (Is it possible that the kernel is writing a log that cannot be seen ?)

  • From Yast Online Update, I have now two packages waiting: the kernel (2.6.31.12) which I have banned, and gnome_2_28_2 which fail to install (the progress only show “Suppressing libsoup-2_4-1”)
    From Yast Install/suppress Software, when I select the installed libsoup, I immediately receive the following error:
    y2base: /usr/include/boost/smart_ptr/intrusive_ptr.hpp :149 : T* boost::intrusive_ptr<T>::operator->() const [with T = const zypp::target::rpm::RpmHeader]: L’assertion « px != 0 » a échoué.
    YaST got signal 6 at YCP file PackagesUI.ycp:280
    /sbin/yast2: line 454: 23597 Abandon $ybindir/y2base $module “$@” “$SELECTED_GUI” $Y2_GEOMETRY $Y2UI_ARGS

Any idea how to solve this ?

I will not answer all your questions, but think I can answer one or two.

For finding out which directory/file shows an exessive size use

du -sk *

within e.g. /var. Then, when you think some dir is suspected,* cd* down to it and repeat. It is a bit of try and guess, but it often works. (Of course read* man du* for further info).

When there is a message that /var is full, I suppose you have a separate /var partition.

You do not tell which tool you use to see if some file system is really full, but I would use

df

Thanks for your help.

Some more info:
/var is on a separate partition of 1.9Gb capacity. Right now, only 351Mb are used.

In the first crash case, I used what I think is originally named “Disk usage analyser”, and Nautilus.
In the second crash case (command line from Rescue mode), I used “ls -l” and “cd” to browse the files (now I think that “ls -lhSr” would have been more handy).

Thanks. That’s indeed a faster way.
Note: when I tried this on my repaired system, I got a few lines like:

du: cannot access `lib/ntp/proc/25471/fd/4': No such file or directory

Should I worry ?

Indeed. :wink:

I won’t try the 2.6.31.12 kernel again to find what is the file (if really it’s a file) that grows over 1Gb in a few days (filling up the whole partition). So I will never know what (and why) is the kernel trying to do (or to tell). I quickly browse bugzilla but saw nothing related to this “no space left in /var” problem.
I hope the next kernel update will be more stable…

I found the solution for the last problem (libsoup RPM crashing Yast):

# rpmdb --rebuilddb

:slight_smile:

Congratulations. Indeed a rebuilddb is something one should think of when really strange things happen as yours.

Why waited until so many problems have risen? What did you throw away from /var? MySQL?
Great you got it running again, but next time come here for help before you start wrecking your system again.

I haven’t noticed that something was going wrong until I got the “black frozen screen”. Before that, everything seemed right, no error message during boot, no error from the desktop… It was all sudden.
But then, because of the “O byte left in /var”, the system became very unstable at a point that opening an application could add more instability. So it was not possible (very dangerous) to open FireFox and post to the forum until I got at least the /var problem solved.

MySQL is not installed.

At first (from the rescue mode), I deleted some of the .bz2 files in /var/log and some other log files I thought it was safe to remove. But that didn’t changed the problem (could not repair nor re-install packages because of /var).
Then I deleted everything with “rm -r *” in /var and THAT did make a change, but also create other problems because of some needed missing (deleted) files.
“rm -r *” was a stupid thing to do (now I know :wink: ), but I could not find WHAT was causing such a mess with /var and started to get >:(

This thread would have been more usefull if indeed I could have provided more details, but the fact is I still don’t know. I’m not even so sure that it’s the kernel-writing-logs which is the cause the /var overfload.
But in both crash cases, the last system changes were the installation of kernel 2.6.31.12.

At least, this warn other users that a kernel update might cause hours (worked on this for 2 days) of :confused::eek:

BTW, is there a tool to monitor a directory and warn if something is growing monstrously (or the space left is going down to a given limit) ?

Some logs go to /var.

Not sure why you have /var on a short leash in it’s own partition. :sarcastic:

That’s how I understand it :slight_smile:
But again, I’m not really sure a log file was the cause (I haven’t seen a really big file in there, or I have missed it twice.)

It was a long time ago, when I first installed Suse 8.x
I followed the guideline that recommended to use separate partitions for some mount points if the disk size allowed it. /var was one of them.
Since then, I kept the same disk structure to avoid resizing and reformating partitions.
Is 2.9Gb for /var to small for today usage ? (OpenSuse is installed here on 2 SCSI HD of 18Gb)

I have just been looking at my 11.1 system and with the system more or less idle my /var is a little over 5GB with 4.9GB in /var/lib.

It looks as if how much space you need will depend upon what you install and how you use the system. It would appear that all sorts of things write to /var. On my system it contains the spool. a few big print jobs could easily eat 1gb

You probably need to look at the transient use of /var such as spool etc.

I’ve just had the “O byte left in /var” problem again, and it was with kernel 2.6.31.8 (so a kernel update to 2.6.31.12 was NOT the cause).

This time, I had the chance to spot that /var/log/kdm.log was very big, and I deleted it. After a reboot, the /var error message was gone.

I don’t know for sure that kdm.log is THE (only) culprit in full-filling /var , but at least its size should be controlled somehow.
There is no file like “kdm” in /etc/logrotate.d (?)
What is the name to use and what does the file should look like (when installed by openSUSE) ?

There are a couple of caches in /var. Do you do a lot of printing?

default size of /var if on it’s own partition is 1.5GB and it has been this low value for over 8 years. Having said this, 2.6.x kernels recommend a minimum of 3GB but a nominal of 5GB only because log rotation is done less often. I wonder what happened to /var being restricted to a maximum of 10% of free space of /, but then that of course wouldn’t work if it is on it’s own partition.

In Linux there is great freedom to configure the system in many different ways. The Distro’s tend to use this a little too much to excess these days as do too many programmers of apps like MySql! The purpose of /var is for temporary storage of running processes, system logs, and some persistent storage between boots. Logs should be set with a maximum limit size for each such that none can expand too large. MySql should not be storing any of it’s database stuff in /var ever. Print spooling is another matter as it may persist over many reboots till it can be run.

No.
( /var/cache/cups is empty.)

The other thing that is (stupidly) floading my HD is the orbit thing; but that’s in /tmp on another partition. (see here).

Yes, and I’m wondering why the openSUSE installation didn’t add a config file for kdm in etc/logrotate.d/
Do you have one you can share with me (or advice on the parameters to use) ?

IMHO you should solve the problems that cause the extreme logging. Something should be throwing errors on the real cause.

The number of lines added per session is not that huge (right now the file is about 7k), but over the time, the file can grow up to over 1 Gb.

I’m trying to solve the errors reported (see Help to fix errors in kdm.log - openSUSE Forums ), but I fear that even without any errors or warnings to report, the kdm.log file will still grow forever.

Edit: BTW, I’ve added a kdm file in /etc/logrotate.d with these settings (Hope that’s enough and will do the trick.) :


/var/log/kdm.log {
weekly
size=+1024k
notifempty
missingok
}

Sorry, this is something for my todo lists. I planned to make all sorts of maintenence scripts which somehow over the years plain disappeared. fixlog %1 %2 from Linux base CLI of 1999 which allowed you to specify the log to fix in %1 and the first date to keep or size in Kbytes to keep in %2. I remember giving it ‘fixlog * today’ to clean all old log entries in old logs before upgrading a kernel so as to have clean test of how the new kernel worked.

How true! Can you answer how you would solve this log entry?
[28345] unknown signal raised during ep4x process at start

It appears sometimes during kdm init, sometimes during a hot mount of USB stick, sometimes during xterm shutdown and maybe other times too. It doesn’t seem to be a show stopper but may show up 3 to 300 times during a session. My solution was to switch from 11.2 64bit to 11.2 32bit with all the same stuff in use and no thrown errors.

Once again, my /var partition has been killed. It growed suddenly from 16 Kb to several hundreds of Mb in a few days.
This time, I can see what’s floading it: kdm.log has A LOT of this:


Mesa 7.6 implementation error: Invalid datatype in _mesa_convert_colors
Please report at bugzilla.freedesktop.org
Mesa 7.6 implementation error: bad datatype in interpolate_int_colors
Please report at bugzilla.freedesktop.org

The trick with logrotate didn’t prevent the overflow. What can I do ??