Prevent memory leaks from crashing the system

The problem has reached a point where I’m no longer willing to let it slip, so I wish to find a solution to this ASAP. What happens precisely is: Occasionally, programs have memory leaks. Usually they tend to be harmless, as they are slow and / or don’t fill quite a lot of RAM. In a few cases however, a process fills the memory with Gigabytes of RAM in a matter of seconds, until every bit of it is taken up. When this happens, the system becomes unusable (nothing can be clicked, mouse pointer freezes, etc) and the computer needs a hard restart. This is both a major problem as well as a security risk, because such a thing can cause loss of data or put the system in danger (consider it were to happen in the middle of a Kernel update).

I was hoping that at this day, the Linux Kernel has a minimal protection against this sort of thing, and enforces a pocket within the available RAM which user applications cannot simply take up… in order to save the system if memory is filled abusively. But since it doesn’t appear to, I need to add one manually. Problem is I don’t know how exactly, and I’m hoping someone can clarify me better.

What is the best way to limit memory usage for normal processes, in order to disallow memory leaks from bringing the system down by blocking all the RAM? I’m thinking something that restricts non-root processes to only part of the total memory. For example, I have 9GB of RAM. If it were to solve those crashes, I am okay with allowing 1GB to be used only by root and system processes, whereas normal programs I run may only have access to the other 8 GB.

The right path seems to be the ulimit command and the /etc/security/limits.conf file. But ulimit seems to have a lot of parameters and memory types it addresses (which aren’t clarified either) and I’m not sure exactly what to set it to for this scenario. I basically seek the ulimit settings that require me to give up as little RAM as possible, in exchange for guaranteeing a space that memory leaks cannot touch to keep the system safe. Also, I’d prefer using percentages rather than fixed values, so I don’t have to re-configure everything if I gain or lose RAM… like for example use “90%” instead of “8000 MB”.

One clarification: I believe that in the past, I’ve heard people say that if a process of low priority has a limitless memory leak, it shouldn’t actually take down the system because the Kernel knows to handle it, so maybe something else is happening. I’ve had the problem numerous times and can confirm this is false! If any badly written program fills up all the memory in a few seconds (which I get to see in KSysGuard before the system dies) it will render the system unusable and the user has to unplug the computer and start it up again. Also, I do have a SWAP partition… and large one at that (8GB). Even so, such leaks do bring down the system.

A few ideas, but first some questions:

  1. Which programs are doing this? What are the ulimit settings for those
    programs while they are running? You can get a current process’s limits
    by looking at the proc filesystem entry for that particular pid, for example:

cat /proc/12345/limits

for process with PID# 12345.

Also, the kernel does eventually engage the out-of-memory killer, but it
doesn’t do it the first time any old process asks for more memory than the
system has available since that would be a terrible thing allowing any
process to cause the kernel to kill other processes.

It may also be useful to know if this is happening on other systems, and
it could be useful to know if all of your RAM and SWAP is already used
when you experience slowness, or if swap is still being consumed. The
system will not, as far as I know, try to kill memory hogs until it is
truly out of memory; with that said, the system will slow down
considerably depending on how much swapping and other disk access is
happening, even if it has more swap space left, or even more RAM left
since the I/O for disks can cause slowness system-wide in other areas.

How long are you waiting when things get slow before you reboot the box?
Does access to the system stop responding to other types of access besides
those using a mouse, such as SSH, or even just ping requests from another box?


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…

I have no idea what programs you are running that are so leaky as I never run
out of memory, and my systems rarely use swap except for the ones with 1 or 2 GB
RAM. Some of my systems are running for months between reboots.

The first step will be to identify the programs at fault. Open a terminal and
run ‘top -V’ in it. Keep an eye on that screen as the system runs and see if any
process is increasing its virtual memory. As to ulimit, the “-v” option is
likely the only one to set.

Once you identify a program that you think is leaking, you can use program
valgrind to find those leaks. If you have the source, then you can fix them
yourself.

Finally, make sure that you are not trying to put 10 Kg of stuff in a 5 Kg bag.
Not all mixes of programs are possible.

I don’t remember all programs by memory, since this has happened many times (even if rarely) over the last few years. Today however, an addon appears to have caused Firefox to go from 350 MB of RAM to 6.3 GB in only a few seconds, which caused a system block. I couldn’t see how much SWAP was used, but I managed to switch to KSysGuard and see this value last. For Firefox the limits you asked about are:

mircea@linux-qz0r:~> cat /proc/5149/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             71847                71847                processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       71847                71847                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
mircea@linux-qz0r:~> 

I’m not sure for other systems… at this moment I only remember for my desktop computer. Still, I used to have Windows up until two years ago, and now I only use latest openSUSE on all my machines. I can’t say I remember this happening in Win right now.

As for how long I’m waiting, I think I spent almost 15 minutes waiting for the system to recover from the crash. During this time nothing could be clicked, and only occasionally the mouse pointer would come back from being stuck and I could move it around a bit. Eventually I pressed control + alt + f1 to change runlevel, and the pointer then disappeared as the image got stuck permanently. Also, KSysGuard showed a load of processes as “disk sleep” during this period, which I see commonly when a memory leak suffocates the system.

On 07/11/2014 02:16 PM, MirceaKitsune wrote:
>
> I don’t remember all programs by memory, since this has happened many
> times (even if rarely) over the last few years. Today however, an addon
> appears to have caused Firefox to go from 350 MB of RAM to 6.3 GB in

Lose the add-on. Also, if you can share which it is, that may be useful,
perhaps with which FF version you’re on. I use FF 24x7 as my primary
browser and have never seen anything quite like that, but I only use a few
add-ons and they have worked well for along time. Still, if you have
things like the web developer stuff, those can use a ton of memory,
especially if you leave them tracing things when you do not intend to
actually do any troubleshooting/developing.

> only a few seconds, which caused a system block. I couldn’t see how much
> SWAP was used, but I managed to switch to KSysGuard and see this value
> last. For Firefox the limits you asked about are:
>
>
> Code:
> --------------------
> mircea@linux-qz0r:~> cat /proc/5149/limits
> Limit Soft Limit Hard Limit Units
> Max cpu time unlimited unlimited seconds
> Max file size unlimited unlimited bytes
> Max data size unlimited unlimited bytes
> Max stack size 8388608 unlimited bytes
> Max core file size 0 unlimited bytes
> Max resident set unlimited unlimited bytes
> Max processes 71847 71847 processes
> Max open files 1024 4096 files
> Max locked memory 65536 65536 bytes
> Max address space unlimited unlimited bytes
> Max file locks unlimited unlimited locks
> Max pending signals 71847 71847 signals
> Max msgqueue size 819200 819200 bytes
> Max nice priority 0 0
> Max realtime priority 0 0
> Max realtime timeout unlimited unlimited us
> mircea@linux-qz0r:~>
>
> --------------------

Perhaps set your max address space option before loading Firefox. . The
easiest way is probably to create a script under ~/bin (/home/mircea/bin
perhaps) and then have that call firefox after calling ulimit to limit the
address space, if not stack size, settings. Once done, just be sure to
always start Firefox with that script and see if it helps at all. Using
that much RAM is not necessarily a bad thing, but if that is not doing
something you really desire out of the browser then I’d start there.
This, though, only potentially fixes one program. You can limit all
programs started by your user via the limits.conf file as you mentioned
before, but even then that’s not going to save a system that has other
users chewing up memory. If it’s just your box, that may help. Try
limiting your user to 1 GiB as you mentioned before, and then add
exceptions as needed. If you suddenly find yourself unable to login
because KDE/Gnome needs more than that limit, you’ll need to get around
that first.

How about this as well for you:

https://www.kernel.org/doc/Documentation/sysctl/vm.txt


oom_kill_allocating_task

This enables or disables killing the OOM-triggering task in
out-of-memory situations.

If this is set to zero, the OOM killer will scan through the entire
tasklist and select a task based on heuristics to kill.  This normally
selects a rogue memory-hogging task that frees up a large amount of
memory when killed.

If this is set to non-zero, the OOM killer simply kills the task that
triggered the out-of-memory condition.  This avoids the expensive
tasklist scan.

If panic_on_oom is selected, it takes precedence over whatever value
is used in oom_kill_allocating_task.

The default value is 0.

The default on my older openSUSE box is still zero for
oom_kill_allocating_task so perhaps change that to 1 and then see if
things improve. Be sure if you change it on the fly with ‘sysctl’ that
you make it persistent before you reboot or change firewall settings or it
may be undone. Many sysctl options can be set via Yast, but it may be
easiest to just add the desired line to /etc/sysctl.conf instead.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…

Oh… if anyone is interested in that, the addon at fault is Shumway (Firefox’s upcoming builtin flash player). When opening even some simple flash files with it, FF goes from about 400 MB to +2 GB in a second. Needs to be discussed with its developers. But more than the addon, my problem is that any process with such a memory leak can bring down the whole system, which IMO is as bad as a computer virus.

And I thought about using “ulimit -x y;firefox” in a desktop shortcut, and might consider that as a permanent solution too. I actually tried this in a console already; ulimit -v 1024000 was surprisingly not enough for firefox to even start up entirely! ulimit -v 2048000 allowed it to start however, and seemed to restrict it in a pretty reasonable way.

On 2014-07-12 00:16, MirceaKitsune wrote:
>
> Oh… if anyone is interested in that, the addon at fault is ‘Shumway’
> (http://mozilla.github.io/shumway/) (Firefox’s upcoming builtin flash
> player). When opening even some simple flash files with it, FF goes from
> about 400 MB to +2 GB in a second. Needs to be discussed with its
> developers. But more than the addon, my problem is that any process with
> such a memory leak can bring down the whole system, which IMO is as bad
> as a computer virus.

openSUSE is configured assuming that users and applications are polite
to one another, so it does not limit the amount of memory they can
request, even if they ask so much as to bring the machine down. But you
can change those defaults, and limit how much memory a single process
can request. That’s up to you to decide.

In fact, there are several ways in which a plain user can kill a Linux
machine. I had a one liner somewhere… try this:


:(){ :|:& };:

It will kill your machine. O google it first >;-)

> And I thought about using “ulimit -x y;firefox” in a desktop shortcut,
> and might consider that as a permanent solution too. I actually tried
> this in a console already; ulimit -v 1024000 was surprisingly not enough
> for firefox to even start up entirely! ulimit -v 2048000 allowed it to
> start however, and seemed to restrict it in a pretty reasonable way.

I know of some apps that I learned to always start with ulimit. They are
buggy, so I take precautions.

You can also limit users, via pam.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

See http://en.wikipedia.org/wiki/Fork_bomb

Alright everyone, I have (hopefully) found the solution to my problem at last. Someone clarified me what it is I’m looking for: Memory overcommit settings. From what I read, the Kernel allows applications to overcommit memory in some cases by default… meaning they can take more than is available. Although officially the defaults should prevent system crashes like mine, they clearly don’t all the time. So after digging a bit through the Kernel settings involved, I found the parameters which will hopefully fix this problem for good:

vm.overcommit_memory = 2
vm.overcommit_ratio = 90

I added the above to sysctl.conf and applied them via sysctl command for now, those values being what I felt is best. Everything seems to be working well so far. What that basically says is that an application can only allocate 90% of the physical RAM + SWAP partition size, never more. Since I have 9GB of RAM (triple-channel before anyone asks why it’s an impair number) and 8GB of SWAP, this means 8.1 GB + 8 GB, which should be more than enough! I’ll see if such leaks grief me again after adding those Kernel settings.

A warning to everyone who considers playing with those: Never set vm.overcommit_ratio to 0! I read in several posts that the people’s machines would no longer boot after doing that. That’s because people assumed they could limit the system to 0% of the RAM because they have SWAP. But the SWAP partition isn’t activated before these settings are, so using 0 there will basically tell the system there is no available memory at startup. I like to keep people out of trouble so I thought to also include this note.

On 2014-07-12 21:56, MirceaKitsune wrote:

> Code:
> --------------------
> vm.overcommit_memory = 2
> vm.overcommit_ratio = 90
> --------------------
>
>
> I added the above to sysctl.conf and applied them via sysctl command for
> now, those values being what I felt is best. Everything seems to be
> working well so far. What that basically says is that an application can
> only allocate 90% of the physical RAM + SWAP partition size, never more.

Interesting… What’s the default value? I don’t have that settings in
my file.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

The defaults are vm.overcommit_memory = 0 and vm.overcommit_ratio = 50. 50% felt like too little while 90% felt ideal.

Ok… apparently vm.overcommit_memory isn’t all that safe to use. After approximately 2 days of uptime, the KDE screen locker eventually breaks… with an error saying “unable to connect to authentication system” or something like that, whenever I type my password and press enter to unlock. And yes, I analyzed it several times… this only happens with vm.overcommit_memory = 2 (didn’t try 1). I’m probably not going to bother bug reporting this… since it’s likely one of those things that are hard to test and only happens to me and a few other people out there. I wish Linux (especially KDE) wasn’t so buggy all the time though…

I wouldn’t mess with the memory management too much. Not that I am an expert but I found a lot of really unexpected behavior when I alter the overcommit behavior. Right now the only bit I micromanage is the vm.oom_kill_allocating_task, which makes sure a program allocating more than it’s share gets killed pretty quick.