[TTM] kernel out of memory, chapter 2

i have an hp z600, 2 x X5650, 12gb memory and 12.3. generic install
Linux acan.site 3.7.10-1.1-desktop #1 SMP PREEMPT Thu Feb 28 15:06:29 UTC 2013 (82d3f21) x86_64 x86_64 x86_64 GNU/Linux

on any given day, the system is running firefox with a few tabs, chromium,
thunderbird, three terminal sessions and at least one vmware guest. periodically,
i run some scripts that make use of ghostscript. this same scenario, without
chromium, has worked well for some time on an hp xw8400, 11.3 and only 8gb memory.
the major application difference is the version of firefox. the 11.3 box still
has firefox 3.something. vmware changed from v7 to v9.

this was tested under 12.2 and after about three days of up time, i would get
[TTM] kernel out of memory faults and applications would crash. the same thing
is occurring under 12.3.

i have uncovered two details. when firefox has been up for a couple of days,
the page cache becomes quite large. 6.5GB. killing firefox will return a big
chunk to the free pool.

the other detail is that when ghostscript begins to do its heavy lifting, the page
cache jumps up quickly to over 7GB (without firefox running.) if firefox, vmware
and the gs job are running at the same time, the page cache grows to over 9.5GB,
free mem drops to about 100MB and the kernel runs out of memory.

one change i’ve made is to set vm.swappiness to 80. this has allowed the system
to stay up slightly longer without the kernel memory faults. the downside to this
change is the system response time becomes horrible. the cursor freezes, copying
files takes much longer, etc.

now for the questions.
there is a good deal written by red hat about adjusting proc/sys/vm/pagecache. there
are three values to adjust. however, opensuse does not have the pagecache file. is
this due to different kernel versions? is there a different parameter used by
opensuse?

is adding more physical memory likely to produce any significant difference? other
than deplete my wallet. have the parts and pieces that make up 12.3 become so memory
intensive that 12GB is really not enough? due to the design of this system, i would
have to replace all six 2gb sticks with 4gb sticks.

should i investigate compiling a custom kernel? i used to do that all the time when
slackware was my system of choice. i’ve never attempted it with opensuse.

any other suggestions? anything specific system details you need to see?

So, you could open up a terminal session and type in free and post the results here. Here what I get on my PC.

free
             total       used       free     shared    buffers     cached
Mem:      16419556    4446920   11972636          0    1364992    2000016
-/+ buffers/cache:    1081912   15337644
Swap:     16769016          0   16769016

You can always add in more SWAP space, I went in with an equal size swap to main memory for instance. You should not run with no swap space.

I have a blog on the subject of SWAP you can read here: Setting up the Proper Size SWAP File in openSUSE - Blogs - openSUSE Forums

It is also a good idea to try a newer kernel. You can get 3.8.10 at kernel.org today. Have a look at my blog on the subject here: openSUSE and Installing New Linux Kernel Versions - Blogs - openSUSE Forums

Thank You,

18gb swap. i use the 1.5 x physical memory rule.

i had to reboot to be able to post this.


             total       used       free     shared    buffers     cached
Mem:      12309196    3226864    9082332          0      66288    2305096
-/+ buffers/cache:     855480   11453716
Swap:     18876412          0   18876412

when the meltdown occurs, i’ve been watching vmstat and /proc/meminfo. i can see it swapping, a little. the last time, i was just sitting there watching. the page cache just kept climbing till it got to 10-something GB. then X fritzed out and i was dumped to a login prompt.

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 9082096  66292 2305176    0    0   491    11   63  227  1  0 94  5  0

Every 5.0s: head -n 18 /proc/meminfo                                            Fri Apr 26 18:02:45 2013

MemTotal:       12309196 kB
MemFree:         9008872 kB
Buffers:           69100 kB
Cached:          2258144 kB
SwapCached:            0 kB
Active:          1030616 kB
Inactive:        2011044 kB
Active(anon):     714132 kB
Inactive(anon):    25868 kB
Active(file):     316484 kB
Inactive(file):  1985176 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      18876412 kB
SwapFree:       18876412 kB
Dirty:               120 kB
Writeback:             0 kB
AnonPages:        714628 kB


what i failed to catch was the size of the dirty and writeback buffers. the system is on a sata drive. maybe if i put it on a sas drive…

i’ll look into the newer kernel. but i moved kernels when i moved from 12.2 to 12.3 with no change.

So like most problems, is it hardware, OS or Application that is at fault? You did change from kernel 3.2 to 3.7, but it is so easy to upgrade to kernel 3.8.10, I would just do it to help decide the next move in my opinion.

Thank You,

On 04/26/2013 07:46 PM, jdmcdaniel3 wrote:
>
> ewhite20;2551532 Wrote:
>> 18gb swap. i use the 1.5 x physical memory rule.
>>
>> i had to reboot to be able to post this.
>>>
> Code:
> --------------------
> > >
> > total used free shared buffers cached
> > Mem: 12309196 3226864 9082332 0 66288 2305096
> > -/+ buffers/cache: 855480 11453716
> > Swap: 18876412 0 18876412
> >
> --------------------
>>>
>>
>> when the meltdown occurs, i’ve been watching vmstat and
>> /proc/meminfo. i can see it swapping, a little. the last time, i was
>> just sitting there watching. the page cache just kept climbing till
>> it got to 10-something GB. then X fritzed out and i was dumped to a
>> login prompt.
>>
>>>
> Code:
> --------------------
> > > procs -----------memory---------- —swap-- -----io---- -system-- -----cpu------
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 1 0 0 9082096 66292 2305176 0 0 491 11 63 227 1 0 94 5 0
> >
> > Every 5.0s: head -n 18 /proc/meminfo Fri Apr 26 18:02:45 2013
> >
> > MemTotal: 12309196 kB
> > MemFree: 9008872 kB
> > Buffers: 69100 kB
> > Cached: 2258144 kB
> > SwapCached: 0 kB
> > Active: 1030616 kB
> > Inactive: 2011044 kB
> > Active(anon): 714132 kB
> > Inactive(anon): 25868 kB
> > Active(file): 316484 kB
> > Inactive(file): 1985176 kB
> > Unevictable: 0 kB
> > Mlocked: 0 kB
> > SwapTotal: 18876412 kB
> > SwapFree: 18876412 kB
> > Dirty: 120 kB
> > Writeback: 0 kB
> > AnonPages: 714628 kB
> >
> >
> --------------------
>>>
>>
>> what i failed to catch was the size of the dirty and writeback
>> buffers. the system is on a sata drive. maybe if i put it on a sas
>> drive…
>>
>> i’ll look into the newer kernel. but i moved kernels when i moved
>> from 12.2 to 12.3 with no change.
>
> So like most problems, is it hardware, OS or Application that is at
> fault? You did change from kernel 3.2 to 3.7, but it is so easy to
> upgrade to kernel 3.8.10, I would just do it to help decide the next
> move in my opinion.

There definitely seems to be a memory leak in something. I would use top and
monitor the VIRT setting. That will show if it is an application. By the time
you get to the point where you are swapping, any offending process should be
very visible.

Leaks from kernel components will not show up in top, but there is a diagnostic
facility that kernel developers use to look for leaks. It is not 100% foolproof,
but my system generally shows no leaks in the kernel no matter how long it has
been running. Yours has to be a major leak.

from a strictly application perspective, the top two memory users are firefox and vmware-vmx. they are running at 3 to 4.5%. the vmware-vmx is a 1gb winxp. top virtual (VIRT) user is almost always systemd.

Tasks: 364 total,   1 running, 363 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.2 sy,  0.0 ni, 99.2 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  12309196 total,  7512812 used,  4796384 free,   165416 buffers
KiB Swap: 18876412 total,        0 used, 18876412 free,  5160596 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND                                   
 2780 ewhite    20   0 1513m 337m  52m S  10.6  2.8  16:49.76 firefox                                   
 1598 ewhite    20   0 2006m 508m 486m S   3.3  4.2   0:30.53 vmware-vmx                                
 1280 root      20   0  245m 112m  34m S   3.0  0.9   3:52.67 Xorg                                      
  165 root      20   0     0    0    0 S   0.3  0.0   0:03.70 kworker/4:1                               
 2507 ewhite    20   0  537m  72m  33m S   0.3  0.6   0:19.32 kwin                                      
 2578 ewhite    20   0  505m  34m  21m S   0.3  0.3   0:22.18 konsole                                   
 3278 ewhite    20   0 12592 1604 1116 S   0.3  0.0   0:14.62 watch                                     
    1 root      20   0 45944 4704 2144 S   0.0  0.0   0:01.65 systemd                                   
    2 root      20   0     0    0    0 S   0.0  0.0   0:00.00 kthreadd                                  

the other possibility is ghostscript. i use it to convert ps to pdf. gs will use 100% cpu and as much memory as it can get. gs is a mature product, but it could still have some leaks.

how does one monitor the kernel for memory leaks?

On 04/27/2013 11:26 AM, ewhite20 wrote:
>
> lwfinger;2551555 Wrote:
>> On 04/26/2013 07:46 PM, jdmcdaniel3 wrote:
>>>
>> There definitely seems to be a memory leak in something. I would use
>> top and
>> monitor the VIRT setting. That will show if it is an application. By
>> the time
>> you get to the point where you are swapping, any offending process
>> should be
>> very visible.
>>
>> Leaks from kernel components will not show up in top, but there is a
>> diagnostic
>> facility that kernel developers use to look for leaks. It is not 100%
>> foolproof,
>> but my system generally shows no leaks in the kernel no matter how long
>> it has
>> been running. Yours has to be a major leak.
>
> from a strictly application perspective, the top two memory users are
> firefox and vmware-vmx. they are running at 3 to 4.5%. the vmware-vmx
> is a 1gb winxp. top virtual (VIRT) user is almost always systemd.
>
>
> Code:
> --------------------
> Tasks: 364 total, 1 running, 363 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.4 us, 0.2 sy, 0.0 ni, 99.2 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 12309196 total, 7512812 used, 4796384 free, 165416 buffers
> KiB Swap: 18876412 total, 0 used, 18876412 free, 5160596 cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2780 ewhite 20 0 1513m 337m 52m S 10.6 2.8 16:49.76 firefox
> 1598 ewhite 20 0 2006m 508m 486m S 3.3 4.2 0:30.53 vmware-vmx
> 1280 root 20 0 245m 112m 34m S 3.0 0.9 3:52.67 Xorg
> 165 root 20 0 0 0 0 S 0.3 0.0 0:03.70 kworker/4:1
> 2507 ewhite 20 0 537m 72m 33m S 0.3 0.6 0:19.32 kwin
> 2578 ewhite 20 0 505m 34m 21m S 0.3 0.3 0:22.18 konsole
> 3278 ewhite 20 0 12592 1604 1116 S 0.3 0.0 0:14.62 watch
> 1 root 20 0 45944 4704 2144 S 0.0 0.0 0:01.65 systemd
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
> --------------------

Your systemd memory usage is less than on my system, and I only have 3 GiB. BTW,
systemd is only using 45944 KB. I am running Firefox with 5 tabs open, and it is
using 938 MB.

> the other possibility is ghostscript. i use it to convert ps to pdf.
> gs will use 100% cpu and as much memory as it can get. gs is a mature
> product, but it could still have some leaks.

It should only use available RAM, and not use virtual memory above that. Of
course, I have never tested with more than 4 GiB RAM. There may be a bug that
only shows when there is more than
>
> how does one monitor the kernel for memory leaks?

You need to generate your own kernel and turn on the KMEMLEAK configuration
variable. It adds some overhead for both memory usage and CPU, thus it is not
turned on in a standard kernel.

I have occasional kernel memory allocation failures due to fragmentation and
badly written drivers, but that is rare. My current free output and uptime (~41
hours) are as follows:


finger@larrylap:~> free
total       used       free     shared    buffers     cached
Mem:       2880172    2742232     137940          0     164564     719084
-/+ buffers/cache:    1858584    1021588
Swap:      4194300      36892    4157408
finger@larrylap:~> cat /proc/uptime
84259.70 148818.23
finger@larrylap:~>