monitoring memory usage

I’m having troubles with memory.
someone >:( is wasting so much memory that my systems hangs.

The system may be up with no problems during more than a month but in other cases it hangs twice a day.
When the hangouth happens I can see in the logs (before restarting with the reset button, because when it happens I can’t log in) that some processes (not always the same) are calling oom-killer
So what I want is to monitorice lets say the top 10 memory user processes any moment.
One option can be using top in a cron script.

There is any better option?

regards

On 05/16/2013 02:26 PM, fperal wrote:
> There is any better option?

have a look at atop…

it includes a (easy) way to take a ‘snapshot’ every x or xx
minutes…plus it looks at the networking so if maybe the problem
comes from there, you will have a chance to see it…


dd

If you’re running KDE, you can launch KDE Monitor by CTL-EXC

From within a console, I’ve installed (if necessary) and run
top
htop

If running KDE, you can deploy any of a number of system monitoring widgets to your Desktop, I run one called “Memory Status” which doesn’t list the individual processes but displays overall usage for RAM and Swap.

HTH,
TSU

I think It may work.

Searching information about atop I’ve found this linux - Unpredictable memory explosions - Unix & Linux Stack Exchange
which is the same problem I’m having. I’ve installed atop and atop-daemon and I’ve activated the daemon as a system service.
In my system the log isn’t at /var/log/atop.log but at /var/log/atop/atop_20130516

I’ll let you know what I find.

regards

On 05/16/2013 07:36 PM, fperal wrote:
> I’ll let you know what I find.

i do not want to raise your paranoia to the level of mine, but you do
know, don’t you that there are things called “rootkits” which (if you
let them in) can hide in memory and do stuff you probably wish they
didn’t do…

there has been lots of chatter of late of some pretty nasty
exploits…are you running Apache?

anyway, unless you are certain your system is clean, it is a
possibility on both yours AND the one in the page you cited…

btw, i never heard of “Unpredictable memory explosions” before…how
good is your security…

and by the way, restarting with the reset button is not a good move.
can you not go to Ctrl+Alt+F1 and log in as root?

if not, can you also not do the magic SysRq keys trick?


dd
http://tinyurl.com/DD-Caveat

No good enough, for sure.

I’ve rkhunter and chkrootkit installed but i guess that if some rootkit has entered my system those programs may be the first to be attacked. I guess the way to check the system should be booting from a live CD and then check the filesystem. Do you recomend any tools?

On 05/19/2013 11:36 AM, fperal wrote:
>
> I’ve rkhunter and chkrootkit installed but i guess that if some rootkit
> has entered my system those programs may be the first to be attacked.

i believe those tools are most useful if they are installed and run
on a fresh, known secure system… and, all false positives
identified/dealt with…and the initial results saved off machine for
later comparison when there has been a reason to suspect compromise…

but, i’m no security expert

> I
> guess the way to check the system should be booting from a live CD and
> then check the filesystem. Do you recomend any tools?

booting from a clean live CD, and use it to check the md5sum (or
similar) of the critical files resident in the suspect system is said
to be useful…

but, i am NOT the best one to ask!

and your question buried deep in a thread named “monitoring memory
usage” is not likely to attract the most qualified around
here…but, you can go with it a while, and see…or you could ask
that question in a different thread…

also, you might do some reading in the openSUSE world and find a
multitude of really good answers, from a wide variety of folks:

https://www.google.com/search?q=site%3Aopensuse.org+rootkit+detect+OR+find+OR+remove

https://www.google.com/search?q=site%3Aopensuse.org+rootkit+detect+find+remove+clean

or you may want to reconstruct a similar search string of your choice…

or do an outside of openSUSE search, like here:

https://www.google.com/search?q=Linux+rootkit+detect+find+remove+clean
there are many thousands of opinions on the “best way” to detect/deal
with rootkits once in the system…personally i try to concentrate
on keeping them out

[so far, knock on wood, i’ve not had to remove one…]


dd
http://tinyurl.com/DD-Caveat

after some time with atop daemon working, We’ve got anoter crash.After it


root@tutatis:/etc/init.d> atop -r /var/log/atop/atop_20130523 -b 23:05 -cC

ATOP - tutatis            2013/05/23  23:09:57              600 seconds elapsed
PRC | sys  10m11s | user   7.10s | #proc    241 | #zombie    0 | #exit    164 |
CPU | sys    102% | user      1% | irq       0% | idle    295% | wait      2% |
cpu | sys    100% | user      0% | irq       0% | idle      0% | cpu002 w  0% |
cpu | sys      1% | user      1% | irq       0% | idle     96% | cpu001 w  2% |
cpu | sys      0% | user      0% | irq       0% | idle     99% | cpu003 w  0% |
cpu | sys      0% | user      0% | irq       0% | idle    100% | cpu000 w  0% |
CPL | avg1   3.01 | avg5    3.04 | avg15   3.05 | csw  1282422 | intr 1205684 |
MEM | tot    3.4G | free    2.4G | cache 161.0M | buff    2.3M | slab  515.3M |
SWP | tot   20.0G | free   20.0G |              | vmcom   1.0G | vmlim  21.7G |
PAG | scan   7994 | stall      0 |              | swin     126 | swout    592 |
DSK |         sda | busy      2% | read    1754 | write   1280 | avio    4 ms |
DSK |         sdb | busy      2% | read     587 | write   1326 | avio    5 ms |
DSK |         sdc | busy      2% | read    1683 | write    697 | avio    4 ms |
DSK |         sdd | busy      1% | read     234 | write    750 | avio    8 ms |
NET | transport   | tcpi    1411 | tcpo    1441 | udpi     777 | udpo     783 |
NET | network     | ipi     2200 | ipo     2461 | ipfrw      0 | deliv   2200 |
NET | eth0     0% | pcki    1513 | pcko    2072 | si    1 Kbps | so   27 Kbps |
NET | lo     ---- | pcki     240 | pcko     240 | si    0 Kbps | so    0 Kbps |
NET | vmnet1 ---- | pcki       0 | pcko      79 | si    0 Kbps | so    0 Kbps |
NET | vmnet8 ---- | pcki       0 | pcko      79 | si    0 Kbps | so    0 Kbps |

  PID  CPU COMMAND-LINE                                                  1/21  
 2528 100% /usr/sbin/smbd -D -s /etc/samba/smb.conf
23366   2% /usr/bin/knotify4
12063   0% <hwclock>
12028   0% <python>
12035   0% <wget>
12027   0% <python>
 4562   0% /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysq
23041   0% beagled /usr/lib/beagle/BeagleDaemon.exe --replace --bg
12151   0% <python>
12154   0% <python>
23051   0% /home/fperal/.dropbox-dist/dropbox
   34   0% kswapd0
 4780   0% /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=OutgoingRunner
 4781   0% /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=VirginRunner:0
 4776   0% /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=BounceRunner:0
 4145   0% /usr/sbin/irqbalance
 4657   0% /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid
 4775   0% /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=ArchRunner:0:1
 4779   0% /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=NewsRunner:0:1
 4777   0% /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=CommandRunner:

so it seems samba daemon is swallowing all the CPU… near 0h at night, when nobody is working

On 05/24/2013 10:26 AM, fperal wrote:
> it seems samba daemon is swallowing all the CPU

well, hmmmmm…it sure doesn’t look to be a memory usage problem…at
least not for this freeze (tell me, was this the last of the atop
log? that is, did it freeze the whole machine solid and no more logs
were collected?)

samba, that is something to do with windows networking right?

suggest you start a new thread with “samba hogs cpu” (or something)
in the subject line, and:

  • describe the symptoms of the problem

  • show the full atop output (between code tags as before)

  • describe your network (how many of what kind of Windows in use)

  • what might those Win machines be doing? (are any allowed to
    automatically phone home and update? are any doing that at 11 PM?)

  • do you have any scripts/cron running at 11 at night (maybe a backup
    or AV scan is scheduled on the Win machines, or on the openSUSE box??
    or??)

  • give the oS and DE versions

  • reference this thread by this pointer:
    http://forums.opensuse.org/showthread.php?t=487006

WAIT! now i reread the first post and see “some processes (not always
the same) are calling oom-killer” and now think maybe you need to let
it run some more, and see if samba kills it over and over, OR is it
some other process the next time, and the next…and, tell us if you
find evidence of oom-killer associated with this (or any future freeze)

strange!!


dd
http://tinyurl.com/DD-Caveat

On 2013-05-24 11:22, dd wrote:
> On 05/24/2013 10:26 AM, fperal wrote:
>> it seems samba daemon is swallowing all the CPU
>
> well, hmmmmm…it sure doesn’t look to be a memory usage problem…at
> least not for this freeze (tell me, was this the last of the atop log?
> that is, did it freeze the whole machine solid and no more logs were
> collected?)

Well, samba is not displayed at the other table. I’m not familiar with
atop myself, I don’t know how to tell it to log the top memory users. It
could be just chance that samba was at 100% when the “photo” was taken.

Maybe I should take a look at atop myself.

> samba, that is something to do with windows networking right?
>
> suggest you start a new thread with “samba hogs cpu” (or something) in
> the subject line, and:

Not yet…

> WAIT! now i reread the first post and see “some processes (not always
> the same) are calling oom-killer” and now think maybe you need to let it
> run some more, and see if samba kills it over and over, OR is it some
> other process the next time, and the next…and, tell us if you find
> evidence of oom-killer associated with this (or any future freeze)

No, no. When the kernel finds the system is out of memory, it starts
killing processes like mad - and not necessarily the culprit. It may
kill many other processes instead, so the runaway one gets even more
memory, till finally this one is killed, or the system crashes. And even
if it doesn’t crash immediately, so many needed processes have been
killed that it either crashes or you have to reboot yourself.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

That may be what is happening. When the system fails I can’t do a remote login (I do it by ssh)
but sometimes I can do a console login and it works but i try to kill some processes (kill -9) and they don’t die. Even I try to do a halt or reboot and the system seems to try (it says "sending al the processes TERM signal) but in fact it does not halt so I have to turn power off. Last time I tried to use sysreq, but it didn’t work, and after rebooting (reset button) I saw that /proc/sys/kernel/sysrq was 0 (I’ve solved it for the next time)

Yes, samba is the daemon who acts as a windows server. In my case it’s acting as a windows primary domain controller for the network

I think I had an error.


ATOP - tutatis            2013/05/23  21:39:58              600 seconds elapsed
PRC | sys  10m11s | user   8.27s | #proc    241 | #zombie    0 | #exit     97 |
CPU | sys    102% | user      1% | irq       0% | idle    295% | wait      1% |
cpu | sys    100% | user      0% | irq       0% | idle      0% | cpu002 w  0% |
cpu | sys      1% | user      0% | irq       0% | idle     99% | cpu000 w  0% |
cpu | sys      1% | user      1% | irq       0% | idle     99% | cpu003 w  0% |
cpu | sys      0% | user      0% | irq       0% | idle     98% | cpu001 w  1% |

the system is a quad CPU, so ONE CPU is 100%, the other three cpus are almost idle.

But I’m looking at atop.
I know system stop working some time between 23.05 and 00.05 (I’ve a script wich connect with my home computer and sets a log entry each hour, last one was 23.05)
I’ve tried


root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 22:30 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 21:30 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 15:30 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 15:10 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 15:00 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 10:04 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 10:10 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 09:00 -cC
root@tutatis:/var/log/atop> atop -r /var/log/atop/atop_20130523 -b 08:00 -cC

an some more test points. The result is that from 14:00 tho the moment of system crash smbd is using 100% of a CPU.
I’ve tested before that moment and the day before at different hours, and smbd isn’t using that much CPU.
I would like to don a CPU usage vs time for smbd process but I havǹt seen the way to do it (still researching)

On 2013-05-24 15:56, fperal wrote:

> an some more test points. The result is that from 14:00 tho the moment
> of system crash smbd is using 100% of a CPU.

That’s a lot of time.

You might use strace or ltrace to find out what samba is doing exactly.
You attach it to the PID of the process, and set to log to a file for a
minute or so. Then read the log…

Test the procedure when samba is working normally. Check their man pages
to find the exact command line to use.

ltrace - A library call tracer
strace - trace system calls and signals

I’m not sure which one is best in this case. Perhaps both.

You could leave two text terminals opened in advance, in the text
consoles, running “top” (as user, not root). In one of them you type “M”
to sort by memory usage. If you can get a look when it crashes, it might
show which is the runaway process. Even if the system crashed, sometimes
it is still possible to switch from one console to the other. Going from
graphical mode to text consoles (ctrl-alt-f1) might not work at that
moment, or take too long.

I said running as user, for safety in case somebody comes by the
machine. But running it as root would allow “top” to be used to kill any
process directly.

> I’ve tested before that moment and the day before at different hours,
> and smbd isn’t using that much CPU.
> I would like to don a CPU usage vs time for smbd process but I havǹt
> seen the way to do it (still researching)

Have a look at the package acct, I think it is just for that. No graphs,
it is a classic cli daemon and app.

If you install it, run “info accounting”, and then look at the preface,
it is even funny. Ah, if you don’t like the “info” interface (I don’t),
install “pinfo”.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)