openSUSE 12.3 Memory Issue

MikeCharlie · May 23, 2016, 4:32pm

Hi All,

I have a server with 4 GB of RAM that sits on a VMWare server. I ran free and got the following:

             total       used       free     shared    buffers     cached
Mem:       4057460    3793888     263572          0     148332    3228416
-/+ buffers/cache:     417140    3640320
Swap:      2103292     137072    1966220

However if I run *top *I see the following:

Tasks: 150 total,   1 running, 148 sleeping,   0 stopped,   1 zombie
%Cpu(s):  6.9 us,  3.0 sy,  0.0 ni, 89.9 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem:   4057460 total,  3800184 used,   257276 free,   148348 buffers
KiB Swap:  2103292 total,   137072 used,  1966220 free,  3233844 cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ COMMAND
 1640 mysql     20   0 1452m  85m 4460 S  2.7  2.2 101:18.58 mysqld
 2040 splicec+  20   0  932m  50m  12m S  0.0  1.3   0:27.68 plasma-desktop
 2084 splicec+  39  19  324m  37m 2224 S  0.0  0.9   1:02.57 virtuoso-t
 3365 splicec+  20   0  523m  29m 4492 S  0.0  0.7  13:43.03 kscreenlocker_g
 1285 root      20   0 92964  26m 2128 S  0.7  0.7  71:29.22 visiond
  265 root      20   0  221m  25m  25m S  0.0  0.6   0:03.84 systemd-journal
 1677 root      20   0  172m  22m 5792 S  0.3  0.6  12:35.46 Xorg
 1459 root      20   0  132m  14m 8696 S  0.0  0.4   0:20.79 httpd2-prefork
 2034 splicec+  20   0  877m  11m 3204 S  0.0  0.3   0:00.76 knotify4
 1096 root      20   0 12628 9480 1500 S  0.3  0.2  19:03.38 sqllog
 2068 splicec+  39  19  678m 9472 2236 S  0.0  0.2   0:00.86 nepomukservices
 2076 splicec+  20   0  563m 9328 2408 S  0.0  0.2   0:00.35 kmix
 1929 splicec+  20   0  828m 8692 4424 S  0.0  0.2   0:20.44 kded4
 1948 polkitd   20   0  361m 8656 2536 S  0.0  0.2   0:02.90 polkitd
 2070 splicec+  20   0  670m 8280 2180 S  0.0  0.2   0:00.70 krunner
 2094 splicec+  39  19  400m 8056 2876 S  0.0  0.2   0:00.17 nepomukservices
23700 wwwrun    20   0  133m 8052 1332 S  0.3  0.2   0:12.49 httpd2-prefork
32557 wwwrun    20   0  133m 8052 1320 S  0.0  0.2   0:07.05 httpd2-prefork
25543 wwwrun    20   0  133m 8048 1328 S  0.0  0.2   0:07.16 httpd2-prefork
28698 wwwrun    20   0  133m 8032 1328 S  0.0  0.2   0:10.13 httpd2-prefork
31889 wwwrun    20   0  133m 8024 1320 S  0.0  0.2   0:05.46 httpd2-prefork
11018 wwwrun    20   0  133m 8012 1312 S  0.0  0.2   0:02.68 httpd2-prefork
 2603 wwwrun    20   0  133m 7892 1252 S  0.0  0.2   0:08.48 httpd2-prefork
17294 wwwrun    20   0  133m 7836 1260 S  0.0  0.2   0:00.50 httpd2-prefork
17640 wwwrun    20   0  133m 7828 1256 S  0.0  0.2   0:03.80 httpd2-prefork
22900 wwwrun    20   0  133m 7828 1256 S  0.3  0.2   0:03.79 httpd2-prefork
24125 wwwrun    20   0  133m 7828 1256 S  0.0  0.2   0:02.72 httpd2-prefork
 4156 wwwrun    20   0  133m 7824 1260 S  0.0  0.2   0:07.70 httpd2-prefork
 4752 wwwrun    20   0  133m 7500 1100 S  0.3  0.2   0:02.58 httpd2-prefork
 9917 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:03.86 httpd2-prefork
 9919 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:03.08 httpd2-prefork
10725 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:04.05 httpd2-prefork
15816 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:00.53 httpd2-prefork
26821 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:05.04 httpd2-prefork
26823 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:05.13 httpd2-prefork
29270 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:03.33 httpd2-prefork
32499 wwwrun    20   0  133m 7496 1096 S  0.0  0.2   0:04.58 httpd2-prefork

Can anybody explain where all my RAM is going? If I restart the server it runs at just over a GB of RAM in use. I had this with a larger setup aswell that had 8 GB of RAM and over time it did this. Can’t see any obvious memory leaks or am I missing something?

Thanks in advance

tsu2 · May 23, 2016, 4:59pm

The first thing to know is that although everyone who uses the Free tool focuses on the first line, it’s not really the line with the most relevant info, you need to look at the second line.

To understand how to use the Free tool and also if necessary run the command to purge your memory cache and buffers (generally if you find yourself switching from one set of extremely heavy load work to a completely different load) I wrote the following wiki article
https://en.opensuse.org/User:Tsu2/free_tool

You will see that you have plenty of unused memory available.

That being said, I would guess that your VMware is not reserving a sufficient amount of memory.
You didn’t mention what VMware product you’re using, but in Workstation you would go to the following and set appropriately
Edit > Preferences > Memory

Also, if you have more than one Guest running be sure to understand if you’ve over-provisioned your Guests without providing sufficient resources.

TSU

MikeCharlie · May 24, 2016, 11:12am

Thanks for the quick reply.

It is a VMWare ESXi 5.5 and I have ensured that the memory is configured correctly and the server does not over provision anything including the CPU.

I have read your document and decided I may need to run your command in the document and also ran the following:

sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
sudo swapoff -a
sudo swapon -a

This then showed me only using 700 MB of RAM when using top and also showed no swap. I checked and all the process’ that was running before was running correctly. The only thing is this server does the same process so surely your command wouldn’t work as you describe it to be able to:

manually clear the buffers and cache because of course the existing cache is only relevant to tasks which will no longer be performed.

The process it does is collect files via ftp from multiple servers. Obtain the data from these files and put them into an sql database, then a user can access the sql database via a web browser to see all the data and run reports on it and this is all it does.

Surely if it is always doing the same process it should not increase the RAM usage from 700 MB to 4 GB and then start using the Swap space? I have an idea to make the commands above automatic with a bit of logical thinking put in place to ensure it does not break the server but am curious as to what is actually causing the issue. Have you any suggestions.

MikeCharlie · May 24, 2016, 3:39pm

It has now increased to using 1.3 G when left through out the day. It is all appearing under cached. Am I getting confused as to what cached memory is?

tsu2 · May 24, 2016, 4:29pm

MikeCharlie:

Thanks for the quick reply.

It is a VMWare ESXi 5.5 and I have ensured that the memory is configured correctly and the server does not over provision anything including the CPU.

I have read your document and decided I may need to run your command in the document and also ran the following:
sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
sudo swapoff -a
sudo swapon -a
This then showed me only using 700 MB of RAM when using top and also showed no swap. I checked and all the process’ that was running before was running correctly. The only thing is this server does the same process so surely your command wouldn’t work as you describe it to be able to:

The process it does is collect files via ftp from multiple servers. Obtain the data from these files and put them into an sql database, then a user can access the sql database via a web browser to see all the data and run reports on it and this is all it does.

Surely if it is always doing the same process it should not increase the RAM usage from 700 MB to 4 GB and then start using the Swap space? I have an idea to make the commands above automatic with a bit of logical thinking put in place to ensure it does not break the server but am curious as to what is actually causing the issue. Have you any suggestions.

If your machine maintains a fairly large number of simultaneous FTP connections, then you will likely need to re-tune your system resources to support those connections. By default, like all distros openSUSE installs support a variety of hardware including small machines with minimal resources. If your workload is more Server oriented, then resources need to be freed up for your workload.

I wrote the following article a long time ago but still is applicable for all current openSUSE which describes re-sizing the TCP/IP Buffers and modifying your TCP/IP Congestion Control algorithm if your network connections are different than Fast Ethernet (wired 10/100). After you modify your network connections, be sure to verify your Server applications (like your database app) still have sufficient resources.

TSU

gogalthorp · May 24, 2016, 7:33pm

Note that memory used as cache is freeable when needed. Linux does not like unused memory and will use any available for temp storage or cache to speed access. If a program request memory cache block will be dropped in favour of program memory. So cache is not used memory it is cache memory and can for program usage be added to free memory. ie usable memory is free+cache

tsu2 · May 24, 2016, 8:49pm

Totally agree, and should work fine under normal circumstances.

The command I provide in my article is to be used only if you’re not doing a normal workload, but changing very suddenly from one extremely heavy workload to a completely different workload.

In this described case, then you can help your system by manually executing what would normally be gradually done by algorithm.

TSU

MikeCharlie · May 25, 2016, 12:58pm

tsu2:

If your machine maintains a fairly large number of simultaneous FTP connections, then you will likely need to re-tune your system resources to support those connections. By default, like all distros openSUSE installs support a variety of hardware including small machines with minimal resources. If your workload is more Server oriented, then resources need to be freed up for your workload.

I wrote the following article a long time ago but still is applicable for all current openSUSE which describes re-sizing the TCP/IP Buffers and modifying your TCP/IP Congestion Control algorithm if your network connections are different than Fast Ethernet (wired 10/100). After you modify your network connections, be sure to verify your Server applications (like your database app) still have sufficient resources.

https://sites.google.com/site/4techsecrets/optimize-and-fix-your-network-connection

TSU

I have it set to CUBIC, which to my understanding is already a high speed one. However I did do the following command for exactly 60 seconds:

tcpdump -no eth0 > /home/user/Downloads/test.pcap

This came back with a 1.3 MB file so when I change this to Kb it appears to only be using 177 Kb/s which to me in not a lot of data. The files it collects are usually in the bytes and not KB. So I don’t think this is an issue. But I did take note of it.

I am aware of the fact that this should happen however it appears not to and instead starts eating into the Swap which I have always been told is bad to happen. I imagine if I turned the swap off the server would just crash instead of freeing this cached memory which does not seem to be “free”. Is there a way to stop it from caching the memory as easy? Currently the swap is not being used and it seems to have stayed at 2 GB used so that is good. I am wondering whether the LAMP server side has some sort of memory leak that is increasing its use and not releasing correctly. Looking online isn’t bringing anything obvious but I did have an issue in the past with the server having a memory leak with the KDE desktop and believe I have resolved this (I was able to see it when running top using over 50% of the RAM).

tsu2 · May 25, 2016, 5:06pm

CUBIC is the default TCP/IP Congestion Control algorithm, and it’s only adequate for all scenarios. If you have gigabit, wireless or transfer very large files (ie FTP large files, normally web files served by http aren’t large), then there are better choices to optimize for that connection. But, as its name suggests, this setting won’t make much of a difference if some kind of congestion or lost or delayed packets isn’t an issue.

You don’t mention whether you enlarged your TCP/IP buffers which I also described in my reference. If you are running your machine as a Server with simultaneous network connections under load, you’ll almost certainly benefit from shifting resources from internal systems to networking.

Your tcpdump test is only as useful as the way you run it. So, you can run it when there is no competing traffic and interpret its results one way (eg set a baseline). But, if you’re trying to analyze a real workload, then you have to run that test when your server is under normal workload. Also, be aware that typically real throughput often is no better than about 70% at most of theoretical throughput (the number your ISP tells you). But, don’t take anything for granted. Do things like verify that full duplex is enabled and turn off auto-negotiation if you know what you’re connecting to (set accordingly). For a server, use NICs that offload loads from the CPU to the NIC.
Memory cache is not a bad thing. If you have enough RAM, yes disable swap altogether(don’t have to remove it) but only if. Otherwise, normally cached memory objects nowadays are supposed to be mostly things that are hardly accessed so should not affect your performance too much.
Depending on what webserver you’re using, you may need to tune. Apache in particular is extremely large and complex so needs tuning. If you’re running something like nginx, it’s a lot smaller, simpler and unlike Apache is not multi-threaded. Instead it operates on a “single-threaded” architecture which can be much faster. The idea is that managing all those threads uses system resources while a single threaded model can deliver same content without the management and resource usage.
Although when I recently reviewed the openSUSE community docs for tuning and found it wanting in the area regarding network tuning, I found a number of useful pieces of information tuning the machine itself
https://doc.opensuse.org/

HTH,
TSU

MikeCharlie · June 1, 2016, 4:10pm

tsu2:

CUBIC is the default TCP/IP Congestion Control algorithm, and it’s only adequate for all scenarios. If you have gigabit, wireless or transfer very large files (ie FTP large files, normally web files served by http aren’t large), then there are better choices to optimize for that connection. But, as its name suggests, this setting won’t make much of a difference if some kind of congestion or lost or delayed packets isn’t an issue.

You don’t mention whether you enlarged your TCP/IP buffers which I also described in my reference. If you are running your machine as a Server with simultaneous network connections under load, you’ll almost certainly benefit from shifting resources from internal systems to networking.

Your tcpdump test is only as useful as the way you run it. So, you can run it when there is no competing traffic and interpret its results one way (eg set a baseline). But, if you’re trying to analyze a real workload, then you have to run that test when your server is under normal workload. Also, be aware that typically real throughput often is no better than about 70% at most of theoretical throughput (the number your ISP tells you). But, don’t take anything for granted. Do things like verify that full duplex is enabled and turn off auto-negotiation if you know what you’re connecting to (set accordingly). For a server, use NICs that offload loads from the CPU to the NIC.

Memory cache is not a bad thing. If you have enough RAM, yes disable swap altogether(don’t have to remove it) but only if. Otherwise, normally cached memory objects nowadays are supposed to be mostly things that are hardly accessed so should not affect your performance too much.

Depending on what webserver you’re using, you may need to tune. Apache in particular is extremely large and complex so needs tuning. If you’re running something like nginx, it’s a lot smaller, simpler and unlike Apache is not multi-threaded. Instead it operates on a “single-threaded” architecture which can be much faster. The idea is that managing all those threads uses system resources while a single threaded model can deliver same content without the management and resource usage.

Although when I recently reviewed the openSUSE community docs for tuning and found it wanting in the area regarding network tuning, I found a number of useful pieces of information tuning the machine itself
https://doc.opensuse.org/

HTH,
TSU

Apologies for the delay.

There should be no issue with congestion or delays, the servers are connected via an MPLS framework and has been heavily tested to ensure it is not causing any issues (One of the reasons for the delay was me testing this). I have now increased and still having the same problem however it does seem to be delayed so is a step in the right direction.
The tcpdump test was run during normal day to day operation to get an accurate display of what is running. I have since run one for 30 minutes during the day at normal use and this averaged at 182 Kb/s so the test seems to suggest low traffic to it.
I tried upgrading the RAM to 6 GB and disabling Swap which give the same high cached and seemed to slug a lot more, after around an hour it froze and had to be rebooted and I have now enabled it again so this is not an option and seems to be the only thing keeping the machine going.
I am now confident that this issue is something to do with mysqld. I have seen after a reboot the RAM go all the way up to 5.8 GB used from 700 MB in the space of 60 seconds. Whilst it is doing this I can see the mysqld process in top go to high cpu, it goes from around 3-4% all the way upto over 120% (Still plenty of CPU power though). Whist this is happening I have logged into the sql process and ran

MariaDB  (none)]> SHOW FULL PROCESSLIST;

This has a query which has the Time go up to around 65 and its State is Sending data and it appears to do this every time I reboot. I have run a program I find called MySQLTuner which gave me a very long detailed output but I noticed the following:


-------- MyISAM Metrics ------------------------------------------------------------------------------------
[OK] Key buffer used: 100.0% (16M used / 16 M cached)
!!] Key buffer size / total MyISAM indexes: 16.0M/3.3G
!!] Read buffer hit rate: 86.3% (1M cached / 176K reads)
!!] Write Key buffer hit rate: 49.5% (1K cached / 975 writes)

To me this suggests it isn’t using much but I am not too sure. If it is the SQL that is eating up this much RAM would a different version be better?

gogalthorp · June 1, 2016, 6:14pm

Most SQL backends eat memory like candy. Maybe try maria in that the lest mods would be needed. Also depends on the type of queries and indexes made size of tables etc. It can get quit complex.

crashaku · June 1, 2016, 7:21pm

Just chiming in with hopefully something helpful… Here is a decent resource on the matter: http://www.linuxatemyram.com/

tsu2 · June 1, 2016, 7:45pm

This article points out the same I tried to describe in my “Free” wiki article… That the <second> line is most relevant to properly accessing RAM availability… Not the first line most people will naturally look at.

I haven’t managed MySQL for awhile,
But it’s common for all relational databases to execute “housekeeping” tasks on a daily basis, and it may be important that these tasks be done during off hours. These are normally also done during an initial startup of the service, so it may not be accurate to measure resource usage immediately and for awhile after startup. See MySQL/MariaDB tuning guides for more details.

Dedicated physical connections can greatly enhance performance but the TCP/IP Congestion Control algorithm can also improve performance by enlarging the Layer 4 TCP/IP windows as I describe on this page. You may benefit by choosing an algorithm that specifically enlarges the TCP/IP window… And you may also benefit from enabling Jumbo Frames at the Layer 2 level.

In my article, I included general descriptions of various algorithms so you don’t have to go elsewhere to look up what each one does, there may be others besides what I described. In my article, I do describe how to list available pre-compiled algorithms included with the current kernel you’re running, anything I don’t describe can be researched.

TSU