RAM is being used for something undetectable, leading to system crash.

I have an unananswered stackexchange question for this: https://unix.stackexchange.com/questions/635411/cant-find-out-what-hogs-memory-how-to-troubleshoot-free-top-htop-ps
I’ve been living with this issue for several months already. If I’m actively using computer I’m forced to restart at least once a day.
Here are the details copy paste from the stackexchange question:

** Summary**

When I start my linux machine it eats about 3.7 Gigs of memory.
By the end of the day, after I close everything I used for work and then close even everything else, even stuff that was actually running in the morning, when it was using 3 Gigs, I have about 9 GiB used, excluding buffers and cache.

Looking at processes doesn’t show anything suspicious. The memory is not visibly used by any of them.


-> free -h
              total        used        free      shared  buff/cache   available
Mem:           31Gi       9.1Gi       5.4Gi        78Mi        16Gi        21Gi
Swap:            0B          0B          0B


-> sudo ps --no-headers ax -o rss | awk '{rss += $1} END {print rss}'
3900356

Sum of all processes shows 3.7 GiB - exactly what I would expect, given how it started in the morning.

Memory is really “used” by something, I tried to claim the remaining 21 GiB, and as soon as I fill them up, system becomes unresponsive and I have to restart.

The longer computer is on, the more memory “disappears”.
And (subjectively, I haven’t verified this bit) the more active I am on the computer, the faster it happens.

How can I find the missing memory?

** Additional info**


-> uname -r
5.10.16-1-default

OS: Opensuse Tumbleweed 20210215
DE: KDE Plasma 5.21

Hi
Best to move to the latest (esp glibc upgrade) snapshot… 20210307, does the issue duplicate? Something leaking memory for sure…

Keep an eye on the output of cat /proc/meminfo to see what’s actually being used.

I have updated to 20210307, similar results.
Today I turned on my PC at 10 in the morning, and now at 20:00 I have:


-> sudo ps --no-headers ax -o rss | awk '{rss += $1} END {print rss}'
3766288


-> free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi       9.2Gi       8.9Gi        81Mi        13Gi        21Gi
Swap:             0B          0B          0B

3.6 GiB reported by ps, but 9.2 is effectively used and blocked, something dark and invisible is stealing almost 6 GiB.
Today was a calm day and I didn’t “torture” the system (I usually do heavy development work with multiple JVMs, compilation, tests, docker, etc., but not today).
here’s a screenshot of my htop sorted by memory%

https://i.imgur.com/JicOOmb.png

Hi
Disable clamd (why running a virus scan?) and see if it changes…

On Gnome I see;


ps --no-headers ax -o rss | awk '{rss += $1} END {print rss}'
6072760

free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi       3.2Gi       8.5Gi       1.0Gi        19Gi        26Gi
Swap:          1.9Gi          0B       1.9Gi

with clamd disabled and not running:


sudo ps --no-headers ax -o rss | awk '{rss += $1} END {print rss}'
3730604

free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi       8.2Gi        10Gi       145Mi        12Gi        22Gi
Swap:             0B          0B          0B

So in your case: 26GiB (available) + 5.8 GiB that ps reports = 31.8 GiB
In my case: 22GiB (available) + 3.55 GiB that ps reports = 25.55 GiB, about 6 GiB missing :expressionless:

Hi
So cat the output of /proc/meminfo to see what is consuming RAM


 ps --no-headers ax -o rss | awk '{rss += $1} END {print rss}'
3919600

free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi       1.8Gi        26Gi       664Mi       2.7Gi        28Gi
Swap:          1.9Gi          0B       1.9Gi

 cat /proc/meminfo 
MemTotal:       32683080 kB
MemFree:        27987344 kB
MemAvailable:   29727272 kB
Buffers:            7460 kB
Cached:          2717476 kB
SwapCached:            0 kB
Active:           697584 kB
Inactive:        2898808 kB
Active(anon):       3420 kB
Inactive(anon):  1549788 kB
Active(file):     694164 kB
Inactive(file):  1349020 kB
Unevictable:      634004 kB
Mlocked:              64 kB
SwapTotal:       1972384 kB
SwapFree:        1972384 kB
Dirty:               196 kB
Writeback:             0 kB
AnonPages:       1478436 kB
Mapped:           457448 kB
Shmem:            681752 kB
KReclaimable:     127984 kB
Slab:             237264 kB
SReclaimable:     127984 kB
SUnreclaim:       109280 kB
KernelStack:       15936 kB
PageTables:        22676 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18313924 kB
Committed_AS:    6372336 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       79380 kB
VmallocChunk:          0 kB
Percpu:             3136 kB
HardwareCorrupted:     0 kB
AnonHugePages:    413696 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      335228 kB
DirectMap2M:    33032192 kB

You appear to be asking why about 9 GB of RAM is “in use” after the system has been up for 10 hrs.
The simple answer is that it’s the data that was used by apps during the 10 hrs your machine has been running which would include any apps you know had run, plus normal OS background apps and automated maintenance.

Your analysis should start with understanding how practically all modern OS run apps nowadays.
There is still a misconception from more than a decade ago that when you run an app, and when you’re no longer running the app the data the app used is automatically removed from memory.
Old OS worked that way, but nowadays OS keep the data in memory on the chance if you ran the app once before you might run the app again and might need the data reloaded into memory, so leaving the data in memory is meant to save that operation, and provide a quicker response if you need to perform the same operation.

Also,
openSUSE has been on the leading edge of Linux distros deploying various mountpoints in RAM that used to be mounted on disk… like temporary storage for files.
Again, especially if an app did something that required temporarily calculating or downloading a file for use, it’s stored in RAM for quick retrieval instead of on disk which could require spinning up metal for accessing.

The bottom line is that no one should jump to conclusions that something is amiss when you see RAM being consumed, that happens by design(although there is always the possibility of poorly written code causing a memory leak).
Instead, you should be sensitive to your system’s performance and responsiveness.
Should your system use so much RAM that your unallocated RAM is squeezed, your system should automatically age and purge used RAM using its own algorithm, and free up memory blocks and clusters as needed.

Additionally,
If you want to use the Free tool for memory access, I posted to my Wiki how to use the tool, it’s very easy to misinterpret what you see
https://en.opensuse.org/User:Tsu2/free_tool

HTH,
TSU

Thanks, I’ve read the wiki you linked, it seems to me I’m understanding the output the way you describe there already.
The drop caches part is interesting, even though in my case the “missing” memory isn’t visibly in cache. I’ll do some testing later.
Intriguingly, while you say “Shared” should not contain a value, on my system it does.

As mentioned in the original post this is not what happens on my system, the “missing” RAM never becomes available, the system just runs out of RAM and crashes at some point instead of reclaiming the memory.

Finally remembered to try that free caches command, unfortunately it only just freed the cache (as was kinda expected), the used, “missing” memory remained used and is still missing :frowning:

https://en.wikipedia.org/wiki/DTrace