Hello
I have a problem with the dis load since I upgraded from openSUSE Leap 42.3 to 15.0.
As for every upgrade I basically stopped the main services, changed the repos and run zypper dup. After restart and starting everything again everything seemed to work. But then I recognize in the monitoring an increase in the IO writing statics on the system disk (btrf file system). This also leads to a noticeable decrease of system performance and leading the system to be a significant portion of time in iowait.
This is what I found out so far:
- The source of the problem seems to be the nextcloud instance running in docker. It contains of a nextcloud 15 container, mariadb 10.3 container and redis 5.0 container. When I stop these, everything goes back to normal.
- These were exactly the same containers in the beginning as before the upgrade. Of cause during the analysis I recreated the containers, but this didn’t change anything.
- Docker version is the same in openSUSE Leap 42.3 and 15.0 according to the package list.
- When I make sure, that no request can reach the nextcloud and I deactivate the cron job calling it every 15 min, then there is NO increased writing on the disk.
- Because the effect is intermitting, it is hard to tell what exactly is causing it, but looking at iotop for a while it seems to be the mysql database causing the increased writing. But what should have changed here compared to before the upgrade.
- I cannot find anything unusual in the logging of nextcloud or mysql, no problems or errors reported.
I am running out of ideas. So every hint would be highly appreciated. What would be further directions to head?
Best
Size of memory and swap usage please???
I think some pictures from the monitoring say the most. The gap is the time during upgrade. Before the upgrade I did some backup.
iostat:
http://susepaste.org/images/91283756.png
cpu:
http://susepaste.org/images/83928036.png
memory:
http://susepaste.org/images/7536393.png
# btrfs fi usage /
Overall:
Device size: 266.09GiB
Device allocated: 89.07GiB
Device unallocated: 177.02GiB
Device missing: 0.00B
Used: 80.01GiB
Free (estimated): 184.31GiB (min: 95.80GiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 280.69MiB (used: 0.00B)
Data,single: Size:83.01GiB, Used:75.71GiB
/dev/sda2 83.01GiB
Metadata,DUP: Size:3.00GiB, Used:2.15GiB
/dev/sda2 6.00GiB
System,DUP: Size:32.00MiB, Used:16.00KiB
/dev/sda2 64.00MiB
Unallocated:
/dev/sda2 177.02GiB
Hello all,
looking at the problem for a while here are two more things that I recognized:
-
The used disk space do not change significantly, so I assume, that the writing is consecutively overwriting something
-
I can see now in the memory usage, that the caching behavior seems to have changes:
http://susepaste.org/images/18334418.png
Before the cache used more or less constantly the full amount of unused memory, now it is decreasing over the time. The jump up every day is at 3 am, when I do some automatic “backup”.
Also “active” and “inactive” are much less constant. Only the changes in the “committed” memory I can explain by activating more or less docker containers for testing reasons.
So my theory is, that before the upgrade some data, that is now causing the IO to the disk, was cached in the RAM back then.
If this is correct, the question is, what caused this change?
My search brought me that far until now, that I can configure the disk cache behavior with these:
sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
But these seems to be perfectly common values and I guess, these didn’t change?
Do you know anything that changed from 42.3 to 15.0, that could explain the behavior change?
Best