Since around a week ago, I’ve been having a lot of cases where the programs go on disk sleep doing the most menial tasks (scrolling text or opening a different tab in the web browser, minimising and restoring applications, etc.). During those times even the mouse cursor gets stuck. It doesn’t last long, but it gets very annoying very fast. It never happened before that time. The log doesn’t seem to contain anything relevant to the issue.
Has anyone else experienced anything similar recently? Any ideas on what I could try to solve the problem?
I’m using openSUSE 13.1, KDE, Btrfs. One thing I could try is going back to an older snapshot of the system with snapper, but I wonder if there might be something else I could try before that…
I’m running btrfs on 13.1 (upgrade from Tumbleweed). Yesterday evening I had the problem as described. Because I was running Amarok with tracks playing, browser open, and another app running it was difficult to intervene since the cursor kept freezing. Eventually I managed to display system Activity with “disk sleep” against several processes. Eventually it cleared, but the disk led on the notebook had been almost constantly “on” throughout the debacle.
I thought it might have been hourly snapshot activity, but not so according to “# snapper list”. Nor was the cleanup cron job due to run until much later. I don’t know the cause and it hasn’t happened since.
When my HDD began to fail earlier this year (btrfs not involved then), the symptoms were as you described and occurring from time time, with increasing frequency. Error counts and messages could be seen in the “smartctl --all /dev/sda” report. So I ran smarctl this time (new HDD) but thankfully it was clean as new.
I’ve occasionally seen that when using “dd_rescue” to write an iso to a USB flash drive. I don’t think I have seen it with my current computer, but I have with the previous one.
I sometimes write an iso to an external hard drive, and I have never seen the problem there.
Yea, I’m carrying out a smartctl test as we speak (the short one didn’t report anything out of the ordinary, so let’s see if the long one will). Though the odd thing is that I get these disk sleep stutters when doing things that by far should not touch the HDD, like scrolling this very webpage.
Looking at free, it reports that I have 200 MB of RAM swapped to disk. Though why that would be is beyond me, because it also reports that I have 6 GB of completely free space (this is with buffers already excluded). I have 8 GiB of RAM in total. Another interesting thing that I’m seeing right now in free is that during the stutters, the “cached” size increases a lot – from 1 GB to 4 GB, and sometimes that is followed by an increase in swap usage by around 70 MB. After that the cache size decreases rapidly back to around 1 GB. So it feels like something is causing spikes of buffered memory, which grows so large that it causes swapping to disk… Maybe I should check how an older kernel performs, and/or test my RAM as well.
nrickert, yes, the symptoms are typical of high disk I/O (the Linux kernel by default doesn't deal well with high disk I/O like that). But in my case, no disk I/O at all is supposed to be happening...
> Looking at free, it reports that I have 200 MB of RAM swapped to disk.
> Though why that would be is beyond me, because it also reports that I
> have 6 GB of completely free space (this is with buffers already
> excluded). I have 8 GiB of RAM in total.
This is normal and good. If you hibernate it is typical.
Another interesting thing that
> I’m seeing right now in free is that during the stutters, the “cached”
> size increases a lot – from 1 GB to 4 GB, and sometimes that is
> followed by an increase in swap usage by around 70 MB. After that the
> cache size decreases rapidly back to around 1 GB.
It is reading big files. Like for indexing.
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
I never hibernate (and sleeping doesn’t even work, but that’s another issue entirely). And I turned off Nepomuk to be sure, and it’s not indexing anything. KSysGuard is only showing kwin, pidgin, kdeinit as the programs having I/O, and also not as much as that. Plus the cache size suddenly growing huge and then quickly going back down can’t be normal…
Another interesting observation: if I turn off swap, then use my scrollwheel a lot in a webpage, Firefox gets killed. Now that’s not normal in the slightest.
On 2013-12-01 22:16, GreatEmerald wrote:
>
> I never hibernate (and sleeping doesn’t even work, but that’s another
> issue entirely). And I turned off Nepomuk to be sure, and it’s not
> indexing anything. KSysGuard is only showing kwin, pidgin, kdeinit as
> the programs having I/O, and also not as much as that. Plus the cache
> size suddenly growing huge and then quickly going back down can’t be
> normal…
Cache growing 4 GB means that something is reading files, more than 4 GB
in size. How to find out what, I don’t know.
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
On 2013-12-01 22:36, GreatEmerald wrote:
>
> Another interesting observation: if I turn off swap, then use my
> scrollwheel a lot in a webpage, Firefox gets killed. Now that’s not
> normal in the slightest.
That would mean that you need more memory than the amount physically
available.
Mmm, I forgot. Cache growing could also mean, perhaps, that someone is
telling the kernel that he is going to map-read a large file, so the
space is allocated. The file is not actually read, or just a little,
then the process ends and the space is deallocated. This is hunch, you
need a programmer to put this in the right wording.
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
I now tested an older kernel, no change. Tested the RAM to make sure, and it’s good. So neither is the problem. I guess I should test how a clean user works in this regard.
robin_listas, that still doesn’t make much sense. Why would scrolling a webpage in firefox cause memory to be allocated, especially exactly a bit over the amount of memory that is unused? And by far it should not cause any big files to be read (firefox doesn’t deal with any big files like that at all). It seems to me that there is something that eats all the memory you have as if to use for caching (no matter if you have 1 GiB or 128 GiB), until the demand is so high swapping occurs. Once that happens programs go to disk sleep and no longer try to cache anything, which in turn makes whatever program is doing that to snap out of it for a while.
Thank you for that. Wow, I’m only running a 35GB root partition for 13.1/Tumbleweed and it’s allocated 28GB for the btrfs storage pool so far, of which about 18GB is actually now used for Data + Metadata, it was more, although rather small compared to yours.
Are you still running with Snapper’s defaults? They are not aggressive enough for cleaning up a desktop system. I recently halved snapper’s default setting (100) snapshots retained for YaST/zypper activity and reduced (halved) the hourly snapshot retention to pull it back to 18GB used, but I kept 10 days for keeping the daily first snapshot (hourly).
I do not keep my data in /home (integral and excluded by sub-volume from snapshots). It’s on separate partition (ext4 btw). I think you might need to do something to tune down your btrfs usage as it could just be high I/O loading giving you the problems, but I have no actual experience at your size of storage used!
Eh, I’ve been using this setup for around two openSUSE releases so far without issues, and nothing changed lately to impact that. As I mentioned, it doesn’t seem to have much to do with Btrfs, but rather with the memory and swapping (swap has no relation to Btrfs, of course). As for Snapper, I have it on defaults, but it’s not that bad. Quite useful for such time trips as I want to take to see what exactly broke the system like that. And btrfs itself is nice for that, since I can take a snapshot of the disk, save it on an external drive, and even if something goes horribly wrong, all my data is still intact.
I’m not sure where you mentioned about previous releases, I don’t recall. I used btrfs on 12.3 and never experienced the problem either. I have on 13.1, only once but the effect on my system was extremely frustrating none the less. So far it hasn’t returned.
As for Snapper, I have it on defaults, but it’s not that bad. Quite useful for such time trips as I want to take to see what exactly broke the system like that. And btrfs itself is nice for that, since I can take a snapshot of the disk, save it on an external drive, and even if something goes horribly wrong, all my data is still intact.
Nobody said it was bad, but the defaults are not so good for average desktop users who typically run openSUSE within 15 - 20GB for root partition. There have already been complaints to the devs about Snapper defaults being too aggressive with hourly snapshots but not enough on cleanup, given openSUSE installer defaults to using it with btrfs selected on a root partition. I agree it is useful.
Just a thought, do you host any virtual machines on your 13.1 system?
Tried a clean user, and it has the same problem. It seems to trigger it less often, but the issue is definitely there. Looks like I should check out my old snapshots to see if I can bisect the problem.
That was a reply to your “I think you might need to do something to tune down your btrfs usage as it could just be high I/O loading giving you the problems, but I have no actual experience at your size of storage used!” – nothing changed that would cause high I/O compared to 12.3 or even earlier 13.1.
Yes, but I had the same ones before that, and I rarely start them.
I found the trigger for the issue: it’s KWin’s OpenGL rendering. With XRender, there are no issues. So this must be some issue with the graphics stack somewhere, maybe Mesa, maybe KWin itself, maybe the Intel driver (which I’m using right now, Ivy Bridge).
BTW, I’m also using KDE with desktop effects enabled, the intel driver (not Ivy Bridge), and 13.1’s pre-release version of X server (1.14.4 RC 1). The problem still hasn’t returned here.
Hmm, after some more testing, it seems turning the effects to XRender only solves the particular test case I used (scrolling the screen very quickly). It still fails another test case (minimising and restoring several different windows at once).
All my previous comments applied to XRender (sytem default here). So, I selected openGL 3.1 and KWin immediately crashed. Clicked on restart, warning that two effects (btw both are selected by default) required openGL, so restart obviously used XRender.
Tried openGL 2.0 with no crash, but jerky in operation with some intermittent rendering issues . Then tried openGL 3.1 again with no crash this time, but also not as smooth as Xrender in operation here.
I couldn’t reproduce results re sticking cursor etc, in any tests. However my 13.1 recently returned to Tumbleweed having installed newer 3.12.3 kernel. Will repeat tests with a reboot of retained 3.11.6 kernel later today.