btrfs process uses 50% of processor and lags everything

Hello!

I’m happy with openSUSE 42.2, but there is a problem. btrfs sometimes runs itself and uses 50% of processor, which causes lags and I cannot work, watch youtube, KDE freezes… Even listening to music is very hard while it is working in background. It runs without a prompt and it seems that my processor can’t handle applications and this. I was searching in Google for a solution, but I didn’t found one. I even don’t know what is this doing… It’s balancing, it’s file system, but I’m on SSD. I was trying to decrease priority, but it didn’t work.

Happy to see help. Cheers!

Take a look at my reply here; https://forums.opensuse.org/showthread.php/523354-High-CPU-load-related-to-btrfs-causes-lock-up

In essence, disable BTRFS quotas with; sudo btrfs quota disable /

before disabling quotas (to help determine the root cause):
1 - how often does this occur? and for how long?
2 - do you have /home on btrfs?
3 - do you any other kind of disk thrashing activities - use for builds, have VMs, databases etc?

  1. One or two days for 10-15 minutes. It’s frustrating as it slows down everything, sometimes applications crash, or they have unexpected behavior (as I click on everything “why you don’t respond? why… oh, btrfs”)

  2. No, /home has XFS.

  3. I have very small mysql database for education purposes. Also, apache is installed. VMs are on external hard drive on USB, which is NTFS-formatted.

I installed Leap 42.2 about 2 weeks ago, btrfs on the root partition, and noticed this behavior for the first time a couple days ago. System was unresponsive for about 30 minutes. “top” said it was a btrfs process (I’ve forgotten which one). I didn’t know what it was, and put it on my list of things to watch for…

Having read the links above, it was likely the same issue others are having with quotas. So far, I haven’t formulated an opinion as to whether I want to disable quotas. It might be enough to just exercise some control over when this happens. Does anyone know how to find/edit the chon job that is doing this. I never play with chron, so I could use some pointers.

so this is interesting:
(/etc/cron.weekly/) btrfs balance runs only weekly (so an unlikely cause)
(/etc/cron.daily/) we have 1) suse.de-backup-rpmdb (this has caused freezing on my system but not for 15mins) and 2) suse.de-snapper
if we look at systemd timers we have (systemctl list-timers --all) systemd-tmpfiles-clean.timer which runs every 48hrs.

to manually run systemd timer (services) you can e.g. sudo systemctl start systemd-tmpfiles-cleanand for cron sudo /etc/cron.daily/suse.de-snapper

could you confirm it was btrfs_transacti (or something similar) and that you did not see a baloo process near top?

for alvinek
‘sudo btrfs filesystem usage -h /’ and (not including 0 and 1) how many snapshots do you have and how old is the oldest?

If you take a look in the “/etc/” directory you’ll find the following:

  • A file named “crontab”.
  • A collection of directories named “cron.daily/”, “cron.hourly/” “cron.monthly/” and “cron.weekly/”.

“crontab” is the base, the directories “hourly”, “daily”, “weekly” and “monthly” contain cron jobs which are executed hourly, daily, weekly and monthly.

Yes, there are a Btrfs jobs which by openSUSE default are run weekly (balance and trim – if you have an SSD, nice to have run automatically for you) and monthly (scrub – cleans up the snapshots).

The Btrfs quota is not related to any cron job – it’s an internal Btrfs feature which is (I’m just quoting a source) “not stable yet”.
<Btrfs - ArchWiki;
<Quota support - btrfs Wiki;

There is an openSUSE discussion related to the Btrfs Quota issue – an openSUSE decision was made to enable Btrfs Quota by default in the Kernel.

  • The reasoning was, “If Btrfs Quota is not enabled, we will never know if it is reliable or not.”

I was able to recreate what I experienced by running the script /etc/cron.weekly/btrfs-balance as root. Two processes hit the top of “top” at different times: “btrfs” and “btrfs_transacti”. In this most recent instance, it ran for about 5 minutes, and while it sucked up 100% of a core, it was merely inconvenient. The desktop remained responsive. Is there a way to make it run at a specific time and/or day?

(so what your experiencing from balance is more or less expected, there have been reports however of more pathological behaviour)
by default you can only change the frequency and level of the balance
yast:sysconfig-editor:system:file-systems:btrfs then {BTRFS-BALANCE-PERIOD, …DUSAGE}
since its through cron perhaps you can change thing more accurately without breaking things but i dont know how.

if you change to daily and reduce the level to say 20 or 30 it will run more often but much quicker, it wont break anything but may not be optimal, not sure.

PS do you have a very small root size? sudo btrfs filesystem usage -h / might be usefull. are you on ssd?

Possibly the first step to understanding exactly how your system is affected is to first understand whether your problem is disk I/O or something else since a lot of time it’s the busy disk system will cause far greater problems than anything else.

Use the iotop utility to read disk I/O so you can be sure it’s something like BTRFS snapshotting and not something else.

TSU

OK. Good to know. Daily might not be a bad plan, as I already have a mental delay in the morning to accommodate dropbox indexing. But it really would be better to chose a time for it (weekend, middle of the night) so that it doesn’t hit when I am actually supposed to be working.

I think of my root as small, but these things are subjective:

linux-5vqd:/etc/cron.weekly # btrfs filesystem usage -h /
Overall:
    Device size:                  60.00GiB
    Device allocated:             16.07GiB
    Device unallocated:           43.93GiB
    Device missing:                  0.00B
    Used:                         14.42GiB
    Free (estimated):             44.81GiB      (min: 22.84GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              224.00MiB      (used: 0.00B)

Data,single: Size:14.01GiB, Used:13.13GiB
   /dev/sda6      14.01GiB

Metadata,DUP: Size:1.00GiB, Used:661.89MiB
   /dev/sda6       2.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6      64.00MiB

Unallocated:
   /dev/sda6      43.93GiB

It’s probably worth noting that the recent run of the script only moved one chunk, which is presumably why it didn’t run for very long:

linux-5vqd:/etc/cron.weekly # ./btrfs-balance
Before balance of /
Data, single: total=15.01GiB, used=13.13GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.50GiB, used=661.02MiB
GlobalReserve, single: total=224.00MiB, used=0.00B
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda6        65G   16G   48G  26% /
Done, had to relocate 1 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1
Done, had to relocate 1 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=10
Done, had to relocate 1 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=20
Done, had to relocate 1 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=30
Done, had to relocate 1 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=40
Done, had to relocate 2 out of 19 chunks
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=50
Done, had to relocate 1 out of 18 chunks
Done, had to relocate 0 out of 18 chunks
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=1
  SYSTEM (flags 0x2): balancing, usage=1
Done, had to relocate 2 out of 18 chunks
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=5
  SYSTEM (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 17 chunks
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=10
  SYSTEM (flags 0x2): balancing, usage=10
Done, had to relocate 1 out of 17 chunks
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=20
  SYSTEM (flags 0x2): balancing, usage=20
Done, had to relocate 1 out of 17 chunks
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=30
  SYSTEM (flags 0x2): balancing, usage=30
Done, had to relocate 1 out of 17 chunks
After balance of /
Data, single: total=14.01GiB, used=13.13GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=660.80MiB
GlobalReserve, single: total=224.00MiB, used=0.00B
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda6        65G   16G   49G  25% /

Finally, when I ran the btrfs-trim script, I got this:

linux-5vqd:/etc/cron.weekly # ./btrfs-trim
Running fstrim on /
fstrim: /: the discard operation is not supported

Is that also expected behavior?

trim is only for ssd i believe (and not supported by all [sudo hdparm -I /dev/sda | grep “TRIM supported”] ), all good.

With Leap 42.2 - actual as of today - and a 500GB HDD (not an SSD and not an SSHD (HDD with SSD cache) with an 80 GB Btrfs partition, and with the Btrfs cache disabled, the weekly Btrfs balance and Btrfs trim and the monthly Btrfs scrub, gave the following results (I preceded the “sh” which called the scripts with “time”):

 # time sh /usr/share/btrfsmaintenance/btrfs-balance.sh     
Before balance of /
Data, single: total=10.01GiB, used=9.45GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=634.25MiB
GlobalReserve, single: total=224.00MiB, used=0.00B
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        86G   12G   74G  14% /
Done, had to relocate 0 out of 14 chunks
.
.
.
Done, had to relocate 1 out of 14 chunks
After balance of /
Data, single: total=10.01GiB, used=9.45GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=634.58MiB
GlobalReserve, single: total=224.00MiB, used=0.00B
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        86G   12G   74G  14% /
0.03user 0.35system 0:09.97elapsed 3%CPU (0avgtext+0avgdata 2996maxresident)k
1632inputs+123936outputs (0major+3274minor)pagefaults 0swaps
 # 
 # time sh /usr/share/btrfsmaintenance/btrfs-trim.sh            
Running fstrim on /
fstrim: /: the discard operation is not supported
0.00user 0.01system 0:00.01elapsed 125%CPU (0avgtext+0avgdata 3064maxresident)k
0inputs+0outputs (0major+853minor)pagefaults 0swaps
 # 
 # time sh /usr/share/btrfsmaintenance/btrfs-scrub.sh
Running scrub on /
scrub device /dev/sda3 (id 1) done
        scrub started at Mon Mar  6 10:20:43 2017 and finished after 00:01:54
        total bytes scrubbed: 10.69GiB with 0 errors
0.00user 8.94system 1:54.30elapsed 7%CPU (0avgtext+0avgdata 3072maxresident)k
22416528inputs+5600outputs (0major+866minor)pagefaults 0swaps
 # 

None of the Btrfs shell script executions provoked an excessive CPU usage (the 125% CPU usage by trim was the effect of an extremely short execution period on a 4 core CPU . . . ).
[HR][/HR]IMHO, provided the Btrfs quota is disabled, the housekeeping performed by Btrfs seems to have minimal system impact.

  • Given a CPU with at least 4 cores . . .

I suspect that, the use of Btrfs on older systems with single core or at most dual core CPUs may possibly experience an impact on system performance when the Btrfs housekeeping routines execute.

OK, its know disabling quotas fixes cpu spikes, the more interesting question is what causes quota problems only on some systems. people immediately disabling quotas will not find the cause. Im curious to eliminate basics like ssd vs hdd (hdds are also DUP on metadata), number of snapshots and especially fragmentation.

ps you only moved 1 chunk but if you say no freeze then thats good, trim dosnt work on hdds.

for those disabling quotas, careful to change the range definitions to fixed values in snapper config.

I had a similar thought. Given that I’ve only had one surprise, and given my own unfamiliarity with btrfs, I have chosen to essentially do nothing and have left quotas enabled. I’ll report back should I have further issues. I’ll be watching this thread to see how the OP makes out.

I tell people to disable quotas because there’s a lot of people who need their computers for real work, not debugging issues that should not be there because the developers think of you as guinea pigs for untested features.

Quotas were marked unstable until last year and they’re still clearly fubar.

I use openSUSE Tumbleweed and also got the lag problem. Using the proposed solution

“In essence, disable BTRFS quotas with; sudo btrfs quota disable /

worked for me. Thanks.

I always choose ext4 for each partition.
I have no advantages … but no problems