High CPU load related to btrfs causes lock up

Linux linux.suse 4.4.49-16-default #1 SMP Sun Feb 19 17:40:35 UTC 2017 (70e9954) x86_64 x86_64 x86_64 GNU/Linux

This is on a Dell E6400 notebook, 8Gbyte RAM with a Samsung 850 Pro SSD 512Gbytes. It ran fine under openuse 13.2. In fact I still have an Intel SSD in the notebook with 13.2 installed on it for emergency use. Basically since I put in a second drive, I figured I’d leave the SSD with win7 and opensuse 13.2 alone. The only connection between the 42.2 OS on the Samsung SSD, well other than I can read the Intel drive and also that 2Gbytes of swap on that drive got lumped in with the 8Gbytes of swap I put on the Samsung SSD. (I didn’t spot this in during the install.)

My problem is when snapper or some btrfs program does it’s thing, the notebook is completely unresponsive. When I can manage to get a command into a terminal, I spot 100% cpu usage on snapper or btrfs. Can the btrfs system programs be “niced”? Should I try:


btrfs filesystem defragment [btrfs filesystem path] 
or
sudo btrfs scrub start /path/to/filesystem/mount

Reading what I could find on the forum with similar problems, here are some diagnostics.

             total       used       free     shared    buffers     cached
Mem:       8126284    3638988    4487296     159268       5204    2697124
-/+ buffers/cache:     936660    7189624
Swap:     10500088          0   10500088

As you can see, I’m using no swap. (10G of swap is comprised on 8G on the Samsung SSD and 2G on the intel. I never hibernate, but figured on the new installation I could afford to “waste” some drive space.

systemd-analyze
Startup finished in 3.158s (kernel) + 3.878s (initrd) + 3.827s (userspace) = 10.864s

Boot time is fine, and it doesn’t lock right after boot. No, that isn’t annoying enough. It locks when I have started to do something. :wink:

systemctl list-timers --all
NEXT                         LEFT        LAST                         PASSED     UNIT                         ACTIVATES
Sat 2017-02-25 00:29:32 PST  23h left    Fri 2017-02-24 00:29:32 PST  56min ago  systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Mon 2017-02-27 00:00:00 PST  2 days left Mon 2017-02-20 00:00:01 PST  4 days ago fstrim.timer                 fstrim.service

I saw a post related to some collision between trim programs on the SSDs, but I don’t think that problem is relevant here.

btrfs device stats /dev/sdb2
[/dev/sdb2].write_io_errs   0
[/dev/sdb2].read_io_errs    0
[/dev/sdb2].flush_io_errs   0
[/dev/sdb2].corruption_errs 0                                                                                                                                                      
[/dev/sdb2].generation_errs 0  

In theory, the SSD is clean based on above.

btrfs filesystem df /                                                                                                                                            
Data, single: total=30.01GiB, used=20.92GiB                                                                                                                                        
System, single: total=64.00MiB, used=16.00KiB                                                                                                                                      
Metadata, single: total=2.00GiB, used=703.69MiB                                                                                                                                    
GlobalReserve, single: total=240.00MiB, used=0.00B



df
Filesystem     1K-blocks     Used Available Use% Mounted on
devtmpfs         4055520        4   4055516   1% /dev
tmpfs            4063140    42200   4020940   2% /dev/shm
tmpfs            4063140     2388   4060752   1% /run
tmpfs            4063140        0   4063140   0% /sys/fs/cgroup
/dev/sdb2       71680000 22903788  47580532  33% /
/dev/sdb2       71680000 22903788  47580532  33% /tmp
/dev/sdb2       71680000 22903788  47580532  33% /var/spool
/dev/sdb2       71680000 22903788  47580532  33% /usr/local
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/mailman
/dev/sdb2       71680000 22903788  47580532  33% /var/cache
/dev/sdb3      419824348 67444600 352379748  17% /home
/dev/sdb2       71680000 22903788  47580532  33% /var/tmp
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/mysql
/dev/sdb2       71680000 22903788  47580532  33% /boot/grub2/x86_64-efi
/dev/sdb2       71680000 22903788  47580532  33% /boot/grub2/i386-pc
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/pgsql
/dev/sdb2       71680000 22903788  47580532  33% /.snapshots
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/named
/dev/sdb2       71680000 22903788  47580532  33% /srv
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/mariadb
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/libvirt/images
/dev/sdb2       71680000 22903788  47580532  33% /var/lib/machines
/dev/sdb2       71680000 22903788  47580532  33% /opt
/dev/sdb2       71680000 22903788  47580532  33% /var/log
/dev/sdb2       71680000 22903788  47580532  33% /var/crash
/dev/sdb2       71680000 22903788  47580532  33% /var/opt
tmpfs             812632       12    812620   1% /run/user/1000

We’ve run into a few cases of this in IRC and by the sound of it you might have ran into the same issue.

Maybe you should try disabling BTRFS quota for /, with; sudo btrfs quota disable /


sudo btrfs quota disable /

Well I did this and we will see what happens. I’m curious how I would know when this issue is patched. Is there a relevant page I can occasionally check?

be careful here, if you disable quotas you also need to change snapper configs (ranges are no longer valid) read here https://bugzilla.opensuse.org/show_bug.cgi?id=1017461

sudo btrfs filesystem usage -h / will give better info than btrfs filesystem df /, (you can see unallocated space etc to check if balance has run etc)

This is too much in the weeds for me. Or if you prefer, too much “inside baseball” for me. Translation: I’m clueless!

btrfs qgroup show /
ERROR: can't perform the search - No such file or directory
ERROR: can't list qgroups: No such file or directory

I got this diagnostic from the bugzilla thread. Am I OK here, or do I need to do something else. BTW, not lock ups in a few days. I didn’t want to get back to the forum until I was sure the lock up problem was gone.

As a FYI, I had to post above the quote because all of my spaces were being trimmed!

Same issue under 42.3, all the CPU is used and all the ram is thrown away

ooomem killer start to kill randomic processes until the system completely freeze

  • from the little info you give your problem sounds more like OOM and swap and not btrfs
  • please dont dump on old threads (start a new one)
  • you have not given enough information for diagnostics

i made a report 2017-04-02
https://bugzilla.opensuse.org/show_bug.cgi?id=1032027