Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: intensive HDD read freezes OS at random time

  1. #1
    Join Date
    Aug 2017
    Posts
    14

    Question intensive HDD read freezes OS at random time

    Hi,

    I have a serious problem on my new LEAP 15 that I never had on previous opensuse version and that I can't solve.

    once or twice a day, at random time it seems, HDD start to work very hard and may be 20 seconds later, OS is not responsive anymore because of that. Mouse pointer is still alive and I can switch to console by alt+ctl+F1 but can't log for "delay is over" as the whole system is so slow.

    I could run iotop as root on console "before" the bug start and noticed that almost all apps are hdd consuming : firefox, dolphin, soffice, akonadi, etc... Just like if at one time, all apps need to read an extraordinary amount of data on HDD ! iotop indicates that hdd is essentially "read", and very little "write".

    I tried to let it run for half an hour but the problem stills. After hard reboot, everything goes well until next time...

    I made jounalctl persistent but "journalctl --boot=-1" does not show particular matter at the date and time of the bug.
    /var/log/messages does not give any clue either...

    I unchecked the system search tool (baloo) but the bug still happens.

    I'm not that familiar with archeology in the log files and I miss ideas to find out what happens and why...

    If anyone could give me some magic shell commands or some log files to explore, that would be precious !

    Thank for any help,
    Marc

  2. #2
    Join Date
    Aug 2017
    Posts
    14

    Default Re: intensive HDD read freezes OS at random time

    PS : the system is not swapping and really few apps are running so it is not a RAM problem.

  3. #3
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    2,559

    Question Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by mder View Post
    If anyone could give me some magic shell commands or some log files to explore, that would be precious !
    Please help us with a little bit more information about the disk partitions:
    • With the user "root", please supply the output of "fdisk --list"
    • If you're using a Btrfs system partition, please also supply the output of "btrfs filesystem usage /" -- also with the user "root" …
    • Please supply the output of " systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs' " -- please note the single quotes: " ' " …

  4. #4
    Join Date
    Aug 2017
    Posts
    14

    Default Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by dcurtisfra View Post
    Please help us with a little bit more information about the disk partitions:
    • With the user "root", please supply the output of "fdisk --list"
    • If you're using a Btrfs system partition, please also supply the output of "btrfs filesystem usage /" -- also with the user "root" …
    • Please supply the output of " systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs' " -- please note the single quotes: " ' " …
    Hi, thanks for your reply !
    Here is the fdisk result :

    linux-derumaux:/var/log # fdisk --list
    Disque /dev/sda : 465,8 GiB, 500107862016 octets, 976773168 secteurs
    Unités : secteur de 1 × 512 = 512 octets
    Taille de secteur (logique / physique) : 512 octets / 4096 octets
    taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets
    Type d'étiquette de disque : dos
    Identifiant de disque : 0x000c038d

    Périphérique Amorçage Début Fin Secteurs Taille Id Type
    /dev/sda1 63 48234495 48234433 23G 83 Linux
    /dev/sda3 * 51458046 976752639 925294594 441,2G f Étendue W95 (LBA)
    /dev/sda5 51458048 59842559 8384512 4G 82 partition d'échange Linux / Solaris
    /dev/sda6 59844608 122759167 62914560 30G 83 Linux
    /dev/sda7 122761216 123772927 1011712 494M 83 Linux
    /dev/sda8 123774976 185679871 61904896 29,5G 83 Linux
    /dev/sda9 185681920 976752639 791070720 377,2G 83 Linux

    La partition 1 ne commence pas sur une frontière de cylindre physique.
    La partition 3 ne commence pas sur une frontière de cylindre physique.


    All partitions are ext4 fs. When I installed 42.3 last year with btrfs fs, I experienced first troubles with snapshots that full filled my / part, then crashes and the last crash ended with a computer I couldn't boot anymore ! Even GRUB was crashed and I was quit worry to set up back... Since then, I choose good old ext4 fs...

    linux-derumaux:/var/log # cat /etc/fstab
    UUID=ce1f7326-550f-4532-a3eb-acb80595522f / ext4 acl,user_xattr 0 1
    UUID=61824984-91ed-413f-9e83-b6630e245b12 /fichiers ext4 defaults 0 2
    UUID=07525971-1554-49ea-ad97-a04a15aef20a /linux42.3 ext4 defaults 0 2

    For the last command :

    linux-derumaux:/var/log # systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs'
    btrfsmaintenance-refresh.path enabled
    btrfs-balance.service static
    btrfs-defrag.service static
    btrfs-scrub.service static
    btrfs-trim.service static
    btrfsmaintenance-refresh.service enabled
    fstrim.service static
    systemd-fsck-root.service enabled-runtime
    systemd-fsck@.service static
    btrfs-balance.timer enabled
    btrfs-defrag.timer disabled
    btrfs-scrub.timer enabled
    btrfs-trim.timer disabled
    fstrim.timer enabled

    Looks like strange btrfs services are enable since it is installed on ext4 fs...

    thanks for your help,
    Marc

  5. #5
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    15,748

    Default Re: intensive HDD read freezes OS at random time

    Looks like some how btrfs utility cron jobs are running. Since you have ext4 that may explain the lockups when those crons trigger.

  6. #6
    Join Date
    Aug 2017
    Posts
    14

    Default Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by gogalthorp View Post
    Looks like some how btrfs utility cron jobs are running. Since you have ext4 that may explain the lockups when those crons trigger.
    That may be the point...

    But I can't find those triggers in cron :
    linux-derumaux:/home/marc # crontab -l
    no crontab for root
    linux-derumaux:/home/marc # crontab -u marc -l
    no crontab for marc

    Crontab file only have one line that I hardly understand :
    linux-derumaux:/home/marc # cat /etc/crontab
    SHELL=/bin/sh
    PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin
    MAILTO=root
    #
    # check scripts in cron.hourly, cron.daily, cron.weekly, and cron.monthly
    #
    -*/15 * * * * root test -x /usr/lib/cron/run-crons && /usr/lib/cron/run-crons >/dev/null 2>&1


    -*/15 would it say a check runs every 15 min ?

    Daily crons are those but I have no idea how to check at what time they are launched : I have no /var/log/cron.log and no /var/log/syslog...


    linux-derumaux:/home/marc # ls /etc/cron.daily/
    google-chrome mdadm mlocate.cron packagekit-background.cron suse-clean_catman suse-do_mandb suse-texlive

    /var/log/messages shows little about cron events :
    linux-derumaux:/home/marc # cat /var/log/messages | grep cron
    2018-08-16T01:21:04.352354+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1185]: Refresh timer btrfs-scrub for monthly
    2018-08-16T01:21:04.353028+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1185]: Refresh timer btrfs-defrag for none
    2018-08-16T01:21:04.353206+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1185]: Refresh timer btrfs-balance for weekly
    2018-08-16T01:21:04.353363+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1185]: Refresh timer btrfs-trim for none
    2018-08-16T01:21:04.523670+02:00 linux-derumaux systemd[1]: Started Update cron periods from /etc/sysconfig/btrfsmaintenance.
    2018-08-16T01:21:14.373683+02:00 linux-derumaux cron[1744]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 37% if used.)
    2018-08-16T01:21:14.396986+02:00 linux-derumaux cron[1744]: (CRON) INFO (running with inotify support)
    2018-08-16T16:34:29.693391+02:00 linux-derumaux systemd[1]: Starting Update cron periods from /etc/sysconfig/btrfsmaintenance...
    2018-08-16T16:34:29.693444+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh script btrfs-scrub.sh for uninstall
    2018-08-16T16:34:29.693495+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh script btrfs-defrag.sh for uninstall
    2018-08-16T16:34:29.693506+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh script btrfs-balance.sh for uninstall
    2018-08-16T16:34:29.693517+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh script btrfs-trim.sh for uninstall
    2018-08-16T16:34:29.693534+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh timer btrfs-scrub for monthly
    2018-08-16T16:34:29.694542+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh timer btrfs-defrag for none
    2018-08-16T16:34:29.694700+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh timer btrfs-balance for weekly
    2018-08-16T16:34:29.694737+02:00 linux-derumaux btrfsmaintenance-refresh-cron.sh[1203]: Refresh timer btrfs-trim for none
    2018-08-16T16:34:29.694837+02:00 linux-derumaux systemd[1]: Started Update cron periods from /etc/sysconfig/btrfsmaintenance.
    2018-08-16T16:34:37.965677+02:00 linux-derumaux cron[1730]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 46% if used.)
    2018-08-16T16:34:37.990399+02:00 linux-derumaux cron[1730]: (CRON) INFO (running with inotify support)
    2018-08-17T11:41:57.846686+02:00 linux-derumaux systemd[1]: is_symlink_with_known_name(cron.service, cron.service) → 1
    2018-08-17T15:39:25.398001+02:00 linux-derumaux crontab[11845]: (root) LIST (root)
    2018-08-17T15:39:41.010978+02:00 linux-derumaux crontab[11848]: (root) LIST (marc)

    Thanks for advice !
    Marc

  7. #7
    Join Date
    Aug 2010
    Location
    Chicago suburbs
    Posts
    12,631
    Blog Entries
    3

    Default Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by mder View Post
    once or twice a day, at random time it seems, HDD start to work very hard and may be 20 seconds later, OS is not responsive anymore because of that.
    If I happen to be using the computer at 3am, I notice that, though I don't know about the "unresponsive" part since it didn't affect what I was doing:

    Code:
    2018-08-17T03:11:26.399734-05:00 nwr2 smartd[1433]: Device: /dev/sda [SAT], starting scheduled Short Self-Test.
    2018-08-17T03:11:26.453041-05:00 nwr2 smartd[1433]: Device: /dev/sdb [SAT], starting scheduled Short Self-Test.
    2018-08-17T03:11:26.515086-05:00 nwr2 smartd[1433]: Device: /dev/sdc [SAT], starting scheduled Short Self-Test.
    I have no idea whether that's what you are seeing.
    openSUSE Leap 15.1; KDE Plasma 5;
    testing Leap 15.2Alpha

  8. #8
    Join Date
    Aug 2017
    Posts
    14

    Default Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by nrickert View Post
    If I happen to be using the computer at 3am, I notice that, though I don't know about the "unresponsive" part since it didn't affect what I was doing:

    Code:
    2018-08-17T03:11:26.399734-05:00 nwr2 smartd[1433]: Device: /dev/sda [SAT], starting scheduled Short Self-Test.
    2018-08-17T03:11:26.453041-05:00 nwr2 smartd[1433]: Device: /dev/sdb [SAT], starting scheduled Short Self-Test.
    2018-08-17T03:11:26.515086-05:00 nwr2 smartd[1433]: Device: /dev/sdc [SAT], starting scheduled Short Self-Test.
    I have no idea whether that's what you are seeing.
    Thanks for the idea. But in my case, this event occurs at arround 10 AM and it does not seems correlated to HDD bug...

    2018-08-15T10:00:51.060152+02:00 linux-derumaux smartd[1155]: Device: /dev/sda [SAT], starting scheduled Short Self-Test.
    2018-08-16T10:47:44.831624+02:00 linux-derumaux smartd[1184]: Device: /dev/sda [SAT], starting scheduled Short Self-Test.
    2018-08-17T10:05:05.119682+02:00 linux-derumaux smartd[1208]: Device: /dev/sda [SAT], starting scheduled Short Self-Test.

    Marc

  9. #9
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    2,559

    Default Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by mder View Post
    linux-derumaux:/var/log # systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs'
    btrfsmaintenance-refresh.path enabled
    btrfs-balance.service static
    btrfs-defrag.service static
    btrfs-scrub.service static
    btrfs-trim.service static
    btrfsmaintenance-refresh.service enabled
    fstrim.service static
    systemd-fsck-root.service enabled-runtime
    systemd-fsck@.service static
    btrfs-balance.timer enabled
    btrfs-defrag.timer disabled
    btrfs-scrub.timer enabled
    btrfs-trim.timer disabled
    fstrim.timer enabled
    You are well advised to apply "systemctl disable" and then "systemctl mask" to all the systemd Btrfs '.timer' services.

    If the disk is a HDD and not a SSD then, "disable" and "mask" the systemd "fstrim.timer" service as well -- if it's a SSHD then, 'fstrim' usually also doesn't work -- "disable" and "mask" the systemd service for this case as well …

    Check for any Btrfs related entries in '/etc/cron.d/', '/etc/cron.hourly/', '/etc/cron.daily/', '/etc/cron.weekly/' and '/etc/cron.monthly/' -- remove them if there's anything there -- with systemd there's no need for cron jobs being executing in parallel to what systemd is doing …

    With no Btrfs present, you should also, at least, 'disable' all the systemd Btrfs related services which are not 'static' -- 'masked' is better …
    Code:
     > systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs'
    btrfsmaintenance-refresh.path                                    masked
    btrfs-balance.service                                            static
    btrfs-defrag.service                                             static
    btrfs-scrub.service                                              static
    btrfs-trim.service                                               static
    btrfsmaintenance-refresh.service                                 masked
    fstrim.service                                                   static
    systemd-fsck-root.service                                        enabled-runtime
    systemd-fsck@.service                                            static
    btrfs-balance.timer                                              masked
    btrfs-defrag.timer                                               masked
    btrfs-scrub.timer                                                masked
    btrfs-trim.timer                                                 masked
    fstrim.timer                                                     enabled
     >

  10. #10
    Join Date
    Aug 2017
    Posts
    14

    Default Re: intensive HDD read freezes OS at random time

    Quote Originally Posted by dcurtisfra View Post
    You are well advised to apply "systemctl disable" and then "systemctl mask" to all the systemd Btrfs '.timer' services.

    If the disk is a HDD and not a SSD then, "disable" and "mask" the systemd "fstrim.timer" service as well -- if it's a SSHD then, 'fstrim' usually also doesn't work -- "disable" and "mask" the systemd service for this case as well …

    Check for any Btrfs related entries in '/etc/cron.d/', '/etc/cron.hourly/', '/etc/cron.daily/', '/etc/cron.weekly/' and '/etc/cron.monthly/' -- remove them if there's anything there -- with systemd there's no need for cron jobs being executing in parallel to what systemd is doing …

    With no Btrfs present, you should also, at least, 'disable' all the systemd Btrfs related services which are not 'static' -- 'masked' is better …
    Code:
     > systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs'
    btrfsmaintenance-refresh.path                                    masked
    btrfs-balance.service                                            static
    btrfs-defrag.service                                             static
    btrfs-scrub.service                                              static
    btrfs-trim.service                                               static
    btrfsmaintenance-refresh.service                                 masked
    fstrim.service                                                   static
    systemd-fsck-root.service                                        enabled-runtime
    systemd-fsck@.service                                            static
    btrfs-balance.timer                                              masked
    btrfs-defrag.timer                                               masked
    btrfs-scrub.timer                                                masked
    btrfs-trim.timer                                                 masked
    fstrim.timer                                                     enabled
     >
    thanks for the advice. I've done all disable and mask :
    Code:
     
    linux-derumaux:/home/marc # systemctl list-unit-files | grep -iE 'fsck|fstrim|btrfs' 
    btrfsmaintenance-refresh.path                masked         
    btrfs-balance.service                        static         
    btrfs-defrag.service                         static         
    btrfs-scrub.service                          static         
    btrfs-trim.service                           static         
    btrfsmaintenance-refresh.service             masked         
    fstrim.service                               static         
    systemd-fsck-root.service                    enabled-runtime
    systemd-fsck@.service                        static         
    btrfs-balance.timer                          masked         
    btrfs-defrag.timer                           masked         
    btrfs-scrub.timer                            masked         
    btrfs-trim.timer                             masked         
    fstrim.timer                                 masked
    I'll tell you how it works since then !
    Thanks, Marc

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •