Intense disk IO (jbd2 and flush) blocks user interface

Hi!

Several times a day my new PC, which is supposed to be very fast, grinds to a halt from a UI perspective when the hard drive starts to work very intensively. This lasts 20-30 seconds and during this time the UI is usually very unresponsive. It sounds like it is frequently accessing two different parts of the drive at the same time. I can switch desktop to look at a console started earlier, with “iotop -oPa” running. It will show either jbd2, flush or both at the top.

This usally happens shortly after I have done something that required a fair amount of disk access, such as starting an application.

I am not running tracker, nepomuk or any other (afaik) service that could explain this, but even if I did they wouldn’t take over my PC completely I hope.

The hardware is a Dell Precision T1600 with 4 core CPU and a 1TB Seagate drive formatted with ext4. Hardware tests and SMART don’t show any problem such as bad sectors.

I have changed the mount option data to writeback on my home partition but have failed to change it on the root partition so it is still ordered on that one.

What else could I try?

TIA,
Gunnar

I think we need to know your openSUSE version and Desktop version. I would open up a terminal session and run the following commands:

su -
password:
cat /etc/fstab
fdisk -l
sudo /usr/sbin/smartctl -a /dev/sda

In place of /dev/sda, put in the actual drive name, but do not enter the Partition number as its not required. Post the results and lets see what we get.

Thank You,

I think we need to know your openSUSE version and Desktop version. I would open up a terminal session and run the following command:

su -
password:
cat /etc/fstab
fdisk -l
 /usr/sbin/smartctl -a /dev/sda

In place of /dev/sda, put in the actual drive name, but do not enter the Partition number as its not required. Post the results and lets see what we get.

Thank You,

Thank you for your interest, here is the output you requested:

fstab:

/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part5 /                    ext4       noatime,acl,user_xattr,barrier=0                1 1
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part7 /home                ext4       noatime,data=writeback,acl,user_xattr,barrier=0 1 2
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part3 /windows             ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part6 swap                 swap       defaults              0 0
/dev/sda8            /vmware              ext3       data=writeback,acl,user_xattr         1 2

fdisk -l:

   Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x985a7f38

Device Boot      Start         End      Blocks   Id  System
/dev/sda1              63       80324       40131    6  FAT16
/dev/sda2           81920    25567231    12742656    7  HPFS/NTFS/exFAT
/dev/sda3        25567232   151404375    62918572    7  HPFS/NTFS/exFAT
/dev/sda4   *   151404544  1953523711   901059584    f  W95 Ext'd (LBA)
/dev/sda5       151406592   277233663    62913536   83  Linux
/dev/sda6       277235712   294006782     8385535+  82  Linux swap / Solaris
/dev/sda7       294006784  1762009087   734001152   83  Linux
/dev/sda8      1762011136  1929775103    83881984   83  Linux

smartctl -a /dev/sda

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.1.9-1.4-desktop] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST31000524AS
Serial Number:    5VP87DK2
LU WWN Device Id: 5 000c50 037f86448
Firmware Version: JC47
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Mar 12 20:18:51 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  609) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 178) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED
RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail  Always       -       
10886816
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       479
  5 Reallocated_Sector_Ct   0x0033   054   054   036    Pre-fail  Always       -       
1903
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail  Always       -       
17313134
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       
1755
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       478
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   090   000    Old_age   Always       -       
124555952163
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   063   045    Old_age   Always       -       31
(Min/Max 21/33)
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31
(0 8 0 0 0)
195 Hardware_ECC_Recovered  0x001a   037   028   000    Old_age   Always       -       
10886816
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       
164132175219731
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       
1409183177
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       
3969213868

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours) 
LBA_of_first_error
# 1  Short offline       Completed without error       00%      1593         -
# 2  Extended offline    Interrupted (host reset)      70%      1592         -
# 3  Short offline       Completed without error       00%      1591         -
# 4  Short offline       Completed without error       00%       620         -
# 5  Short offline       Completed without error       00%       439         -
# 6  Short offline       Completed without error       00%       406         -
# 7  Short offline       Completed without error       00%       399         -
# 8  Short offline       Completed without error       00%       398         -
# 9  Short offline       Completed without error       00%       393         -
#10  Extended offline    Interrupted (host reset)      80%       391         -
#11  Extended offline    Completed without error       00%       390         -
#12  Extended offline    Interrupted (host reset)      30%       388         -
#13  Short offline       Completed without error       00%       385         -
#14  Short offline       Completed without error       00%       384         -
#15  Short offline       Completed without error       00%       384         -
#16  Short offline       Completed without error       00%       376         -
#17  Short offline       Completed without error       00%       376         -
#18  Short offline       Completed without error       00%       376         -
#19  Short offline       Completed without error       00%       376         -
#20  Short offline       Completed without error       00%       375         -
#21  Extended offline    Completed: read failure       50%       375         1024110386
1 of 1 failed self-tests are outdated by newer successful extended offline self-test #11

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay. 

I suggest you save your old fstab file and then as root, modify it to say this:

/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part3 /windows             ntfs-3g    defaults              0 0
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part5 /                    ext4       acl,user_xattr        1 1
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part6 swap                 swap       defaults              0 0
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part7 /home                ext4       acl,user_xattr        1 2
/dev/disk/by-id/ata-ST31000524AS_5VP87DK2-part8 /vmware              ext3       acl,user_xattr        1 2
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0

Thank You,

These changes would in effect remove the options noatime, data=writeback and barrier=0. These are all options that I have added because of my problems in the hope that they would increase performance. Originally I didn’t have any of them, only acl and user_xattr, so removing them would not help.

Thanks anyway.

Well, they will not help and there is more to the modifications I show that removing unneeded options. Take another look.

Thank You,

I could see two other changes; the windows mount using defaults instead of all the options that the opensuse installer added, and the vmware mount by id isof device name. I’ve now changed the vmware mount and removed the windows mount since I don’t need it. I’ll post the results.

Now I know you have already posted lost of stuff, but can I have another listing from your system? I have a script that loads another app and it is helpful to know your total system setup:

H.I. Hardware Information - A Bash script to install and run inxi with default options! - Blogs - openSUSE Forums

One thing is for sure, I do not dismiss your problem as real and want very much for your system to work properly. So lets continue the dialog if we can. Post the output from HI and lets see what your made of.

Thank You,

On 2012-03-11 09:26, gugrim wrote:
> Several times a day my new PC, which is supposed to be very fast,
> grinds to a halt from a UI perspective when the hard drive starts to
> work very intensively. This lasts 20-30 seconds and during this time the
> UI is usually very unresponsive.

Have a look
here


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2012-03-11 09:26, gugrim wrote:
> It sounds like it is frequently
> accessing two different parts of the drive at the same time.

How busy is the virtual machine inside? Is it running at that time?


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

OK, here is the HI output:

System:    Host: gunnar1 Kernel: 3.1.9-1.4-desktop x86_64 (64 bit)                                                                                                                 
           Desktop KDE 4.7.2 Distro: openSUSE 12.1 (x86_64) VERSION = 12.1 CODENAME = Asparagus                                                                                    
Machine:   System: Dell product: Precision T1600 version: 01                                                                                                                       
           Mobo: Dell model: 06NWYK version: A00 Bios: Dell version: A02 date: 04/11/2011                                                                                          
CPU:       Quad core Intel Xeon CPU E31245 (-HT-MCP-) cache: 8192 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx)                                                          
           Clock Speeds: 1: 1600.00 MHz 2: 1600.00 MHz 3: 1600.00 MHz 4: 1600.00 MHz 5: 1600.00 MHz 6: 1600.00 MHz 7: 1600.00 MHz 8: 1600.00 MHz                                   
Graphics:  Card: nVidia GF108 [Quadro 600]                                                                                                                                         
           X.Org: 1.10.4 drivers: nouveau (unloaded: fbdev,nv,vesa) Resolution: 1920x1200@60.0hz                                                                                   
           GLX Renderer: Rasterizer GLX Version: 2.1 Mesa 7.11                                                                                                                     
Audio:     Card-1: nVidia GF108 High Definition Audio Controller driver: snd_hda_intel Sound: ALSA ver: 1.0.24                                                                     
           Card-2: Intel 6 Series/C200 Series Chipset Family High Definition Audio Controller driver: snd_hda_intel                                                                
           Card-3: Logitech QuickCam Pro 9000 driver: USB Audio                                                                                                                    
Network:   Card: Intel 82579LM Gigabit Network Connection driver: e1000e                                                                                                           
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 78:2b:cb:a6:30:69                                                                                                 
Drives:    HDD Total Size: 1000.2GB (23.9% used) 1: /dev/sda ST31000524AS 1000.2GB                                                                                                 
Partition: ID: / size: 60G used: 7.7G (14%) fs: rootfs ID: / size: 60G used: 7.7G (14%) fs: ext4                                                                                   
           ID: /home size: 690G used: 176G (27%) fs: ext4 ID: swap-1 size: 8.59GB used: 0.00GB (0%) fs: swap                                                                       
Sensors:   Error: You do not have the sensors app installed.                                                                                                                       
Info:      Processes: 209 Uptime: 1:41 Memory: 1380.1/7955.9MB Client: Shell inxi: 1.7.24 

/Gunnar

Just a few comments here. Based on this, what CPU speed Govonor are you using?

CPU:       Quad core Intel Xeon CPU E31245 (-HT-MCP-) cache: 8192 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx)                                                          
           Clock Speeds: 1: 1600.00 MHz 2: 1600.00 MHz 3: 1600.00 MHz 4: 1600.00 MHz 5: 1600.00 MHz 6: 1600.00 MHz 7: 1600.00 MHz 8: 1600.00 MHz   

I have two suggestions for maximum speed:

YaST Power Management - Control Your CPU Energy Usage How To & FAQ - Blogs - openSUSE Forums

C.F.U. - CPU Frequency Utilitiy - Version 1.10 - For use with the cpufrequtils package - Blogs - openSUSE Forums

I would set or make sure that the Governor was at Performance…No need for anything else unless you have a Laptop or a bad cooling system or the A/C is out at home.

Next, I see your video system says this …

Graphics:  Card: nVidia GF108 [Quadro 600]                                                                                                                                         
           X.Org: 1.10.4 drivers: nouveau (unloaded: fbdev,nv,vesa) Resolution: 1920x1200@60.0hz                                                                                   
           GLX Renderer: Rasterizer GLX Version: 2.1 Mesa 7.11 

You could do much better loading the nVIDIA driver in my opinion. Here are three blogs I would read:

Installing the nVIDIA Video Driver the Hard Way - Blogs - openSUSE Forums

<AND>

LNVHW - Load NVIDIA (driver the) Hard Way from runlevel 3 - Version 1.40 - Blogs - openSUSE Forums

<AND>

S.A.N.D.I. - SuSE Automated NVIDIA Driver Installer - Version 1.40 - Blogs - openSUSE Forums

So, basically I would look to speed up your CPU and consider loading the proprietary nVIDIA video driver and see how your systems works then, along with the cleaned up fstab file listing mentioned before. If you are serious about speeding up your system, these can help. Of course, issues with hard drives can slow you down and its hard to know if that is a problem or not. When a systems works OK for a long time and then delays step in, your hard drive could be the problem. Also, older systems need to be cleaned out of all dust and often all plugs, adapter cards and memory modules should be removed and reseated, with power removed of course. Sometimes redoing the heat removing CPU heat sinks and reapplying the silicon grease can be helpful in reducing CPU heating which can grow over time. These are just some things I can think of that might be helpful to you.

Thank You,

No, VMware is almost never running.

I don’t have any CPU performance problems. The CPU steps up and down as they should. I am also quite satisfied with the Nouvau driver. No problem there. My problem is with how Suse handles my hard drive. Every other seconds I can clearly hear that it accesses it and iotop shows that jbd2 is the culprit. I have read on several forums that I am not the only one with that problem. But the big issue here is that several times a day Suse goes completely crazy with it for 20-30 seconds, blocking everything else.

Is ext4 the problem? Maybe not quite ready for production? Should I reformat with ext3? Would that help?

On 2012-03-18 08:56, gugrim wrote:
>
> I don’t have any CPU performance problems. The CPU steps up and down as
> they should. I am also quite satisfied with the Nouvau driver. No
> problem there. My problem is with how Suse handles my hard drive. Every
> other seconds I can clearly hear that it accesses it and iotop shows
> that jbd2 is the culprit. I have read on several forums that I am not
> the only one with that problem. But the big issue here is that several
> times a day Suse goes completely crazy with it for 20-30 seconds,
> blocking everything else.

I posted a link with a mail from one of the devs that points to a possible
cause. He needs people with the problem to test.

> Is ext4 the problem? Maybe not quite ready for production? Should I
> reformat with ext3? Would that help?

I don’t think so, but you can try and find out if that is so.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

So its not hard to miss here as some additional expansion on the fix might be helpful. I see a suggestion to add the following kernel load option to your openSUSE startup in the grub menu.lst file to see if it helps:

transparent_hugepage=never

As the menu is presented, you can just type this in to the options request and press enter to see if it works before you add it to the menu.lst file for use every time.

Thank You,

The “intense disk IO” will also happen if you get an error in the disk
controller or hard drive. Have you checked the logs to rule these out?

jdmcdaniel3 wrote:

> I see a suggestion to
> add the following kernel load option to your openSUSE startup in the
> grub menu.lst file to see if it helps:
>
>
> Code:
> --------------------
> transparent_hugepage=never

I just tried this and my system remains snappy even when i copy 40GB to
another drive while running a VM and watching a HD flash movie fullscreen
:slight_smile:

But there must be a reason for this setting not to be the default, so i’m
waiting for the downside of this.

So far so good anyway,
thanks!

Chris Maaskant

Happy to hear of your success and you can thank Carlos E. R. for finding this gem for us today.

Thank You,