High CPU Wait States on Intel Core 2 Quad Machine

All

I have a machine that is randomly pausing. In top the CPU stats say that the CPU I/O Wait State is high, often 100% for several seconds, splitting the display to show individual CPUs it seems that randomly one or more CPUs is high in wait states, 100%, or near abouts, for seconds at a time.

top is NOT reporting any application with high utilisation when this happens.

I have determined via iotop that the disks are NOT under any load when this happens.

I last night ran the memtest off the install CD, it completed 8 scans without error in 11 1/2 hours.

My question is, does the CPU I/O also have to stop for network I/O or is there some sort of “buffer” between the two. This machine is being accessed via ssh to run X applications on other machines. So network traffic would be constantly high.

Specifications.

openSUSE 11.1 64 bit
Kernel: (from uname) 2.6.27-29-0.1-default #1 SMP x86-64

CPU: (from /proc/cpuinfo) Intel Core 2 Quad CPU Q8400 @ 2.66 GHz
It sees 4 CPUs :slight_smile:

Memory: 8 GB

Motherboard: Intel P5QL Pro Chipset P43.

Disk controller: ICH10 southbridge.

Disks: 4 SATA 1TB disks. The / partition is on a md raid mirrored over 2 disks. The other 2 disks are mounted under /mnt
All partitions formated ext3

Swap: is on one of the mirrored disks.

/boot: is on the other disk of the mirror set.

Network: is a PCIe Gigabit LAN Controller connected to a 100 Mbit switch.

Cheers
Jim

That’s a relatively high-powered system, so you’d think it would fly. Having 8 Gig of RAM makes a big difference, too.

Try latencytop; it’s in the Build repositories. I’ve never tried posting a “ymp” link here; see if this works: http://software.opensuse.org/ymp/devel:tools/openSUSE_11.1/latencytop.ymp. If not, go to “software.opensuse.org,” click the “search” item on the left, and enter “latencytop” in the search box.

See if latencytop will give you some idea of what’s happening and post back here. I’m intrigued.

That is why it is annoying me… I have a Core 2 Duo Laptop, Admittedly running 11.0, without issues.

Yep the ymp link worked… Thanks for that.

Well firing it up for the first time, and not really knowing what I am looking at, at first glance, BUT the first thing I see is pdflush 3524.8, md0_raid1 3524.7 kjournald 3523.4.

I assume that this is millisecs as this is the unit in the next window when I click on each of these entries. 3.5 secs is a long time in computer terms.

Whoops this machine just paused again and this time md0_raid1, specifically Raid resync kernel thread, was up at 17700 ms this time.

I wonder if it is indeed a disk issue. Might check the smart stats and see if one of them is having issues.

Thanks again for the heads up on the latencytop tool.

Cheers
Jim

Jim,

Might not hurt to poke around in the BIOS settings, too. Wait states are a necessary evil, given the difference in speed between the CPU(s) and the memory buss. But it’s possible that with a little judicious tinkering, you could speed that up.

Are you running software or hardware raid? The ICH10R supports hardware; is that what you have?

I’m headed to bed for the night, but I’m still intrigued. When one shells out the bucks for a loaded system like yours, one expects a load of results. :slight_smile:

Well this has me puzzled.

I ran some tests on the disks using the smartctl utility.

All disks pass and no errors being reported.

I then used the hdparm -T utility to test the cache read performance. This also seems fine with each disk reporting throughput of approx 1.6 GB/s

Even the /dev/md0 reports 1.5 GB/s and this was during a period of high wait states. I am guessing, as its output paused, that it is also freezing and does not see the overall time. I had top running and the wait states went to around 80% or so, on 3 of the 4 processors.

When I get a chance later tonight I am going to have a closer look at the BIOS settings, reading the manual I notice that there are a series of overclocking settings, wondering if the machine builder got a bit too keen on some of these settings?

Jim

Test results to follow (too big to fit in this post)

Output from tests (Part 1):

hdparm -T /dev/sda

/dev/sda:
Timing cached reads: 3202 MB in 2.00 seconds = 1600.75 MB/sec

/dev/sda:
Timing cached reads: 3238 MB in 2.00 seconds = 1619.27 MB/sec

/dev/sda:
Timing cached reads: 3286 MB in 2.00 seconds = 1642.61 MB/sec

/dev/sda:
Timing cached reads: 3162 MB in 2.00 seconds = 1581.46 MB/sec

/dev/sda:
Timing cached reads: 3118 MB in 2.00 seconds = 1558.72 MB/sec

/dev/sda:
Timing cached reads: 3234 MB in 2.00 seconds = 1617.50 MB/sec

hdparm -T /dev/sdb

/dev/sdb:
Timing cached reads: 3248 MB in 2.00 seconds = 1624.35 MB/sec

/dev/sdb:
Timing cached reads: 3250 MB in 2.00 seconds = 1624.96 MB/sec

/dev/sdb:
Timing cached reads: 3240 MB in 2.00 seconds = 1619.67 MB/sec

hdparm -T /dev/sdc

/dev/sdc:
Timing cached reads: 3238 MB in 2.00 seconds = 1618.84 MB/sec

/dev/sdc:
Timing cached reads: 3268 MB in 2.00 seconds = 1633.55 MB/sec

/dev/sdc:
Timing cached reads: 3276 MB in 2.00 seconds = 1638.22 MB/sec

/dev/sdc:
Timing cached reads: 3258 MB in 2.00 seconds = 1629.39 MB/sec

hdparm -T /dev/sdd

/dev/sdd:
Timing cached reads: 3224 MB in 2.00 seconds = 1611.91 MB/sec

/dev/sdd:
Timing cached reads: 3222 MB in 2.00 seconds = 1610.95 MB/sec

/dev/sdd:
Timing cached reads: 3226 MB in 2.00 seconds = 1612.82 MB/sec

/dev/sdd:
Timing cached reads: 3258 MB in 2.00 seconds = 1629.37 MB/sec

hdparm -T /dev/md0

/dev/md0:
Timing cached reads: 3200 MB in 2.00 seconds = 1599.65 MB/sec

/dev/md0:
Timing cached reads: 3264 MB in 2.00 seconds = 1631.64 MB/sec

/dev/md0:
Timing cached reads: 3162 MB in 2.00 seconds = 1580.80 MB/sec

Output of tests (Part 2):

# smartctl -a /dev/sda
smartctl 5.39 2008-10-24 22:33 [x86_64-suse-linux-gnu] (openSUSE RPM)
Copyright (C) 2002-8 by Bruce Allen, [smartmontools Home Page (last updated $Date: 2009-09-14 01:43:11 +0200 (Mon, 14 Sep 2009) $)](http://smartmontools.sourceforge.net)

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00M2B0
Serial Number:    WD-WCAV51010226
Firmware Version: 01.00A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Oct  4 11:50:50 2009 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (20400) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 235) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   115   112   021    Pre-fail  Always       -       7241
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       16
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       903
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       11
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       7205
194 Temperature_Celsius     0x0022   118   114   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


# smartctl -a /dev/sdb
smartctl 5.39 2008-10-24 22:33 [x86_64-suse-linux-gnu] (openSUSE RPM)
Copyright (C) 2002-8 by Bruce Allen, [smartmontools Home Page (last updated $Date: 2009-09-14 01:43:11 +0200 (Mon, 14 Sep 2009) $)](http://smartmontools.sourceforge.net)

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00M2B0
Serial Number:    WD-WCAV51028064
Firmware Version: 01.00A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Oct  4 11:52:54 2009 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (21600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 248) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   109   109   021    Pre-fail  Always       -       7508
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       16
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       903
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       11
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       7091
194 Temperature_Celsius     0x0022   121   112   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Output of tests (Part 3)

# smartctl -a /dev/sdc
smartctl 5.39 2008-10-24 22:33 [x86_64-suse-linux-gnu] (openSUSE RPM)
Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00M2B0
Serial Number:    WD-WCAV51010266
Firmware Version: 01.00A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Oct  4 11:52:59 2009 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (21600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 248) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   117   117   021    Pre-fail  Always       -       7133
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       16
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       902
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   196   196   000    Old_age   Always       -       14632
194 Temperature_Celsius     0x0022   121   116   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


# smartctl -a /dev/sdd
smartctl 5.39 2008-10-24 22:33 [x86_64-suse-linux-gnu] (openSUSE RPM)
Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EADS-00M2B0
Serial Number:    WD-WCAV51025602
Firmware Version: 01.00A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Oct  4 11:53:04 2009 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (19200) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 221) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   107   107   021    Pre-fail  Always       -       7608
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       16
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       897
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2208
194 Temperature_Celsius     0x0022   121   116   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Checked the BIOS settings and all the Overclocking stuff was set to defaults of Auto. So nothing suspicious there.

One thing the BIOS does have is a Hardware Monitor screen showing things like Fan Speeds and CPU Temps etc. CPU is running at approx 36 C, so not too hot.

I would not be worrying too much about the CPU wait states being too high except for the fact that the whole machine freezes for seconds at a time. Which makes it almost unusable when trying to do real work. :frowning:

Jim

After the following changes I can still not find what is going on.

Running latencytop over several days and watching it during hangs I noticed that it was showing high latency during disk “stuff”

fsync(), Writing page to disk, etc. I also noticed it is mentioning Page Fault. Which I know as a memory operation, to do with swap, but the machine has 8 GB of mem and the swap is empty, so I might be wrong there.

Anyway I have tried the following - with results included:

Found a mailing list post at:

Linux-Kernel Archive: Re: Finding what is stuck…

which mentions using:

echo noop > /sys/block/sda/queue/schedular

which changes the schedular to use noop instead of CFQ, did this all all disks.

No difference.

changed back to CFQ

=//=

Mounted the disks as ext2

No difference

Mounted back as ext3

=//=

Reran the Memory test 7 Passes in 9 hours or so.

No Errors

=//=

Started the machine using the Fail Safe option

No Difference

Rebooted back to Normal

=//=

Updated BIOS from v1001 to v1004

No difference

=//=

Updated the CPU microcode downloaded from Intel website

No difference

=//=

Now I am running out of ideas…

Jim

Ah ha - I might - I say might - have found the issue. Now to hunt down a solution.

I noticed over the last few days, and getting more familiar with latencytop that the following were popping up a lot.

  • fsync() on a file

  • Writing a page to disk

  • Writing buffer to disk (synchronous)

  • EXT3: Waiting for journal access

Top and latencytop keep reporting in most cases, but occasionally they would stop too.

Usually up to 3 CPUs would have high wait states of 90% or more. The machine would not unfreeze until all wait states were back to 0

I had a brainstorm… As I use the Evolution email client, and it is the most painful when this happens, I put a search together and found:

More on the EeePC hangs | Community Matters

Which seems to be the same issue on an eeePC. There seems to be an issue with fsync() and the ext3 filesystem. It seems that Evolution, and Firefox, use sqlite and this uses fsync() heavily which blocks causing the freeze.

Now to solve the issue…

Will report back.

Cheers
Jim

Please do.

Sorry, I was away over the weekend taking a little vacation with my sweetie. But I’m still very intrigued by this. Wish I had more suggestions to offer, but in this case, I may be learning from you. :slight_smile:

FYI,
Found very high “wait load” on CPU, often over 90% when editing a large document in OOo 3. After letting latencytop run overnight, found this high reading on file lock contention. Does this make sense to anyone?
http://www.viewsletter.com/img/100CPU-Load-3.jpg
http://www.viewsletter.com/img/LatencyTOPlocks.jpg

Generic notebook (eRacks), openSuse 11.1, OOo 3.1.1

konsultor, smpoole7

I have seen latencytop report the minus numbers when my machine has done a “hard pause” (all is frozen, no mouse movement etc.) and I think it’s internal timers lose the plot too.

As for the large document issue I am not sure if Ooo uses fsync or not, but for me this is where I think the issue is.

My findings since last post:

After some reorganisation of my file systems, I am now running all partitions except /boot on reiserfs. As I moved my home partition to reiser (on its own disk), and the /root and /mnt/disk1 were still ext3, there did seem to be some improvement. The amount of the freezes seemed to decrease a bit. Evolution was almost usuable. However, as I have moved all partitions now to reiser, the issue is still there.

The one big thing I notice it that when the freeze is about to happen/happening the HD light on the front panel of the machine is on solid and the wait states increase across multiple CPUs. When it is working OK the HD light does its usual flicking as expected. So there is something not right in that area.

The really annoying part of all this is the machine is quick, really quick, when it is working, but it is taking me longer to do things as I have to wait around for upto minutes at a time before it gets past a “hang”. Interestingly enough if I continue to click and type while it is “hung” these are buffered somewhere and are acted upon when the machine returns. So the jam is not on inputs.

As 11.2 is on its way I might wait for it to be released and give it a go, Have to check which version of the kernel is being implemented for it. Maybe check Bugzilla again and post a bug if it does not fix the issue.

Cheers
Jim

Latest update…

I think I have found the issue…

Had a bit of a think about this and wondered what would happen if I fail one of the disks in the MD RAID…?

YESSSSSSSSssssss!!!

It has stopped the issue. The machine is now whizzing along. Launching apps, evolution, firefox all the ones that were causing issues are now working. The Wait States are jumping up and down, but only in single digits. The HD light is acting normal again.

So having MD RAID in a Raid 1 (Mirroring) on the root partition was not a good idea. Or it needs some fine tuning for 1 TB disks and/or the ICH10 SATA controller.

More research needed…

Jim

An interesting thread, but it seems hard for anyone to comment much. A few comments strike me.

An interesting problem but not a very clear one. There’s nothing intrinsically wrong with having / on a mirrored md(4) device, and I have used such setups on SMP systems in past. I saw something similar in effect when there was a driver issue, but then there were error messages logged in files (/var/log), which you have not mentioned.

Are you sure there’s nothing reported about DMA errors for example?

CPU Memory wait states, are a very low level hardware issue, the CPU stalls waiting to load from RAM, hence pre-fetching, L1, L2 & L3 caches, and SMT/HT to utilise the core logic better whilst the CPU is (invisibly to OS) cycling on a memory access.

The I/O wait state reported by top(1), is reporting something different, not memory, but seperating out the idle “waiting for I/O” and idle “nothing to do”. 10 years ago, you did not see that information, my I/O wait goes up when I have a I/O work to do.

The netbook report found was the famous SSD “stutter” problem, basically small writes would cause a slow read-erase-write cycle on random I/O which was very inefficient, SSD’s have had improved firmware and larger RAM buffers to improve predictability on non-sequential 8writes.

Konsultors latency on fsync(2) of 60 msec seems to be expected, after all that call returns when ALL outstanding write I/O has reached the hard disk.

> There seems to be an issue with fsync() and the ext3
> filesystem. It seems that Evolution, and Firefox, use > sqlite and this uses fsync() heavily which blocks
> causing the freeze.

It was very regular fsync(2) which was writing ALL outstanding cached blocks, which killed interactive performance. Hence splitting /home, /var, /tmp and /, mounting with data=writeback, and performance improvers like mounting with relatime (or noatime) so inodes aren’t forced to be written after file reads.

Given that you mention several cores showing similar, that you might have stumbled on something like a lock contention issue with multi-threading in md(4).

But more likely it’s related to the Intel ICH10 SATA controller. Digging around I have come across some mention of issues with some Intel SATA controllers :

[Bug 187383] Re: System monitor causes Xorg to consume 100% CPU](http://www.mail-archive.com/desktop-bugs@lists.ubuntu.com/msg344014.html)
Nabble - freebsd-current - SATA DMA errors on second ICH10 bus

Perhaps there’s a quirk, showing itself up during RAID 1 writes?

Are you using a “tainted” kernel with binary graphics driver?

robopensuse

Thanks for the information. Some valid points.

I am sorry for the delay in replying but I got carried away with my new machine and then the holiday break was on us.

The interesting thing is that now that I have broken the mirror, and therefore not using the RAID1, I am successfully running 5 VM Sessions, on top of normal desktop activity; email, browsing etc. without the pauses I was seeing, even though top is still reporting high wait states across multiple cores. They are just not staying high for as long as they had been and the machine is not freezing.

I do not remember seeing any kernel tainted messages, and searching the messages log from that period did not find any.

The graphics driver I am running is the standard Vesa Framebuffer. This machine has a nvidia graphics card, but as I am using the machine as a “server” I rarely use the local display and therefore have not chased down a fancier graphics driver that may add tainting :wink:

Jim

Software RAID1 works fine for / on my machine. As robopensuse said there is nothing inherently wrong with RAID1 for /. Maybe you had a fakeraid configuration that uses a poor proprietary driver?