Nearly 100% CPU usage when writing to disk

Hi,

I have experienced an issue when running applications which write huge amount of data to the hard disk. E. g. extracting files with Ark, downloading with KTorrent at ~10 MiB/s, or even copying large files. Running such an operation makes the CPU running at nearly 100%, and the systems becomes unusable, becouse everything starts to lag (even the mouse cursor).

I am using openSUSE 13.1 with KDE 4.12.2 and 3.11.10-7-desktop kernel.

Is there anyone who experiencing this problem?

I also have the system freezing completely at times but not when writing large files, so far on my side is un-explainable, my main suspect is the gpu driver.

Is the file on an NTFS file system? In my experience, the CPU cost for NTFS activity is higher than with native linux file systems, though I have not seen it reach 100%.

No, it is on ext4 filesystem.

On 2014-02-22 09:06, helmet91 wrote:
>
> No, it is on ext4 filesystem.

There are some people reporting similar problems, but still nothing
definite has been found. No known culprit yet.

If this happened a decade ago, I would immediately suspect that the
system is not using DMA to access the disks…

Try hdparm -i and -I on that disk. It should verify if dma is used, and
which.

Oh, I forgot. One culprit was found in one or two cases: the graphical
environment desktop or file browser. Try to copy the files using ‘mc’
instead in an xterm or console. You will probably have to install it.
Very fast and powerful text mode file browser.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Thank you for your answer.

Regarding to hdparm, the hard disk in question uses udma6 (here is the output of hdparm).

I have tried copying with mc, but the issue is still there. :frowning:

On 2014-02-22 14:06, helmet91 wrote:

> robin_listas;2626471 Wrote:
>>
>> Try to copy the files using ‘mc’ instead in an xterm or console.
>>
>
>
> I have tried copying with mc, but the issue is still there. :frowning:

Oh.
Then I’m out of ideas…

Hum. Start a terminal and run ‘top’ in it. What process is eating the
cpu time during copies?


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Always the one which performs the disk-write action. E. g. when I copy with mc, then mc. When I extract a RAR archive with Ark, then unrar. When I split a huge file into seperate parts, then split.

On 02/22/2014 07:46 AM, helmet91 wrote:
>
> Always the one which performs the disk-write action. E. g. when I copy
> with mc, then mc. When I extract a RAR archive with Ark, then unrar.
> When I split a huge file into seperate parts, then split.

Under top, what percentage of the time is consumed by the user (labeled us), the
system (sy), or waiting (wa)? Copying a 4 GB file using cp in a console occupies
about 20% sy and 33% wa on my system.

us: 1.9%, sy: 3.1%, wa: 65%

So, something may be wrong.

On 2014-02-22 18:46, helmet91 wrote:

> us: 1.9%, sy: 3.1%, wa: 65%

AHHH! It is waiting for disk. It is not really busy.

> So, something may be wrong.

Indeed.

I’m suddenly remembering a recent thread, someone that had a very slow
“/home” partition. He had “/home” on raid 1, while the system was on an
SSD. One of the two sides of the raid was developing bad sectors, and it
was slowing down the system hugely.

So, tests to do:


hdparm -tT  /dev/whateverdiskyouhave
smartctl -a /dev/whateverdiskyouhave


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Result of hdparm: http://pastebin.com/b8mkdG18
Result of smartctl: http://pastebin.com/cD4Zp601

These test results do not seem to be unsatisfying to me. I think, the write speeds are acceptable, and the smart data also does not involve errors (I also did a smartctl short test).

I’m suddenly remembering a recent thread, someone that had a very slow
“/home” partition. He had “/home” on raid 1, while the system was on an
SSD.

Well, my construction is similar to this. The system is on an SSD, and my /home partition takes up the whole disk which causes the problems now. Before asking, I tried to find a similar issue in this forum, but did not find anything. Sorry, if I unnecessarily consume your time.

Did you do smartctl on both RAID drives? You only showed 1

On 2014-02-23 22:06, gogalthorp wrote:
>
> Did you do smartctl on both RAID drives? You only showed 1

We don’t know yet if he has raid. :-??
I assumed he might, comparing his problem with somebody else that had a
similar problem and was indeed using raid.

So, we need to know if he is using a raid, and if he has, then we need
results for both disks, of course.

The smartctl output appears correct, but I would like you (helmet91) to
run the smart long test, on all disks, then the results.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

I do not use raid. Still needed the long test on all disks?

Not on SSD

Your HD looks ok so it is not that. ie no bad sectors

It really looks like a HD problem but the HD looks OK. I’m stumped :stuck_out_tongue:

Yeah, my hard disk seems to be OK. I did the long smart test, and it completed without error.

On 02/24/2014 12:56 PM, gogalthorp wrote:
>
> Not on SSD
>
> Your HD looks ok so it is not that. ie no bad sectors
>
> It really looks like a HD problem but the HD looks OK. I’m stumped :stuck_out_tongue:

This one has me stumped too.

If we cannot figure out what is causing the long wait times, we should be able
to cut the total overall time.

The standard tools such as cp, mc, etc. use a relatively small block size when
reading/writing files. When both files are on the same HD, this causes lots of
overhead due to head repositioning and waiting for the correct sector to arrive
at the heads. The caching process helps, but it is overwhelmed when large files
are involved. Increasing the block size to 1 MiB, or bigger, will reduce the
number of operations by a lot. When you copy a file, try the following:


time dd if=<input_file_path> of=<output_file_name> bs=1M

The “time” at the beginning is optional, but it will list the elapsed, user, and
system time for the operation. That will show you the difference that changing
“bs” makes. I would try 1M, 2M, and 4M.

For a 400 MB file, I get the following:


finger@larrylap:~> sudo time dd if=junk.iso of=/root/junk.iso bs=1M
391+1 records in
391+1 records out
410421248 bytes (410 MB) copied, 12.4206 s, 33.0 MB/s
0.00user 4.38system 0:12.43elapsed 35%CPU (0avgtext+0avgdata 1860maxresident)k
801648inputs+801608outputs (1major+511minor)pagefaults 0swaps

When you try the differing block sizes, you do have to be careful that neither
the input or output files are cached. That will show as differing inputs and
outputs. Try on different files, on different days, or after a reboot. The page
fault and swap counts will also be interesting.

Finally, be very careful with the dd command. It is very easy to destroy a file
system with it.

Here are the results (I tested for different files so that they won’t be cached) for bs=1:


3041+1 records in
3041+1 records out
3189170176 bytes (3,2 GB) copied, 78,3076 s, 40,7 MB/s
0.01user 9.89system 1:18.32elapsed 12%CPU (0avgtext+0avgdata 1916maxresident)k
6229400inputs+6228856outputs (3major+531minor)pagefaults 0swaps

Results for bs=4M:


283+1 records in
283+1 records out
1187855383 bytes (1,2 GB) copied, 18,7771 s, 63,3 MB/s
0.00user 3.27system 0:18.77elapsed 17%CPU (0avgtext+0avgdata 5000maxresident)k
2320464inputs+2320032outputs (3major+1294minor)pagefaults 0swaps

The low CPU percentages are interesting, becouse System monitor tools like top show high CPU usage.

On 02/25/2014 11:56 AM, helmet91 wrote:
>
> Here are the results (I tested for different files so that they won’t be
> cached) for bs=1:
>
> Code:
> --------------------
>
> 3041+1 records in
> 3041+1 records out
> 3189170176 bytes (3,2 GB) copied, 78,3076 s, 40,7 MB/s
> 0.01user 9.89system 1:18.32elapsed 12%CPU (0avgtext+0avgdata 1916maxresident)k
> 6229400inputs+6228856outputs (3major+531minor)pagefaults 0swaps
>
> --------------------
>
>
> Results for bs=4M:
>
> Code:
> --------------------
>
> 283+1 records in
> 283+1 records out
> 1187855383 bytes (1,2 GB) copied, 18,7771 s, 63,3 MB/s
> 0.00user 3.27system 0:18.77elapsed 17%CPU (0avgtext+0avgdata 5000maxresident)k
> 2320464inputs+2320032outputs (3major+1294minor)pagefaults 0swaps
>
> --------------------
>
>
> The low CPU percentages are interesting, becouse System monitor tools
> like top show high CPU usage.

“time” does not show the wait time. In any case, your disk system is performing
better than mine. At 1M bs, you transferred at least a 20% higher rate than my
system did.