7zip doesn't use the full power of the cpu.

Hi all, i’m curently compressing a big 100GB file with 7zip on a system with
an intel 2,66Ghz quadcore processor.
Allthough 7zip uses 4 cores to compress the file, it still uses only about
40% of the cpu performance.
The cpu isn’t doing anything else, nothing heavy anyway.

My question is why?
I started compressing 8 hours ago and the compression is currently 47% done.
Reading and writing the files isn’t very fast so my harddrive isn’t slowing
the process down.

So what’s the holdup?
There is still 53% of work to be done.
Is it for some reason impossible to do this faster?

Chris Maaskant

hi,

you will probably have a high %iowait since your hard drive is not delivering the data fast enough. that is not a bug, it’s a technical fact that your cpu is much faster than your hard drive. in this case your cpu will simply go to sleep or calculate other processes until the hard drive delivered new data.

you can just play the waiting game with the 60% rest of your cpu-power :slight_smile:

On 10/24/2010 09:21 AM, Chris Maaskant wrote:
> Hi all, i’m curently compressing a big 100GB file with 7zip on a system with
> an intel 2,66Ghz quadcore processor.
> Allthough 7zip uses 4 cores to compress the file, it still uses only about
> 40% of the cpu performance.
> The cpu isn’t doing anything else, nothing heavy anyway.
>
> My question is why?
> I started compressing 8 hours ago and the compression is currently 47% done.
> Reading and writing the files isn’t very fast so my harddrive isn’t slowing
> the process down.
>
> So what’s the holdup?
> There is still 53% of work to be done.
> Is it for some reason impossible to do this faster?

There’s lots of things that could be in play here. Are you swapping?
I’d think there would be lots of that for a file that big.

Are you writing to the same disk you’re reading from? That will
definitely be a bottleneck too.

You say “Reading and writing the files isn’t very fast”. Not sure what
you mean by that, but if file i/o is slow, then it’s going to slow the
process down.

Also, keep in mind that the system is maybe throttling the process
somewhat to allow other things to get some CPU time. You can probably
‘un-nice’ the process so it gets more priority. Never tried but read
the nice man pages and see if they offer any options.

Good luck…

…Kevin

Kevin Miller - http://www.alaska.net/~atftb
Juneau, Alaska
In a recent survey, 7 out of 10 hard drives preferred Linux
Registered Linux User No: 307357, http://counter.li.org

Kevin Miller wrote:

> There’s lots of things that could be in play here. Are you swapping?
> I’d think there would be lots of that for a file that big.

No swap isn’t used at all, i have 6GB of ram

> Are you writing to the same disk you’re reading from? That will
> definitely be a bottleneck too.
>
> You say “Reading and writing the files isn’t very fast”. Not sure what
> you mean by that, but if file i/o is slow, then it’s going to slow the
> process down.

Sorry i didn’t make that clear.
I meant that if it takes more than 8 hours just to read and write 47% of the
file there isn’t much reading and writing going on so the drive isn’t the
bottleneck.

> Also, keep in mind that the system is maybe throttling the process
> somewhat to allow other things to get some CPU time. You can probably
> ‘un-nice’ the process so it gets more priority.

When i convert an dvd to an mkv file with handbrake the cpu is at 100% and
full speed so the system can and lets processes use the cpu 100%.
The system may throttle it down if there is something else to do for the
cpu, but at the moment there isn’t.

That’s the reason i was wondering why 7zip uses more power from the cpu.
I did give the proces a higher priority but it didn’t make a difference
because no other process got in the way, so it allready had the cpu all to
it self.


Chris Maaskant

brian j wrote:

> you will probably have a high %iowait since your hard drive is not
> delivering the data fast enough. that is not a bug, it’s a technical
> fact that your cpu is much faster than your hard drive

Sorry i wasn’t clearer on that.
I explained a bit more in my reply to Kevin.
The process is so slow that the harddrive isn’t used that much.
But i am reading and writing on the same drive.

If i would decompress the file i would write to another drive because
decompressing is much faster.


Chris Maaskant

Compression is a complex problem it uses lots of temp storage and may need to revise things mid stream because of a change in the frequency of tokens it uncovers. There are lots of lookups to see if a token already exists or need to be added to the list. 100 gig is a very large file. Setting niceness lower may help use more CPU. If you used 100% it would still take a very long time and then your machine would do little else. Generally things are scheduled to provide other programs time slices.

On 2010-10-25 00:19, Chris Maaskant wrote:

>> You say “Reading and writing the files isn’t very fast”. Not sure what
>> you mean by that, but if file i/o is slow, then it’s going to slow the
>> process down.
>
> Sorry i didn’t make that clear.
> I meant that if it takes more than 8 hours just to read and write 47% of the
> file there isn’t much reading and writing going on so the drive isn’t the
> bottleneck.

I/O wait is not only raw read. The disks can be pretty fast if it simply needs to copy one large
file from source to destination. If it has to find a thousand files all over the disk, time is much
longer. I’ll try to draw an inaccurate picture.

  • Get a filename from the list.
  • Locate and read the directory entry. If it is several levels deep, seek several sectors.
  • Locate and read the inode entry.
  • Locate an read the data sectors.

To do this, the head has to move around at least four times per file. It is this part that is slow.

Plus compressing, locating tokens in the large dictionary (disk, memory?), writing compressed
archive (and dictionary?).

You can find out if this is the problem with an applet gnome has - I dunno if there are equivalent
in kde, because even gkrellm doesn’t have this feature. The gnome cpu applet can show IOwait, if the
color for this is changed from the default black to something else (I choose violet).

I often see when some task is unusually slow that there is a largish violet zone, which means that
the cpu is waiting for the disk.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Chris Maaskant wrote:
> Hi all, i’m curently compressing a big 100GB file with 7zip

it is very possible i do not understand, but why use a compression
program written for ms-windows? (with its different hooks, memory
model, etc etc etc)

how does the time to compress compare to a native Linux app?
this link may be helpful:
http://www.linuxjournal.com/article/8051


DenverD
When it comes to chocolate, resistance is futile.
CAVEAT: http://is.gd/bpoMD [posted via NNTP w/openSUSE 10.3]

DenverD, 7z / LZMA (the actual algorithm) is not too unlikely on a Linux system and it exists in a native, open source version (openSUSE .rpms are compressed via LZMA, some other distributions such as Slackware, Gentoo or Arch use LZMAs as well). I have not read the article you provided (I am a bit in a hurry) but as far as I know, the compression ratio of 7z / LZMA by far is superior to common file archivers (rar, zip, gzip etc.) plus it is superfast when decompressing an archive. So there are some good reasons to use 7z on Linux (the only disadvantage being that it can not take unixoid file attributes into account, so when using it for system backups or the like, one should use another archiver before running 7z on it).

I suppose this is a misunderstanding; Chris Maaskant says he is using “7zip”, which is the name of the Windows version indeed. I am quite sure he is referring to the native version ‘7z’ / ‘p7zip’.

Chris Maaskant wrote:
> with 7zip

@Chris Maaskant, did you intend to type 7z or p7zip rather than 7zip?

did you download the source of 7zip (and compile it) from here:
http://www.7-zip.org/download.html or what?


DenverD
When it comes to chocolate, resistance is futile.
CAVEAT: http://is.gd/bpoMD [posted via NNTP w/openSUSE 10.3]

Howdy,
I would recommend a try increasing your application’s running priority

Setting Application Priority in OpenSuSE

HTH,
Tony

If nothing else of importance is running on your box, set it to -20.

Tony

gropiuskalle wrote:

> I suppose this is a misunderstanding; Chris Maaskant says he is using
> “7zip”, which is the name of the Windows version indeed. I am quite sure
> he is referring to the native version ‘7z’ / ‘p7zip’.

You got that right, thnx :slight_smile:

Chris Maaskant

DenverD wrote:

> Chris Maaskant wrote:
>> Hi all, i’m curently compressing a big 100GB file with 7zip
>
> it is very possible i do not understand, but why use a compression
> program written for ms-windows? (with its different hooks, memory
> model, etc etc etc)

I said 7zip but ment 7z, i didn’t know there was a difference.

> how does the time to compress compare to a native Linux app?
> this link may be helpful:
> http://www.linuxjournal.com/article/8051
>
They should test on different kind of files not just some log file.
For example:
I made an backup of my windows boot partition of 100MB with gparted.
Compressing this file with rar with maximum compression resulted in an
56.8MB file in 52 seconds.
With 7z it became an 57.1MB file in 25 seconds.
7z is the winner for me here, little less compression but twice as fast.

Compressing an 30MB wave file resulted with rar in a 20MB file in 6 seconds,
and with 7z in a 25MB file in 10 seconds.
Rar is clearly the winner here.

So i use different compression tools for different types of files.


Chris Maaskant

Carlos E. R. wrote:

> I/O wait is not only raw read. The disks can be pretty fast if it simply
> needs to copy one large file from source to destination. If it has to find
> a thousand files all over the disk, time is much longer. I’ll try to draw
> an inaccurate picture.
>
> * Get a filename from the list.
> * Locate and read the directory entry. If it is several levels deep, seek
> several sectors. * Locate and read the inode entry.
> * Locate an read the data sectors.
>
> To do this, the head has to move around at least four times per file. It
> is this part that is slow.
>
> Plus compressing, locating tokens in the large dictionary (disk, memory?),
> writing compressed archive (and dictionary?).
>
>
> You can find out if this is the problem with an applet gnome has - I dunno
> if there are equivalent in kde, because even gkrellm doesn’t have this
> feature. The gnome cpu applet can show IOwait, if the color for this is
> changed from the default black to something else (I choose violet).
>
> I often see when some task is unusually slow that there is a largish
> violet zone, which means that the cpu is waiting for the disk.

I have to test this next weekend.
I’ll use some smaller files and test different settings.
Like different software and reading from one disk and writing to another.

Thank for the explanation :slight_smile:

Chris Maaskant

Something else you might check is the disk space available for your temporary files.

If, for example, you have /home in a separate partition, usually / is in a much smaller partition, which may severely limit the space available for /tmp.

Just a thought.

gogalthorp wrote:

>
> Compression is a complex problem it uses lots of temp storage and may
> need to revise things mid stream because of a change in the frequency of
> tokens it uncovers. There are lots of lookups to see if a token already
> exists or need to be added to the list. 100 gig is a very large file.
> Setting niceness lower may help use more CPU. If you used 100% it would
> still take a very long time and then your machine would do little else.
> Generally things are scheduled to provide other programs time slices.
>
Yeah it was quite nice to have my system not choking while compressing.
If it wasn’t for the cpu fan making more noice i wouldn’t have noticed it at
all.
I’ll test some settings next weekend to see if it makes a reasonable
difference.

Thnx.


Chris Maaskant