Abysmal multithreading I/O? performance since 12.2

Hi,

I recently updated openSuSE 12.1 to 12.2 on our server via zypper dup. Some things worked fine, others did not. For example NXServer and Gnome don’t collaborate anymore :confused: so I can only use KDE through NXServer now but that’s a minor problem compared to the threading or multitasking problems I encounter:
The server is equipped with 4x8-core opterons and 256 GB of RAM and is usually used for memory-intensive MATLAB simulations. In 12.1 I already had MATLAB threading issues due to the Java version bundled with MATLAB. After changing this to the system’s java the problems vanished.
Now on 12.2, however, MATLAB threads exhibit the same horrible slowdowns as in 12.1 and no changing of Java versions whatsoever seems to help a bit.
In addition, I have a few FORTRAN simulations which may be run in parallel - they compute separate things and only rely on a set of shared files in a common directory, meaning they are inherently I/O bound. In 12.1 I could run say 20 of these without noticing a slowdown, one process took approx. 10-15 minutes. In 12.2, when running 16 in parallel, one process takes more than 300 minutes and still doesn’t finish. First I thought they ran into some kind of deadlock but according to ‘strace’ this isn’t the case. It just seems that the I/O is terribly slow.
At the same time I see dozends of kworker/xx and watchdog processes with high CPU times, apparently trying to schedule something? The load increases to >100 when I would expect <20.

I’m running the default 12.2 kernel (3.4.6-2.10-default) with no other severe modifications to the base system and the update went smoothly If I remember correctly.

Does anyone have some idea for me? Frankly I’m out of ideas and also a little disappointed because we urgently need a working system.

Please also tell me which additional information to provide, if there’s anything missing.

Thanks a lot for your assistance.

On 10/09/2012 04:26 PM, glucke wrote:
> At the same time I see dozends of kworker/xx

please show the output of


uname -a

why run KDE on your ‘server’?

other than that i saw a thread somewhere about problems with multi-core
on 12.2, as i recall it was something about the default cores being set
to one, but i can’t now find the thread, so don’t recall the fix…

maybe it was in a mail list…

but, for sure you need not run the linux worlds most resource hungry GUI
needlessly if you are concerned about performance.

oh wait! i see you are running the default kernel…i’m not certain that
one has multi-core enabled anyway…and, i think the desktop kernel (not
the default kernel) is the one installed with a 64bit system, for a
reason…


dd

I am specially surprised about the watchdogs.

It is clearly a regression -> http://bugzilla.novell.com/

That said, you can just try with the latest stable kernel: http://download.opensuse.org/repositories/Kernel:/stable/standard/

please don’t turn this into an argument for or against KDE. I have non-Linux-savvy users that need a GUI at least for launching MATLAB and copying files. That maybe 1% additional load for GUI displaying or whatever is irrelevant to me. And as any GTK application segfaults my nxclient/server currently there is no other possibility.

uname -a currently gives:
Linux hostname 3.4.6-2.10-default #1 SMP Thu Jul 26 09:36:26 UTC 2012 (641c197) x86_64 x86_64 x86_64 GNU/Linux
I just ran the first update after the distupgrade and the situation seems to have improved a little. Now I can run four of my FORTRAN programs in parallel without kworkers or watchdogs hogging the CPU. From eight or ten on it gets difficult again (as opposed to >24 possible with 12.1)

I agree, on a machine like this a desktop environment shouldn’t be the issue. If you run " top " without any MATLAB activity, do you see anything extraordinary?

Hmm, I see the occasional (idle?) Java process (MATLAB workers are Java-based) but only at 2-10 percent CPU which vanished promptly. Apart from that I don’t think so.
What confuses me most is that the problem arises at two seemingly (very) different workloads. The FORTRAN programs do a lot of I/O while the MATLAB processes mainly calculate (or at least the problem arises only when calculating and not when doing I/O in MATLAB).
Could the common denominator be memory access? Both programs presumably copy/move a lot of memory. I’m unfamiliar with the caching mechanisms and strategies for disk files but the files in the working directory surely get read and written a lot.

I also just noticed a number of “migration/xx” processes already with >30 minutes CPU time. What do these normally do? Could this be any indication?

Perhaps I should really try the ‘stable’ kernel. We already used it once because of Infiniband support.

On 10/09/2012 08:16 PM, glucke wrote:
> please don’t turn this into an argument for or against KDE.

you brought up the “dozends of kworker/xx and watchdog processes…The
load increases to >100 when I would expect <20.”–from where might you
think those come?

i do not know if XFCE will run with your nxclient/server, but it will
certainly not spawn kworkers, and as far as i know, no (or few)
watchdogs either…

and, if you have not yet killed all the desktop searcher junk (akonadi,
nepomuk, strigi, tracker) you might consider that also…

and cut out all the desktop effects you can live without.


dd

kworkers and watchdogs have nothing to do with KDE. And they are not ‘spawned’ by KDE either because they are kernel entities that are present anyways but usually dormant. They get activated when the kernel has work to do like caching or migrating threads.
Of course I disabled the search stuff, at least for the users where this was a problem, but still, even will all this active, there won’t be a noticable peak in load. KDE is that bloated to use 32 cores to their max. GUI overhead in general is negligible when doing scientific (or whatever) computing.

oh :slight_smile: this should read:

Freudian slip :wink:

On 2012-10-09 16:26, glucke wrote:

> Does anyone have some idea for me? Frankly I’m out of ideas and also a
> little disappointed because we urgently need a working system.

Are you using the same filesystem, or did you change that when upgrading?

With a system like yours, I would have two installations on two partitions: one 12.1, another
12.2, so that I could test the next version while still having the previous version to revert
to if it did not work right.

Or, if the system setup is complex and you need to really do upgrades, then I would use a test
partition with 12.2 prior to do the upgrade of 12.1.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

glucke wrote:
> oh :slight_smile: this should read:
> glucke;2494800 Wrote:
>> KDE is NOT that bloated to use 32 cores to their max. GUI overhead in
>> general is negligible when doing scientific (or whatever) computing.
>
> Freudian slip :wink:

I’ve been following this thread with interest, although without any
helpful ideas I’m afraid. It seems that noone here has managed to solve
the problem after quite a while, so I’d suggest that you ask on the
mailing list, where there might be more people running similar systems
and more Suse employees.

On 10/10/2012 08:36 AM, glucke wrote:
>
> kworkers and watchdogs have nothing to do with KDE. And they are not
> ‘spawned’ by KDE either because they are kernel entities that are
> present anyways but usually dormant.

of course, it is possible that your multiple MATLAB and FORTRAN
simulations are prompting the kernel’s excessive kworker/wathdog
threads…but, i have seen several instances of KDE users (with neither
the math or other simulations you have) complaining of them running
rampant and sucking off most all apparent computing power–also for no
apparent reason!!

so, i would say you probably need to pay close attention to RedDwarf and
Dave’s most recent posts…(log a bug and ask the opensuse mail list…)


dd