Korn-shell script freezing

hello everybody,

recently we have a big problem with running our Korn-shell scripts on SUSE Linux systems on some customer’s sites. The problem takes place only on Suse platform (SLES) with kernel version 2.6.16.60-0.21-smp (output of “uname -r” command).

The scenario is basically like this:

We have two kind of the Korn-shell scripts, the “parent” script and the “child” script, the parent script spawns at its start a limited number of child script processes in the backround. The number of concurrent childs is limited by a constant (typically cca 10 of them) and controlled by the “semaphore” files (ordinary files in the filesystem), every child signaling it’s start/end by creating a separate semaphore file for both of the events. The parent process cycles in a loop, processing semaphore files and removing them from the filesystem afterwards, saving the current number of processes in a special file of it’s own.

Now - under some circumstances - the parent process seemes to become “frozen” (ps ax showing “S” state), while child processes running normally - waiting for their semaphore file to be removed by the parent. Normally the child process spawns another script which starts some database processing, but we are able to simulate the “freezing” state also with simple echo commands instead of the real processing - seems it doesn’t matter what the real processing is.

We made some tests to get to the clue of this and we found out that:
- the state occurs typically when the OS is loaded by a lot of external processes/threads (context switches?), but neither CPU, nor RAM needs to be necessarily 100% used
- once the parent script has already got “frozen”, it is only very rarely awaken by OS, even if the overloading situation is over - external processes killed
- we are not able to simulate this behaviour on other Linux/Unix (we tried CentOS, Ubuntu and AIX till now)
- we have to overload machine with 4 CPU’s by much more processes then 2 CPU’s machine to invoke the state.

If anyone has a clue to this, thank you in advance for your note here, it’s quite urgent !

zdenovan

Started reading your post I came very soon to the mentioning of SLES. Do you realize that this is the openSUSE Forums and that SLES is at SUSE Linux Enterprise Server (SLES) - NOVELL FORUMS?

In any case, I had some ksh problems like hang earlier (SuSE 9 and also later in openSUSE). Since then I always replace the installed ksh file with one I have for ages. Works to my satisfaction.

Thank you for your reply ! Pls, don’t you remember what was the major difference between those two versions of the script ?


As to the subforum I used:
I was to put my question to SLES, but have no right to do anything except there, except of reading messages so I decided put it to Scripting. In the lower left corner I have:

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

Zdenovan

or you mean ksh interpreter binary ?

Yes, I do. I must correct my statements above a bit. I only do replace the installed ksh by mine if I encounter problems. I could go a bit more into details, but I do not think that would help you much. What I can do is send it to you with an e-mail (size 640700 bytes) . Then it is up to you what to do with it.

When you want it please send me an e-mail address via PM.

And about SLES. We normaly try to redirect people there because openSUSE and SLES are different and most of us can not recreate SLES problems or look into SLES systems. When you use SLES you should have a support contract. When not you can not use the SLES/SLED Forums AFAIK.

ok, so I sent my email address to you by PM, pls send the binary there. By the way, our installed ksh has only 5144 bytes - uses shared libs …

My installed ksh (openSUSE 11.2) has 5656 bits.

Send it. Try it and dowhat you want with t.