OpenSUSE 10.2 freezing on several machines

Hello Everyone,

We have a large number of machines using OpenSUSE 10.2 deployed at customer sites. These machines display a Java (Swing) point of sale application.

We were seeing random freezes on these machines (UI freezes and machines drop off of the network and become unreachable). This typically happens when the systems are idle. As or the frequency of freezing; somedays several machines in different locations freeze, other times the machines work for days without a hiccup.

These machines are usually built using the same hardware and imaged with PartImage.

One thing we have noticed was that the frequency of the freezing has noticeably increased when we updated the machines from Java 1.4 to 1.6.

When the application is ran using the same java version on a different set of Hardware and on openSUSE 11.x this problem goes away.

Nothing out of the ordinary was found when the logs in /var/log were investigated after recovering from a crash.

Would appreciate any pointers or tools that can aid in the debugging.

On 10/24/2012 06:46 PM, silentecho wrote:

> We have a large number of machines using OpenSUSE 10.2 deployed at
> customer sites.

wow! openSUSE 10.2 went past its end of live and out of support in
November 2008 and therefore has had zero security patches available for
for four years…i hope they are not exposed to the internet because an
unpatched 10.2 is hmmmmmmmmmm, not well protected…

are you, by chance (and hopefully you are) actually running SUSE Linux
Enterprise 10 SP2 ? if so, it is (i think) still supported…

you can learn for sure which you are using by this command:


cat /etc/SuSE-release

if you are running the enterprise product you just need to go to their
forum, here: http://forums.suse.com/

but, if you are actually running openSUSE 10.2 you are welcome to hang
out here and see if anyone happens along who remembers much about that
and can help you…

but i wouldn’t stake my business on that happening…

> We were seeing random freezes on these machines (UI freezes and
> machines drop off of the network and become unreachable).

oh no…they are on “the network”…so, i’d suggest you do need to
consider the possibility that the freezes might be a sign that you have
had visitors…

i would begin by booting from a Live CD and closely inspecting for signs
of a root kit or whatever…

> This typically happens when the systems are idle.
> As or the frequency of freezing;
> somedays several machines in different locations freeze, other times the
> machines work for days without a hiccup.

see, that could mean that a distant party is tasking the machines to
(say) spew spam, or participate in a DOS attack…

i am NOT saying that is what is going on…just that it is a possibility…

> These machines are usually built using the same hardware and imaged
> with PartImage.
>
> One thing we have noticed was that the frequency of the freezing has
> noticeably increased when we updated the machines from Java 1.4 to 1.6.

i’m running still supported 11.4 and SUN Java 1.6 and have had no
‘freeze’ in over a year…so, i can’t trouble shoot from here…

> When the application is ran using the same java version on a different
> set of Hardware and on openSUSE 11.x this problem goes away.

do you also have openSUSE 11.x in production? today only 11.4 is
supported, and that run out in a couple of weeks…

you might need to look at converting your customers to a longer
supported distribution, or have a look at the enterprise version at
suse.com (there are many alternatives, of course)

> Nothing out of the ordinary was found when the logs in /var/log were
> investigated after recovering from a crash.

that is not good news–because root kits are routinely built to not
log their activity…

> Would appreciate any pointers or tools that can aid in the debugging.

only you can determine what is best, but i would highly recommend you
take steps to ensure the security of those systems…

and, hang around here as i am by FAR not the most experienced person
here and you may get a much different view (didn’t start messing with
linux until about '98)…might even run into one who also is running
10.2 and knows exactly which knob to turn to reinstate solid 24x7 service!!


dd http://tinyurl.com/DD-Caveat

Thanks for the heads up. But these machines are inside a VPN. And we have been able to replicate this problem in-house on a freshly imaged machine, so I do not think this is due to malicious access.

It is very well possible that your systems are not compromised. And running software for such a long time will assure you a stable system … as long a all goes well. But now you have something that allmost nobody here will have. Thus it will be almost impossible for anybody here to compare or recreate your problems with something they have. Also the combination of Java 1.6 with openSUSE 10.2 will be a very scarce one (and that is a very optimistic phrase) and probably was never a supported or tested combination.

In other words, though people are generaly willing to help, your changes are very low in this case. Sorry that I can not be more optimistic.

hcvv wrote:
> It is very well possible that your systems are not compromised. And
> running software for such a long time will assure you a stable system
> … as long a all goes well. But now you have something that allmost
> nobody here will have. Thus it will be almost impossible for anybody
> here to compare or recreate your problems with something they have. Also
> the combination of Java 1.6 with openSUSE 10.2 will be a very scarce one
> (and that is a very optimistic phrase) and probably was never a
> supported or tested combination.
>
> In other words, though people are generaly willing to help, your
> changes are very low in this case. Sorry that I can not be more
> optimistic.

I would agree with Henk’s assessment. The only thought I can offer is
that intermittent failures and changing rates of failure makes me think
about race conditions and deadlocks. Perhaps the newer Java has more
parallelism or faster code paths or something like that.

When you say that they drop off the network, do you mean that they are
completely unreachable using ping or telnet or whatever, or just that
your application doesn’t respond. And when this happens, do they still
respond to local console input (at shell level, not your app’s GUI)?

If the machine is completely dead, then it seems very likely to be a
kernel problem, and you will need to update the OS to fix the problem.
Or recode your app to avoid the bug. If the machine is still alive, then
maybe it’s a logic error in your app that you can fix.

My instinct would be to move to a current system - perhaps SLED or SLES
if you need long-term operation. If you can’t do that, I would start
adding debug logging to try to find out what it is doing just before it
crashes. Maybe debug logging would stop it being ‘idle’ enough so the
problem disappears :slight_smile:

On 10/25/2012 10:06 AM, silentecho wrote:
> so I do not think this is due to malicious access.

very happy to hear that!!

still, i wouldn’t know where to begin in helping to unravel the
problem…hmmmm (let me state the obvious):

  • the best time to leave an unsupported version is while it is still
    supported

  • and, there are several distributions with longer lives, one i know of
    which is still supported existed the same day someone there settled on
    10.2, and that would be a SLES 10 (with an easy step to SLES 11 which
    goes on and on…

now, this just dawned on me: there might be a very easy move from
openSUSE 10.2 to SLES 10, and it has some years to go in support…and,
the cost to just have routine updates (without hand holding) is very
low–well, there are free forums, like these for cost free hand holding…

i do know they rolled SUN Java into SLES a while back…oh!! you can
free download and install SLES 10 or 11 with FREE 90 day support and
with that can update it to current (with the new java and kernel, etc
etc etc) and see how you app runs…if you like it you can just let the
free 90 days run out and patch it manually … something to think about…


dd http://tinyurl.com/DD-Caveat

IMHO, now is your moment to upgrade the OS. Like already said, SLES or SLED would be a good option, or the latest openSUSE.