I have a customer who is running our Linux application server product on SuSe Linux. They are having a mysterious problem where the application server process is terminating for no explainable reason. Nothing gets logged in the logs that the application itself creates, and it doesn’t appear that anything is being written to any of the system logs I checked. Also, they followed some instructions on the Novell website to enable core dumps to be created, but no dump files are actually being created (which, again, leads me to believe the application is not actually crashing).
We think that, maybe, some other process might be shutting down the daemon, but really don’t know.
Is there some way to get an additional level of system logging, and somehow have the OS or a ‘watchdog’ process log any time that process is terminated, what signal it was terminated with, what the return value was (if any), and where the termination signal originated from (e.g. from the application itself, or from a different process)?
You could attach ‘strace’ to the process, though depending on the
complexity of the app this can become non-trivial quickly, and the
amount of data you generate will be non-trivial if it works properly in
any case. Other methods, well, I’m not sure. Anything show up in
various other logs? Does it happen at a particular time of day? Other
machines (test/QA/etc.) do this at all?
Good luck.
jsbiff wrote:
> Hello,
>
> I have a customer who is running our Linux application server
> product on SuSe Linux. They are having a mysterious problem where the
> application server process is terminating for no explainable reason.
> Nothing gets logged in the logs that the application itself creates, and
> it doesn’t appear that anything is being written to any of the system
> logs I checked. Also, they followed some instructions on the Novell
> website to enable core dumps to be created, but no dump files are
> actually being created (which, again, leads me to believe the
> application is not actually crashing).
>
> We think that, maybe, some other process might be shutting down the
> daemon, but really don’t know.
>
> Is there some way to get an additional level of system logging, and
> somehow have the OS or a ‘watchdog’ process log any time that process is
> terminated, what signal it was terminated with, what the return value
> was (if any), and where the termination signal originated from (e.g.
> from the application itself, or from a different process)?
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
I’m not sure about strace - our application server already uses a pretty substantial amount of RAM (as it caches a lot of data from a database in memory), but thanks for the suggestion, I’ll keep that in mind - guess it might be worthwhile.
I’ve tried asking the customer if this issue happens on any other computers (or even a virtual machine) setup with substantially the same OS and software, but didn’t get any real response to that inquiry. I’m hoping to convince him to try it on another machine, as I think it’s just something wonky on that computer - we have quite a few other customers running our product on Linux, and haven’t heard of any other complaints of this nature.