Can this be done? Automatically restarting a program when it cashes?

On 2012-03-18 12:16, hcvv wrote:
>
> Contemplating your information, I switched from thinking about using
> -cron- to sometthing more in line with Carlos’ ideas.

> Code:
> --------------------
> #!/bin/bash
>
> # This runs the wspr program in an endless loop.
> # Because sometimes a line is read after wspr crashes, it reads from /dev/null.
> # To stop the endless loop type Ctrl-C.
>
> while true
> do wspr </dev/null
> echo “=== Restarting ===”
> done
> --------------------

I was thinking of this idea, from a script I used time ago to restart the
nscd service that used to crash randomly - it was run from cron… no, I’m
mistaken, I don’t remember how I started it. Anyway:


#!/bin/bash
# watchdog para reiniciar el servicio nscd

# idea del case en "307:rc"
/usr/sbin/rcnscd status ; status=$?
echo "Status= "$status
case $status in
[1-47])  echo "failed"
/bin/logger -p user.warn -t watchdog "nscd is not running,
restarting. -- Bugzilla 387202; see root's crontab to disable this wd"
/usr/sbin/rcnscd restart
;;
[56])   echo "skipped"
;;
0|*) echo "Nothing to do"
;;
esac

The idea is that programs use an exitcode to inform the parent of what
happened. A “0” means “no problems, normal exit”.

Let me try to adapt your script with this idea:


#!/bin/bash
while true
do
wspr
status=$?
case $status in
0)  echo "Normal exit."
exit
;;
*)  echo "Exitcode= $status"
echo "=== Problems! Restarting... ==="
;;
done

I have done no testing at all, there might be bugs. Of course, if the wspr
program always exits with zero, it will not work. If it does, a normal
closure of the program will also exit the script.

If I were to use it, I would also add a timestamp.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

I understand (and in fact you mentioned this earlier and I then understood) fully how to use return codes.

It might be a better solution, but it would involve a lot of testing, specialy on those crashes (of which there seem to be at least two versions with and without reading from stdin). My idea is that when the OP is satisfied with what we have, we could spare him all those finding out and communication about about the return codes het gets.

On 2012-03-18 16:56, hcvv wrote:
>
> I understand (and in fact you mentioned this earlier and I then
> understood) fully how to use return codes.

I didn’t.
I had to search all my scripts, I did not remember how to do it. O:-)

> It might be a better solution, but it would involve a lot of testing,
> specialy on those crashes (of which there seem to be at least two
> versions with and without reading from -stdin-). My idea is that when
> the OP is satisfied with what we have, we could spare him all those
> finding out and communication about about the return codes het gets.

Well, it would not be a problem for me to write all those changes if it
were my problem, it is easy to do - but I was a programmer. I suppose he
can do that himself :-?

But basically we only have to consider two cases, 0 and the rest, which is
what the script covers. If it works, clicking close on the program would work.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

You could shorten the whole considerably:


while true
do      wspr </dev/null && break
      echo "Restarting"
done

But we have to know for certain what the return codes are when the program crashes. And we do not know if they are non 0.

When I use e.g. kwrite for a test, it returns 0 on a forced close of the window, but that is not a crash.

On 2012-03-18 21:26, hcvv wrote:
>
> You could shorten the whole considerably:
>
> Code:
> --------------------
>
> while true
> do wspr </dev/null && break
> echo “Restarting”
> done

Yes… I guess. But you would not learn the exitcodes the software uses
(and I’m curious about those), than can also be useful for reporting upstream.

Why devnulling the input? I don’t have that part clear. I read your comment
on that, but I don’t understand it :-?


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

I am testing as I type this. It indeed works as you describe it.
I had a problem with trying to make it executable from the command line. I used the permissions tab of Dolphin properties to make it executable which does the same thing but with a GUI.

http://www.slhess.com/pictures/runwspr.png

Please, please, do not post such difficult pictures. I thought that with more then 1000 posts here, you would have noticed that we post part of computer sesions by copying/pasting them from the terminal window into the post between CODE tags. http://forums.opensuse.org/english/information-new-users/advanced-how-faq-read-only/451526-posting-code-tags-guide.html

You can of course use Dolphin or any other filemanager to do the chmod for you. But it is strange that the file can not be found. Did you place it there or somewhere else?

Now let us wait for a crash to happen. And please report back then.

No I saved it to my /home folder.
The image was to show the file exists.
I guess I could have done that with a list command of some type.
I don’t use the command line that much so I remember a limited number of commands.

Thanks this is quite useful to me.

It is not very clever to put the file elsewhere and then trying to do something with it in the place I advised you to put it!

I strongly advise you to move it in the correct place. Your personal bin directory is made for it. Use it, it is in your PATH.

There is no .bin in my /home/flamebait directory and I am not allowed to copy it to /bin.
Do I need to elevate my privileges to move it?
Sorry if I am so dumb I don’t usually even think about this.

Answered my own question and the answer is yes.

On 2012-03-18 22:46, FlameBait wrote:
>
> There is no .bin in my /home/flamebait directory and I am not allowed to
> copy it to /bin.

Then create it. It is “/home/flamebait/bin”.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Actually I moved it to /bin and did away with the need to


sh runwspr

On 2012-03-18 23:36, FlameBait wrote:
>
> Actually I moved it to /bin and did away with the need to

it would be better in “/home/flamebait/bin” as we told you - otherwise it
will be lost when you install the newer openSUSE version.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

It has be done. Path is /home/flamebait/bin now.

Who talked about puting something in *.bin earlier in this thread?
Who talked about puting something in
/bin *earlier in this thread?
In both cases: nobody!

What is the use of me puting spare time in preparing carefully a solution, including th instructions to put it in place when you do not follow them, inventing characters that were never there along the way? This is rather frustrating.

hcvv I must have got my wires crossed. Sorry.
It’s been moved back to /home/flamebait

I am still waiting for something to disturb wspr but nothing has yet. Watch it work flawlessly in 12.1.
I know opening the pluseaudio mixer up will crash it as it steals focus away the resources using the stream.
So I don’t mess with pulseaudio mixer while it’s running.

I repeat for the last time: it should be in /home/flamebait/bin and nowhere else.

LOL. That is where I moved it from. I am listening. Just thick some times. It’s back in there now.

OK I have had my first instance of the program wspr crashing. However this time it was in such a state that that runwspr couldn’t catch it and I have to do a control-c to stop wspr ant run runwspr again as it was locked up. This is not the way wspr always exits or crashes however as I know from much past experience.

Here is the screen dump including the crtl-c


 Exception in Tkinter callback
Traceback (most recent call last):
  File "/usr/lib64/python2.7/lib-tk/Tkinter.py", line 1410, in __call__
    return self.func(*args)
  File "/usr/lib64/python2.7/lib-tk/Tkinter.py", line 495, in callit
    func(*args)
  File "/usr//share/wspr/wspr.py", line 1106, in update
    draw_axis()
  File "/usr//share/wspr/wspr.py", line 447, in draw_axis
    c.create_line(0,j,i1,j,fill='black')
  File "/usr/lib64/python2.7/lib-tk/Tkinter.py", line 2201, in create_line
    return self._create('line', args, kw)
  File "/usr/lib64/python2.7/lib-tk/Tkinter.py", line 2189, in _create
    *(args + self._options(cnf, kw))))
TclError: invalid command name "7"

^CTraceback (most recent call last):
  File "/usr//share/wspr/wspr.py", line 1762, in <module>
    root.mainloop()
  File "/usr/lib64/python2.7/lib-tk/Tkinter.py", line 1017, in mainloop
    self.tk.mainloop(n)
KeyboardInterrupt