View Single Post
  #1 (permalink)  
Old 29-Jun-2009, 10:09
pmichelmp3's Avatar
pmichelmp3 pmichelmp3 is offline
Puzzled Penguin
 
Join Date: Oct 2008
Location: Montreal
Posts: 5
pmichelmp3 hasn't been rated much yet
Angry screen saver hangs in select() call - - kernel problem?

I'm creating a new thread, because all the existing screen saver problems dont address my problem

Problem:
On my laptop, many different screen saver will start and sometimes will eat-up all CPU time, at this point, the screen saver process is jammed somewhere in an kernel call, thus, even if I press a key, the screen saver owns the keyboard and mouse inputs and totally prevents me from unlocking it, and thus prevents me from opening a tty shell CTRL-ALT-F?
Often I had to use a power reset to reboot the laptop...quite embarrassing for a unix system

Based on an strace log, I'm pretty sure the screen saver hangs in a select() kernel call...but why ???

Note, when the laptop jams, I can ssh to it and login and kill the faulty screen saver process
Via ssh, before killing the process, I tried to attach to it using gdb, but even gdb cant attach, it tries, but I never get a prompt...so I cant do "where"


Sometimes I dont have the luxury of having another PC to ssh !

Pls dont tell me to simply disable the screen saver, I saw that lame solution many times in other threads, The reason I'm posting this, is to find a solution, and exchange with resourceful people ! ...maybe learn something in the process, and to improve Linux

--------------------------------------

Here is the latest development,
I made a cron job that monitors for a greedy screen saver process and does a few things when this occurs
1- it attaches an strace to it
2- it checks the CPU usage and will kill if required
using SIGABRT (to crate a core dump)
...all this, going to log files

To ease the cron job, I forced the keuphoria.kss screen saver (instead of having a random one)

below I'm pasting the strace log and a "ps" extract

So now, I'm pretty sure it hangs in the select() kernel call that usually times-out...
But I'm kinda stuck, I'm not sure what I can do next, it looks like a kernel problem, I dont think a process should hang so deep that even gdb cant attach to it...

Is it possible that for a reason, the select() call sometimes does not time-out?



STRACE LOG: (last lines)
read(3, 0x6518f4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
select(14, [3 4 7 9 13], [], [], {0, 378}) = 0 (Timeout)
ioctl(14, 0x4008642a, 0x7ffff76839d0) = 0
ioctl(14, 0xc0086444, 0x7ffff7683970) = 0
ioctl(14, 0x40046445, 0x7ffff7683960) = 0
ioctl(14, 0x40206443, 0x7ffff7683a10) = 0
ioctl(14, 0xc0086444, 0x7ffff76839c0) = 0
ioctl(14, 0xc0086444, 0x7ffff7683970) = 0
ioctl(14, 0x40046445, 0x7ffff7683960) = 0
ioctl(14, 0x40206443, 0x7ffff7683a10) = 0
ioctl(14, 0xc0086444, 0x7ffff76839c0) = 0
brk(0xd10000) = 0xd10000
brk(0xd50000) = 0xd50000
ioctl(14, 0xc0086444, 0x7ffff7683ad0) = 0
ioctl(14, 0x40046445, 0x7ffff7683ac0) = 0
ioctl(14, 0xc0086444, 0x7ffff7683ad0) = 0
ioctl(14, 0x40046445, 0x7ffff7683ac0) = 0
ioctl(14, 0xc0086444, 0x7ffff7683b10) = 0
ioctl(14, 0x40046445, 0x7ffff7683b00) = 0
ioctl(14, 0x40206443, 0x7ffff7683bb0) = 0
ioctl(14, 0xc0086444, 0x7ffff7683b60) = 0
ioctl(14, 0xc0086444, 0x7ffff76839c0) = 0
ioctl(14, 0x40046445, 0x7ffff76839b0) = 0
ioctl(14, 0x40206443, 0x7ffff7683a60) = 0
ioctl(14, 0xc0086444, 0x7ffff7683a10) = 0
ioctl(14, 0x6458, 0) = -1 EINVAL (Invalid argument)
select(4, [3], [3], NULL, NULL) = 1 (out [3])
writev(3, [{"\227\5\6\0\233\2 \4\26\10\"\5\1\0\1\0.\6#\5\351\1.\0\234\4\3\0|\0\0 \0"..., 44}], 1) = 44
read(3, 0x6518f4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
select(14, [3 4 7 9 13], [], [], {0, 15323}) = 0 (Timeout)
read(3, 0x6518f4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
select(14, [3 4 7 9 13], [], [], {0, 33}) = 0 (Timeout)
--- SIGINT (Interrupt) @ 0 (0) ---
Process 6061 detached
^^^^---- The cron job sends a SIGABRT to force a core dump, but from strace, it looks like I receive a SIGINT
*** and no core file is generated anywhere

-------
LOG of "ps" just before the cron job kills the screen saver:

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
...
1 S pmichel 4524 1 0 80 0 - 38176 - 08:19 ? 00:00:01 kdesktop [kdeinit]
0 S pmichel 7378 4524 0 80 0 - 33084 - 08:49 ? 00:00:00 /opt/kde3/bin/kdesktop_lock
0 R pmichel 7381 7378 58 99 19 - 31684 - 08:49 ? 00:02:11 /opt/kde3/bin/keuphoria.kss -root

^^^--- see keuphoria uses 99% cpu time


Thanks for your inputs
__________________
Box: Asrock AGP/PCIe x86_64 - openSUSE 11.1 - Core2 Quad - 2Gig ram - GeForce 7600 GS - Kde 3.5.9r49.1
Lap: Dell D630 x86_64 - openSUSE 11.1 - Core2 Duo - 4Gig ram - Intel 965 GM - Kde 3.5.10r21.11
Reply With Quote