freeze up problem, may be related to 4965 card (?)

I too have the same problem with freeze ups that many others have reported, and I’ve been trying to solve it for days now. I have Toshiba Laptop with an Intel 4965 card and am running 11.1(64), KDE4.1. I get lockups several times a day, sometimes multiple times in one hour. 70% of the time the keyboard and mouse are completely unresponsive and the caplock led is flashing. The only solution is a hard boot using the power button (ouch). The other 30% of the time the only difference is no flashing LED.

No events are recorded in any logs at the time of the lockups, and they’ve occurred at apparently random times. ie. large file transfers going on, no transfers going on, typing in Kwrite with no other activity at the time, surfing this forum and many others for solutions to this problem, during idle time, etc, etc.

I’d actually had the same problem earlier with 11.0 and resolved it then by going from 228.57.2.21-1.9 firmware for the 4965 to 228.57.2.21-1.10, but I’ve tried the same with 11.1 with no luck.

I’d updated to 11.1 when the problem reoccurred, so I reformatted the partitions and started over with a fresh install – no difference.

(I’ve removed the Beagle completely.)

I do note one interesting thing in system activity that I do not understand. That’s three instances of iwlagn running (iwlagn, iwlagn/0 and iwlan/1). don’t have a clue what that might mean.

I’ve read many, many reports here and elswhere of others with the same problem, so I’m not alone. (note: https://bugzilla.novell.com/show_bug.cgi?id=444149 in particular)

Any thoughts?

I’ve had to hard reboot five times already this morning. The last time I rebooted the PC I hadn’t even touched it yet before it froze up. I hope somebody has some ideas, because this is the worst problem I’ve had since I started with SUSE 9.something or other. I’m completely stumped.

caprus wrote:
> I’ve had to hard reboot five times already this morning. The last time
> I rebooted the PC I hadn’t even touched it yet before it froze up. I
> hope somebody has some ideas, because this is the worst problem I’ve had
> since I started with SUSE 9.something or other. I’m completely stumped.

Have you done an extended memory test?

Sorry to say I’ve spent most of my tech career working on MS based OSes. How do I text extended mem with Open SUSE?

Also, just curious why you suspect extended mem if (a) many of the freeze-ups occur while the system is idle and (b) the system worked with 11.0 but not with 11.0.

No criticism intended, grateful for the help, but curious about the reasoning.

caprus wrote:
> Also, just curious why you suspect extended mem if (a) many of the
> freeze-ups occur while the system is idle and (b) the system worked with
> 11.0 but not with 11.0.
>
> No criticism intended, grateful for the help, but curious about the
> reasoning.

Whenever someone says that their system has started freefing for no reason, I
always suspect memory. That is particularly when they have just come from
Windows, but a new kernel version will exercise memory differently.

I said nothing about extended memory. Testing with Linux is the same as for
Windows - run memtest86+. Check your installation medium to see if it has an
entry for testing memory. If not, download the memtest86+ iso image and burn it
to a CD.

Larry

memtest86+ complete, no errors found

caprus wrote:
> memtest86+ complete, no errors found

How long did it run? Sometimes it takes 24 hours to expose a memory error that
Linux finds in minutes.

11.5 hrs, 2 passes completed. (Can’t spare the PC more than that, had to use it.)

Perhaps that was sufficient.

Do you have another Linux machine on your local network? If so, you should try
to set up a netconsole to capture any kernel output when the freezes occur.

I’m setting up an old laptop for the purpose. I will post results when I get them.

I just reviewed the log file after the seventh freezup of the day. For the first time I actually saw some entries during the period of time preceding the freezup, and it was at a time when nobody was home, and the PC had been idle for over an hour. Don’t know if there’s anything helpful here.

(I’m burning a DVD now so I can load the second PC. Will continue working on that project.)

Jan 6 15:33:36 linux-x205 – MARK –
Jan 6 15:42:38 linux-x205 smartd[3255]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 137 to 144
Jan 6 15:42:39 linux-x205 smartd[3255]: Device: /dev/sdb [SAT], SMART Usage Attribute: 9 Power_On_Hours changed from 83 to 82
Jan 6 15:56:01 linux-x205 kernel: CE: hpet increasing min_delta_ns to 15000 nsec
Jan 6 15:57:05 linux-x205 kernel: CE: hpet increasing min_delta_ns to 22500 nsec
Jan 6 16:06:37 linux-x205 kernel: CE: hpet increasing min_delta_ns to 33750 nsec
Jan 6 16:12:23 linux-x205 syslog-ng[2339]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=330’, processed=‘center(received)=274’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=4’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=17’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=154’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=112’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=17’, processed=‘destination(warn)=20’, processed=‘source(src)=274’
Jan 6 16:33:56 linux-x205 – MARK –
Jan 6 16:42:38 linux-x205 smartd[3255]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 144 to 148
Jan 6 17:02:38 linux-x205 – MARK –
Jan 6 17:12:23 linux-x205 syslog-ng[2339]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=338’, processed=‘center(received)=282’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=4’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=17’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=158’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=116’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=17’, processed=‘destination(warn)=20’, processed=‘source(src)=282’
Jan 6 17:12:38 linux-x205 smartd[3255]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 148 to 144
Jan 6 17:32:38 linux-x205 – MARK –

caprus wrote:
> I just reviewed the log file after the seventh freezup of the day. For
> the first time I actually saw some entries during the period of time
> preceding the freezup, and it was at a time when nobody was home, and
> the PC had been idle for over an hour. Don’t know if there’s anything
> helpful here.
>
> (I’m burning a DVD now so I can load the second PC. Will continue
> working on that project.)
>
> Jan 6 15:33:36 linux-x205 – MARK –
> Jan 6 15:42:38 linux-x205 smartd[3255]: Device: /dev/sda [SAT], SMART
> Usage Attribute: 194 Temperature_Celsius changed from 137 to 144
> Jan 6 15:42:39 linux-x205 smartd[3255]: Device: /dev/sdb [SAT], SMART
> Usage Attribute: 9 Power_On_Hours changed from 83 to 82
> Jan 6 15:56:01 linux-x205 kernel: CE: hpet increasing min_delta_ns to
> 15000 nsec
> Jan 6 15:57:05 linux-x205 kernel: CE: hpet increasing min_delta_ns to
> 22500 nsec
> Jan 6 16:06:37 linux-x205 kernel: CE: hpet increasing min_delta_ns to
> 33750 nsec

The smartd messages can be ignored. I think that the temps are in F,
not C. On the other hand, the CE: hpet messages indicate that the high
precision timer is failing.

To test, try adding an option to the boot line in the GRUB menu. You
should add ‘clocksource=XXX’, where XXX is tsc, jiffies, or pit.

Larry

Interesting results:

with clocksource=tsc the system froze as soon as the boot splash screen appeared (tried multiple times, always the same)

with clocksource=jeffies the system would not go into runlevel 5 (no matter what I tried)

with clocksource=pit the system boots ok and runs, but the appearance of the screen is terrible, gradients appear as steps, fonts and graphics are horribly pixelated.

I’m going to run with =pit long enough to see if the system locks up or not. If it seems to be stable then I guess I’ll have to look at my video drivers (again, arghh!)

Running with the PIT timer shouldn’t cause a problem. On my Turion 64
X2 CPU, the TSC counter cannot be used as it chganges frequency when
the clock is slower for power saving.

In your dmesg output, you should see a section that looks something like

Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC: PIT calibration confirmed by PMTIMER.
TSC: using PIT calibration value
Detected 2000.126 MHz processor.
spurious 8259A interrupt: IRQ7.
Console: colour dummy device 80x25
console [tty0] enabled
Checking aperture…
No AGP bridge found
Node 0: aperture @ f002000000 size 32 MB
Aperture beyond 4GB. Ignoring.

Find that section and post it. I would be interested in seeing it both
with the =pit boot option, and with no option - if the system will
stay up long enough for you to capture it. To help you with that, use
the command ‘dmesg > boot_noopt’ to capture the data to the file
boot_noopt. You can then reboot with the =pit option and examine the
file at leisure.

Larry

system has crashed twice since the change to clocksource=pit

I’m tempted to remove the line, but I’ll await your input first.

Kernel command line: root=/dev/disk/by-id/ata-TOSHIBA_MK1637GSX_77AXF7DGS-part7 resume=/dev/disk/by-id/ata-TOSHIBA_MK1637GSX_77AXF7DGS-part6 splash=silent clocksource=pit vga=0x314
bootsplash: silent mode.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC: PIT calibration confirmed by PMTIMER.
TSC: using PIT calibration value
Detected 1994.986 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Checking aperture…
No AGP bridge found
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Placing software IO TLB between 0x20000000 - 0x24000000
Memory: 4044756k/5242880k available (2731k kernel code, 147928k reserved, 2620k data, 1472k init)

with the pit entry removed. (crashed three more times before I could capture and send this)

Still trying to set up the second system, but the hard drive is bad in it. (I’m not having a good day)

Kernel command line: root=/dev/disk/by-id/ata-TOSHIBA_MK1637GSX_77AXF7DGS-part7 resume=/dev/disk/by-id/ata-TOSHIBA_MK1637GSX_77AXF7DGS-part6 splash=silent vga=0x314
bootsplash: silent mode.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC: PIT calibration confirmed by PMTIMER.
TSC: using PIT calibration value
Detected 1994.961 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Checking aperture…
No AGP bridge found
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Placing software IO TLB between 0x20000000 - 0x24000000
Memory: 4044756k/5242880k available (2731k kernel code, 147928k reserved, 2620k data, 1472k init)
CPA: page pool initialized 1 of 1 pages preallocated
hpet clockevent registered
Calibrating delay loop (skipped), value calculated using timer frequency… 3989.92 BogoMIPS (lpj=7979844)

here’s the history of one entire session (with clocksource=pit) from boot to crash with PC not touched (I was out of the building, so it was idle)

Jan 7 08:50:09 linux-x205 – MARK –
Jan 7 08:52:10 linux-x205 smartd[3189]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 141 to 144
Jan 7 09:21:56 linux-x205 syslog-ng[2347]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=299’, processed=‘center(received)=246’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=2’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=16’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=138’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=102’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=16’, processed=‘destination(warn)=19’, processed=‘source(src)=246’
Jan 7 09:41:56 linux-x205 – MARK –
Jan 7 09:52:10 linux-x205 smartd[3189]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 144 to 148
Jan 7 10:21:56 linux-x205 syslog-ng[2347]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=304’, processed=‘center(received)=251’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=2’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=16’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=141’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=104’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=16’, processed=‘destination(warn)=19’, processed=‘source(src)=251’
Jan 7 10:41:56 linux-x205 – MARK –
Jan 7 10:52:10 linux-x205 smartd[3189]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 148 to 144
Jan 7 11:21:56 linux-x205 syslog-ng[2347]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=309’, processed=‘center(received)=256’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=2’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=16’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=144’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=106’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=16’, processed=‘destination(warn)=19’, processed=‘source(src)=256’
Jan 7 11:41:56 linux-x205 – MARK –
Jan 7 12:01:56 linux-x205 – MARK –
Jan 7 12:21:56 linux-x205 syslog-ng[2347]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=314’, processed=‘center(received)=261’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=2’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=16’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=147’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=108’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=16’, processed=‘destination(warn)=19’, processed=‘source(src)=261’
Jan 7 12:41:56 linux-x205 – MARK –
Jan 7 13:01:56 linux-x205 – MARK –
Jan 7 13:21:56 linux-x205 syslog-ng[2347]: Log statistics; dropped=‘pipe(/dev/xconsole)=0’, dropped=‘pipe(/dev/tty10)=0’, processed=‘center(queued)=319’, processed=‘center(received)=266’, processed=‘destination(newsnotice)=0’, processed=‘destination(acpid)=2’, processed=‘destination(firewall)=0’, processed=‘destination(null)=2’, processed=‘destination(mail)=2’, processed=‘destination(mailinfo)=2’, processed=‘destination(console)=16’, processed=‘destination(newserr)=0’, processed=‘destination(newscrit)=0’, processed=‘destination(messages)=150’, processed=‘destination(mailwarn)=0’, processed=‘destination(localmessages)=0’, processed=‘destination(netmgm)=110’, processed=‘destination(mailerr)=0’, processed=‘destination(xconsole)=16’, processed=‘destination(warn)=19’, processed=‘source(src)=266’
Jan 7 13:22:10 linux-x205 smartd[3189]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 144 to 141
Jan 7 13:42:10 linux-x205 – MARK –

caprus wrote:
> system has crashed twice since the change to clocksource=pit

Are these crashes the same as the ones with the default clocksource?
Do they occur at the same frequency? When the “crash” happens, wht do
you see? Do you have a “caps lock” light? Is it flashing at about 1 Hz
when the crash happens?

Larry

yes they are the same. intervals vary from 5 minutes to 10 + hours. the system freezes completely, ie. the mouse cursor does not move, mouse buttons appear to have no affect, keyboard is dead. Only visible sign is the capslock key led flashing as you describe, about 1 hz.

(I have had a couple of crashing w/o the flashing led, but not many)