Problem with heat management on HP ZBook 15

Hi all,

The laptop is running openSUSE 12.3. Temperature under normal work load is in the range of 60-70dC, which I find already a bit much. If I use the discrete nvidia card via bumblebee’s optirun for a 3D game it can go up to 90dC. I find this is too much and I really get sweaty hands from that.

Additional observations:

  1. A few weeks ago, the fan would from time to time make a notable sound, like expelling excess heat. That doesn’t seem to happen any more.
  2. When I close the lid of the laptop it get’s really hot, even without 3D application or optirun.

So, I was digging into the issue and came across acpi, sensors, sensor-detect and thermald:

Output of acpi -t as root:

If 'acpi' is not a typo you can use command-not-found to lookup the package that contains it, like this:
    cnf acpi

That is weird, isn’t it?

Sensors give:

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +60.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:         +59.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:         +60.0°C  (high = +84.0°C, crit = +100.0°C)

acpitz-virtual-0
Adapter: Virtual device
temp1:        +56.0°C  (crit = +128.0°C)
temp2:         +0.0°C  (crit = +128.0°C)
temp3:         +0.0°C  (crit = +128.0°C)
temp4:         +0.0°C  (crit = +128.0°C)
temp5:        +32.0°C  (crit = +128.0°C)
temp6:       +114.0°C  (crit = +128.0°C)

sensors-detect spits out:
http://i.imgur.com/nfZ8XYd.png

I followed up on it by installing thermald, which is running:

thermald.service - Thermal Daemon Service
   Loaded: loaded (/usr/lib/systemd/system/thermald.service; enabled)
   Active: active (running) since Wed 2015-07-08 21:28:52 SGT; 18min ago
 Main PID: 2147 (thermald)
   CGroup: /system.slice/thermald.service
           └─2147 /usr/sbin/thermald --no-daemon --dbus-enable

Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 0:Processor
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 4:intel_powerclamp
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 6:intel_pstate
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 5:rapl_controller
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 0:Processor
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 4:intel_powerclamp
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 6:intel_pstate
Jul 08 21:31:15 voya.site thermald[2147]: thd_trip_cdev_state_reset index 5:rapl_controller
Jul 08 21:31:16 voya.site thermald[2147]: Read set point 0


But the behavior hasn’t changed. So I tried editing the thermald configuration, even though I read that thermald is buggy on Haswell architectures, which my chip is. In the process I came across /sys/class/thermal:
cooling_device0 through cooling_device3 all only list ‘Processor’ in the file ‘Device’
cooling_device4 lists ‘intel_powerclamp’, at least. But where is the fan? Is it not a cooling device? Or is it listed somewhere else?

I honestly don’t dare to go further on my own and help from the community is much appreciated. Please let me know, which additional information might be needed.

Thanks in advance,

Kai

Hmm, so nobody?

Maybe it is better to make a thread about my acpi question only first…

On Thu, 16 Jul 2015 03:36:01 +0000, Kaiak wrote:

> Hmm, so nobody?
>
> Maybe it is better to make a thread about my acpi question only first…

Well, ultimately, if the hardware is being driven hard enough, then
you’re going to have heat generated.

The solution is to throttle the CPU or the GPU, if possible on the
system, but that will affect performance.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

Thanks Jim, for the reply.

Maybe I should ask more general:
Why does ‘acpi -t’ doesn’t give any output?

Where would I find information about the fan controls in my system?
In /sys/class/thermal I don’t seem to find any hint to it.

And lastly, why is CPU activity going up, and consequentially heat, when I am away from keyboard?

I didn’t seem to have heat problems a few weeks ago, so I think the problem might have been introduced from one of the recent updates.

This is what it looks when I don’t touch the computer for about a minute:
http://i.imgur.com/SwS1i67.png
So, clearly when not active the system load and temperature increase. Why is that? What is the computer doing?

Have you tried disabling Indexing, Baloo or whatever the hell its called nowadays?

On Fri, 17 Jul 2015 07:06:01 +0000, Kaiak wrote:

> This is what it looks when I don’t touch the computer for about a
> minute:
> [image: http://i.imgur.com/SwS1i67.png]
> So, clearly when not active the system load and temperature increase.
> Why is that? What is the computer doing?

I would be inclined to use the ‘top’ command (in a terminal window) to
see what’s hitting the CPU.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

On 2015-07-08 16:16, Kaiak wrote:
>
> Hi all,
>
> The laptop is running openSUSE 12.3. Temperature under normal work load
> is in the range of 60-70dC, which I find already a bit much. If I use
> the discrete nvidia card via bumblebee’s optirun for a 3D game it can go
> up to 90dC. I find this is too much and I really get sweaty hands from
> that.
>
> Additional_observations:_
> 1) A few weeks ago, the fan would from time to time make a notable
> sound, like expelling excess heat. That doesn’t seem to happen any more.
>
> 2) When I close the lid of the laptop it get’s really hot, even without
> 3D application or optirun.

Well, you should configure your machine to suspend or hibernate on lid
closing. I say this because on some laptops part of the air intake or
exhaust is on the top, so that the air flow is impeded when you close
the lid, and thus it overheats.

If that is not the case, on many laptops the keyboard itself is a heat
radiator, intentionally or accidentally; you close the lid and heat
builds up.

Specially on machine with powerful graphics.

I would also consider placing that laptop on top of a cooling fan
platform. They connect to an USB connector in the laptop itself, or via
a mains adaptor to AC. I prefer the later, less stress on the laptop.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Yes, I’m having the same problem on an HP 8510W laptop, also with an NVIDIA card. It started crashing randomly this week, and I realised that the fan was not running above idle. I updated everything via YaST on 7/7, and again on 7/20 (I had been away much of that time and didn’t use the machine much).

I have a partition with 13.1 on it, and the fan works normally when that is booted, so the update apparently broke something. The 7/20 updated included lmsensors, so I reverted but it had no effect.

I had to install sensors to get the output, and it was simply this:

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +32.0°C (high = +105.0°C, crit = +105.0°C)

Hmm…I can’t seem to edit my previous message. Here is the Sensors output from 13.1; 13.2 is only seeing one CPU core as noted above…

acpitz-virtual-0
Adapter: Virtual device
temp1: +65.0°C (crit = +105.0°C)
temp2: +43.0°C (crit = +105.0°C)
temp3: +20.0°C (crit = +110.0°C)
temp4: +30.0°C (crit = +110.0°C)
temp5: +43.0°C (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +35.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +35.0°C (high = +105.0°C, crit = +105.0°C)

On 2015-07-22 13:56, rschaffter wrote:
>
> Hmm…I can’t seem to edit my previous message. Here is the Sensors
> output from 13.1; 13.2 is only seeing one CPU core as noted above…

Maybe you have to run “sensors-detect” again :-?


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Yes, I did that. It did indicate that coretemp wasn’t;loaded, which has helped stability, but the fan isn’t running. My guess is that an update borked the chipset driver module. Lsmod shows that none of the usual power management modules (fan, battery, button) are loaded, and modprobe thermal_sys doesn’t give an error, but doesn’t do anything, either.

Here’s what I’ve got; as can be seen, no fan, and a single-core Core2 Duo… :\

# inxi -xxxSMCGIsz
Resuming in non X mode: glxinfo not found. For package install advice run: inxi --recommends
System:    Host: 8510WLINUX Kernel: 3.16.7-21-desktop i686 (32 bit, gcc: 4.8.3) 
           Desktop KDE 4.14.9 (Qt 4.8.6) Distro: /etc/SuSE-release corrupted, use -% to override
Machine:   System: Hewlett-Packard product: HP Compaq 8510w version: F.0F Chassis: type: 10
           Mobo: Hewlett-Packard model: 30C5 version: KBC Version 71.36
           Bios: Hewlett-Packard version: 68MVD Ver. F.0F date: 02/05/2008
CPU:       Single core Intel Core2 Duo CPU T9300 (-UP-) cache: 6144 KB flags: (lm nx sse sse2 sse3 sse4_1 ssse3 vmx) bmips: 4987.48 clocked at 2493.741 MHz 
Graphics:  Card: NVIDIA G84GLM [Quadro FX 570M] bus-ID: 01:00.0 
           X.org: 1.16.1 driver: nvidia tty size: 80x37 Advanced Data: N/A for root 
Sensors:   System Temperatures: cpu: 55.0C mobo: N/A gpu: 0.0:75C 
           Fan Speeds (in rpm): cpu: N/A 
Info:      Processes: 168 Uptime: 8:15 Memory: 2096.7/3970.0MB Runlevel: 5 Gcc sys: 4.8.3 Client: Shell inxi: 1.7.24 

it sounds like your graphic card needs a new coat of thermal compound, it’s a relatively common problem with modern graphic cards, the fix is not software related, you’d need to remove the old thermal compound and apply a new coat, as the compound gets old and dries it does not transfer heat to the radiator which is cooled by the fan, as the temperature rises the fan runs faster but with a bad contact you gpu chip will overheat.
Before unscrewing your gfx make sure that it’s the graphic card that’s over heating not the cpu (the thermal grease on cpu’s is changeable too but I’ve never had the need to do it)

edit. tldr
So it’s a laptop, still thinking it’s nvidia’s fault, I’ve never done it under Linux there are a few win apps that show the gfx’s temperature, and I’ve had personal experience with gfx overheating on both desktops and laptops.
if you have windows run GPU-Z to see your gfx’s temperature
http://www.techpowerup.com/downloads/SysInfo/GPU-Z/
see if the temperature rises abnormally while watching accelerated video’s (use mpc-hc or mpc-be under win)
changing the thermal grease on a laptop’s gfx is a real pain as getting to it might require dismantling the whole machine, better hope it’s something else

On 2015-07-23 04:36, I A wrote:
>
> it sounds like your graphic card needs a new coat of thermal compound,
> it’s a relatively common problem with modern graphic cards, the fix is
> not software related,

He said that there is no problem when he boots 13.1 instead, which
proves it is a software issue.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

The nvidia-settings program has temp reporting

Be sure that the kernel-firmware module is installed