ATI FirePro M7740 went unstable - any ideas?

temp is from arround 56°C (somewhat constant) before to now between 49-55°C (changing, often arrounf 50°C). But the show is not about temp, you remember? :wink:

Will see if the machine is more stable now… Have to wait.

Hi
Yes, but could also be hardware related with the test of time (equipment age)…

Also look at trying DRI set to 3;


Option "DRI" "3"

Post the output from the xorg log after a reboot, also reset the temperature settings manually…

I’m a little confused, should I enter

Option "DRI" "3"

in the console? What else? Why reboot? I’m getting old… :wink:

Hi
In the 20-radeon.conf file, so the changes take place (or just restart the X server)

On Thu 23 Nov 2017 04:46:01 PM CST, malcolmlewis wrote:

suse_rasputin;2845934 Wrote:
> temp is from arround 56°C (somewhat constant) before to now between
> 49-55°C (changing, often arrounf 50°C). But the show is not about
> temp, you remember? :wink:
>
> Will see if the machine is more stable now… Have to wait.
Hi
Yes, but could also be hardware related with the test of time (equipment
age)…

Also look at trying DRI set to 3;

Code:

Option “DRI” “3”


Post the output from the xorg log after a reboot, also reset the
temperature settings manually…

Hi
FWIW I created a new script, systemd service and sysconfig for the newer
kernel options and rpm which works for both radeon and amdgpu kernel
modules…

GitHub Project: GitHub - malcolmlewis/amd_gpu_power_profile: A systemd service for setting amd gpu power profiles
Script: https://raw.githubusercontent.com/malcolmlewis/amd_gpu_power_profile/master/amd_gpu_power_profile
OBS Project: Welcome - openSUSE Build Service
Packages: openSUSE Software


Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE Leap 42.2|GNOME 3.20.2|4.4.92-18.36-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!

Hi again!

I did this DRI 3 thing and rebooted. Sat manually the “battery” and “low” again (with these echo commands you proposed).

What makes me think it’s not related to the age of the hardware:

  • I don’t see it with Win7

  • It never happend at night yet, always while doing some (minor stuff). I never use it for watching DVD, 3D stuff or something like that, putting more load on the GPU

  • I tried another power supply, just in case. Didn’t help.

I could try a Leap (would have to install one on a USB-stick or so…)

No crash for 48h! Haven’t had that for some weeks! Looking good. Wait’n see… :slight_smile:

No crashes for days, running very stable. Forgot to set the graphics to low energy after a reboot and it took just an hour to crash. Proof of concept…

GPU heatsink needs new thermal compund… stubborn dust bunnies in the fan? New kernel regression…?

Everything fine here, as long as I manually set the GPU to low powre mode.

Hi
I re-did the script and systemd service (see previous post)… so if you install this it will happen automatically… just need to remove the radeon one.

Very cool!

What would I need to remove first? What is the beest NOOB-wayto install? The package? But I would need to start the systemd thingy manually?:slight_smile:

Hi
Install, enable and start… need to be root user…


zypper in https://download.opensuse.org/repositories/home:/malcolmlewis:/TESTING/openSUSE_Tumbleweed/noarch/amd_gpu_power_profile-0.0.1-1.1.noarch.rpm

(run following command to see current status)
amd_gpu_power_profile

systemctl enable amd_gpu_power_profile
systemctl start amd_gpu_power_profile
systemctl status amd_gpu_power_profile

(run following command to see current status)
amd_gpu_power_profile

Profiles can be changed via YaST /etc/sysconfig editor unter system-> amd_gpu_power_profile

Pörfekt!

followed your instructions, reboot, came back with profile “battery” and “low”

many, many thanks, safed my lovely old machine! rotfl!


PS:

You must spread some Reputation around before giving it to malcomlewis again.

Sorry, no star today… Maybe someone else? :shame:

Any chance this script needs an update, since kernel 4.20 quite unstable again here :frowning: Any help highly appreciated!

Hi
So you using the radeon driver?

I need to see what’s down in (don’t have any radeon cards…);


ls -la /sys/class/drm/card0/device/

/sbin/lspci -nnk | grep -A3 VGA   01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RV740/M97 GL [FirePro M7740] [1002:94a3]         Subsystem: Dell Device [1028:02ef]          Kernel driver in use: radeon          Kernel modules: radeon
and   
ls -la /sys/class/drm/card0/device  lrwxrwxrwx 1 root root 0 Feb  7 15:55 /sys/class/drm/card0/device -> ../../../0000:01:00.0

Hi
My bad, meant;


ls -la /sys/class/drm/card0/device/

Missing the / to see what options are present…

…here we go :slight_smile: https://paste.opensuse.org/91427824 just wanted to add: the machine went into a state several times since the 4.20 kernel not even recoverable with ALT+PRINT REISUB

Hi
So if you cat the following items, what do you see?





  1. -rw-r--r--  1 root root      4096 Feb  7 15:58 power_dpm_force_performance_level

  1. -rw-r--r--  1 root root      4096 Feb  7 15:58 power_dpm_state


  1. -rw-r--r--  1 root root      4096 Feb  7 16:56 power_method


  1. -rw-r--r--  1 root root      4096 Feb  7 15:58 power_profile