Mobility Radeon HD 5730 / 6570M - Problem after update

paulcee · August 2, 2018, 3:18pm

Greetings:

Registered this morning, with the intention of posting a question regarding the Mobility Radeon HD 5730/6570M card and Tumbleweed. However, I have resolved the problem, and wanted to
share the solution.

Problem occurred after the 2018-07-27 Tumbleweed update. Running on an older iMac 27, with an i7, 16GB…updated wifi and SSD. Runs GREAT, and as much as I don’t like Apple, I have to admit
they make an excellent monitor. Plus, you can pick these up for about $300 on eBay, but I digress. Things were running great, until the aforementioned update. After that, the system booted just fine, but
I’d get an absolutely black screen after X was initialized. Was able to SSH into the system, saw no errors in the X logs, but saw some telltales in the dmesg output:

   33.371427] radeon 0000:01:00.0: ring 0 stalled for more than 10100msec   33.371436] radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000001ad last fence id 0x00000000000001bf on ring 0)
   33.408187] radeon 0000:01:00.0: Saved 567 dwords of commands on ring 0.
   33.408205] radeon 0000:01:00.0: GPU softreset: 0x00000019
   33.408207] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5703CA0
   33.408208] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xFC000007
   33.408210] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
   33.408211] radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000AC0
   33.408213] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
   33.408214] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
   33.408216] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00011000
   33.408217] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00068406
   33.408219] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80878647
   33.408221] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
   33.416180] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
   33.416232] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
   33.417381] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
   33.417382] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
   33.417383] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
   33.417385] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
   33.417386] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
   33.417388] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
   33.417389] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
   33.417391] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
   33.417392] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
   33.417394] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
   33.417416] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
   33.438223] [drm] PCIE gen 2 link speeds already enabled
   33.442388] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
   33.442498] radeon 0000:01:00.0: WB enabled
   33.442501] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0x00000000db38e065
   33.442502] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0x000000009f073723
   33.442805] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x000000005fb4bb12
   33.459389] [drm] ring test on 0 succeeded in 1 usecs
   33.459395] [drm] ring test on 3 succeeded in 2 usecs
   33.635271] [drm] ring test on 5 succeeded in 1 usecs
   33.635275] [drm] UVD initialized successfully.
   35.606692] [drm:radeon_dp_link_train [radeon]] *ERROR* displayport link status failed
   35.606737] [drm:radeon_dp_link_train [radeon]] *ERROR* clock recovery failed
   36.699225] [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait timed out.
   36.699274] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-110).

…which led me to looking up other things, where I ran across this:

    20.824205] [drm:radeon_cs_parser_relocs [radeon]] *ERROR* gem object lookup failed 0x13

The “gem object lookup” led me to a suggestion to remove the xf86-video-ati package. A “zypper rm xf86-video-ati” command and a reboot later, and things were back to normal. Had to go
into the KDE control panel and re-enable compositing, but aside from that, no errors.

My initial diagnosis was that the libdrm_radeon1 package was to blame. Tried installing the ‘experimental’ package from software.opensuse.org, which had no effect. Not sure where the problem
lies, here…perhaps someone at SuSE will know, but from what I read, it had to do with the updates to that package, and the fact that there were updates to the main Xorg pieces that now perform
the same/similar functions to the mentioned xf86-video-ati package, and the two were stepping on each others toes.

Hope it helps someone

mrmazda · August 2, 2018, 8:28pm

Nice you were able to work it out on your own.

The upstream direction since probably at least two years ago is apparently to have all AMD/ATI, All Intel, and all NVidia (if not other as well) FOSS users using the very same modesetting driver that has been an integral part of the server since 1.17.x. Thus, the individual xf86-video-* drivers apparently get less thorough testing, and will eventually be deprecated (if upstream they haven’t already).

paulcee · August 2, 2018, 9:41pm

Had to poke around a while…I did wait a few days, and run a couple of updates, thinking “Well, it’s a bug…they’ll fix it in a bit”, but kept digging.

What you said mirrored what was said in other places, too, that the individual xf86-video-whatever drivers are being done away with, for the most part. Personally, I’d have put something in to remove such packages, when the update came out, to avoid having these issues for the users. Small thing, but it did take me some time to thread through. Was forced to use Windows for a few days…was horrible.

mrmazda · August 2, 2018, 10:59pm

Mass paradigm overhauls like KMS and non-root Xorg take time to implement and debug. I’m sure there are far more video hardware iterations in existence than the developers have physically available, or time to test with. They count on reports from users of hardware they don’t have to become aware of escaped bugs. Both kinds of drivers need to remain available until confidence is high enough that material numbers of people with the misfortune to have untested hardware won’t get locked out by a necessary “upgrade” from a no longer supported installation.

Once thing that was done to smooth transition with AMD can be found in the form of /etc/X11/xorg_pci_ids/, from which followed an amdgpu howto here among others.

paulcee · August 3, 2018, 1:08am

Good point, and I agree, and can feel the pain of trying to upgrade/develop a big system like that. Had to upgrade the card to an Evergreen chipset in this iMac, to get eDP support, because only the DisplayPort was working before then (with compositing), but running with nomodeset would let it come up. Had screen tearing and artifacting, but an upgrade (which was a little bit of a pain) to a used card from eBay ($80), fixed that.

Once thing that was done to smooth transition with AMD can be found in the form of , from which followed an amdgpu among others.

Yes, I remember YEARS ago trying to get ATI graphics working with ANYTHING Linux related was challenging. This new driver worked right out of the gate, which is what was so disappointing/annoying when the upgrade disabled it. No complaints, mind you…and was glad to have worked out a solution.