Page 2 of 5 FirstFirst 1234 ... LastLast
Results 11 to 20 of 47

Thread: AMDGPU errors and occasional crashes/hangs

  1. #11
    Join Date
    Jan 2016
    Location
    UK
    Posts
    754

    Default Re: AMDGPU errors and occasional crashes/hangs

    BIOS is the latest, and rather than run Memtest overnight as I have 4 sticks of memory I'll swap them over (in pairs) and see if the error shows up as I don't believe Memtest can test shared graphics memory as it's reserved in BIOS. Once swapped I'll run Memtest again. Currently the error is only showing after booting up and the system has not crashed since.

    Stuart

  2. #12
    Join Date
    Jan 2016
    Location
    UK
    Posts
    754

    Default Re: AMDGPU errors and occasional crashes/hangs

    Well I have swapped the memory round and re-run Memtestx86 with no errors and then ran the hammer test separately twice with no errors reported. On reboot I looked at the journal and the errors I was getting do not show up. Only thing I can think is that removing and reseating the memory has maybe fixed it.

    I'll watch it for a few days and see if the errors come back.

    Stuart

  3. #13

    Default Re: AMDGPU errors and occasional crashes/hangs

    Quote Originally Posted by broadstairs View Post
    Well I have swapped the memory round and re-run Memtestx86 with no errors and then ran the hammer test separately twice with no errors reported. On reboot I looked at the journal and the errors I was getting do not show up. Only thing I can think is that removing and reseating the memory has maybe fixed it.
    Possibly dirty contacts on a memory modules.
    You may clean contacts with rubber/alcohols.

  4. #14
    Join Date
    Jan 2016
    Location
    UK
    Posts
    754

    Default Re: AMDGPU errors and occasional crashes/hangs

    I spoke too soon a few minutes after booting I saw the errors again but only a few and none since. I have checked on the BIOS and there is an update which I will load tomorrow and see what happens.

    Stuart

  5. #15
    Join Date
    Jan 2014
    Location
    Erlangen
    Posts
    2,662
    Blog Entries
    1

    Default Re: AMDGPU errors and occasional crashes/hangs

    Quote Originally Posted by broadstairs View Post
    I spoke too soon a few minutes after booting I saw the errors again but only a few and none since. I have checked on the BIOS and there is an update which I will load tomorrow and see what happens.
    Clean memory slots and camera lenses with 99,9% isopropanol. Use interdental brushes for thorough cleansing.
    AMD Athlon 4850e (2009), openSUSE 13.1, KDE 4, Intel i3-4130 (2014), i7-6700K (2016), i5-8250U (2018), AMD Ryzen 5 3400G (2020), openSUSE Tumbleweed, KDE Plasma 5

  6. #16
    Join Date
    Jan 2016
    Location
    UK
    Posts
    754

    Default Re: AMDGPU errors and occasional crashes/hangs

    On further investigation I have found elsewhere that some issues can randomly happen like mine and adding iommu=pt to the kernel boot options has for others fixed problems. I tried his today and so far no recurrence of any errors since I booted with this option. Somewhere else it was suggested to add idle=nomwait but I've not tried that yet.

    Searching here there was a fix to the kernel on 19th April 2021 which addressed some problems with amdgpu and my problems only started after this around the 26th April which is probably when I installed that update, is it possible that this update could have a bearing on my problem. Also interesting that I had the DRM option enabled in my browser to play DRM content and if I remember correctly this was around the time I had the hang/crash and subsequently turned off that browser option and it has not crashed since.

    Anyway I'll see how it goes over the next few days.

    Stuart

  7. #17

    Default Re: AMDGPU errors and occasional crashes/hangs

    I have a Radeon Vega Picasso integrated GPU and recently started seeing these "retry page fault" errors (but not the GPU resets or the other errors in your original post), and also screen and keyboard freezes in some of the instanes where these errors happen.

    I'm running Ubuntu, but I dropped by to say that In my case the problem turned out to be the version of the linux-firmware package. I reverted back from version 1.197 to version 1.190.5 and things are back to the previous level of stability.

    So you might want to try downgrading your linux-firmware package as well.

    Details of my issue are here, if you're curious: https://bugs.launchpad.net/ubuntu/+s...e/+bug/1928393

  8. #18
    Join Date
    Jan 2016
    Location
    UK
    Posts
    754

    Default Re: AMDGPU errors and occasional crashes/hangs

    Thanks that's very interesting. I think I will now open a bug with openSUSE and see what they have to say. I will continue to monitor and if the problems does reappear I will downlevel the f/w package.

    Stuart

  9. #19
    Join Date
    Jan 2016
    Location
    UK
    Posts
    754

    Default Re: AMDGPU errors and occasional crashes/hangs

    Quote Originally Posted by broadstairs View Post
    Thanks that's very interesting. I think I will now open a bug with openSUSE and see what they have to say. I will continue to monitor and if the problems does reappear I will downlevel the f/w package.

    Stuart
    Because of packaging differences I am unable to find an older openSUSE package for kernel-firmware-all earlier than the April 2021 package which I think introduced the issue so as for now I cannot try downleveling the f/w. Unless someone has access to an older package they can point me to.

    Still no errors showing since I added iommu=pt to kernel options on boot.

    Stuart

  10. #20
    Join Date
    Jan 2014
    Location
    Erlangen
    Posts
    2,662
    Blog Entries
    1

    Default Re: AMDGPU errors and occasional crashes/hangs

    Quote Originally Posted by broadstairs View Post
    Still no errors showing since I added iommu=pt to kernel options on boot.
    Without ever tinkering I get:
    Code:
    3400G:~ # journalctl -b --grep iommu 
    -- Logs begin at Thu 2021-04-29 05:00:44 CEST, end at Fri 2021-05-14 11:50:28 CEST. -- 
    May 13 04:15:39 3400G kernel: iommu: Default domain type: Passthrough  
    May 13 04:15:39 3400G kernel: pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.
    May 13 04:15:39 3400G kernel: pci 0000:00:01.0: Adding to iommu group 0 
    ...
    May 13 04:15:39 3400G kernel: pci 0000:08:00.6: Adding to iommu group 8 
    May 13 04:15:39 3400G kernel: pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40 
    May 13 04:15:40 3400G kernel: AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> 
    3400G:~ #
    AMD Athlon 4850e (2009), openSUSE 13.1, KDE 4, Intel i3-4130 (2014), i7-6700K (2016), i5-8250U (2018), AMD Ryzen 5 3400G (2020), openSUSE Tumbleweed, KDE Plasma 5

Page 2 of 5 FirstFirst 1234 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •