Page 1 of 7 123 ... LastLast
Results 1 to 10 of 61

Thread: 3D engines causing frequent GPU lockups

  1. #1
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default 3D engines causing frequent GPU lockups

    https://bugzilla.opensuse.org/show_bug.cgi?id=1046962
    https://bugs.freedesktop.org/show_bug.cgi?id=101672

    A GPU lockup has once again been introduced in Mesa and / or the RadeonSI driver. As is usual with this sort of thing, the image immediately freezes in place while audio stops and every form of input becomes unresponsive (including the NumLock / CapsLock keyboard leds), the only option being to power the machine off and back on. I started noticing this crash roughly a month ago after a distribution upgrade (openSUSE Tumbleweed).

    The crash only appears to be caused by 3D rendering. It's probabilistic but very frequent. It is triggered by a variety of game engines, and I've noticed it with at least the following ones:

    - Blender 3D: When opening certain scenes in Blender and going into Weight Paint mode, the system is bound to crash in at most 5 minutes of usage.

    - Second Life: Linux native viewers for Second Life also trigger this, I believe somewhere between 5 and 30 minutes estimate.

    - Xonotic (Darkplaces engine): Starting a game will freeze the machine anywhere between instantly (the moment a game starts) and 30 minutes at most.

    - The Dark Mod (idTech 4 engine): The same freeze will occur when playing TheDarkMod, anywhere between instantly and roughly 10 minutes at most.

    - MineCraft: The native version of Minecraft can also trigger the crash, after at most 1 hour of playing a game especially on servers with a lot of geometry.

    My OS is openSUSE Tumbleweed x64. My current Mesa version is 17.1.3, I can confirm first noticing this in 17.1.1, but I don't know if the issue was introduced in 17.1.0 or prior. My video card is a Radeon R7 370 (Gigabyte), Pitcairn Islands GPU, GCN 1.0, RadeonSI. Official product page: http://www.gigabyte.com/products/pro....aspx?pid=5469
    openSUSE Tumbleweed x64, KDE Framework 5

  2. #2
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    I was able to find an important clue, while running Xonotic using the following environment variables:

    Code:
    MESA_DEBUG=1
    MESA_LOG_FILE=/foo/bar/mesa_err.log
    A log is generated and readable after restarting the machine. It only contains one line, but that looks like it might address the cause:

    Code:
    Mesa: User error: GL_INVALID_OPERATION in glGetQueryObjectiv(out of bounds)
    openSUSE Tumbleweed x64, KDE Framework 5

  3. #3
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    The exact same story repeats with Mesa 17.1.5 (as with 17.1.4); The release notes claim that a core crash has been fixed, yet inexplicably this freeze persists after updating to the latest version.

    Something very unusual happened however: I rebooted my machine and started testing under Xonotic. After about 10 minutes of playing I got my first freeze, however it did not block the machine; Only Xonotic itself crashed (image froze and sound died), so I was able to alt-tab switch to my desktop... the system detected that the process was unresponsive and killed it, after which I could notice that it did NOT eat up any CPU or memory while it was frozen. I tested again and after about 15 minutes I got another freeze... this time though it froze the entire computer as usual (including taking down SSH).

    I preformed the suggested test of monitoring the files in sys/kernel/debug/dri/0 through my SSH connection, to check whether this might be caused by a vram leak. The most relevant file in there was radeon_vram, which seems to have 2.0 GB at all times (makes sense as that's the amount of vram on my video card). I used the command "watch -n 1 cat /sys/kernel/debug/dri/0/radeon_vram" to monitor it, but that has not printed any changes in the file itself. Adding a screenshot of that directory and its contents.

    openSUSE Tumbleweed x64, KDE Framework 5

  4. #4
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    I was able to use parallel SSH sessions to monitor changes in the files suggested by Max Staudt, which I did by using the command:

    Code:
    watch -n 1 cat filename
    The relevant files that existed and I was able to watch are:

    Code:
    /sys/kernel/debug/dri/0/clients
    /sys/kernel/debug/dri/0/gem_names
    /sys/kernel/debug/dri/0/radeon_gtt_mm
    /sys/kernel/debug/dri/0/radeon_vram_mm
    /sys/kernel/debug/dri/0/ttm_dma_page_pool
    /sys/kernel/debug/dri/0/ttm_page_pool
    I will attach the captures of each output, each showing its file <= 1 second before the freeze. I understand those files should retain information about VRAM, which indicates whether this could be a progressive memory leak.

    Very important note: It has taken me hours to obtain those outputs, and for a while I thought the freeze was fixed by an update altogether. For over 2 hours I was able to run all game engines that produced this crash without getting any freeze whatsoever, which has never happened before! However the freeze returned after I restarted my machine, meaning it's still present. I have no idea whether there's a switch in my system that causes it to happen only sometimes, but hopefully those files will say something.

    Code:
    Every 1,0s: cat /sys/kernel/debug/dri/0/clients                                                                                                                                                    linux-qz0r.site: Thu Jul 27 00:25:38 2017
    
                 command   pid dev master a   uid      magic
                       X  1766   0   y    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
                       X  1766 128   n    y     0          0
    Code:
    Every 1,0s: cat /sys/kernel/debug/dri/0/gem_names                                                                                                                                                  linux-qz0r.site: Thu Jul 27 00:25:38 2017
    
      name     size handles refcount
         1  8847360       2        1
         2  8294400       3        2
    Code:
    Every 1,0s: cat /sys/kernel/debug/dri/0/radeon_gtt_mm                                                                                                                                              linux-qz0r.site: Thu Jul 27 00:25:38 2017
    
    0x0000000000000000-0x0000000000000001: 1: used
    0x0000000000000001-0x0000000000000011: 16: used
    0x0000000000000011-0x0000000000000111: 256: used
    0x0000000000000111-0x0000000000000211: 256: used
    0x0000000000000211-0x0000000000000311: 256: used
    0x0000000000000311-0x0000000000000321: 16: used
    0x0000000000000321-0x0000000000000331: 16: used
    0x0000000000000331-0x0000000000000332: 1: used
    0x0000000000000332-0x0000000000000333: 1: used
    0x0000000000000333-0x0000000000000334: 1: used
    0x0000000000000334-0x0000000000000434: 256: used
    0x0000000000000434-0x0000000000000444: 16: used
    0x0000000000000444-0x0000000000000445: 1: used
    0x0000000000000445-0x0000000000000448: 3: used
    0x0000000000000448-0x0000000000000449: 1: used
    0x0000000000000449-0x000000000000044a: 1: used
    0x000000000000044a-0x000000000000054a: 256: used
    0x000000000000054a-0x000000000000054b: 1: used
    0x000000000000054b-0x000000000000054c: 1: used
    0x000000000000054c-0x000000000000055c: 16: used
    0x000000000000055c-0x000000000000055d: 1: used
    0x000000000000055d-0x000000000000055e: 1: used
    0x000000000000055e-0x000000000000056e: 16: used
    0x000000000000056e-0x000000000000057e: 16: used
    0x000000000000057e-0x0000000000000583: 5: free
    0x0000000000000583-0x0000000000000584: 1: used
    0x0000000000000584-0x0000000000000594: 16: used
    0x0000000000000594-0x000000000000059c: 8: free
    0x000000000000059c-0x000000000000059d: 1: used
    0x000000000000059d-0x000000000000059e: 1: used
    0x000000000000059e-0x00000000000005ab: 13: used
    0x00000000000005ab-0x00000000000005ad: 2: used
    0x00000000000005ad-0x00000000000005ae: 1: used
    0x00000000000005ae-0x00000000000005b1: 3: free
    0x00000000000005b1-0x00000000000005b2: 1: used
    0x00000000000005b2-0x00000000000005b5: 3: free
    0x00000000000005b5-0x00000000000005b6: 1: used
    0x00000000000005b6-0x00000000000005b7: 1: used
    0x00000000000005b7-0x00000000000005c7: 16: used
    0x00000000000005c7-0x00000000000005d2: 11: used
    0x00000000000005d2-0x00000000000005d3: 1: used
    0x00000000000005d3-0x00000000000005d4: 1: used
    0x00000000000005d4-0x00000000000005d5: 1: used
    0x00000000000005d5-0x00000000000005d6: 1: used
    0x00000000000005d6-0x00000000000005d7: 1: used
    0x00000000000005d7-0x00000000000005d8: 1: used
    0x00000000000005d8-0x00000000000005d9: 1: used
    0x00000000000005d9-0x00000000000005da: 1: used
    0x00000000000005da-0x00000000000005db: 1: used
    0x00000000000005db-0x00000000000005dc: 1: used
    0x00000000000005dc-0x00000000000005dd: 1: used
    0x00000000000005dd-0x00000000000005de: 1: used
    0x00000000000005de-0x00000000000005df: 1: used
    0x00000000000005df-0x00000000000005e2: 3: free
    0x00000000000005e2-0x00000000000005e3: 1: used
    0x00000000000005e3-0x00000000000005e4: 1: used
    0x00000000000005e4-0x0000000000000624: 64: free
    0x0000000000000624-0x0000000000000634: 16: used
    0x0000000000000634-0x0000000000000644: 16: used
    0x0000000000000644-0x0000000000000654: 16: used
    0x0000000000000654-0x0000000000000655: 1: used
    0x0000000000000655-0x0000000000000656: 1: used
    0x0000000000000656-0x0000000000000657: 1: used
    0x0000000000000657-0x000000000000065d: 6: free
    0x000000000000065d-0x000000000000065e: 1: used
    0x000000000000065e-0x000000000000065f: 1: used
    Code:
    Every 1,0s: cat /sys/kernel/debug/dri/0/radeon_vram_mm                                                                                                                                             linux-qz0r.site: Thu Jul 27 00:25:38 2017
    
    0x0000000000000000-0x0000000000000040: 64: used
    0x0000000000000040-0x0000000000000165: 293: used
    0x0000000000000165-0x00000000000001d6: 113: used
    0x00000000000001d6-0x00000000000005d6: 1024: used
    0x00000000000005d6-0x00000000000005d7: 1: used
    0x00000000000005d7-0x00000000000005d8: 1: used
    0x00000000000005d8-0x0000000000000dc1: 2025: used
    0x0000000000000dc1-0x0000000000000dc5: 4: used
    0x0000000000000dc5-0x0000000000000dc6: 1: used
    0x0000000000000dc6-0x0000000000000dc7: 1: used
    0x0000000000000dc7-0x0000000000000dc8: 1: used
    0x0000000000000dc8-0x0000000000000dc9: 1: used
    0x0000000000000dc9-0x0000000000000dd1: 8: used
    0x0000000000000dd1-0x0000000000000de1: 16: used
    0x0000000000000de1-0x0000000000000df1: 16: used
    0x0000000000000df1-0x0000000000000df5: 4: used
    0x0000000000000df5-0x0000000000000df9: 4: used
    0x0000000000000df9-0x0000000000000dfd: 4: used
    0x0000000000000dfd-0x0000000000000e01: 4: used
    0x0000000000000e01-0x0000000000000e05: 4: used
    0x0000000000000e05-0x0000000000000e0d: 8: used
    0x0000000000000e0d-0x0000000000000e2d: 32: used
    0x0000000000000e2d-0x0000000000000e2e: 1: used
    0x0000000000000e2e-0x0000000000000e30: 2: free
    0x0000000000000e30-0x0000000000000e40: 16: used
    0x0000000000000e40-0x0000000000000e48: 8: used
    0x0000000000000e48-0x0000000000000e50: 8: used
    0x0000000000000e50-0x0000000000000ea0: 80: free
    0x0000000000000ea0-0x0000000000000ea8: 8: used
    0x0000000000000ea8-0x0000000000000eb0: 8: used
    0x0000000000000eb0-0x0000000000000eb8: 8: used
    0x0000000000000eb8-0x0000000000000ec0: 8: used
    0x0000000000000ec0-0x0000000000000ed0: 16: used
    0x0000000000000ed0-0x0000000000000ee0: 16: used
    0x0000000000000ee0-0x0000000000000ef0: 16: used
    0x0000000000000ef0-0x0000000000000ef8: 8: used
    0x0000000000000ef8-0x0000000000000f08: 16: used
    0x0000000000000f08-0x0000000000000f09: 1: used
    0x0000000000000f09-0x0000000000000f10: 7: free
    0x0000000000000f10-0x0000000000000f20: 16: used
    0x0000000000000f20-0x0000000000000f30: 16: used
    0x0000000000000f30-0x0000000000000f31: 1: used
    0x0000000000000f31-0x0000000000000f38: 7: free
    0x0000000000000f38-0x0000000000000f40: 8: used
    0x0000000000000f40-0x0000000000000f48: 8: used
    0x0000000000000f48-0x0000000000000f49: 1: used
    0x0000000000000f49-0x0000000000000f51: 8: used
    0x0000000000000f51-0x0000000000000f61: 16: used
    0x0000000000000f61-0x0000000000000f68: 7: free
    0x0000000000000f68-0x0000000000000f69: 1: used
    0x0000000000000f69-0x0000000000000f70: 7: free
    0x0000000000000f70-0x0000000000000f78: 8: used
    0x0000000000000f78-0x0000000000000f80: 8: used
    0x0000000000000f80-0x0000000000000f88: 8: used
    0x0000000000000f88-0x0000000000000f98: 16: used
    0x0000000000000f98-0x0000000000000fa0: 8: used
    0x0000000000000fa0-0x0000000000000fa1: 1: used
    0x0000000000000fa1-0x0000000000000fa8: 7: free
    0x0000000000000fa8-0x0000000000000fa9: 1: used
    0x0000000000000fa9-0x0000000000000fb9: 16: used
    0x0000000000000fb9-0x0000000000000fc1: 8: used
    0x0000000000000fc1-0x0000000000000fd1: 16: used
    0x0000000000000fd1-0x0000000000000fd9: 8: used
    0x0000000000000fd9-0x0000000000000fe0: 7: free
    0x0000000000000fe0-0x0000000000000ff0: 16: used
    0x0000000000000ff0-0x0000000000000ff8: 8: used
    Code:
    Every 1,0s: cat /sys/kernel/debug/dri/0/ttm_dma_page_pool                                                                                                                                          linux-qz0r.site: Thu Jul 27 00:25:38 2017
    
             pool      refills   pages freed    inuse available     name
               wc         5008             0     3833    16199 radeon 0000:03:00.0
           cached        22077         83375     4929        4 radeon 0000:03:00.0
    Code:
    Every 1,0s: cat /sys/kernel/debug/dri/0/ttm_page_pool                                                                                                                                              linux-qz0r.site: Thu Jul 27 00:25:37 2017
    
      pool      refills   pages freed     size
        wc            0             0        0
        uc            0             0        0
    wc dma            0             0        0
    uc dma            0             0        0
    openSUSE Tumbleweed x64, KDE Framework 5

  5. #5
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    I used 'dmesg -w' via SSH to monitor dmesg output as the system froze. I have not seen anything of interest, and no new messages were printed before the crash took place. The only arguably suspicious line was:

    Code:
    [ 1286.800069] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
    Never the less, I have discovered another important factor during my tests: I decided to look through my BIOS settings again, as I remembered I had left enabled a memory overclock setting called Performance Enhance. In the past when I had a different set of memories, this option caused the exact same freeze when I was watching Youtube (1080p @ 60fps videos). Later on I got new memories, and due to how my clocks are synced I'm running those at an underclocked (therefore more stable) frequency, so I figured I can leave this enabled without problems. The highly erratic probabilities of the freezes threw me off (once it's after 10 minutes, then it's after 2 hours), whereas a crash this obvious would have been all over the bug tracker by now if it was Mesa.

    After disabling it, I no longer seem to get any immediate system freezes. It will however require more testing to confirm it was that option, so please give me a few more weeks before we close this. If my theory is proven wrong, I'll immediately post a new comment and let everyone know.
    openSUSE Tumbleweed x64, KDE Framework 5

  6. #6
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    The freeze still happens with the Performance Enhance BIOS setting turned off, the crash is not caused by my overclocking settings. It took 2 hours of playing Minecraft in a row for it to occur again.

    I noticed an important clue: In the case of Minecraft, the system only seems to crash after mobs have loaded into view. If I only explore a world where no entities spawn (be it full of voxel geometry), the freeze has never happened thus far. This made me realize that all engines I noticed the freeze with have one thing in common: A skeletal mesh is loaded into view. Could this be an issue related to animated models by chance?

    Note that I don't suspect Vertex Buffer Objects to be a cause: I once turned off VBO in Minecraft, restarted the game, and still got a system freeze.
    openSUSE Tumbleweed x64, KDE Framework 5

  7. #7
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    For the time being, I've decided to test whether this also happens with the RadeonSI scheduler. To make sure I'm applying it to all games across the system, I've added the following line to ~/.profile and restarted:

    Code:
    export R600_DEBUG=sisched
    I managed to play Minecraft for over an hour several times, including in areas with many mobs and therefore skeletal models in view... so far no freeze. However it will take much more testing to be sure this makes a difference, so far there is no real verdict. I'll also follow the advice of testing with Supertux Kart, which should be an easier test case for other developers.

    If the SI scheduler does turn out to fix the problem, it would mean this is a bug specific to the old scheduler (still default, hence why that environment variable is needed to switch). That would make sense since IIRC the scheduler influences how drawable items are queued and rendered, which is a likely candidate for something causing an error that freezes the system.
    openSUSE Tumbleweed x64, KDE Framework 5

  8. #8
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    To rule out the possibility of a hardware issue, I ran two Memtest86 5.01 sessions from a Clonezilla bootable CD. The first was in the day for 5 hours, the second was during the night for over 10 hours: The program only registered 3 passes in total, but it did not find any errors. I'll attach a picture just in case any useful information is printed there.

    openSUSE Tumbleweed x64, KDE Framework 5

  9. #9
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    887

    Default Re: 3D engines causing frequent GPU lockups

    Wasn't sure whether to bump this same bug report, as the original issue has clearly been fixed during nearly an year of countless Kernel + Mesa + driver updates. Unfortunately I now experience a new issue acting just like what I described here at the time: When certain 3D engines are running, there is a chance that after a few minutes the machine instantly freezes and becomes fully unusable until powered off and back on. I don't know when the new crash was implemented since I haven't played a lot of 3D games recently, but I'd assume somewhere within the last few months.

    I now have Kernel 4.15.3 and Mesa 18.0.0. Again my video card is a Radeon R7 370 from Gigabyte (RadeonSI, GCN 1.0, AMD Pitcairn Islands). I'm running the openSUSE Tumbleweed x64 rolling release distribution.

    Can someone please explain a way to debug those instant system freezes as they're added to the system components? I can't get an output at the time of the crash as the entire machine stops working and becomes bricked until restarted (likely including SSH), but maybe I can make it log info that I can retrieve after I reboot? Any useful info will help, just please nothing dangerous that might permanently break my OS.
    openSUSE Tumbleweed x64, KDE Framework 5

  10. #10
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    29,717
    Blog Entries
    15

    Default Re: 3D engines causing frequent GPU lockups

    Hi
    Are you using radeon or amdgpu? I would switch if using radeon and see if that makes a difference.

    Can you show the output from;
    Code:
    /usr/sbin/lspci -nnk | grep -A3 VGA
    The switch is pretty easy, a blacklist, add some boot options and rebuild initrd...
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

Page 1 of 7 123 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •