Page 3 of 7 FirstFirst 12345 ... LastLast
Results 21 to 30 of 61

Thread: 3D engines causing frequent GPU lockups

  1. #21
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    I've been testing this crash using Xonotic during the past two days, granted it's a game I have a lot of experience customizing. What I found is pretty interesting and should be a good start in shedding light on this bug.

    Initially the system freeze occurred somewhere between 10 and 40 minutes. Upon changing a few cvars, I seem to have almost entirely gotten rid of it: After nearly 5 hours of continuous testing, only one lockup has taken place! Below are the cvar overrides I added to my autoexec.cfg for the test: At least one of them had an influence... I'm still working on pinning down which, and that will take several more days due to the probability rate of the issue.

    Code:
    r_batch_multidraw 0 // old: 1
    r_batch_dynamicbuffer 0 // old: 1
    r_depthfirst 0 // old: 2
    gl_vbo 0 // old: 3
    gl_vbo_dynamicindex 0 // old: 1
    gl_vbo_dynamicvertex 0 // old: 1
    r_glsl_skeletal 0 // old: 1
    vid_samples 1 // old: 4
    gl_texture_anisotropy 0 // old: 16
    I know the issue has something to do with triangles or vertices: The crash seems more frequent when there are a lot of players or objects present, indicating that an increased surface count may be a contributor. I've suspected mesh data stored on the video card to be the culprit, especially shared data with multiple objects using one instance of a mesh from video memory. This is why my bet is currently on gl_vbo (Vertex Buffer Objects / GL_ARB_vertex_buffer_object) being the variable that made a difference... again I still got a lockup even without it, so if anything it just heavily mitigated the crash.

    This belief is reinforced by my previous experience in Blender: The only scene causing the GPU lockup is one where several high-poly objects share common mesh data, and the crash always occurred upon me adding a Subdivision Surface to just one of them (increasing its polygon count). It's been confirmed that as of Blender 2.77 (I have 2.79) VBO is indeed enabled in the 3D viewport. Note that I was also using the untextured viewport, thus I doubt textures play a role.

    Lastly I ruled out the possibility of overheating having anything to do with it: During the first 3 hours in which I got no lockup, the temperature in my room was above 26°C. When I did get that one lockup later at night, the temperature of my room had long dropped to 23°C. The stress on the GPU was the same at all times, absolutely no settings were changed including the map.
    openSUSE Tumbleweed x64, KDE Framework 5

  2. #22
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    Testing is still heavily undergoing. There's still nothing conclusive yet, but I should definitely share a piece of information early on.

    To my surprise, it would appear the culprit may be either Anti-Aliasing or Anisotropic Filtering. I decided to re-enable their cvars first in Xonotic since I honestly suspected them the least... the moment I did that all hell broke lose again: In 30 minutes I had two system lockups! Then I disabled them once more, and could play a 40 minute match with no problem.

    I have no idea which of the two it could be, but I should be getting there in the following days. I'm slowly re-enabling the other cvars first to rule them out, then I'll see whether AA or texture filtering is behind the crashes.
    openSUSE Tumbleweed x64, KDE Framework 5

  3. #23
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    And we have a verdict. The influential factor is by far the anti-aliasing, at least in the case of Xonotic. The other cvars I previously mentioned have absolutely no effect on the frequency of this freeze.

    Today I enabled the feature again and tried playing another match: I instantly got two lockups, one after 8 minutes and the other after only 20 seconds! I then disabled it and let the bots play again while I was away: This time the machine froze after more than 2 whole hours of experiencing no issues.

    I find it interesting how the probability of the freeze seems to scale with the number of samples: If I use 4x AA ("vid_samples 4"), I get a crash roughly every 30 minutes... if I disable AA ("vid_samples 1"), I get a crash less than once per 2 hours... 30 minutes * 4x = 2 hours. Maybe this is just me seeing patterns but I thought I should suggest the idea.

    I'd like to hear some thoughts from the developers or experienced users at this point. Can we close in on the source of this GPU lockup, knowing that Anti Aliasing greatly affects its frequency in Darkplaces engine? Are there any open bugs about AA related X11 crashes I should check out? What else can I test, ideally still under Xonotic where I have the best test case prepared?
    openSUSE Tumbleweed x64, KDE Framework 5

  4. #24
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    I should add another detail to the discussion. I know this may be a separate issue which might have nothing to do with the crash, but at the same time I wouldn't be surprised if it does: Glitched graphics often indicate something going wrong with the display, such as corrupt textures in video memory, which may ultimately lead to just such a lockup.

    On occasion, certain programs (namely Firefox and Blender) glitch out and draw broken rectangles all over the window. Some of those glitches are just boxes of random colors, others contain pieces of past images (for instance I saw patterns from my lock screen background). Sometimes they quickly disappear on their own, at other times I have to restart the program as it becomes illegible and unusable. If I move anything the squares flicker all over the place. The glitches continue even after I disable desktop effects, thus KDE compositing should have nothing to do with it.

    Attached is a screenshot of the glitch happening in Blender, showing its window covered in the corrupt squares. I'm curious what your opinion is. Again I know this may be an unrelated issue, but I'm wondering whether it indicates some video storage corruption that's also leading up to the lockups.

    openSUSE Tumbleweed x64, KDE Framework 5

  5. #25
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    15,802

    Default Re: 3D engines causing frequent GPU lockups

    Maybe a thermal problem check the fans. Maybe monitor the temp of the GPU

  6. #26
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    Quote Originally Posted by gogalthorp View Post
    Maybe a thermal problem check the fans. Maybe monitor the temp of the GPU
    Both video card fans work fine, I check every now and then to blow the dust out of them. I have a temperature monitor in KSysGuard as well as a Plasma Widget: The GPU temperature always stays at roughly the same as the CPU, meaning 46°C which is a very safe and normal temp... it does not change while Firefox and Blender experience the visual glitch described above. The corrupt squares wouldn't have anything to do with the temp anyway, whereas the lockup doesn't seem to be caused by GPU stress but rather by specific circumstances like certain meshes or textures or shaders or other obscure factors.
    openSUSE Tumbleweed x64, KDE Framework 5

  7. #27
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    I'm still struggling to debug this. The more I see the more my jaw drops.

    First of all, the rule that disabling anti-aliasing decreases the frequency of the freeze (see the comments above) was just patched out: AA no longer has any effect either, it always freezes between 0 and 30 minutes now.

    I ran the following new tests in Xonotic, none of which had any influence:

    - Running with the following environment variable set: R600_DEBUG=checkir,precompile,nooptvariant,nodpbb,nodfsm,nodma,nowc,nooutoforder,nohyperz,norbplus,no2d,notiling,nodcc,nodccclear,nodccfb,nodccmsaa

    - Disabling all shaders, even turning off OpenGL 2.0 support entirely.

    - Resetting the entire BIOS to its failsafe defaults, making sure that neither overclocking nor any other settings are involved.

    - Running under both an X11 and Wayland session (Plasma). In Wayland it crashes instantly so it's even worse.

    - Verified that this occurs on both the "radeon" and "amdgpu" modules, meaning the video driver makes no difference either.

    It's clear to me at this point that this is the work of a professional: The code causing the crash is carefully maintained and injected into my system. If this was just a bug, at least one of the countless things I tried would have affected it somehow, it's impossible for a randomly occurring bug to survive so many different settings and environments... the issue instead is adaptive, so that the moment I find and disable one implementation another is activated within minutes to keep the crashes going. I imagine the objective is to block the user from finding a solution and ultimately censor them from using specific programs. I find it unbelievable that someone out there is actively doing this.

    Please help me get to the bottom of this: The crash clearly acts by simulating some sort of bug, so there must be a vulnerability deep in the system which hidden code is exploiting. I offered a lot of test data on this report: If the developers read this, please let me know what to try next!
    openSUSE Tumbleweed x64, KDE Framework 5

  8. #28
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    I decided to turn my attention to the last logical thing I can imagine: DPM (Dynamic Power Management) and the clocks on my video card. The kernel added support for realtime tuning of the frequencies a while ago, so I was pondering if the default setup may have led to excess overclocking.

    I left a console to watch the file /sys/kernel/debug/dri/0/amdgpu_pm_info which I understand contains the video card frequencies. The maximum "power level" I seem to reach is 4, at sclk 101500 and mclk 140000. I'm attaching the peak output of this file here.

    My video card is supposed to run at 1015 MHz (core clock) + 5600 MHz (memory clock). I don't fully understand how those numbers translate to frequencies, but from what I heard that represents the MHz * 100. If such is the case, my GPU clock is just right whereas my VRAM is actually under-clocked to a quarter of its default frequency! Can anyone confirm this so at least the hypothesis of bad clocks is out of the way?

    I may try testing with the kernel parameters "radeon.dpm=0 amdgpu.dpm=0" later: I tried doing so briefly but the performance is too horrible to play a game, so I'll instead leave a bot match running in spectator mode while I'm away.

    Code:
    Every 2.0s: cat /sys/kernel/debug/dri/0/amdgpu_pm_info                                            linux-qz0r.site: Mon Apr  2 01:18:47 2018
    
    Clock Gating Flags Mask: 0x0
            Graphics Medium Grain Clock Gating: Off
            Graphics Medium Grain memory Light Sleep: Off
            Graphics Coarse Grain Clock Gating: Off
            Graphics Coarse Grain memory Light Sleep: Off
            Graphics Coarse Grain Tree Shader Clock Gating: Off
            Graphics Coarse Grain Tree Shader Light Sleep: Off
            Graphics Command Processor Light Sleep: Off
            Graphics Run List Controller Light Sleep: Off
            Graphics 3D Coarse Grain Clock Gating: Off
            Graphics 3D Coarse Grain memory Light Sleep: Off
            Memory Controller Light Sleep: Off
            Memory Controller Medium Grain Clock Gating: Off
            System Direct Memory Access Light Sleep: Off
            System Direct Memory Access Medium Grain Clock Gating: Off
            Bus Interface Medium Grain Clock Gating: Off
            Bus Interface Light Sleep: Off
            Unified Video Decoder Medium Grain Clock Gating: Off
            Video Compression Engine Medium Grain Clock Gating: Off
            Host Data Path Light Sleep: Off
            Host Data Path Medium Grain Clock Gating: Off
            Digital Right Management Medium Grain Clock Gating: Off
            Digital Right Management Light Sleep: Off
            Rom Medium Grain Clock Gating: Off
            Data Fabric Medium Grain Clock Gating: Off
    
    uvd    vclk: 0 dclk: 0
    power level 4    sclk: 101500 mclk: 140000 vddc: 1163 vddci: 1000 pcie gen: 2
    openSUSE Tumbleweed x64, KDE Framework 5

  9. #29
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    27,828
    Blog Entries
    15

    Default Re: 3D engines causing frequent GPU lockups

    Hi
    What is the following set to?
    Code:
    cat /sys/class/drm/card0/device/power_dpm_force_performance_level
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  10. #30
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    810

    Default Re: 3D engines causing frequent GPU lockups

    Quote Originally Posted by malcolmlewis View Post
    Hi
    What is the following set to?
    Code:
    cat /sys/class/drm/card0/device/power_dpm_force_performance_level
    Code:
    mircea@linux-qz0r:~> cat /sys/class/drm/card0/device/power_dpm_force_performance_level
    auto
    openSUSE Tumbleweed x64, KDE Framework 5

Page 3 of 7 FirstFirst 12345 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •