Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Graphics card suddenly causes boot crash with mce error

  1. #1
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    933

    Default Graphics card suddenly causes boot crash with mce error

    Something strange and unsettling happened to me today. I woke up to my screen no longer powering back on after moving the mouse, not an entirely unique occurrence. I restarted and was surprised to see that right before the login screen, the monitor would power itself off, and this time I was unable to do a clean shutdown by pressing the power button. It soon became apparent the computer would stay frozen for roughly a minute, then proceed to restart itself and repeat the cycle. After one restart I'm able to catch the following error message in the console:



    I realized it must be hardware related since I didn't install any updates nor make changes to the system configuration for over a week, this wouldn't happen yesterday on the exact same system... to confirm it I reproduced by booting a live image, exact same behavior there. I pulled out the memory modules and tried them in sets, disconnected all hard drives, tried two different screens (HDMI and DisplayPort cables), booting two kernels (5.14 and 5.15), radeon vs amdgpu, reset the CMOS via pins... in the end the only thing that worked was removing my video card and plugging in an older one.

    What makes this extremely bizarre is that I get image up until boot time: I can enter BIOS just fine, see GRUB, there are no GPU freezes or graphical corruption... this seems to be all Linux detecting an error and freaking out over it. All error messages are prefixed with "mce" and oddly enough reference a CPU issue, the rest of my hardware works just fine so it's not the processor thank god.

    Does anyone know what could break in a video card that would make Linux do this? I saw a reference about a `mcelog` command for these errors, but like I said the machine becomes completely inoperable after that's printed so I can't issue any commands. If you can suggest further tests I'll take a look, but please mention everything I could test first as I don't feel comfortable plugging and pulling the video card with my motherboard so often and risk breaking things (tried it twice today). If this is a hardware issue that can't be solved from kernel I have no choice but to spend a large sum of money I didn't want to spend... figured I'd ask for help here first so I know I tried everything else.

  2. #2
    Join Date
    Aug 2008
    Location
    Brazil
    Posts
    3,278

    Default Re: Graphics card suddenly causes boot crash with mce error

    AFAIU nvidia/nouveau drivers are only loaded after X starts (or concomitantly). Before that your GPU is using fbdev and/or another intermediate driver. That's why you can only use, say, 4K resolution, after the desktop starts, not in BIOS or GRUB or splash screen.

    So perhaps it is a driver problem (it may have been corrupted somehow) or, more probable, a hardware fail. I'd try the card in another system, with a livecd or another driver before discarding it, however.

  3. #3

    Default Re: Graphics card suddenly causes boot crash with mce error

    Which video cards?
    ILL videocard hardware failure.

  4. #4
    Join Date
    Jun 2008
    Location
    East of Podunk
    Posts
    33,246
    Blog Entries
    15

    Default Re: Graphics card suddenly causes boot crash with mce error

    Hi
    Maybe firmware update available?

    Code:
    fwupdtool get-devices
    Bank 5 is RAM, maybe run memtest?

    If you now put the other graphics card back, does the error duplicate?
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  5. #5
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    933

    Default Re: Graphics card suddenly causes boot crash with mce error

    Quote Originally Posted by malcolmlewis View Post
    Hi
    Maybe firmware update available?

    Code:
    fwupdtool get-devices
    Bank 5 is RAM, maybe run memtest?

    If you now put the other graphics card back, does the error duplicate?
    I didn't have time to run memtest fully yet. I did however try booting with two in two RAM boards, same issue with any combination. With another video card I have no problems, this problem only occurs when plugging in the old one.

  6. #6
    Join Date
    Jun 2008
    Location
    East of Podunk
    Posts
    33,246
    Blog Entries
    15

    Default Re: Graphics card suddenly causes boot crash with mce error

    Quote Originally Posted by MirceaKitsune View Post
    I didn't have time to run memtest fully yet. I did however try booting with two in two RAM boards, same issue with any combination. With another video card I have no problems, this problem only occurs when plugging in the old one.
    Hi
    Can you advise what card works and what doesn't? Maybe a firmware update for the card in question may be available?
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  7. #7
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    933

    Default Re: Graphics card suddenly causes boot crash with mce error

    Quote Originally Posted by malcolmlewis View Post
    Hi
    Can you advise what card works and what doesn't? Maybe a firmware update for the card in question may be available?
    The previous card I'm falling back on is old enough that I forgot what it is. The one with the issue is an XFX AMD Radeon™ R9 390X.

    Like I said I don't suspect a software problem to be involved: It worked well for at least a week since the latest updates, haven't changed even any configuration since the issue began. I presume it must be a physical issue with the card, but seemingly one that just throws the Linux kernel or drivers off without breaking anything else. I tried swapping cards two times, the broken one always fails to boot while the fallback always works flawlessly.

  8. #8
    Join Date
    Jun 2008
    Location
    East of Podunk
    Posts
    33,246
    Blog Entries
    15

    Default Re: Graphics card suddenly causes boot crash with mce error

    Quote Originally Posted by MirceaKitsune View Post
    The previous card I'm falling back on is old enough that I forgot what it is. The one with the issue is an XFX AMD Radeon™ R9 390X.

    Like I said I don't suspect a software problem to be involved: It worked well for at least a week since the latest updates, haven't changed even any configuration since the issue began. I presume it must be a physical issue with the card, but seemingly one that just throws the Linux kernel or drivers off without breaking anything else. I tried swapping cards two times, the broken one always fails to boot while the fallback always works flawlessly.
    Hi
    So it needs external power, that all ok (as in your power supply)?
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  9. #9
    Join Date
    Jan 2009
    Location
    Romania, Bucharest
    Posts
    933

    Default Re: Graphics card suddenly causes boot crash with mce error

    Quote Originally Posted by malcolmlewis View Post
    Hi
    So it needs external power, that all ok (as in your power supply)?
    Old (broken) card has two additional connectors, a 6-pin plus an 8-pin... the older (fallback) has only one 6-pin connector and works fine. The PSU makes them customizable (6-pin or 8-pin) so I tried reversing which are plugged into which connector last time, no effect so I don't suspect a bad socket. New card is supposed to arrive soon, I needed an upgrade anyway, I'll be seeing how it goes.

  10. #10
    Join Date
    Jun 2008
    Location
    East of Podunk
    Posts
    33,246
    Blog Entries
    15

    Default Re: Graphics card suddenly causes boot crash with mce error

    Quote Originally Posted by MirceaKitsune View Post
    Old (broken) card has two additional connectors, a 6-pin plus an 8-pin... the older (fallback) has only one 6-pin connector and works fine. The PSU makes them customizable (6-pin or 8-pin) so I tried reversing which are plugged into which connector last time, no effect so I don't suspect a bad socket. New card is supposed to arrive soon, I needed an upgrade anyway, I'll be seeing how it goes.
    Hi
    What's your new card? I suspect you just need to rock on with what works. Do you have a secondary PCIe slot for the non working card, plug both in and see if it's working as an offload device (Assuming you have spare power).
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •