Small squares of random pixels static artifact on video card

Is this card defective, or am I only seeing a power supply, driver or other issue? Generally, how do you diagnose or reproduce such an issue (under Windows) for the repair technicians?

I’m using a compositing window manager (KWin) for my desktop (i.e., one which renders all windows to a texture and displaying that with optional zooming and panning effects). Block artifacts eventually appear on the desktop (actually on the texture, as they seem to ‘stick’ to a certain area of the desktop even when zooming out). Each appears to be a random bitmap of about 16x16 pixels large and each has a single color with transparency (i.e., each pixel is either lit 100% to the color for the block or transparent). Each block seems strictly aligned to a grid that has the same cell size as a single block. A block disappears after window content is refreshed (screen part gets overwritten).

The card is new and has never been overclocked. It has started to show these artifacts after the first or second week. It has not been used interactively in the first week (i.e., I have no data whether it produced these signs out of the box) as the computer was running burn-in tests.

As a quick test, replacing the card with an nVidia 9500 GT does make the artifacts disappear, but of course it is not a solution.

Using openSUSE 12.1 (x86_64) Asparagus;
Linux 3.1.9-1.4-desktop #1 SMP PREEMPT Fri Jan 27 08:55:10 UTC 2012 (efb5ff4) x86_64 x86_64 x86_64 GNU/Linux;
nvidia NVIDIA-SMI 2.285.05, Driver Version: 285.05.33;
GPU card: Gainward nVidia GTX-280 3GB.

View image: 2012 04 04 15 10 27 laszlo marak gpu artifact

http://i41.tinypic.com/2afbwns.jpg

http://imageupload.org/thumb/thumb_219718.jpg](http://imageupload.org/en/file/219718/gpu-artifact-zoom.jpg.html)

As a quick test, replacing the card with an nVidia 9500 GT does make the artifacts disappear, but of course it is not a solution.
Sounds like it is

Why the older nvidia driver?

Thank you for your views.

It is not a good long term solution because the 9500 GT is a much inferior card and it is borrowed from a different PC anyway.

I’m using this version of the driver because this is the official CUDA driver version. Anyway, I’ve tried different versions between 275 and 290 without success.

Can you verify the card is good in a different machine/install and or OS?
Presumably this is a new phenomena ?

What I meant was: It might be that the card is on the way out.

Looks like memory corruption on the card. My GT9800 showed exactly this, no matter which NVIDIA driver version. I thought I had it fixed, then after a week the machine froze, I replaced the card with a new NVIDIA, flawlessly on the same driver.

Well, the card is new - it has been both in early April (just weeks ago). On the first week, I didn’t use KDE, as it was only running batch processing jobs. However, as soon as I’ve started to use it interactively, I’ve encountered the glitch in a few days.

I’m a bit hesitant because the whole computer is bought new and assembled, on which I’ve done a fresh SUSE install (I was using a different system before on my previous PCs) and didn’t know what to expect.

Sadly, as this card has high demands, my only other machine here doesn’t have a power supply that is powerful enough for testing. A local computer shop was kind enough to do an hour of some basic testing for me in one of their simple Windows 7 bench machines, but it did not show such signs there.

I’ll try to negotiate with the retailer.

Thanks, this is very insightful information. It is also unfortunate that the issue (currently) does not cause stability issues for me, and might only be confirmed by manual inspection (or even a specialized in-house CUDA memory tester utility?) Though, maybe it will get worse after some more stressing.

The card is on the way back. I’ll keep you posted on the resolution. By the way, sorry for the typo, the card is actually a GTX-580.

Finally someone with the same problem as me !

I have the EXACT same squares randomly appearing with my monstrous GTX580 which works perfectly well on Windows ! (Very high demanding games, hours of them, without anything remotely looking like a glitch on screen). So I guess we can rule out the video card : the problem lies elsewhere, either in the driver or in the Xorg server.

I hope one day someone will come here as a hero and tell us why it’s happening :smiley:

If anyone has any idea, my relevant config parts:
openSUSE 12.1 x86 - 3.1.10-1.9-default #1 SMP
Gainward nvidia GTX 580 3GB RAM
nvidia proprietary driver v295.40
Gnome 3 Desktop Environment

Edit: apparently we have the exact same card, I had to double check I wasn’t the one who posted your message seeing how much our problem is similar :smiley:

On 05/13/2012 12:46 AM, xwolfi wrote:
>
> Finally someone with the same problem as me !
>
> I have the EXACT same squares randomly appearing with my monstrous
> GTX580 which works perfectly well on Windows ! (Very high demanding
> games, hours of them, without anything remotely looking like a glitch on
> screen). So I guess we can rule out the video card : the problem lies
> elsewhere, either in the driver or in the Xorg server.
>
> I hope one day someone will come here as a hero and tell us why it’s
> happening :smiley:
>
> If anyone has any idea, my relevant config parts:
> openSUSE 12.1
> nvidia GTX 580

Apparently, you need to someone to state the obvious. Your system has a hardware
or software problem. :slight_smile:

A successful run with Windows does not rule out a hardware problem with the card
as it and Linux do things very differently. Perhaps the card functions perfectly
when loading the card’s RAM from one region of host memory, but fails from
another. It could also be a RAM failure in the host that does not show up in
Windows because it never gets used there. It could be a marginal power supply
that cannot handle the load imposed by quicker operations of the Linux kernel.
It might be that Linux works the GPU harder and it gets hotter. I could go on
for a lot longer, but I think you probably get the idea. Any statements
regarding the correctness of the hardware between Linux and Windows are
essentially limited to a dead/non-dead determination. It is difficult to
determine effects much more subtle than that.

More likely it is a bug in the driver. As you did not state what one you are
using, then any further speculation would be pointless.

Well I did state which one I was using: nvidia proprietary driver v295.40, or is there something else I should give ?

The problem certainly lies, as you said, in the driver, and indeed you’re right about the windows thing, but on the other hand we are not really being absolute here, the important thing is to rule out very improbable reasons. As I spent 4 months using my card without any problem on windows and the card works most of the time correctly on linux, except from small squares like the OP showed from times to times, it would be counter-productive to still keep the hardware option open.

But in an absolute world it could be some subtle electric variations that only happen by chance when I run linux and not when I run Windows :smiley: Or it could even be my screen badly reacting to the gnome desktop environment - everything is possible.

To stay serious a minute here, we can find out what we have in common, the OP and me, and what we haven’t:

Common: Exact same Gainward GTX580 card
Exact same “version” of opensuse (11.2)

Different: OpenSuse Architecture (he has x64, I have i686)
Driver version (he has 285.05.33, I have 295.40)
Desktop environment (he has KDE4, I have Gnome 3) - but it isn’t really relevant I guess anyway
Linux kernel version (he has 3.1.9-1.4-desktop, I have 3.1.10-1.9-default)

What can we rule out in your opinion ? If you still insist on a hardware problem, then I’ll try to install fedora and the proprietary drivers to see if there is any difference, but that would take some times and I’d like to find other easier options before trying it. The driver option is the most probable, but we aren’t that many with the problem (the OP is the only one I ever found) while the GTX580 sold well - which kinda point towards a hardware problem :frowning:

Edit: Actually some others have the same problem : P9X79 + i7 3820 + GTX 580 + Linux(Gentoo, Ubuntu) = square multicolor artifacts - nV News Forums

OK. You are aware that 294.40 has lots of bugs and that you should be using 295.49.

When you change distros, you only change the user-space stuff. The kernel and
the external graphics drivers are identical as long as the kernel version is the
same.

My experience is the power management in GNU/Linux is not as good as the power management in MS-Windows, mainly due to the proprietary video driver in MS-Windows being much better and also due to power regressions in the GNU/Linux kernel which while improved (from some kernel versions back) are still not up to that of MS-Windows. I concede this experience of mine comes from much older hardware and what I saw was a case where the power supply was only ‘just’ barely adequate (for MS-Windows) meant that it ran well under MS-Windows with serious graphic use, it simply was not adequately functional for GNU/Linux, and I saw small squares of random pixels, static artifacts, etc in GNU/Linux that I did not see in MS-Windows.

When I moved the graphic card to a PC with a more powerful power supply, the artifacts in GNU/Linux (which were never present in MS-Windows) disappeared for GNU/Linux. I’m not saying this is your problem, but rather I am recommending caution in coming to a conclusion on this.

Unfortunately its not always so straight forward.

Well yeah I kinda see what you mean but I have a 800W Corsair modular power supply so I guess I’ll try more recent drivers then.

Thanks for your updates, sorry for the late reply but we’ve just received the replacement card from far away (or at least we hope it’s a new one). I’ll examine your posts and links and reflect each one a bit later.

Standard ESD handling precautions have been again followed when inserting the card.

The computer has just been started, was running mostly idle on a desktop - basically clicking on windows and doing text editing produced the artifact again on this new card in less than half an hour. Room HVAC was running. Overheating is thus not probable.

Chassis: Supermicro sc743tq-865b-sq black 4u rack
(from text: PSU 865W, AC cooling redundant, 80+ certified)
SUPERMICRO SC743TQ-865 Black 4U Rack Server Chassis, 8x SAS/SATA HS, 865W Super Quiet HE PSU : AVADirect Custom Computer Component

All 3 power cables inserted into the GPU.

We have ECC memory, so no memory errors are possible.

If nothing helps, we’ll write OpenCL tests for memory and processing consistency.

Verdict: driver bug. The problem has been silently fixed in the new Nvidia beta driver version 302.07 released on the 2nd of May - a week after detecting and RMA’ing the card!

Do look out for that the regular download pages do not list this, so you need to click either the Unix archive or the “beta or older downloads” page to fetch it.

Linux AMD64 Display Driver Archi

Can I mark this topic solved?

I too am experiencing a similar problem. I apologize if I should have started a new thread as this thread was marked closed or resolved? My box has an nvidia GeForce 8300 GS. The OS is OpenSuSE 12.3 with all the latest updates, as far as I know. The problem seems as though it can start after just a few minutes from logging in, or hours later. The machine is always on. All I have to do is log out and log back in and the problem stops, for awhile.

On to the actual problem: What usually happens is the mouse pointer, or the “icons” on the task bar, or the “icons” on the tabs on firefox, or all the “graphical” elements on almost any or all programs become corrupted. By this I mean they may be totally or partially replaced with random pixels. Or, sometimes, they may be replaced with what are obviously portions of some random graphic - like a piece of a picture from a web page. Another common problem will occur in firefox, but then it seems to spread throughout the desktop, and that is all the graphical elements will be replaced with random parts of other graphics…

What it appears to me that is happening, is that video memory is somehow being corrupted. It looks like the desktop environment (kde4), or x.org is somehow getting a pointer stack hosed. When you log out and log back in all is good as new.

When I run ‘lspci -k’ this is the relevant info:
01:00.0 VGA compatible controller: NVIDIA Corporation G86 [GeForce 8300 GS] (rev a1)
Subsystem: NVIDIA Corporation Device 0494
Kernel driver in use: nouveau

Again the specs:
OS: OpenSuSE 12.3
Driver: nouveau
Card: GeForce 8300 GS

What other info would be helpful???

Thanks for any brainstorming help in advance!
Terry.

As a follow up, when I run:


mach:/home/someone# grep -i LoadModule /var/log/Xorg.0.log
 91398.789] (II) LoadModule: "dri2"
 91398.789] (II) LoadModule: "glamoregl"
 91398.906] (II) LoadModule: "glx"
 91398.922] (II) LoadModule: "nvidia"
 91398.933] (II) UnloadModule: "nvidia"
 91398.933] (II) LoadModule: "nouveau"
 91398.934] (II) LoadModule: "nv"
 91398.948] (II) LoadModule: "modesetting"
 91398.956] (II) LoadModule: "fbdev"
 91398.966] (II) LoadModule: "vesa"
 91398.975] (II) LoadModule: "fbdevhw"
 91398.994] (II) LoadModule: "dri"
 91399.270] (II) LoadModule: "fb"
 91399.307] (II) LoadModule: "exa"
 91399.307] (II) LoadModule: "shadowfb"
 91399.308] (II) UnloadModule: "nv"
 91399.308] (II) UnloadModule: "modesetting"
 91399.308] (II) UnloadModule: "fbdev"
 91399.308] (II) UnloadModule: "vesa"
 91400.053] (II) LoadModule: "evdev"

Does this help? Why is x.org trying to load nv and all those other modules after it loaded nouveau???

An example of the problem in firefox:
https://docs.google.com/file/d/0B3JnXpQJbBZKR0JaSHhNSk55U0k/edit?usp=sharing