NVIDIA graphics card not recognised

I have two NVIDIA gfx cards in my machine, and the second card isn’t being recognised / having the driver loaded.

here is the relevant section of lspci -v (compared using diff --side-by-side) with just the “invisible” card" installed on the left, and both cards on the right.

**03:00.0** VGA compatible controller: NVIDIA Corporation Device  | **01:00.0** VGA compatible controller: NVIDIA Corporation GF108GL
        Subsystem: NVIDIA Corporation **Device 1c82**             |         Subsystem: NVIDIA Corporation **Device 0835**
        **Physical Slot: 4 **                                     |         **Physical Slot: 2**
        Flags: bus master, fast devsel, latency 0, IRQ 16     |         Flags: bus master, fast devsel, latency 0, IRQ 31
        Memory at f9000000 (32-bit, non-prefetchable) [size=1] |         Memory at d0000000 (64-bit, prefetchable) [size=128]
        Memory at d0000000 (64-bit, prefetchable) [size=32]  |         Memory at d8000000 (64-bit, prefetchable) [size=32]
        I/O ports at e000 [size=128]                                    I/O ports at e000 [size=128]
        Expansion ROM at fa000000 [disabled] [size=512]      |         [virtual] Expansion ROM at fa000000 [disabled] [size=] Power Management version 3                   Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 6 |         Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 6
        Capabilities: [78] Express Legacy Endpoint, MSI 00    |         Capabilities: [78] Express Endpoint, MSI 00
                                                              >         Capabilities: [b4] Vendor Specific Information: Len=1
        Capabilities: [100] Virtual Channel                             Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting       <
        Capabilities: [128] Power Budgeting <?>                         Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting          <
        Capabilities: [600] Vendor Specific Information: ID=0           Capabilities: [600] Vendor Specific Information: ID=0
        Capabilities: [900] #19                               |         Kernel driver in use: nvidia
        Kernel modules: nouveau                               |         Kernel modules: nouveau, nvidia_drm, nvidia


03:00.1 Audio device: NVIDIA Corporation Device 0fb9 (rev a1) | 01:00.1 Audio device: NVIDIA Corporation GF108 High Definitio
        Subsystem: NVIDIA Corporation Device 1c82             |         Subsystem: NVIDIA Corporation Device 0835
        Physical Slot: 4                                      |         Physical Slot: 2
        Flags: bus master, fast devsel, latency 0, IRQ 17               Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at fa080000 (32-bit, non-prefetchable) [size=1] Power Management version 3                   Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 6           Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 6
        Capabilities: [78] Express Endpoint, MSI 00                     Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting          <
        Kernel driver in use: snd_hda_intel                             Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel                Kernel modules: snd_hda_intel

Can you advise:

  1. what additional evidence / logs etc would help locate the failure, and/or
  2. how I can “force” recognition of the invisible card & load the driver?

Thanks in advance,
David[/size][/size][/size][/size][/size][/size][/size][/size][/size]

Well the left output does “see” the card, even has the nouveau driver loaded. I haven’t had any NVIDIA graphics based systems at hand, but IIRC there’s an option in the NVIDIA installer to handle multiple cards, but I’m not sure.

Unfortunately:
http://www.tomshardware.co.uk/answers/id-3229891/nvidia-drivers-unstable-multiple-cards.html

I think your problem may stem from trying to mix and match cards from different series. If you tried three cards from the same generation you might have better luck.

I forgot to say thank you … your comment pointed me in the right direction and some quick searching found an answer - the two cards won’t mix. :frowning:

I’ve sorted the situation by swapping one of the NVIDIA cards for an ATI one, and now both cards are seen and respond.

Sometimes it’s useful having a bit of bits stripped from old machines …

It looks like nouveau and nvidia proprietary drivers were loaded. They usually do not mix well, especially when you have 2 or more cards. Having different cards is not a problem as long as you do not want to use them in SLI mode.
You can try blacklist (add) nouveau driver in /etc/modprobe.d/50-blacklist.conf and run mkinitrd. And reboot. Then only nvidia proprietary driver will get loaded.

To check if nvidia and/or nouveau drivers loaded run (probably as root):
lsmod | egrep “nvdia|nouveau”

Sorry, I should have been clearer … I did quite a bit of messing round trying to get both cards seen by the nvidia driver, including reverting to the nouveau driver. The lspci with the single card was from when I had the nouveau driver loaded & had taken out one of the cards to see if it was a hardware problem (it wasn’t). I then switched to the nvidia driver (needed for CUDA) and then added back the second card (first card disappeared), which is when I saved the next lspci.

Now, the two cards are present and working …

$ /sbin/lspci -nn
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RV370 [Radeon X300/X550/X1050 Series] [1002:5b63]
01:00.1 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] RV370 [Radeon X300/X550/X1050 Series] (Secondary) [1002:5b73]
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
03:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
$ lsmod | egrep -i 'nvidia|nouveau|gfx|vga'
nvidia_drm             49152  0 
nvidia_modeset        860160  2 nvidia_drm
drm_kms_helper        155648  1 nvidia_drm
drm                   393216  4 ttm,drm_kms_helper,nvidia_drm
nvidia_uvm            704512  0 
nvidia              13160448  10 nvidia_modeset,nvidia_uvm
$ deviceQuerydeviceQuery Starting...


 CUDA Device Query (Runtime API) version (CUDART static linking)


Detected 1 CUDA Capable device(s)


Device 0: "GeForce GTX 1050 Ti"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 4039 MBytes (4235001856 bytes)
  ( 6) Multiprocessors, (128) CUDA Cores/MP:     768 CUDA Cores
  GPU Max Clock rate:                            1392 MHz (1.39 GHz)
  Memory Clock rate:                             3504 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >


deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

… so, I’m now happy enough … until I break something …

I should add that I’ve just started a Deep Learning course, and this is to save me a small fortune in renting cloud VMs + GPUs for DNN training …