How to identify problematic hardware by ID or PCIe connection

Hi,
I’m getting hundreds of error logs per minute from some device. I’ve used lspci and dmidecode and hunted through my BIOS but I can’t find any way to link the data they spit out to my actual hardware. I can see the problem seems to be with PCI device 00:1c.4 and 8086:7ab . How am I supposed to track a device down?

9/10/22 4:16 PM    kernel    thunderbolt 0000:09:00.0: AER: can't recover (no error_detected callback)
9/10/22 4:16 PM    kernel    xhci_hcd 0000:3d:00.0: AER: can't recover (no error_detected callback)
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: AER: device recovery failed
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: AER: can't find device of ID00e4
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: AER: Uncorrected (Non-Fatal) error received: 0000:00:1c.4
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4:   device [8086:7abc] error status/mask=00100000/00004000
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4:    [20] UnsupReq               (First)
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: AER:   TLP Header: 34000000 07000052 00000000 00000000

9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4:    [20] UnsupReq               (First)
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: AER:   TLP Header: 34000000 07000052 00000000 00000000
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
9/10/22 4:16 PM    kernel    pcieport 0000:00:1c.4:   device [8086:7abc] error status/mask=00100000/00004000


lspci -v -s 00:1c.4
00:1c.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 (rev 11) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 143
        Bus: primary=00, secondary=07, subordinate=70, sec-latency=0
        I/O behind bridge: 00006000-00008fff [size=12]
        Memory behind bridge: 54000000-820fffff [size=737]
        Prefetchable memory behind bridge: 0000004000000000-000000404a0fffff [size=1185]
        Capabilities: [40] Express Root Port (Slot+), MSI 00
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [98] Subsystem: ASUSTeK Computer Inc. Device 8694
        Capabilities: [a0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [220] Access Control Services
        Capabilities: [200] L1 PM Substates
        Capabilities: [150] Precision Time Measurement
        Capabilities: [a30] #19
        Capabilities: [a90] #25
        Kernel driver in use: pcieport

[/size][/size][/size]

Hi
Thunderbolt ports by the look: See https://bugzilla.kernel.org/show_bug.cgi?id=215453 are you using thunderbolt? I would disable temporarily in the BIOS and see if that helps.

8086:7abc, not 8086:7ab. Intel PCI Express Root Port, Alder Lake-S PCH PCI Express Root Port #5. Many users complain about it. Try to update BIOS. New CPU firmware may help.
Check dmesg.
PCIe error may be a result of bending motherboard, which is typical with Alder Lake. Try to dissemble and carefully assemble PC - if you can. Or ask manufacturer about warranty case.
Try to use lower PCIe version - 3.0 or 2.x.

Thanks malcolm. Indeed, I have TB though nothing is plugged into it. Will test after disabling.

Thanks, I have the newest BIOS and Intel firmware from OBS kernel stable backport.

I don’t think it would be a bent board because this one has a steel plate on the entire backside – very stiff. Good suggestions though. I can try dropping PCIe version but I still need to identify which device/slot/lane is causing issues.

I still want to know how a user is supposed to identify hardware from the logs!
8086:7abc = ???
PCH PCI Express Root Port #5 = Do I need to ask the manufacturer for help identifying this?

Hi
It should be the thunderbolt port…