Is my Z640 in trouble? How can I diagnose connection problem

I spent a good deal of time attempting to solve a network connection issue and concluded it was a wiring problem which I thought I fixed as the HP machine diagnosis which comes with the Z640 could not find any issue with the system when running on windows 10

Sadly after a brief respite the problem has recurred. It appears the performance of the network connection deteriorates over time until neither Leap 15.4 nor Windows 10 can connect using the nic and with very low speeds showing when using an USB wifi dongle.
The lan cable connection now no longer works and the wifi is very slow. I checked the connections with my laptop and both methods were near maximum speed for my ISP connection so it is my Z640 that is the problem, not the network.

I am planning to do a thorough clean of my machine to remove dust and grime but meanwhile are there any diagnostic tools I can use to find out if there is a problem?

@Budgie2 So is the laptop wifi and etherent the same speed, eg it’s common for laptops to run at 100M rather than 1000MB. I assume you also used the usb wifi dongle in both machines?

If you set the Z640 to the same speed as the laptop ethernet is the issue present or not? Did you run the same amount of data through the laptop as the desktop?

Check the data in/out on the interface with ip stats show dev eth0 group link do you see errors in the output?

I use USB to Ethernet devices here, work fine…

What’s a Z640? Is that some hardware that’s newer than 15.4 and needs newer kernel and/or drivers?

@mrmazda They are references to HP Workstations models, I have a Z440, the Z640 can have two cpu’s. And no they are circa 2016, AND support Linux specifically SLED and derivatives… as well as all the Linux tools to update the BIOS and are also supported with fwupd.

Among recent Intel motherboard chipsets used by HP: H410, B460, H470, Q470, W480, Z490, H510, B560, H570, W580, Z590, H610, B660, Q670, W680, Z690, B760, H770, Z790

Among recent AMD motherboard chipsets used by HP: B350, X370, B450, X470, A520, B550, X570, A620, B650, X670

Can anyone see why I asked? Better too much info than not enough when asking for help.

@mrmazda I tend to look at previous user posts… especially if forget to post desktop environment… eg Search results for '@Budgie2 Z640' - openSUSE Forums

Hi and thanks for the replies. I was running Leap 15.4 with KDE desktop. Sorry I left that out. BTW Malcolm, the Z640 to which I refer has two processors installed and was intended to be my main workstation. So much for good intentions!

I note Malcolm’s comments and will repeat his suggestions to make sure I am repeating same tests on the Z640 and the laptop.

I am not sure how I can set the speed yet, I was just using Broadband Speedchecker. My ISP has a limit of 40 Mb/s and the laptop was showing between 36 and 38 last night so I do not think there are any contention issues on the wan connection. Sadly my Z640 was only showing around 2 Mb/s or less.

I shall do the spring clean first but then will use phoronix test suite if I can remember how to drive it. I shall then be able to provide all the info requested. Of course there may be a bigger problem if my connection is so poor that I cannot install phoronix!

Here are the details of my Z640 system:-

alastair@HP-Z640-1:~> phoronix-test-suite system-info


Phoronix Test Suite v10.8.4
System Information


  PROCESSOR:              2 x Intel Xeon E5-2620 v3 @ 3.20GHz
    Core Count:           12                                       
    Thread Count:         24                                       
    Extensions:           SSE 4.2 + AVX2 + AVX + RDRAND + FSGSBASE 
    Cache Size:           30 MB                                    
    Microcode:            0x49                                     
    Core Family:          Haswell                                  
    Scaling Driver:       intel_pstate powersave                   

  GRAPHICS:               llvmpipe
    OpenGL:               4.5 Mesa 21.2.4 (LLVM 11.0.1 256 bits) 
    Monitor:              ASUS PB258                             
    Screen:               2560x1440                              

  MOTHERBOARD:            HP 212A v1.01
    BIOS Version:         M60 v02.59            
    Chipset:              Intel Xeon E7 v3/Xeon 
    Audio:                Realtek ALC221        
    Network:              Intel I218-LM         

  MEMORY:                 64GB

  DISK:                   2000GB KINGSTON SNVS2000GB
                      + 256GB Samsung SSD 850
                      + 1000GB Western Digital WD1003FBYX-23
                      + 0GB USB CRW-CF/MD
                      + 0GB USB CRW-SD
                      + 0GB USB CRW-MS
    File-System:          btrfs                                                   
    Mount Options:        relatime rw space_cache ssd subvol=/@/home subvolid=263 
    Disk Scheduler:       NONE                                                    
    Disk Details:         Block Size: 4096                                        

  OPERATING SYSTEM:       openSUSE 15.4
    Kernel:               5.14.21-150400.24.41-default (x86_64)                                                                                     
    Desktop:              KDE Plasma 5.24.4                                                                                                         
    Display Server:       X Server 1.20.3                                                                                                           
    Security:             itlb_multihit: KVM: Mitigation of VMX disabled                                                                            
                          + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable                                        
                          + mds: Mitigation of Clear buffers; SMT vulnerable                                                                        
                          + meltdown: Mitigation of PTI                                                                                             
                          + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable                                                            
                          + retbleed: Not affected                                                                                                  
                          + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp                                                     
                          + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization                                      
                          + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected 
                          + srbds: Not affected                                                                                                     
                          + tsx_async_abort: Not affected                                                                                           

alastair@HP-Z640-1:~> 

All the above looks fine and I shall start looking at suitable modules for testing but meanwhile I think I may have identified a problem with this machine.

The machine has been set up for dual boot with Leap 15.4 and Windows 10.

It is usually remote in a cupboard so I also set up wol capability.

The grub boot has been set to default to the Leap 15.4 system and, on occasions I need the machine to run Windows 10. If it is subsequently shut using the normal windows shut down, it cannot be started using wol. This is an energy saving artefact from Microsoft.

I therefore set up a dedicated shut down button which puts the machine in power state 3 and this is usually fine.

However if I re-start from windows, because I wish to resume working on Leap 15.4, a message “WER Fault” flashes up for a second as it shuts down before re-starting.

When this happens and the machine re-starts, my Grub system defaults to Leap 15.4 as expected but then fails to make a network connection. It seems my system is being messed up and continues in this state. The only way I have found to fix this is to remove power from (unplug) my system and nic for a couple of minutes. If I then reconnect the machine starts as intended and I then do have a good network connection.

Still checking options but it appears the problem results from the windows re-start.

From what command?

I’m surprised to see whatever command generated the above announced OpenGL differently from inxi:

> inxi -S --vs
inxi 3.3.24-00 (2022-12-10)
System:
  Kernel: 5.14.21-150400.24.33-default arch: x86_64 bits: 64
    Desktop: KDE v: 3.5.10 Distro: openSUSE Leap 15.4
> inxi -Cx
CPU:
  Info: dual core model: Intel Core i3-4150T bits: 64 type: MT MCP
    arch: Haswell rev: 3 cache: L1: 128 KiB L2: 512 KiB L3: 3 MiB
  Speed (MHz): avg: 2701 high: 2702 min/max: 800/3000 cores: 1: 2700 2: 2700
    3: 2702 4: 2702 bogomips: 24000
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
> inxi -Gaz --zl
Graphics:
  Device-1: Intel 4th Generation Core Processor Family Integrated Graphics
    vendor: Micro-Star MSI driver: i915 v: kernel arch: Gen-7.5
    process: Intel 22nm built: 2013 ports: active: HDMI-A-1
    empty: HDMI-A-2,VGA-1 bus-ID: 00:02.0 chip-ID: 8086:041e class-ID: 0300
  Display: x11 server: X.Org v: 1.20.3 driver: X: loaded: modesetting
    unloaded: fbdev,vesa alternate: intel dri: crocus gpu: i915 display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 120 s-size: 406x254mm (15.98x10.00")
    s-diag: 479mm (18.85")
  Monitor-1: HDMI-A-1 mapped: HDMI-1 model: Samsung SMS24A850
    serial:  built: 2012 res: 1920x1200 hz: 60 dpi: 94 gamma: 1.2
    size: 518x324mm (20.39x12.76") diag: 611mm (24.1") ratio: 16:10 modes:
    max: 1920x1200 min: 720x400
  API: OpenGL v: 4.6 Mesa 21.2.4 renderer: Mesa Intel HD Graphics 4400 (HSW
    GT2) compat-v: 3.1 direct render: Yes

Does inxi do the same for you? If it matches mine, good. If not, there’s a graphics config problem somewhere.

I never have trouble on account of shutting down from Win10, because that never happens. I always exit Windows by rebooting and either booting Linux, or powering down while the Grub menu is up.

@mrmazda the HP Z workstations cpu’s don’t have integrated graphics… so no intel GPU unless it’s and Intel ARC… all are discrete PCIeX16 V3.0 slots…

Same issue/question with NVidia:

# pinxi -Gaz --vs
pinxi 3.3.24-00 (2022-12-10)
Graphics:
  Device-1: NVIDIA GF119 [NVS 310] vendor: Hewlett-Packard driver: nouveau
    v: kernel non-free: series: 390.xx+ status: legacy-active (EOL~late 2022)
    arch: Fermi code: GF1xx process: 40/28nm built: 2010-16 pcie: gen: 1
    speed: 2.5 GT/s lanes: 16 ports: active: DP-1,DP-2 empty: none
    bus-ID: 01:00.0 chip-ID: 10de:107d class-ID: 0300 temp: 46.0 C
  Display: x11 server: X.Org v: 1.20.3 driver: X: loaded: modesetting
    dri: nouveau gpu: nouveau display-ID: :0 screens: 1
  Screen-1: 0 s-res: 4480x1440 s-dpi: 120 s-size: 948x304mm (37.32x11.97")
    s-diag: 996mm (39.19")
  Monitor-1: DP-1 pos: primary,left model: Acer K272HUL serial: 
    built: 2018 res: 2560x1440 hz: 60 dpi: 109 gamma: 1.2
    size: 598x336mm (23.54x13.23") diag: 686mm (27") ratio: 16:9 modes:
    max: 2560x1440 min: 720x400
  Monitor-2: DP-2 pos: right model: NEC EA243WM serial:  built: 2011
    res: 1920x1200 hz: 60 dpi: 94 gamma: 1.2 size: 519x324mm (20.43x12.76")
    diag: 612mm (24.1") ratio: 16:10 modes: max: 1920x1200 min: 640x480
  API: OpenGL v: 4.3 Mesa 21.2.4 renderer: NVD9 direct-render: Yes

Ok, so some thoughts based on what I’m seeing here.

Cleaning your machine isn’t going to accomplish anything relative to the onboard NIC and (if I am understanding correctly, but you really didn’t clearly explain) onboard WiFi. There may be an issue with the controller or controllers. How are you running each OS? Is this a dual-boot, or is one living inside of a virtual machine being hosted by the other? If you’ve dual booted your system, then to me that suggests a hardware problem if two discrete OS installations are having the same problem.

You mentioned a USB dongle WiFi (if I understood what you wrote correctly) but you didn’t really explain how that is involved in the situation. With all the respect in the world, you’re making the rest of us have to interpret and assume a lot of things, which of course impacts our ability to help you.

If a separate device running in parallel with this other computer experiences NO issues at all, and multiple OSs on the device to which you are referring experience the same problem, my immediate assumption is you have a hardware issue (failing motherboard? bad RAM? multiple failing controllers? failing chipset?).

It would be really helpful to us if you would clearly state:

  1. What kind of computer is this, exactly?
  2. How are the OSs that you’re using on it actually installed?
  3. Are you using a built-in Wireless NIC on that computer in addition to a USB WiFi dongle?
  4. Are you having network throughput problems across multiple different bus types?

Before tearing down your machine, I would download (on a different computer which works) the MemTest86 disk image and write it to a flash drive, then boot the affected system with it and let it check your RAM.

Also, just for laughs, maybe consider grabbing a different distro which has a live image ISO (Linux Mint, for example) and write that to a flash drive and run from that instead of your present system drive. It’s conceivable you have a failing system drive which isn’t failing in a way that’s setting off any alarm bells. Run that for an extended period of time and see if you still have network performance issues.

AFAIK the Z640 is the same as the Z440, just with the additional riser card for the second CPU and probably the 1Kw Power Supply…

I have the same NIC (and others…)

MOTHERBOARD:            HP 212B v1.01
    BIOS Version:         M60 v02.59                                    
    Chipset:              Intel Xeon E7 v3/Xeon                         
    Audio:                Realtek ALC221                                
    Network:              Intel I218-LM + 4 x Realtek RTL8111/8168/8411 

I run rasdaemon here to kep an eye on the system;

ras-mc-ctl --summary

No Memory errors.

No PCIe AER errors.

No ARM processor errors.

No Extlog errors.

No devlink errors.
No disk errors.
No Memory failure errors.

No MCE errors.

No receding hairline errors

No excessive amounts of money in the bank account errors

Whoops! :rofl:

I have been trying to set up ras and get the following result:-

alastair@HP-Z640-1:~> sudo ras-mc-ctl --summary
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1169.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1170.
alastair@HP-Z640-1:~> 

I have clearly missed something in trying to set this up. Please advise.

Meanwhile I am sure I have identified the problem and it has nothing to do with the Leap 15.4 system or my NIC. The problem is the shutdown state of Windows 10. There are many options in windows 10 and my aim was to shot down using the windows command shutdown script

shutdown /s /f /t 3

I thought this shut down to sleep state 3 or sleep. If so it doesn’t work. I must shut down completely with the start button shutdown. This ensures that when I start the machine by hand it starts grub and boots whatever system is selected and works correctly with network connection working.

On reflection and as indicated by mrmazda a full shut down from windows should not be a problem. I will check the restart and clarify exactly what scenario causes the windows problem. Meanwhile I have moved the machine to where I can always start it with the hardware button.

@Budgie2 what version of rasdaemon, I updated the one Tumbelweed version.

To do a full shutdown in windows it’s shutdown /s /t 3 or via the menu system once the shutdown options are displayed, hold the shift key down and select shutdown and hold the shift key untill it starts to power down.

Hi Malcolm, on this Z640 I have v0.6.7.18.git+7ccf12f-1504002.7 from my repo.

Regarding the windows 10 issue, the red button I configured, (I thought using advice or a link from you a while ago,) is set with an extra /f as above but afaik this is the default. I was mistaken regarding the shutdown state since the red button script is supposed to do a complete shutdown but whatever I do, I get the problem when rebooting having left windows.

A restart from windows definitely causes the problem.

Since I rarely use the windows system on this machine I am reconciled to unplugging if necessary now I know what to do to get my network connection back. The only other issue might be associated with PoE on the switch but again afaik the switch only gets it’s power over PoE but does not serve it to the client ports and that should not make any difference unless there is an issue with the on-board nic. If I can get rasdaemon to work I might learn more.

Meanwhile many thanks once more. I sometimes wonder why I have all the difficult problems but I do try to solve them myself for hours, with much reading to boot, before asking questions.

Hi,
I had been hoping for some further help with ras but meanwhile whatever the problem I have had with network connection it has become slightly worse.

Now if I boot into windows it works fine but even if I do a full shutdown using the shutdown /s /t 3 command or use the normal windows shutdown I cannot restart the machine, boot into Leap 15.4 and get a network connection.

Grub starts OK when the machine is powered on and if I then select Leap 15.4, which is set as the default on grub, the machine continues and I can log into my Leap 15.4 system but then there is no network connection.

To overcome this problem I must unplug the power cord after the windows shutdown and then restart with the hardware button which then continues as normal.

As stated above the condition has deteriorated over time from when I first started this thread but HP diagnostics have failed to find any issues at all. I hope ras will turn up more details if I can get it to run.

I revisit this thread in hope of getting an answer. To recap I have a Z640 in my room which is the one giving me the network connection problem described above.

My other HP Z640 has now been put into the working situation in which it is sited in the server closet and accessed remotely from my desk. It now boots to windows 10 but I also have leap 15.4 installed just in case.

The two machines are essentially the same except that my machine here has two processors installed whereas the remote machine only has one.

I have been working on a new printer installation and moving the office computers around and this has required me to reconfigure all the relevant computers to access the new and relocated printers. The remote machines, ie two IBM servers and the remote Z640 I have to boot using wol. I have never had any issues with the IBM servers but the Z640 will no longer boot from wol. It is not as bad as the problem reported above because if I start the machine with the button on the machine I still get a network connection but this is a new problem, in a way possibly related to the above. I really do need to investigate further.

What I noticed when booting the problem machine is that in the screen shows (briefly as it is scrolling) that the eth0 connection is UP but when eventually I log in, there is no internet connection and I must unplug the power connection from the computer and then start again. It does not work if I just unplug the lan.

I am beginning to think there is a M$ plot to stop me booting into Linux!!!

I have spent some more time on this problem which is that if I re-boot from running windows, closing windows and starting Leap 15.4 I have no network connection unless I unplug the machine and then restart it in Leap 15.4.

When I ran ip neigh the first time I had a message INCOMPLETE. It seems my problem might have been caused by the two L2/3 switches that are between my workstation and the DNS and DHCP servers set up in the UTM and the way in which ARP works.

I have my Leap 15.4 machine on a static IP but had not set the static address on the windows 10 machine so thought this might have been confusing the arp tables being used. I therefore set the windows machine to the same static IP that I have configured on the Leap 15.4 machine.

Sadly this has not solved my problem but I am showing the output of the various tests I have used to date in trying to diagnose the problem.

First the results from Leap 15.4 having just re-started from a windows which had dhcp when last booted:-

alastair@HP-Z640-1:~> ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 70:5a:0f:3a:fa:27 brd ff:ff:ff:ff:ff:ff
    altname eno1
    altname enp0s25

alastair@HP-Z640-1:~> ip neigh
192.168.169.129 dev eth0  INCOMPLETE
alastair@HP-Z640-1:~>

alastair@HP-Z640-1:~> ping 192.168.169.129
PING 192.168.169.129 (192.168.169.129) 56(84) bytes of data.
^C
--- 192.168.169.129 ping statistics ---
13 packets transmitted, 0 received, 100% packet loss, time 12268ms

I then changed the windows machine to the same static IP and did ran the same tests on Leap 15.4:-

alastair@HP-Z640-1:~> ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 70:5a:0f:3a:fa:27 brd ff:ff:ff:ff:ff:ff
    altname eno1
    altname enp0s25

alastair@HP-Z640-1:~> ip neigh
192.168.169.129 dev eth0 lladdr 7c:5a:1c:82:f1:43 REACHABLE
alastair@HP-Z640-1:~>


alastair@HP-Z640-1:~> ping 192.168.169.129
PING 192.168.169.129 (192.168.169.129) 56(84) bytes of data.
64 bytes from 192.168.169.129: icmp_seq=1 ttl=64 time=0.287 ms
64 bytes from 192.168.169.129: icmp_seq=2 ttl=64 time=0.274 ms
64 bytes from 192.168.169.129: icmp_seq=3 ttl=64 time=0.331 ms
^C
--- 192.168.169.129 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.274/0.297/0.331/0.024 ms
alastair@HP-Z640-1:~>

I then tried restarting Leap 15.4 again without having unplugged just in case a cache needed to be flushed:-

alastair@HP-Z640-1:~> ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 70:5a:0f:3a:fa:27 brd ff:ff:ff:ff:ff:ff
    altname eno1
    altname enp0s25
alastair@HP-Z640-1:~>

alastair@HP-Z640-1:~> ip neigh
192.168.169.129 dev eth0 lladdr 7c:5a:1c:82:f1:43 DELAY
alastair@HP-Z640-1:~>

alastair@HP-Z640-1:~> ping 192.168.169.129
PING 192.168.169.129 (192.168.169.129) 56(84) bytes of data.
From 192.168.169.137 icmp_seq=3 Destination Host Unreachable
From 192.168.169.137 icmp_seq=4 Destination Host Unreachable
From 192.168.169.137 icmp_seq=5 Destination Host Unreachable
From 192.168.169.137 icmp_seq=6 Destination Host Unreachable
From 192.168.169.137 icmp_seq=7 Destination Host Unreachable
From 192.168.169.137 icmp_seq=8 Destination Host Unreachable
^C
--- 192.168.169.129 ping statistics ---
9 packets transmitted, 0 received, +6 errors, 100% packet loss, time 8124ms
pipe 3
alastair@HP-Z640-1:~>

Still no internet connection. Finally I unplugged the system and then started Leap 15.4 to resume this thread.