Leap 15 and weird lan crash dump

Hi,
I’ve opensuse on HP DL380 G9, I’ve found some crashes in the messages. Server only disconnect from lan (Cisco Nexus), but second server (active/pasive) works fine. Eth4 and Eth5 are bonded to LACP (over vpc on nexus switches too)
What caused it ? Bad network cables ? Ports on nexus ? Or some bug in the kernel ? On second server is a same version - no errors.
Thanks and best regards

May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:923(eth4)]begin crash dump -----------------
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:933(eth4)]def_idx(0x1040) def_att_idx(0x8d6) attn_state(0x0) spq_prod_idx(0x58) next_stats_cnt(0x103a)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:938(eth4)]DSB: attn bits(0x0) ack(0x10) id(0x0) idx(0x8d6)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:939(eth4)] def (0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1555 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0) igu_sb_id(0x0) igu_seg_id(0x1) pf_id(0x0) vnic_id(0x0) vf_id(0xff) vf_valid (0x0) state(0x1)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:990(eth4)]fp0: rx_bd_prod(0x66dc) rx_bd_cons(0x515) rx_comp_prod(0xb3cd) rx_comp_cons(0xb201) *rx_cons_sb(0xb201)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:993(eth4)] rx_sge_prod(0x7fc0) last_max_sge(0x7bd8) fp_hc_idx(0x7c75)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp0: tx_pkt_prod(0x839) tx_pkt_cons(0x839) tx_bd_prod(0xda58) tx_bd_cons(0xda57) *tx_cons_sb(0x839)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
May 14 11:43:21 mail1 kernel[1548]: Last message ‘bnx2x: [bnx2x_panic_’ repeated 1 times, suppressed by syslog-ng on mail1.fnhk.cz
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1021(eth4)] run indexes (0x7c75 0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1027(eth4)] indexes (0x0 0xb201 0x0 0x0 0x0 0x839 0x0 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
May 14 11:43:21 mail1 kernel: SM[0] __flags (0x0) igu_sb_id (0x2) igu_seg_id(0x0) time_to_expire (0x62c2a02) timer_value(0xff)
May 14 11:43:21 mail1 kernel: SM[1] _flags (0x0) igu_sb_id (0x2) igu_seg_id(0x0) time_to_expire (0x633d661) timer_value(0xff)
May 14 11:43:21 mail1 kernel: INDEX[0] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[1] flags (0x2) timeout (0x6)
May 14 11:43:21 mail1 kernel: INDEX[2] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[3] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[4] flags (0x1) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[5] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[6] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[7] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:990(eth4)]fp1: rx_bd_prod(0x88d2) rx_bd_cons(0x70b) rx_comp_prod(0xaab8) rx_comp_cons(0xa8ec) *rx_cons_sb(0xa8ec)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:993(eth4)] rx_sge_prod(0xa180) last_max_sge(0x9d8a) fp_hc_idx(0xcadb)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp1: tx_pkt_prod(0x976b) tx_pkt_cons(0x976b) tx_bd_prod(0xfc1d) tx_bd_cons(0xfc1c) *tx_cons_sb(0x976b)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp1: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
May 14 11:43:21 mail1 kernel[1548]: Last message 'bnx2x: [bnx2x_panic
’ repeated 1 times, suppressed by syslog-ng on mail1.fnhk.cz
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1021(eth4)] run indexes (0xcadb 0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1027(eth4)] indexes (0x0 0xa8ec 0x0 0x0 0x0 0x976b 0x0 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
May 14 11:43:21 mail1 kernel: SM[0] __flags (0x0) igu_sb_id (0x3) igu_seg_id(0x0) time_to_expire (0x631e594) timer_value(0xff)
May 14 11:43:21 mail1 kernel: SM[1] __flags (0x0) igu_sb_id (0x3) igu_seg_id(0x0) time_to_expire (0x5df69c2) timer_value(0xff)
May 14 11:43:21 mail1 kernel: INDEX[0] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[1] flags (0x2) timeout (0x6)
May 14 11:43:21 mail1 kernel: INDEX[2] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[3] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[4] flags (0x1) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[5] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[6] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[7] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:990(eth4)]fp2: rx_bd_prod(0xeac7) rx_bd_cons(0x900) rx_comp_prod(0x56ad) rx_comp_cons(0x54e1) *rx_cons_sb(0x54e1)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:993(eth4)] rx_sge_prod(0x6000) last_max_sge(0x5c11) fp_hc_idx(0x32d2)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp2: tx_pkt_prod(0x69f9) tx_pkt_cons(0x69f9) tx_bd_prod(0x3c39) tx_bd_cons(0x3c38) *tx_cons_sb(0x69f9)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp2: tx_pkt_prod(0x2) tx_pkt_cons(0x2) tx_bd_prod(0x8) tx_bd_cons(0x7) *tx_cons_sb(0x2)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp2: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1021(eth4)] run indexes (0x32d2 0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1027(eth4)] indexes (0x0 0x54e1 0x0 0x0 0x0 0x69f9 0x2 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
May 14 11:43:21 mail1 kernel: SM[0] __flags (0x0) igu_sb_id (0x4) igu_seg_id(0x0) time_to_expire (0x6312da2) timer_value(0xff)
May 14 11:43:21 mail1 kernel: SM[1] _flags (0x0) igu_sb_id (0x4) igu_seg_id(0x0) time_to_expire (0x499f9f0) timer_value(0xff)
May 14 11:43:21 mail1 kernel: INDEX[0] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[1] flags (0x2) timeout (0x6)
May 14 11:43:21 mail1 kernel: INDEX[2] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[3] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[4] flags (0x1) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[5] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[6] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[7] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:990(eth4)]fp3: rx_bd_prod(0xcb8c) rx_bd_cons(0x9c5) rx_comp_prod(0xf723) rx_comp_cons(0xf557) *rx_cons_sb(0xf576)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:993(eth4)] rx_sge_prod(0x7740) last_max_sge(0x735e) fp_hc_idx(0x99d5)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp3: tx_pkt_prod(0xe6d4) tx_pkt_cons(0xe6c7) tx_bd_prod(0xa2f4) tx_bd_cons(0xa266) *tx_cons_sb(0xe6d4)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp3: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
May 14 11:43:21 mail1 kernel[1548]: Last message 'bnx2x: [bnx2x_panic
’ repeated 1 times, suppressed by syslog-ng on mail1.fnhk.cz
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1021(eth4)] run indexes (0x99fd 0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1027(eth4)] indexes (0x0 0xf576 0x0 0x0 0x0 0xe6d4 0x0 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
May 14 11:43:21 mail1 kernel: SM[0] __flags (0x1) igu_sb_id (0x5) igu_seg_id(0x0) time_to_expire (0x633d7ff) timer_value(0x6)
May 14 11:43:21 mail1 kernel: SM[1] _flags (0x0) igu_sb_id (0x5) igu_seg_id(0x0) time_to_expire (0x5df68c7) timer_value(0xff)
May 14 11:43:21 mail1 kernel: INDEX[0] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[1] flags (0x2) timeout (0x6)
May 14 11:43:21 mail1 kernel: INDEX[2] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[3] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[4] flags (0x1) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[5] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[6] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[7] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:990(eth4)]fp4: rx_bd_prod(0x1cf7) rx_bd_cons(0xb30) rx_comp_prod(0x54b2) rx_comp_cons(0x52e6) *rx_cons_sb(0x52e6)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:993(eth4)] rx_sge_prod(0xa080) last_max_sge(0x9c98) fp_hc_idx(0xd7a4)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp4: tx_pkt_prod(0x63a) tx_pkt_cons(0x63a) tx_bd_prod(0xcc87) tx_bd_cons(0xcc86) *tx_cons_sb(0x63a)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp4: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
May 14 11:43:21 mail1 kernel[1548]: Last message 'bnx2x: [bnx2x_panic
’ repeated 1 times, suppressed by syslog-ng on mail1.fnhk.cz
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1021(eth4)] run indexes (0xd7a4 0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1027(eth4)] indexes (0x0 0x52e6 0x0 0x0 0x0 0x63a 0x0 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
May 14 11:43:21 mail1 kernel: SM[0] __flags (0x0) igu_sb_id (0x6) igu_seg_id(0x0) time_to_expire (0x61e9bda) timer_value(0xff)
May 14 11:43:21 mail1 kernel: SM[1] _flags (0x0) igu_sb_id (0x6) igu_seg_id(0x0) time_to_expire (0xe87b54ac) timer_value(0xff)
May 14 11:43:21 mail1 kernel: INDEX[0] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[1] flags (0x2) timeout (0x6)
May 14 11:43:21 mail1 kernel: INDEX[2] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[3] flags (0x0) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[4] flags (0x1) timeout (0x0)
May 14 11:43:21 mail1 kernel: INDEX[5] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[6] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: INDEX[7] flags (0x3) timeout (0xc)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:990(eth4)]fp5: rx_bd_prod(0xae0a) rx_bd_cons(0xc43) rx_comp_prod(0xeecf) rx_comp_cons(0xed03) *rx_cons_sb(0xed03)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:993(eth4)] rx_sge_prod(0xb600) last_max_sge(0xb204) fp_hc_idx(0x7a8)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp5: tx_pkt_prod(0x6276) tx_pkt_cons(0x6276) tx_bd_prod(0xe78c) tx_bd_cons(0xe78b) *tx_cons_sb(0x6276)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1010(eth4)]fp5: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)
May 14 11:43:21 mail1 kernel[1548]: Last message 'bnx2x: [bnx2x_panic
’ repeated 1 times, suppressed by syslog-ng on mail1.fnhk.cz
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1021(eth4)] run indexes (0x7a8 0x0)
May 14 11:43:21 mail1 kernel: bnx2x: [bnx2x_panic_dump:1027(eth4)] indexes (0x0 0xed03 0x0 0x0 0x0 0x6276 0x0 0x0)pf_id(0x0) vf_id(0xff) vf_valid(0x0) vnic_id(0x0) same_igu_sb_1b(0x1) state(0x1)
May 14 11:43:21 mail1 kernel: SM[0] __flags (0x0) igu_sb_id (0x7) igu_seg_id(0x0) time_to_expire (0x62b612d) timer_value(0xff)
May 14 11:43:21 mail1 kernel: SM[1] __flags (0x0) igu_sb_id (0x7) igu_seg_id(0x0) time_to_expire (0x5bf8978) timer_value(0xff)


Questions:

  • What openSUSE version?
  • Are you using kernel default BNX driver or HP supplied one?
  • Do you have the latest HP/BNX Firmware installed? Latest HP firmware?
  • Are the firmwares on both servers identical?

Hi,
thanks for the answer.
I use Leap 15, no HP drivers, just default kernel, version is same on both servers (they were in the same time updated).

mail1:~ # rpm -qa | grep firmware
kernel-firmware-20180525-lp150.2.3.1.noarch

kernel version:
Linux mail1 4.12.14-lp150.12.13-default #1 SMP Wed Aug 8 19:31:27 UTC 2018 (942604c) x86_64 x86_64 x86_64 GNU/Linux

Anyway - I’m not sure if server’s firmware is same. The servers were updated about 2 years ago.

Thanks
JK

Hi there,
I found some errors (not like described above) on the second server:
May 14 11:51:24 **mail2 **kernel: bnx2x: [bnx2x_acquire_hw_lock:2030(eth4)]Timeout
May 14 11:51:24 **mail2 **kernel: bnx2x: [bnx2x_release_hw_lock:2064(eth4)]lock_status 0x0 resource_bit 0x1. Unlock was called but lock wasn’t taken!

Should it be caused by “older” kernel firmware ? If is this a correct kernel’s firmwares for network cards.
zypper lu :
v | openSUSE-Leap-15.0-Oss | kernel-default | 4.12.14-lp150.12.13.1 | 4.12.14-lp150.11.4
| x86_64
v | openSUSE-Leap-15.0-Update | kernel-default | 4.12.14-lp150.12.13.1 | 4.12.14-lp150.12.58.1
| x86_64
v | openSUSE-Leap-15.0-Update | kernel-firmware | 20180525-lp150.2.3.1 | 20190312-lp150.2.16.1
| noarch

Thanks and best regards
J.Karliak

Yes, that’s why I was asking if those had the latest firmware.

That particular bug was caused, at least in the past, by buggy firmware in the LAN chipset itself. I would urge you to update the DL380G9 BIOS/Firmware to the latest, then followed by updating the LAN adapters firmware - both are available through HP’s support website - we had similar issues during heavy load on the Broadcom chips.

You can find the LAN firmware here:

https://support.hpe.com/hpesc/public/km/product/1009087943/Product#t=DriversandSoftware&sort=%40hpescuniversaldate%20descending&layout=table&f:@kmswsoftwaretypekey=[swt8000029]

and BIOS / System OS firmware here:

https://support.hpe.com/hpesc/public/km/product/1009087943/Product#t=DriversandSoftware&sort=%40hpescuniversaldate%20descending&layout=table&f:@kmswsoftwaretypekey=[swt8000194]

Both offer .RPM and/or SCEXE which you can use to update the firmwares via Linux as root or with sudo.

Ok,
thank you very much for the advices.
Best regards
J.Karliak

Good afternoon,
what driver’s version I could use for Opensuse Leap15 ?
For rpms "tg3-kmp-default-3.137y_k4.12.14_23-2.sles15.x86_64.rpm tg3-kmp-default-3.137y_k4.4.73_5-2.sles12sp3.x86_64.rpm
" there are failed dependencies:
varování: /usr/local/src/tg3-kmp-default-3.137y_k4.4.73_5-2.sles12sp3.x86_64.rpm: Hlavička V3 RSA/SHA256 Signature, key ID 26c2b797: NOKEY
chyba: Selhalé závislosti:
ksym(default:__dev_kfree_skb_any) = 26ffcb6c je potřeba pro tg3-kmp-default-3.137y_k4.4.73_5-2.sles12sp3.x86_64
ksym(default:__free_page_frag) = 6fbf07cd je potřeba pro tg3-kmp-default-3.137y_k4.4.73_5-2.sles12sp3.x86_64
ksym(default:__napi_schedule) = 2ba94939 je potřeba pro tg3-kmp-default-3.137y_k4.4.73_5-2.sles12sp3.x86_64

varování: /usr/local/src/tg3-kmp-default-3.137y_k4.12.14_23-2.sles15.x86_64.rpm: Hlavička V3 RSA/SHA256 Signature, key ID 26c2b797: NOKEY
chyba: Selhalé závislosti:
ksym(default:__cpu_online_mask) = 31cd8869 je potřeba pro tg3-kmp-default-3.137y_k4.12.14_23-2.sles15.x86_64

Thanks and best regards
J.Karliak