Raspberry Pi 3B+ swapper, then USB errors: "ChHltd set, but reason is unknown"

A problem with an apparent solution, and a problem. Latter not critical, but these might help others running an environment like mine.

I’m running Leap 15, 4.12.14-lp150.12.45-default #1 SMP, on a Raspberry Pi-3B+. I boot off a µSD or off an SSD drive connected via USB3.

I haven’t been working with this system since about September. When I came back to it and did an update (“zypper -vvv -t patch --no-recommends”), it became unstable (don’t recall which kernel it updated from, but may have been 12.28). The system would hang under any substantial load; the console error message reported a CPU timeout and mentioned “swapper”). I rebuilt swap partition multiple times and even tried a swap file, but the problem persisted. I wasn’t consistent in looking at error logs, but today checked “journalctl” and found the following entry:

Jan 19 16:32:32 Pi-7 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]

The problem was worse if I was running in XFCE, but even if I booted to command line, a loaded system would hang if I did anything intensive (that is, likely to invoke swapping).

Until today.

Today I did a “zypper up” and the resulting install included “udev” and some other packages. I’ve let the system run for 6 hours, did some memory intensive work, and checked “swapon -s” and found that it is, indeed, using swap space. So I think that problem was resolved with the latest update.

If you’re seeing hangs like this with Leap 15 on RPi-3B+, that update might fix it for you.

BUT …

In looking at the “journalctl” logs, I see MANY entries that look like this:

an 29 17:08:34 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600021
Jan 29 17:09:09 Pi-6 kernel: dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 3 - ChHltd set, but reason is unknown
Jan 29 17:09:09 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600021
Jan 29 17:09:15 Pi-6 kernel: dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 2 - ChHltd set, but reason is unknown
Jan 29 17:09:15 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600021
Jan 29 17:09:18 Pi-6 kernel: dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 6 - ChHltd set, but reason is unknown
Jan 29 17:09:18 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x06600029
Jan 29 17:09:18 Pi-6 kernel: dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 4 - ChHltd set, but reason is unknown
Jan 29 17:09:18 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600021
Jan 29 17:09:50 Pi-6 kernel: dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 1 - ChHltd set, but reason is unknown
Jan 29 17:09:50 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600021
Jan 29 17:09:54 Pi-6 kernel: dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 2 - ChHltd set, but reason is unknown
Jan 29 17:09:54 Pi-6 kernel: dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600021

Looking at other forums, I see that this issue was reported for Raspbian last year. It seems to be related to handling FIQ interrupts for the USB driver. A search pointed to

//dwh/usb_iip/dev/software/otg/linux/drivers/dwc_otg_hcd_intr.c

as its origin.

Has anyone else seen this? If so, have you found a fix? System seems stable, but clearly not running optimally.

Correction: system still hangs, but not as predictably.

Came back after dinner to find that I couldn’t log into Leap 15 on the console or via ssh from my Mac. Rebooted. “jounalctl” showed long sequences of messages about “ChHltd set …” over the prior 2-3 hours, but no “swapper” entries. Did find a couple of entries like this, though:

Jan 29 18:44:47 Pi-6 kernel: usb 1-1.1: clear tt 3 (9053) error -71

Looks like there is some continuing problem with USB device support (booted this off SSD/USB3 on the Pi-3B+).

Hi :slight_smile:
Sounds like it’s 3B+ specific, I have zero entries for for Leap 15 on my 3B running since July last year… just normal entries for dwc2/usb.

Not seeing any kernel crashes in the logs? Are you using wifi on the B+?

Hi, Malcolm!

Yes, I think it may be 3B+ specific. I moved Pi’s around among various cases when I got back to VT from MT in Oct; I think I may have been running the 3B in this case over the summer — no problems — and now it’s a 3B+ in the case. Good point: I’d forgotten that. I have a 3B I can repurpose to check it out with … I’ll post if I get a chance to try it.

No kernel crashes … just hangs with the swapper message, and no response from console or remote ssh. Surprised that journalctl reported any timeout/swapper message, since it’s completely unresponsive when it hangs.

I’m using both eth0 and wlan0 on the 3B+. Both work. And I’m seeing occasional journalctl messages reporting errors on each of them occasionally, at which time the network connections die. So there seem to several issues going on at the same time: no idea if they’re related.

I’m updating TW on another microSD to see if it is more stable. (I’d be surprised, but worth a try.).

I run Raspbian on that Pi-3B+ / SSD-USB3, to check it out, and it runs fine — for long periods. So I don’t think it’s Pi-3B+ hardware (which I at one point thought it might be).

Hope you’re staying warm. Polar vortex visiting us (again) tonight … only -5 or so this time. Good thing we’re OK with cold weather. :slight_smile:

This isn’t a critical issue: I’ve got lots of other things to work on, and I’m happy to just wait a while until it’s fixed in some subsequent update. Or maybe TW will be more stable.

But if you see any solutions, I’d be happy to try them.

David

Never mind … this turned out to be (I think) an intermittent hardware problem on one of my 3B+'s. That particular 3B+ ran Raspbian perfectly well for long periods of time. But running Leap, any number of odd things popped up to hang the system. Swapped things around and it seemed to go with the Leap disk, not the RPi-3B+.

But over the last couple of days, I tried to do fresh installs of various openSUSE distributions on that particular RPi-3B+, and they all failed in various ways. Finally did a fresh install of TW on a different RPi-3B+ and it worked just fine. Connected my old Leap 15 drive and booted on that second RPi-3B+: no problems. Ran it for a while: no problems doing the things that crashed the first RPi-3B+. Looking at journalctl, the dwc2 error reports no longer appear on this second RPi-3B+.

My guess is that the USB port on that first RPi-3B+ is picking up noise and causing corrupted data and extraneous interrupts. No explanation for why Raspbian was working but Leap wasn’t, but in my latest tests, Raspbian started hanging the same way.

I hate to throw away a $35 computer, but considering the number of days I’ve put into finding this – wish I’d done it a week ago.