ehci_hcd fatal error with USB2.0 hard drives

djorlando24 · May 30, 2011, 10:53am

I’m getting failures of ehci_hcd at random times when trying to copy files between USB2 hard disks on my new installation of OpenSUSE 11.4 on my new i7 machine.
Kernel version is 2.6.37-0.5-desktop, x86_64. The error is very repeatable but happens at random times, not associated with any particular disk sector or file etc.

It’s becoming very annoying as it is causing files on my drives to become corrupted if they are open when the error occurs. >:(

I cannot find any previous threads on the board to do with this kind of ehci_hcd failure, so I’m making a new thread.

I have two USB2 hard drives plugged in, and I’m trying to copy files from one to the other either via terminal or nautilus. The symptom of the failure is that all USB devices on the root hub of those drives will suddenly disappear in the middle of a file transfer. This includes the KB & mouse , so I lose every USB device on the same root hub at once.

If the keyboard is on a separate root hub I can retain control of the PC. In this case, lsusb shows that the root hub to which the drives were connected has vanished - not just the drive itself. The whole root hub is gone.

I have confirmed that it is NOT a problem with the drives themselves, because
(a) I have tried many different drives, of different brands, and they all do it,
(b) all my drives work flawlessly under Windows 7 and
(c) this never happened under my old OpenSUSE 11.2 on my old x86_64 core2 duo PC with the same hard drives.

They are all NTFS formatted and I use NTFS-3g to write to them. I don’t have a choice about the NTFS, since they bring data from other Windows PCs. I’d much prefer to be on ext3, but what can you do.

The root cause of the problem is that the ehci_hcd module is failing at a low level. It’s not the NTFS-3g driver, by the looks of things. I have no idea how to fix this. I don’t think it’s a problem with my USB2 drives going to sleep mode because it happens in the middle of a file copy operation!

I am very keen to get this fixed, because it’s driving me mad! :
Please let me know if you have a clue about this or can redirect me to developers who can…

Dan

=========================
/var/log/messages output follows…

/var/log/messages shows these kinds of errors at the leadup to a failure. Nothing suspicious appears until we suddenly see this:

kernel: 505.042722] hub 1-1:1.0: cannot reset port 3 (err = -110)
… same message repeated 4 times …
kernel: 509.075401] hub 1-1:1.0: Cannot enable port 3. Maybe the USB cable is bad?
kernel: 509.083593] hub 1-1:1.0 cannot disable port 3 (err = -110)
kernel: 511.091780] hub 1-1:1.0: cannot reset port 3 (err = -110)
… same message repeated 4 times …
…

The above errors repeat themselves about 5 times until the ntfs-3g driver notices a problem:

ntfs-3g[3640]: ntfs_attr_pread error reading ‘/MyFolder/myfile’ at offset 194101248: 131072 <> 122880: Input/output error
ntfs-3g[3640]: ntfs_attr_pread_i : ntfs_pread failed: Input/output error
… repeats a few times …
…
Then suddenly all the OTHER usb ports on the same root hub start to fail about a second later:
…
kernel: 529.238984] hub 1-1:1.0: cannot reset port 5 (err = -110)
kernel: 529.238998] hub 1-1:1.0: cannot disable port 3 (err = -110)
…
And then we get new errors indicating the failure of /dev/sdc1:
…
kernel: 529.239041] sd 11:0:0:0: Device offlined - not ready after error recovery
kernel: 529.239045] sd 11:0:0:0: [sdc] Unhandled error code
kernel: 529.239046] sd 11:0:0:0: [sdc] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
kernel: 529.239048] sd 11:0:0:0: [sdc] CDB: Read(10): 28 00 19 eb 9f c7 00 00 10 00
kernel: 529.239052] sd 11:0:0:0: [sdc] end_request: I/O error, dev sdc, sector 434872263
kernel: 529.239054] quiet_error: 48 callbacks suppressed
kernel: 529.239055] Buffer I/O error on device sdc1, local block 54359025
kernel: 529.239059] Buffer I/O error on device sdc1, local block 54359026
kernel: 529.239068] sd 11:0:0:0: rejecting I/O to offline device
…
… same error repeats 3 times…
kernel: 529.239094] sd 11:0:0:0: [sdc] Unhanded error code
…

We then see continous repeats of all the above error messages as the system attempts to keep writing /dev/sdc1 unsuccessfully.

Then we start seeing errors like:
…
kernel: 534.280843] hub 1-1:1.0: hub_port_status failued (err = -110)
…

and more errors about “Maybe the USB cable is bad?”
These errors start popping up for EVERY USB device on the same root hub.

After some many hundreds of the above errors, we finally get a complete fail in ehci_hcd:

kernel: 572.685841] usb 1-1.3: USB disconnect, address 7
kernel: 573.066911] usb 1-1.1: reset high speed USB device using ehci_hcd and address 3
kernel: 573.077009] ehci_hcd 0000:00:1a.0: fatal error
kernel: 573.077010] ehci_hcd 0000:00:1a.0: force halt; handshake ffffc9000187e824 000040000 000040000 → -110
kernel: 573.077013] ehci_hcd 0000:00:1a.0: HC died; cleaning up
kernel: 573.077015] usb 1-1.1: device descriptor read/8, error -19
…

NTFS-3g then unmounts ALL the drives on that root hub and everything falls in a heap as all I/O operations and every device on that hub fails in sequence.

jdmcdaniel3 · May 30, 2011, 7:30pm

So the USB 2.0 driver does work in openSUSE and so one would wonder if indeed there is some sort of hardware issue? The log files even suggest you should look at your cables and in one case another user here actually had a bad wireless keyboard drive all of his hard drives crazy. I would mix it up, swap cables around, perhaps replace that battery in the old wireless keyboard and mouse. Consider that things work just fine until they stop working, often not related to the present software actions being performed.

Thank You,

djorlando24 · May 31, 2011, 6:12am

I agree , a hardware issue is most likely the cause. The question is, why doesn’t this happen when I use the drives on other OSes (Mac, Windows) – surely a hardware fault would manifest itself on other system as well but perhaps in a different way? I’m just wondering if ehci_hcd’s error handling is not so good.

I’ve moved my keyboard and mouse to a separate USB root hub to isolate that problem. They’re not wireless, so that’s not an issue.

I will have to try different drives and cable combinations to see if I can find a particular drive and or cable which causes the error. At least then I can try replacing it.

Cheers