[13.1] system hang after USB reset during fsck of external ext4 drive


uname -a
Linux dhcppc0 3.11.10-21-desktop #1 SMP PREEMPT Mon Jul 21 15:28:46 UTC 2014 (9a9565d) x86_64 x86_64 x86_64 GNU/Linux

I have a WDC_WD1600JD_22HBB0 250G external SATA laptop drive connected to USB Bus 001 Device 005 via a JMicron (152d:2337) ATA/ATAPI bridge adapter.
I have formatted this as ext4 (/dev/sdc1) using gparted with label ‘old tosh’ and there are approx 32GB of files on the volume.

I want to profile and store the bad blocks on this drive, so I do a


#/sbin/fsck.ext4 -c -c -C 0 /dev/sdc1

and prepare for a 20-30 hour wait…

The badblock check proceeds and it is at 10% after about 2h30m.

I leave the machine alone for a few more hours (2-3) and I return to find that the machine is unresponsive - the mouse cursor
does not respond to mouse movements, although the (usb) mouse is still powered. The caps lock key does not toggle
the caps lock led, although the (usb) keyboard is still powered.

On the screen I read the terminal output and it says

done
old tosh: updating bad block inode

but no signs of it returning to the # root prompt.

The operation seems to have completed far too early.

I look in /var/log/messages and see this sequence (timestamps omitted, but time consistent with failure time)

My USB drive is sd 7:0:0:0: at the start of the fsck operation


sd 7:0:0:0: [sdc] Unhandled error code
sd 7:0:0:0: [sdc]
Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
sd 7:0:0:0: [sdc] CDB:
Write(10): 2a 00 06 f1 40 00 00 00 f0 00
end_request: I/O error, dev sdc, sector 116473856
usb 1-2: reset high-speed USB device number 6 using ehci-pci
usb 1-2: reset high-speed USB device number 6 using ehci-pci
usb 1-2: reset high-speed USB device number 6 using ehci-pci
usb 1-2: reset high-speed USB device number 6 using ehci-pci
usb 1-2: reset high-speed USB device number 6 using ehci-pci
sd 7:0:0:0: Device offlined - not ready after error recovery
sd 7:0:0:0: [sdc] Unhandled error code
sd 7:0:0:0: [sdc]
Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
sd 7:0:0:0: [sdc] CDB:
Write(10): 2a 00 07 01 bc f0 00 00 f0 00
end_request: I/O error, dev sdc, sector 117554416
sd 7:0:0:0: rejecting I/O to offline device
sd 7:0:0:0: [sdc] killing request
sd 7:0:0:0: [sdc] Unhandled error code
sd 7:0:0:0: [sdc]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 7:0:0:0: [sdc] CDB:
Write(10): 2a 00 07 01 bd e0 00 00 20 00
end_request: I/O error, dev sdc, sector 117554656
sd 7:0:0:0: rejecting I/O to offline device
sd 7:0:0:0: rejecting I/O to offline device
sd 7:0:0:0: rejecting I/O to offline device
...
sd 7:0:0:0: rejecting I/O to offline device

the log floods with this last message, repeated every few microseconds or so, e.g.

2014-11-14T05:13:27.998640+00:00 dhcppc0 kernel: [21315.550191] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:27.998643+00:00 dhcppc0 kernel: [21315.550215] sd 7:0:0:0: rejecting I/O to offline device

then the device re-attaches at timestamp 21341.283023 and this is the complete log up to the lock-up and re-boot


2014-11-14T05:13:30.862797+00:00 dhcppc0 kernel: [21318.415214] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862799+00:00 dhcppc0 kernel: [21318.415238] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862801+00:00 dhcppc0 kernel: [21318.415262] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862803+00:00 dhcppc0 kernel: [21318.415285] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862805+00:00 dhcppc0 kernel: [21318.415309] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862808+00:00 dhcppc0 kernel: [21318.415333] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862810+00:00 dhcppc0 kernel: [21318.415357] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862812+00:00 dhcppc0 kernel: [21318.415378] usb 1-2: USB disconnect, device number 6
2014-11-14T05:13:30.862814+00:00 dhcppc0 kernel: [21318.415381] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862817+00:00 dhcppc0 kernel: [21318.415404] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862819+00:00 dhcppc0 kernel: [21318.415427] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.862821+00:00 dhcppc0 kernel: [21318.415451] sd 7:0:0:0: rejecting I/O to offline device
2014-11-14T05:13:30.978177+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x6cfbd0
2014-11-14T05:13:31.023019+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x730030
2014-11-14T05:13:31.023667+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x6cfbd0
2014-11-14T05:13:53.730763+00:00 dhcppc0 kernel: [21341.283023] usb 1-2: new high-speed USB device number 7 using ehci-pci
2014-11-14T05:13:53.844763+00:00 dhcppc0 kernel: [21341.397974] usb 1-2: New USB device found, idVendor=152d, idProduct=2337
2014-11-14T05:13:53.844788+00:00 dhcppc0 kernel: [21341.397980] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=5
2014-11-14T05:13:53.844791+00:00 dhcppc0 kernel: [21341.397983] usb 1-2: Product: USB to ATA/ATAPI Bridge
2014-11-14T05:13:53.844793+00:00 dhcppc0 kernel: [21341.397986] usb 1-2: Manufacturer: JMicron
2014-11-14T05:13:53.844795+00:00 dhcppc0 kernel: [21341.397988] usb 1-2: SerialNumber: D50AA92969FF
2014-11-14T05:13:53.847762+00:00 dhcppc0 kernel: [21341.400075] usb-storage 1-2:1.0: USB Mass Storage device detected
2014-11-14T05:13:53.847788+00:00 dhcppc0 kernel: [21341.400258] scsi8 : usb-storage 1-2:1.0
2014-11-14T05:13:53.850404+00:00 dhcppc0 mtp-probe: checking bus 1, device 7: "/sys/devices/pci0000:00/0000:00:13.5/usb1/1-2"
2014-11-14T05:13:53.852007+00:00 dhcppc0 mtp-probe: bus: 1, device: 7 was not an MTP device
2014-11-14T05:13:54.920329+00:00 dhcppc0 kernel: [21342.473154] scsi 8:0:0:0: Direct-Access     WDC WD25 00BEVT-22A23T0        PQ: 0 ANSI: 2 CCS
2014-11-14T05:13:54.920353+00:00 dhcppc0 kernel: [21342.473440] sd 8:0:0:0: Attached scsi generic sg3 type 0
2014-11-14T05:13:54.929761+00:00 dhcppc0 kernel: [21342.482625] sd 8:0:0:0: [sdd] 488397168 512-byte logical blocks: (250 GB/232 GiB)
2014-11-14T05:13:54.930785+00:00 dhcppc0 kernel: [21342.483632] sd 8:0:0:0: [sdd] Write Protect is off
2014-11-14T05:13:54.930798+00:00 dhcppc0 kernel: [21342.483637] sd 8:0:0:0: [sdd] Mode Sense: 28 00 00 00
2014-11-14T05:13:54.931775+00:00 dhcppc0 kernel: [21342.484626] sd 8:0:0:0: [sdd] No Caching mode page found
2014-11-14T05:13:54.931795+00:00 dhcppc0 kernel: [21342.484630] sd 8:0:0:0: [sdd] Assuming drive cache: write through
2014-11-14T05:13:54.934821+00:00 dhcppc0 kernel: [21342.487997] sd 8:0:0:0: [sdd] No Caching mode page found
2014-11-14T05:13:54.934843+00:00 dhcppc0 kernel: [21342.488020] sd 8:0:0:0: [sdd] Assuming drive cache: write through
2014-11-14T05:13:54.947107+00:00 dhcppc0 kernel: [21342.500292]  sdd: sdd1
2014-11-14T05:13:54.956768+00:00 dhcppc0 kernel: [21342.509156] sd 8:0:0:0: [sdd] No Caching mode page found
2014-11-14T05:13:54.956792+00:00 dhcppc0 kernel: [21342.509164] sd 8:0:0:0: [sdd] Assuming drive cache: write through
2014-11-14T05:13:54.956795+00:00 dhcppc0 kernel: [21342.509170] sd 8:0:0:0: [sdd] Attached SCSI disk
2014-11-14T05:13:55.231909+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x7300b0
2014-11-14T05:13:55.439810+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x7300b0
2014-11-14T05:13:55.448218+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x6cf950
2014-11-14T05:13:55.716810+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: in handle_volume_mount
2014-11-14T05:13:55.815765+00:00 dhcppc0 kernel: [21343.368045] EXT4-fs (sdd1): warning: mounting fs with errors, running e2fsck is recommended
2014-11-14T05:13:55.830269+00:00 dhcppc0 kernel: [21343.377245] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Opts: (null)
2014-11-14T05:13:55.838491+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x6cf950
2014-11-14T05:13:55.846963+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: libbluray/bdnav/index_parse.c:162: indx_parse(): error opening /run/media/user/old tosh/BDMV/index.bdmv
2014-11-14T05:13:55.855869+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: libbluray/bdnav/index_parse.c:162: indx_parse(): error opening /run/media/user/old tosh/BDMV/BACKUP/index.bdmv
2014-11-14T05:13:55.857520+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: emit_signal: 0x71d480
2014-11-14T05:08:57.221125+00:00 dhcppc0 udisksd[3375]: message repeated 28 times:  Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD1600JD_22HBB0_WD_WMAL91398348: Error updating SMART data: sk_disk_smart_status: Input/output error (udisks-error-quark, 0)]
2014-11-14T05:13:55.864764+00:00 dhcppc0 udisksd[3375]: Mounted /dev/sdd1 at /run/media/user/old tosh on behalf of uid 1000
2014-11-14T05:13:55.873000+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug: in volume_mount_cb
2014-11-14T05:13:55.873759+00:00 dhcppc0 org.gtk.Private.UDisks2VolumeMonitor[2489]: ### debug:  success
2014-11-14T05:15:01.626777+00:00 dhcppc0 systemd[1]: Starting Session 25 of user root.
2014-11-14T05:15:01.664319+00:00 dhcppc0 systemd[1]: Started Session 25 of user root.

the drive re-attaches as /dev/sdd, not /dev/sdc

This is the second time this has happened. After the first occurrence, I made the following
changes

-change APM to prevent spin-down

#/sbin/hdparm -B128 /dev/sdc

-another spin-down/standby change

#/sbin/hdparm -S0 /dev/sdc

-added

options usbcore autosuspend=-1

to /etc/modprobe.d/99-local.conf

I installed sdparm and did


#/sbin/sdparm -a /dev/sdc

but no STANDBY parameter is listed.

So - my questions

Why does the USB drive get reset by ehci-pci ?
Why does the USB drive go offline ?
Why does the fsck indicate premature completion ?
Why does Linux lock solid and require a power-off re-boot when this happens ?

I noticed a similar problem. On error a USB disk is reconnected as a different device. It used to be reconnected as the same device, but forced to a read-only state.

I originally reported this as:

Bug 891168 Device names are changing dynamically, making the system disappear.

In my case, this was due to a cabling problem, so I closed the bug when I found that.

more on the USB drive and what I am trying to do.

The drive is in SMART pre-fail on 1 parameter, but I don’t care.

Please don’t say ‘back up your data and replace’ - I can’t afford a new drive and any lost files can easily be replaced.

I want to leverage the bad block mapping feature of ext4 to extend the usable life. I previously had it formatted as NTFS and used it for > 6 months by managing the bad blocks myself with 1G guard files around the bad spots.

the ext4 error behaviour listed by

#tune2fs -l /dev/sdc1

is ‘continue’, but something is stepping in to remount the device or the partition as ro in

cat /proc/mounts

WHY ? This is a bug, surely ? The filesystem default is being deliberately ignored.

once this has happened, a

#mount -o remount,rw /var/run/media/user/old\ tosh

fails with a ‘Read-only filesystem’ error. I have to unmount and re-mount to fix this.

I have issued

#/sbin/hdparm -r0 /dev/sdc

to clear any read-only bit for the device, but this does nothing to permit the mount operation to succeed

When drives start to fail they will continue to lose sectors. So even if you manage to map the current set of bad you will be faced with a new set ect etc.

Sorry to say that drive is in its death throws and I doubt you can manually keep a head of the problems :’(

has been stable for > 6 months as a data drive with bad blocks, as I said in previous post.