USB HDD periodically shuts on\off

kore_kontingency · July 27, 2010, 2:39pm

Hi everyone,

I’ve got an external USB HDD (Transcend 25M 500Gb), and recently it started to behave weirdly.

In short: it seems to be irregularily switching off\on.

There is only one ‘ext3’ partition on the drive, mounted via ‘fstab’ during system startup.

Symptoms:

sometimes it shuts down during file operations (i.e. unmounts, disappears from the /dev, then at most times reappears again after a couple of seconds);
recently it started shutting down right after file operation completes (not always). I experimented a bit with various files and can’t say there’s any explicit correlation between this phenomenon and file types\sizes;
it always shuts down in the very beginning of ‘fsck’. No warnings, nothing - the process goes to uninterruptable sleep (S in ‘top’) and that’s it;
lately during (or after, IDK) POST, when usb devices are identified, it just freezes the whole machine. Didn’t have patience to wait beyond 5 minutes. Activity resumes when I pull out the drive’s USB plug;
even if it makes it past GRUB, there’s a chance ‘fsck’ during startup will say ‘bad superblock’ and send me to the ‘file system recovery’ prompt, during which manual ‘fsck’ goes to ‘S’ again;
working speed sometimes is really slow, e.g. it took ~20 minutes to copy a 7Gb directory with about 10 video files.

Power and cables:
the drive came with USB-miniUSB connector. USB part is dual, i.e. there is additional plug just for powering purposes. Doesn’t make any difference, though, as it works all the same with other “single” cables.
I tried to unplug almost everything else except keyboard and mouse (that is tablet, cardreader, scanner and printer) to no avail.
I also plugged the drive into different slots - no result. I doubt the faulty USB though, as the motherboard is rather new (about 3 months).

What has been tried and hasn’t helped:

switching off ‘USB Legacy Support’ in BIOS;
checking the drive off the list of bootable devices,
changing type of external usb storage (Auto \ forced Hard Drive);
plugging the drive into different slots

What hasn’t been tried and why:
Haven’t attempted formatting it yet, as there are tons of photos and videos stored inside, most of them private and unique (in one copy) - would be a painful blow to lose them to this. Yes, I am at fault here for not making backups, but still.

Can the problem be fixed or should I prepare for the worst?
I would be extremely grateful for any suggestions, pointers, links to obscure threads, etc.

hcvv · July 27, 2010, 4:27pm

To me (a hardware dummy I admit) it seems like a hardware problem. The first thing I would do is to make sure I have a backup (better two).

The facts you show make it clear imho that the device realy is gone from the system as if it’s cable is pulled (else the device special files would not have gone). When it does so at it’s own irrigular whim who knows when it will stop coming back.

hendersj · July 28, 2010, 10:20am

On Tue, 27 Jul 2010 14:36:01 +0000, hcvv wrote:

> To me (a hardware dummy I admit) it seems like a hardware problem.

I would concur (not with the hardware dummy bit, but with it seeming like
a hardware problem). Might be interesting to see what’s recorded in /var/
log/messages when the device disconnects.

To the OP - how does the drive get its power? If it is powered by the
USB connection, it could be a faulty USB port on the PC and not the drive
itself. If it’s powered by an external power supply, it could be a
faulty power connector on the drive’s case.

In either case, I would also recommend a backup in case the drive is
starting to fail. Having twice had to do recovery on external storage
devices in the 1 TB range, I can tell you it’s no fun not having a backup
(I was able to recover some data in both cases, but not nearly as much as
I’d hoped - fortunately it wasn’t critical data).

Other than that, what I’d be inclined to do is:

sudo tail -f /var/log/messages

in a terminal window (you have to do this as root because /var/log/
messages isn’t world readable) and see what’s reported there when the
drive’s power kicks off. That might give some clues (or it might just
show symptoms that the drive’s not connected any more).

Jim

Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

kore_kontingency · July 28, 2010, 2:21pm

Thanks for your replies.

The drive is usb-powered: it’s either a usual USB-miniUSB cable or (included with the drive) the same cable but with an additional plug for powering purposes only.

Reproduced the problem and here is what ‘cat /var/log/messages’ has to say about it:

Jul 28 16:05:59 machine-mother kernel: [87333.306063] sd 6:0:0:0: [sde] Unhandled error code
Jul 28 16:05:59 machine-mother kernel: [87333.306074] sd 6:0:0:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jul 28 16:05:59 machine-mother kernel: [87333.306085] end_request: I/O error, dev sde, sector 677547729
Jul 28 16:05:59 machine-mother kernel: [87333.306148] sd 6:0:0:0: [sde] Unhandled error code
Jul 28 16:05:59 machine-mother kernel: [87333.306154] sd 6:0:0:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jul 28 16:05:59 machine-mother kernel: [87333.306163] end_request: I/O error, dev sde, sector 677547969
Jul 28 16:05:59 machine-mother kernel: [87333.306206] sd 6:0:0:0: [sde] Unhandled error code
Jul 28 16:05:59 machine-mother kernel: [87333.306213] sd 6:0:0:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jul 28 16:05:59 machine-mother kernel: [87333.306222] end_request: I/O error, dev sde, sector 677547985
Jul 28 16:05:59 machine-mother kernel: [87333.306262] sd 6:0:0:0: [sde] Unhandled error code
Jul 28 16:05:59 machine-mother kernel: [87333.306269] sd 6:0:0:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jul 28 16:05:59 machine-mother kernel: [87333.306278] end_request: I/O error, dev sde, sector 677548225
Jul 28 16:05:59 machine-mother kernel: [87333.306325] sd 6:0:0:0: [sde] Unhandled error code
Jul 28 16:05:59 machine-mother kernel: [87333.306333] sd 6:0:0:0: [sde] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jul 28 16:05:59 machine-mother kernel: [87333.306343] end_request: I/O error, dev sde, sector 677547729
Jul 28 16:05:59 machine-mother kernel: [87333.306635] usb 1-2.2: USB disconnect, address 6
Jul 28 16:05:59 machine-mother kernel: [87333.367499] JBD: Detected IO errors while flushing file data on sde1
Jul 28 16:05:59 machine-mother kernel: [87333.367536] journal_bmap: journal block not found at offset 12 on sde1
Jul 28 16:05:59 machine-mother kernel: [87333.367541] Aborting journal on device sde1.
Jul 28 16:05:59 machine-mother kernel: [87333.367544] ------------ cut here ]------------
Jul 28 16:05:59 machine-mother kernel: [87333.367551] WARNING: at /usr/src/packages/BUILD/kernel-desktop-2.6.31.5/linux-2.6.31/fs/buffer.c:1158 mark_buffer_dirty+0x94/0xb0()
Jul 28 16:05:59 machine-mother kernel: [87333.367558] Hardware name: System Product Name
Jul 28 16:05:59 machine-mother kernel: [87333.367561] Modules linked in: wacom snd_pcm_oss snd_mixer_oss snd_seq edd af_packet cpufreq_conservative 
cpufreq_userspace cpufreq_powersave acpi_cpufreq speedstep_lib binfmt_misc 
nls_utf8 nls_cp437 vfat fat ext4 jbd2 crc16 fuse loop dm_mod snd_hda_codec_analog 
snd_usb_audio snd_usb_lib epl(C) snd_hda_intel snd_rawmidi 8139too snd_seq_device 
snd_hda_codec snd_hwdep snd_pcm snd_timer 8139cp pcspkr sg i2c_i801 sr_mod 
joydev serio_raw iTCO_wdt snd iTCO_vendor_support snd_page_alloc wmi 
asus_atk0110 button cdrom thermal nvidia(P) fan processor thermal_sys reiserfs 
ide_pci_generic ide_core ata_generic [last unloaded: preloadtrace]
Jul 28 16:05:59 machine-mother kernel: [87333.367627] Pid: 4418, comm: kjournald Tainted: P         C 2.6.31.5-0.1-desktop #1
Jul 28 16:05:59 machine-mother kernel: [87333.367632] Call Trace:
Jul 28 16:05:59 machine-mother kernel: [87333.367643]  <c020845a>] try_stack_unwind+0x17a/0x1a0
Jul 28 16:05:59 machine-mother kernel: [87333.367649]  <c020708c>] dump_trace+0x6c/0x130
Jul 28 16:05:59 machine-mother kernel: [87333.367655]  <c0208008>] show_trace_log_lvl+0x58/0x80
Jul 28 16:05:59 machine-mother kernel: [87333.367661]  <c0208056>] show_trace+0x26/0x40
Jul 28 16:05:59 machine-mother kernel: [87333.367667]  <c06929e3>] dump_stack+0x79/0x91
Jul 28 16:05:59 machine-mother kernel: [87333.367674]  <c0251408>] warn_slowpath_common+0x78/0xc0
Jul 28 16:05:59 machine-mother kernel: [87333.367680]  <c0251471>] warn_slowpath_null+0x21/0x40
Jul 28 16:05:59 machine-mother kernel: [87333.367685]  <c033d984>] mark_buffer_dirty+0x94/0xb0
Jul 28 16:05:59 machine-mother kernel: [87333.367692]  <c03a6937>] journal_update_superblock+0x77/0xf0
Jul 28 16:05:59 machine-mother kernel: [87333.367698]  <c03a6b50>] __journal_abort_soft+0x90/0xc0
Jul 28 16:05:59 machine-mother kernel: [87333.367703]  <c03a7989>] journal_bmap+0x99/0xa0
Jul 28 16:05:59 machine-mother kernel: [87333.367709]  <c03a7b3f>] journal_next_log_block+0x6f/0xa0
Jul 28 16:05:59 machine-mother kernel: [87333.367715]  <c03a3727>] journal_commit_transaction+0x387/0xc90
Jul 28 16:05:59 machine-mother kernel: [87333.367721]  <c03a7520>] kjournald+0xc0/0x210
Jul 28 16:05:59 machine-mother kernel: [87333.367727]  <c026e5c4>] kthread+0x84/0x90
Jul 28 16:05:59 machine-mother kernel: [87333.367733]  <c0204d8b>] kernel_thread_helper+0x7/0x1c
Jul 28 16:05:59 machine-mother kernel: [87333.367738] --- end trace 6baec2872647c777 ]---
Jul 28 16:05:59 machine-mother kernel: [87333.367753] __journal_remove_journal_head: freeing b_committed_data
Jul 28 16:05:59 machine-mother kernel: [87333.367786] journal commit I/O error
Jul 28 16:05:59 machine-mother kernel: [87333.545080] usb 1-2.2: new high speed USB device using ehci_hcd and address 10
Jul 28 16:05:59 machine-mother kernel: [87333.612693] ext3_abort called.
Jul 28 16:05:59 machine-mother kernel: [87333.612705] EXT3-fs error (device sde1): ext3_journal_start_sb: Detected aborted journal
Jul 28 16:05:59 machine-mother kernel: [87333.612720] Remounting filesystem read-only
Jul 28 16:05:59 machine-mother kernel: [87333.636687] usb 1-2.2: New USB device found, idVendor=152d, idProduct=2329
Jul 28 16:05:59 machine-mother kernel: [87333.636699] usb 1-2.2: New USB device strings: Mfr=1, Product=11, SerialNumber=5
Jul 28 16:05:59 machine-mother kernel: [87333.636708] usb 1-2.2: Product: StoreJet Transcend
Jul 28 16:05:59 machine-mother kernel: [87333.636714] usb 1-2.2: Manufacturer: JMicron
Jul 28 16:05:59 machine-mother kernel: [87333.636720] usb 1-2.2: SerialNumber: 152D20329000
Jul 28 16:05:59 machine-mother kernel: [87333.636869] usb 1-2.2: configuration #1 chosen from 1 choice
Jul 28 16:05:59 machine-mother kernel: [87333.637295] scsi7 : SCSI emulation for USB Mass Storage devices
Jul 28 16:05:59 machine-mother kernel: [87333.637396] usb-storage: device found at 10
Jul 28 16:05:59 machine-mother kernel: [87333.637398] usb-storage: waiting for device to settle before scanning
Jul 28 16:06:00 machine-mother kernel: [87334.635854] scsi 7:0:0:0: Direct-Access     StoreJet  Transcend            PQ: 0 ANSI: 2 CCS
Jul 28 16:06:00 machine-mother kernel: [87334.636968] sd 7:0:0:0: Attached scsi generic sg4 type 0
Jul 28 16:06:00 machine-mother kernel: [87334.637670] sd 7:0:0:0: [sdf] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Jul 28 16:06:00 machine-mother kernel: [87334.638459] sd 7:0:0:0: [sdf] Write Protect is off
Jul 28 16:06:00 machine-mother kernel: [87334.638473] sd 7:0:0:0: [sdf] Mode Sense: 34 00 00 00
Jul 28 16:06:00 machine-mother kernel: [87334.638478] sd 7:0:0:0: [sdf] Assuming drive cache: write through
Jul 28 16:06:00 machine-mother kernel: [87334.640580] sd 7:0:0:0: [sdf] Assuming drive cache: write through
Jul 28 16:06:00 machine-mother kernel: [87334.640593]  sdf: sdf1
Jul 28 16:06:00 machine-mother kernel: [87334.663438] usb-storage: device scan complete
Jul 28 16:06:00 machine-mother kernel: [87334.665302] sd 7:0:0:0: [sdf] Assuming drive cache: write through
Jul 28 16:06:00 machine-mother kernel: [87334.665317] sd 7:0:0:0: [sdf] Attached SCSI disk

Seems there is some sort of problem with sectors, but I don’t know what to make of it.
Making backups little by little, but I’m out of free space, and it looks like I’ll have to sacrifice some of the data.

Anyway, what do you think of these errors?

hendersj · July 28, 2010, 11:57pm

On Wed, 28 Jul 2010 12:36:01 +0000, kore kontingency wrote:

> Seems there is some sort of problem with sectors, but I don’t know what
> to make of it.

It could be the starts of a hard drive failure - depending on how the
drive behaves when it encounters a disconnect initiated by the host, they
could be the root cause or they could be a symptom (ie, read errors or I/
O errors are not unusual if you disconnect a drive that’s actively in use)

> Making backups little by little, but I’m out of free space, and it looks
> like I’ll have to sacrifice some of the data.
>
> Anyway, what do you think of these errors?

Maybe try swapping the cable out and see if it’s a bad cable. If that
doesn’t work, try either on a different system if you can or a different
USB port (a different system would be better because you’d eliminate the
USB controller on the system you are using).

Jim

–
Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

user · July 29, 2010, 1:06am

On 2010-07-28 14:36, kore kontingency wrote:
>
> Thanks for your replies.
>
> The drive is usb-powered: it’s either a usual USB-miniUSB cable or
> (included with the drive) the same cable but with an additional plug for
> powering purposes only.

I hate those.

>
> Reproduced the problem and here is what ‘cat /var/log/messages’ has to
> say about it:
>
>
> Code:
…
> Jul 28 16:05:59 machine-mother kernel: [87333.367544] ------------ cut here ]------------
> Jul 28 16:05:59 machine-mother kernel: [87333.367551] WARNING: at /usr/src/packages/BUILD/kernel-desktop-2.6.31.5/linux-2.6.31/fs/buffer.c:1158 mark_buffer_dirty+0x94/0xb0()
> Jul 28 16:05:59 machine-mother kernel: [87333.367558] Hardware name: System Product Name
> Jul 28 16:05:59 machine-mother kernel: [87333.367561] Modules linked in: wacom snd_pcm_oss snd_mixer_oss snd_seq edd af_packet cpufreq_conservative

This part is a kernel bug that you should report in bugzilla. It maybe the cause, or simply
triggered by your problem.

:05:59 machine-mother kernel: [87333.367627] Pid: 4418, comm: kjournald Tainted: P C
2.6.31.5-0.1-desktop #1

Tainted. The kernel devs may refuse to look at it, unless you remove the tainting module and
reproduce the same problem.

> Seems there is some sort of problem with sectors, but I don’t know what
> to make of it.

I’m not sure if you have bad sectors, or simply the disk stops responding and the kernel reports the
sector that failed to read because of the stopping.

> Making backups little by little, but I’m out of free space, and it
> looks like I’ll have to sacrifice some of the data.

You need another disk to do a backup. And another oS and perhaps another computer to do it.

–
Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

pacolaser · December 8, 2010, 12:51pm

I am experiencing a similar problem with a new Dell desktop running SUSE 11.3 x64bit and a 1TB USB drive with separate power supply. Partitions are 500GB ext4 and 400GB FAT32. The external drive is for making backup copies of /home (including a w7x64 virtual m/c). We left the copy running overnight and in the morning it had only managed 30GB (of 220GB) before crashing. A new attempt crashed immediately.

I searched and found several similar reports since May 2010.

knurpht · December 8, 2010, 1:47pm

Next time, open a new thread instead of reusing an old one with a ‘similar’ problem. My 2 cents now: backup whatever you can, replace drive.