My poor HDD

My HDD have been working too hard and it stops everything.

I don’t have a clue why of course.

I’m not copying files when it happens or nothing special, just open programs I’m not operating.

I hope this is not affecting all OpenSUSE users.

I have the latest OpenSUSE and update regularly. This problem has been affecting me for some moths now.

Thank you for your attention.

Disks eventually fail. And sometimes they fail in odd ways (seek failures, spin up failures or slowness, logic failures).

That’s perhaps what you are seeing. The sure test is to replace the disk and see if that fixes the problems.

On 2014-06-14 03:26, nrickert wrote:
> That’s perhaps what you are seeing. The sure test is to replace the
> disk and see if that fixes the problems.

I concur.

But you can ask the disk about its own health, with smartctl. Use “-a”
to see the logs, Then do a short test, then a long test.

Typically you see failed sector (remap count) in the “-a” output.

Hint: “man smartctl” for help.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Thanks for coming to the rescue :slight_smile:

I will read the manual for “smartctl” during the day.

On 2014-06-14 09:26, binarydepth wrote:
>
> Thanks for coming to the rescue :slight_smile:
>
> I will read the manual for “smartctl” during the day.

Welcome.

You do not need to read it all, just the start, then jump to the
examples section at the end. Later you can go back to find out what
exactly does an option in the examples that you are interested in. :slight_smile:

And for background info on what SMART is, there is a good explanation at
the wikipedia.

You can paste here the output of the “-a” option and we’ll help you to
interpret it (please do so inside code tags (the ‘#’ button in the forum
editor). See photo)

The long SMART test on modern disks (modern by a decade or so :wink: ) does
a surface test.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-11-desktop] (SUSE RPM)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Travelstar 5K500.B
Device Model:     Hitachi HTS545050B9A300
Serial Number:    101014PBN40417F8SKBE
LU WWN Device Id: 5 000cca 5f1ee41ec
Firmware Version: PB4OC60F
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat Jun 14 20:26:30 2014 AST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  645) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 158) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   147   147   033    Pre-fail  Always       -       2
  4 Start_Stop_Count        0x0012   097   097   000    Old_age   Always       -       5747
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   069   069   000    Old_age   Always       -       13905
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       3312
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       173
193 Load_Cycle_Count        0x0012   074   074   000    Old_age   Always       -       269283
194 Temperature_Celsius     0x0002   177   177   000    Old_age   Always       -       31 (Min/Max 19/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

# Sample configuration file for smartd.  See man smartd.conf.

# Home page is: http://smartmontools.sourceforge.net

# $Id: smartd.conf 3651 2012-10-18 15:11:36Z samm2 $

# smartd will re-read the configuration file if it receives a HUP
# signal

# The file gives a list of devices to monitor using smartd, with one
# device per line. Text after a hash (#) is ignored, and you may use
# spaces and tabs for white space. You may use '\' to continue lines.

# You can usually identify which hard disks are on your system by
# looking in /proc/ide and in /proc/scsi.

# The word DEVICESCAN will cause any remaining lines in this
# configuration file to be ignored: it tells smartd to scan for all
# ATA and SCSI devices.  DEVICESCAN may be followed by any of the
# Directives listed below, which will be applied to all devices that
# are found.  Most users should comment out DEVICESCAN and explicitly
# list the devices that they wish to monitor.
# Adding -d removable prevents error messages after disconnecting of
# monitored removable discs.
#DEVICESCAN -d removable
DEVICESCAN -d removable

# Alternative setting to ignore temperature and power-on hours reports
# in syslog.
#DEVICESCAN -d removable -I 194 -I 231 -I 9

# Alternative setting to report more useful raw temperature in syslog.
#DEVICESCAN -d removable -R 194 -R 231 -I 9

# Alternative setting to report raw temperature changes >= 5 Celsius
# and min/max temperatures.
#DEVICESCAN -d removable -I 194 -I 231 -I 9 -W 5

# First (primary) ATA/IDE hard disk.  Monitor all attributes, enable
# automatic online data collection, automatic Attribute autosave, and
# start a short self-test every day between 2-3am, and a long self test
# Saturdays between 3-4am.
#/dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03)

# Monitor SMART status, ATA Error Log, Self-test log, and track
# changes in all attributes except for attribute 194
#/dev/hdb -H -l error -l selftest -t -I 194 

# Monitor all attributes except normalized Temperature (usually 194),
# but track Temperature changes >= 4 Celsius, report Temperatures
# >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
# Send mail on SMART failures or when Temperature is >= 55 Celsius.
#/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com

# An ATA disk may appear as a SCSI device to the OS. If a SCSI to
# ATA Translation (SAT) layer is between the OS and the device then
# this can be flagged with the '-d sat' option. This situation may
# become common with SATA disks in SAS and FC environments.
# /dev/sda -a -d sat

# A very silent check.  Only report SMART health status if it fails
# But send an email in this case
#/dev/hdc -H -C 0 -U 0 -m admin@example.com

# First two SCSI disks.  This will monitor everything that smartd can
# monitor.  Start extended self-tests Wednesdays between 6-7pm and
# Sundays between 1-2 am
#/dev/sda -d scsi -s L/../../3/18
#/dev/sdb -d scsi -s L/../../7/01

# Monitor 4 ATA disks connected to a 3ware 6/7/8000 controller which uses
# the 3w-xxxx driver. Start long self-tests Sundays between 1-2, 2-3, 3-4, 
# and 4-5 am.
# NOTE: starting with the Linux 2.6 kernel series, the /dev/sdX interface
# is DEPRECATED.  Use the /dev/tweN character device interface instead.
# For example /dev/twe0, /dev/twe1, and so on.
#/dev/sdc -d 3ware,0 -a -s L/../../7/01
#/dev/sdc -d 3ware,1 -a -s L/../../7/02
#/dev/sdc -d 3ware,2 -a -s L/../../7/03
#/dev/sdc -d 3ware,3 -a -s L/../../7/04

# Monitor 2 ATA disks connected to a 3ware 9000 controller which
# uses the 3w-9xxx driver (Linux, FreeBSD). Start long self-tests Tuesdays
# between 1-2 and 3-4 am.
#/dev/twa0 -d 3ware,0 -a -s L/../../2/01
#/dev/twa0 -d 3ware,1 -a -s L/../../2/03

# Monitor 2 SATA (not SAS) disks connected to a 3ware 9000 controller which
# uses the 3w-sas driver (Linux). Start long self-tests Tuesdays
# between 1-2 and 3-4 am.
# On FreeBSD /dev/tws0 should be used instead
#/dev/twl0 -d 3ware,0 -a -s L/../../2/01
#/dev/twl0 -d 3ware,1 -a -s L/../../2/03

# Same as above for Windows. Option '-d 3ware,N' is not necessary,
# disk (port) number is specified in device name.
# NOTE: On Windows, DEVICESCAN works also for 3ware controllers.
#/dev/hdc,0 -a -s L/../../2/01
#/dev/hdc,1 -a -s L/../../2/03

# Monitor 3 ATA disks directly connected to a HighPoint RocketRAID. Start long
# self-tests Sundays between 1-2, 2-3, and 3-4 am. 
#/dev/sdd -d hpt,1/1 -a -s L/../../7/01
#/dev/sdd -d hpt,1/2 -a -s L/../../7/02
#/dev/sdd -d hpt,1/3 -a -s L/../../7/03

# Monitor 2 ATA disks connected to the same PMPort which connected to the
# HighPoint RocketRAID. Start long self-tests Tuesdays between 1-2 and 3-4 am
#/dev/sdd -d hpt,1/4/1 -a -s L/../../2/01
#/dev/sdd -d hpt,1/4/2 -a -s L/../../2/03

# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
#   -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
#   -T TYPE set the tolerance to one of: normal, permissive
#   -o VAL  Enable/disable automatic offline tests (on/off)
#   -S VAL  Enable/disable attribute autosave (on/off)
#   -n MODE No check. MODE is one of: never, sleep, standby, idle
#   -H      Monitor SMART Health Status, report if failed
#   -l TYPE Monitor SMART log.  Type is one of: error, selftest
#   -f      Monitor for failure of any 'Usage' Attributes
#   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
#   -M TYPE Modify email warning behavior (see man page)
#   -s REGE Start self-test when type/date matches regular expression (see man page)
#   -p      Report changes in 'Prefailure' Normalized Attributes
#   -u      Report changes in 'Usage' Normalized Attributes
#   -t      Equivalent to -p and -u Directives
#   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
#   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
#   -i ID   Ignore Attribute ID for -f Directive
#   -I ID   Ignore Attribute ID for -p, -u or -t Directive
#   -C ID   Report if Current Pending Sector count non-zero
#   -U ID   Report if Offline Uncorrectable count non-zero
#   -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
#   -v N,ST Modifies labeling of Attribute N (see man page)
#   -a      Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
#   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
#   -P TYPE Drive-specific presets: use, ignore, show, showall
#    #      Comment: text after a hash sign is ignored
#    \      Line continuation character
# Attribute ID is a decimal integer 1 <= ID <= 255
# except for -C and -U, where ID = 0 turns them off.
# All but -d, -m and -M Directives are only implemented for ATA devices
#
# If the test string DEVICESCAN is the first uncommented text
# then smartd will scan for devices /dev/hd[a-l] and /dev/sd[a-z]
# DEVICESCAN may be followed by any desired Directives.

On 2014-06-15 02:36, binarydepth wrote:
>
> Code:
> --------------------
> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-11-desktop] (SUSE RPM)

> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

> 9 Power_On_Hours 0x0012 069 069 000 Old_age Always - 13905

That’s a bit old for a laptop. Did you say it is a laptop? :-?

“Travelstar” hints at a laptop, right?

> 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
> 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
> 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

0, that’s good.

> SMART Self-test log structure revision number 1
> No self-tests have been logged. [To run self-tests, use: smartctl -t]

Well, you need to run the short and long tests.

> Code:
> --------------------
> # Sample configuration file for smartd. See man smartd.conf.

> --------------------

That is not needed.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Actually, Travelstar is a model-type for some Hitachi SATA drives.

I think they also made a series of laptops with the same name.

On 2014-06-16 02:36, Fraser Bell wrote:

> Actually, Travelstar is a model-type for some Hitachi SATA drives.

Oh.

> I think they also made a series of laptops with the same name.

It is also a 5400 rpm model, which is typical of laptops. I suppose I
could google the model, though, but I prefer being told :wink:

Yep, it is a laptop model.

http://www.hgst.com/tech/techlib.nsf/techdocs/FFA370A7BF845F87862574FE0003054C/$file/TS5K500B_DS_final.pdf


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

It’s an Acer Timeline X 5820T-6825

… with the original WD drive in it?

If so, I am betting on drive failure. My WD in my Acer failed more than a year ago, but diagnostics could not detect the problem, because it was an electronic problem that not even the WD diagnostics would detect.

Platters and surfaces were fine. Ran sometimes, no life other times. I believe it is a manufacturer’s design defect, but no acknowledged (and warranty is long gone), so an otherwise perfectly good drive is useless.

Ubuntu is running flawlessly on it. :wink: I’m making a Distro on Suse Studio and would read Doc’s on how everything is configured on pen SUSE. Not the time for that now though.

On 2014-06-26 21:36, binarydepth wrote:

> Ubuntu is running flawlessly on it. :wink: I’m making a Distro on Suse
> Studio and would read Doc’s on how everything is configured on pen SUSE.
> Not the time for that now though.

Writing to a relatively modern disk forces it to remap the bad sectors
it finds, if any, so that yo no longer sees them and the disk works well
again.

In fact, when I have proof of bad sectors (the long smart test I
suggested finds them), I rewrite the entire disk with zeroes, to force a
remapping, and further detection of bad sectors.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Guys,
Before analyzing the disk, why should you assume the OP is analyzing his problem correctly? Just because apps open slowly, how can you even begin to suspect a disk problem?

Before considering that, I’d want to know

  • Sufficient RAM for the system he’s running? use the free tool to understand available memory and used buffered memory, see my wiki
    http://en.opensuse.org/User:Tsu2/free_tool
    BTW - If you use “heavy” apps for awhile, then decide to change projects and no longer use those heavy apps, I describe the command that clears your memory buffers without having to reboot
  • How much free space do you have on the disk? Are you storing many files, maybe even multi-booting?
  • Which specific apps are running simultaneously and how many of them?
  • How long has the system been running?
  • Which version openSUSE are you running? Recent versions of openSUSE have moved many parts of the OS off the disk and into memory instead. The benefit is that there is less disk access. The drawback is if you don’t have sufficient RAM, it may cause swapping which means disk access.
  • Are you transferring large files? Certain operations like transferring files may create large temporary files. Affect will vary depending on everything… distro version, mount configurations, app used for transferring, system resources, more.

I’ve used the 500MB and 320MB Travelstars several times in the past (not currently, personally I’ve upgraded practically all my laptop drives to 1GB). It’s not a performance freak but has given me good, reliable service. Anything is possible, but I wouldn’t suspect the drive… Without real reason.

IMO,
TSU

I have 8GB of RAM. Dunno if that’s enough but it would f bad reputation to OpenSUSE if not.

I really don’t see any evidence that the program I use could be causing this. The first suspect is the “System Configuration”. For me of course.

There’s no use of doing all that check for me. Which app do you consider “heavy” ? Server apps ?

There are two things happening in Open SUSE 13.1 everything is normal, up and running and suddenly the HDD light turns on like a freak and the system gets reaally laggy.

There’s one thing happening in Ubuntu everything works fine. If I could I would do further testing but I need work to be done.

So where is the problem that you mention ? in the Kernel ?

If I do a test and then “bad” sectors are found , Can I see how much is damaged ?

PD : I have to read abut sectors again. I learned about that 15 years ago.

On 2014-06-27 13:46, binarydepth wrote:

>
> So where is the problem that you mention ? in the Kernel ?

Of course not. Bad sectors is a hardware problem.

> If I do a test and then “bad” sectors are found , Can I see how much is
> damaged ?

More or less.

> PD : I have to read abut sectors again. I learned about that 15 years
> ago.

Simply run “smartctl --test=long /dev/whatever”, and after the time it
tells you, query the results with “smartctl -a /dev/whatever”. You may
keep using the computer during the hours the test takes, albeit it can
become slow, sluggish, or even totally unresponsive. Just wait it out if
it happens, do not reboot.

Maybe there is no problem detected. It is just a routine visit to the
doctor, he may find nothing, which is good. Actually, this test should
be routinely run by everybody.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)