Hard drive failure is imminent! How do I back everything up, including what repos/packages, etc?

I just got a bios message: “Hard disk failure is imminent! Please back up your hard disk and have it replaced!”

So now I’m backing everything up but I’ve got tons of repos added, tons of packages installed, apps, etc and I have no idea how I figure out what they all are and how to properly back them up. I need to also back up my virtual machines. Can someone tell me how to do this?

Thanks!

Hello 6tr6tr,

Just to make sure you have already got data backups but you want backups of the repositories you have and which packages you’ve got installed.

You can make a list of your repositories with this command:

zypper lr -u

The -u argument will show the URI of the repository so you can enter this in YaST to add him.

To make an file containing all the installed packages you start YaST->Software->Software Management.
Select “File” on top of the window and select Export.
This will generate a file containing all the **user **installed packages.
When you need to import them you can simply start Software Management again and select File->Import.

Remember that the file only contains the user installed packages so I recommend you only use this file on a system with the same openSUSE version.

I’m not sure how to make a backup of your virtual machines.
If you use VirtualBox you can simply make an backup of your ~/.VirtualBox folder.
And on a new system place this folder back, make sure you’ve got VirtualBox installed and it should work.

Best of luck!:wink:

Ouch

VM’s will be in their own files where ever you put them. If you use VirtualBox I believe it defaults to you home directory. I have a special partition that I always use just for VM files.

Use Clonezilla or other image type backup to make images of the partitions.

Note when restoring to a new drive you will need to be able to boot from an external Linux disk so you can edit the /boot/grub and /etc/fstab files to point to the new drive name.

On 2010-09-17 17:06, 6tr6tr wrote:
>
> I just got a bios message: “Hard disk failure is imminent! Please back
> up your hard disk and have it replaced!”

run “smartctl --health /dev/yourdevicename” and “smartctl -a /dev/yourdevicename”, post the result here.

> So now I’m backing everything up but I’ve got tons of repos added, tons
> of packages installed, apps, etc and I have no idea how I figure out
> what they all are and how to properly back them up. I need to also back
> up my virtual machines. Can someone tell me how to do this?

Just copy everything, file by file, to the same destination on another disk. Use rsync. Partition
and format it first in any layout you like (or the same layout as the original).

Or image the disk to another one big enough, as image file. This disk is not the replacement disk,
but a fast place to make a backup on a hurry. Use dd or dd-rescue (check the name).


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Minas Tirith))

Implement your disaster recovery plan??? :wink:

First, I would check your drive with smartctl command to see what’s up
with the drive… overheating maybe? Download the manufacturers
diagnostic tools to verify it as well.

What drive and size is it?

For your virtual machines you should just be able to drag/drop the
directories (thats all I do with vmware workstation) but check for
hidden files maybe. You could also bzip the directories up and
drag/drop.

Else I would pull the drive if it is going to fail and then connect up
externally to do your recovery.


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (i586) Kernel 2.6.32.19-0.2-pae
up 3 days 2:56, 2 users, load average: 0.00, 0.07, 0.12
ASUS eeePC 1000HE ATOM N280 1.66GHz | GPU Mobile 945GM/GMS/GME

how do I connect externally to pull stuff off the drive?

I turned off that computer (I did back up my most critical data first, but it’s dual-boot with windows and I didn’t get a chance to back up the windows stuff) and I want to figure out what steps to take before proceeding. Here’s what I will do:

  1. Back up the data on the windows partition

  2. Back up the VirtualBox disks
    (From what I understand, it’s not as simple as copying the “.vdi” files. I see a lot of diff. suggestions on what to do. Any idea which is the right one?)
    How do I backup/restore my VDIs - Hands-On (View topic) • virtualbox.org
    How To: Properly Backup a VirtualBox Machine (.VDI) | The Linux Daily
    How to copy and transfer or backup a Virtualbox Virtual Machine .vdi - my-guides.net

  3. Backup the repos, installed packages
    a. zypper lr -u
    b. YAST -> Software -> File -> Export

(I’m thinking that I might install opensuse 11.3 on the new HD. But since I have 11.2 on the current one, can I just change all the 11.2 references to 11.3 for the packages/repos?)

  1. Install openSUSE on the new HD (is there anything I need to do to make this work?)

  2. Reinstall everything

Hello 6tr6tr,

In most cases it will work, but there could be an 11.3 repository missing a package you had on 11.2.

Well it’s just like a fresh install.
The only things are when you put back your home folder backup it could contain outdated config files for some applications.

Best of luck!:wink:

Right but someone mentioned something about having to prepare GRUB for the new HD. Is this correct?

Hi
I have one of those USB->SATA/IDE(44/40) converters (comes with a power
brick) to connect drives when things like this happen…

Have you verified the issue with the drive, over temperature, failing
sectors etc?


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.32.19-0.2-default
up 1 day 2:41, 2 users, load average: 0.18, 0.10, 0.16
GPU GeForce 8600 GTS Silent - Driver Version: 256.53

Hello 6tr6tr,

This is only true when you make a exact copy of your old drive.
And put it back on your new drive.

In this case all settings in Grub will point to the old drive.
It’s the same with /etc/fstab.

But when you do a fresh install and put your data back, then this isn’t a problem because the installer generates new files pointing to the new drive.

Good luck!:wink:

Not yet. I’m booting up to a USB stick first and grabbing all data off it first. Once I’m done with that, I’ll boot into it and check the HD, as well as get the data from repos/packages. I’ll post back here when I do.

Here’s the results of those smartctl commands:

–health:


SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0007   100   001   025    Pre-fail  Always   In_the_past 8320

-a


smartctl 5.39 2009-08-08 r2872~ [i686-pc-linux-gnu] (openSUSE RPM)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint P80 series
Device Model:     SAMSUNG SP0802N             
Serial Number:    S00JJ20Y175651              
Firmware Version: TK200-04                    
User Capacity:    80,060,424,192 bytes        
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7                                              
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0               
Local Time is:    Fri Sep 17 17:36:54 2010 EDT                   

==> WARNING: May need -F samsung2 or -F samsung3 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled                                 

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.              
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever  
                                        been run.                               
Total time to complete Offline                                                  
data collection:                 (3000) seconds.                                
Offline data collection                                                         
capabilities:                    (0x1b) SMART execute Offline immediate.        
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new         
                                        command.                                    
                                        Offline surface scan supported.             
                                        Self-test supported.                        
                                        No Conveyance Self-test supported.          
                                        No Selective Self-test supported.           
SMART capabilities:            (0x0003) Saves SMART data before entering            
                                        power-saving mode.                          
                                        Supports SMART auto save timer.             
Error logging capability:        (0x01) Error logging supported.                    
                                        No General Purpose Logging support.         
Short self-test routine                                                             
recommended polling time:        (   1) minutes.                                    
Extended self-test routine                                                          
recommended polling time:        (  50) minutes.                                    

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0        
  3 Spin_Up_Time            0x0007   100   001   025    Pre-fail  Always   In_the_past 8320     
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2595     
  5 Reallocated_Sector_Ct   0x0033   100   100   011    Pre-fail  Always       -       0        
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0        
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0        
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       525379   
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       8        
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0        
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1338     
194 Temperature_Celsius     0x0022   103   082   000    Old_age   Always       -       45       
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       4268     
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0        
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0        
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0        
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       952      
200 Multi_Zone_Error_Rate   0x000a   100   100   051    Old_age   Always       -       0        
201 Soft_Read_Error_Rate    0x000a   100   100   051    Old_age   Always       -       0        

SMART Error Log Version: 1
ATA Error Count: 952 (device log contains only the most recent five errors)
        CR = Command Register [HEX]                                        
        FR = Features Register [HEX]                                       
        SC = Sector Count Register [HEX]                                   
        SN = Sector Number Register [HEX]                                  
        CL = Cylinder Low Register [HEX]                                   
        CH = Cylinder High Register [HEX]                                  
        DH = Device/Head Register [HEX]                                    
        DC = Device Command Register [HEX]                                 
        ER = Error register [HEX]                                          
        ST = Status register [HEX]                                         
Powered_Up_Time is measured from power on, and printed as                  
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,                      
SS=sec, and sss=millisec. It "wraps" after 49.710 days.                    

Error 952 occurred at disk power-on lifetime: 4376 hours (182 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH                              
  -- -- -- -- -- -- --                              
  84 51 08 b8 a6 08 e1  Error: ICRC, ABRT at LBA = 0x0108a6b8 = 17344184

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 b8 a6 08 e1 00      00:38:17.625  WRITE DMA           
  ec 00 00 00 00 00 a0 00      00:38:17.563  IDENTIFY DEVICE     
  ef 03 46 00 00 00 a0 00      00:38:17.563  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      00:38:17.563  IDENTIFY DEVICE                 
  00 00 01 01 00 00 a0 00      00:38:17.438  NOP [Abort queued commands]     

Error 951 occurred at disk power-on lifetime: 4376 hours (182 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH                              
  -- -- -- -- -- -- --                              
  84 51 08 b8 a6 08 e1  Error: ICRC, ABRT at LBA = 0x0108a6b8 = 17344184

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 b8 a6 08 e1 00      00:38:17.375  WRITE DMA           
  ec 00 00 00 00 00 a0 00      00:38:17.375  IDENTIFY DEVICE     
  ef 03 46 00 00 00 a0 00      00:38:17.375  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      00:38:17.375  IDENTIFY DEVICE                 
  00 00 01 01 00 00 a0 00      00:38:17.188  NOP [Abort queued commands]     

Error 950 occurred at disk power-on lifetime: 4376 hours (182 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH                              
  -- -- -- -- -- -- --                              
  84 51 08 b8 a6 08 e1  Error: ICRC, ABRT at LBA = 0x0108a6b8 = 17344184

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 b8 a6 08 e1 00      00:38:17.188  WRITE DMA
  ec 00 00 00 00 00 a0 00      00:38:17.125  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:38:17.125  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      00:38:17.125  IDENTIFY DEVICE
  00 00 01 01 00 00 a0 00      00:38:16.938  NOP [Abort queued commands]

Error 949 occurred at disk power-on lifetime: 4376 hours (182 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 b8 a6 08 e1  Error: ICRC, ABRT at LBA = 0x0108a6b8 = 17344184

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 b8 a6 08 e1 00      00:38:16.875  WRITE DMA
  c8 00 08 10 e2 29 e7 00      00:38:16.875  READ DMA
  c8 00 08 c8 e1 29 e7 00      00:38:16.875  READ DMA
  c8 00 08 68 e0 29 e7 00      00:38:16.875  READ DMA
  c8 00 08 08 e0 29 e7 00      00:38:16.813  READ DMA

Error 948 occurred at disk power-on lifetime: 4375 hours (182 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 40 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000040 = 64

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 01 40 00 00 e0 00      00:07:40.125  WRITE DMA
  ec 00 00 00 00 00 a0 00      00:07:40.063  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:07:40.063  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      00:07:40.063  IDENTIFY DEVICE
  00 00 01 01 00 00 a0 00      00:07:39.875  NOP [Abort queued commands]

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging


On 2010-09-18 00:06, 6tr6tr wrote:

> ==> WARNING: May need -F samsung2 or -F samsung3 enabled; see manual for details.

Better read that.

> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

> 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2595

> 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 525379

Is this true, the disk has been running for 60 years and only powered on 2600 times?

Either you didn’t copy paste the correct data, or something is wrong/buggy.

> No self-tests have been logged. [To run self-tests, use: smartctl -t]

You should do so. At least the short test.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

The copy paste is correct.

i already replaced the HD, but I’ll put it in another computer soon and run those tests on it.

Hi
How old is this drive???

Interesting comment here;
http://www.samsung.com/global/business/hdd/faqView.do?b2b_bbs_msg_id=106

Manufacturer’s diagnostic tool
http://www.samsung.com/global/business/hdd/support/utilities/Support_HUTIL.html

Sure you have good cooling in your system? 45C seems a tad high for an
80GB drive, I run 3x500GB drives and they are at 30-33C my 36GB Raptors
ran at around 25C…

Hardware_ECC_Recovered
http://newsgroups.derkeiler.com/Archive/Uk/uk.comp.os.linux/2007-10/msg00188.html


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.32.19-0.2-default
up 1 day 9:38, 2 users, load average: 0.58, 0.21, 0.13
GPU GeForce 8600 GTS Silent - Driver Version: 256.53