Previously perfectly working install died after a reboot

openSUSE 11.1 x86_64

The system was running fine till I rebooted it and it pops out of the default green loading screen and shows me:

bootsplash: status on console 0 changed to off
/dev/disk/by-id/ata-WDC_WD6400AKS-75A7B0_WD-WMASY1366430-PART1: | +some progress indicator.

Then it comes up with a bunch of warnings about Multiply-Claimed block followed by things that seem to go well and:

Adding 208840K swap on /dev/sda3. Priority: -1 extents:1 across 208440k
blogd: no message logging because var file system is not accessible.Failed

fsck failed. Please repair manually and reboot. The root file system is currently mounted read-only. To remount it read-write do:
bash# mount -n -o remount,rw /
Attention: Only CONTROL-D will reboot the system in this mainanancy mode. Shutdown or reboot will not work.

So I did as it asked and ran fsck which reported me "There are 8 inodes containing multiply-claimed block.)

Then it asked a lot questions to which I could only answer yes to afaik… (postfixed with <y> instead of <y/n>). So I thought what the hell and just answered Y to everything as the files it mentioned were of no interest anyways… openoffice templates and what not.

Now as I reboot again it’s doing the same thing again… aborting the slash screen and scanning the disk… but much slower than before and it seems to get stuck at a certain percentage just to resume ~1 minute later.

When it finally reached 100% it spammed:
Illegal block number passed to ext2fs_test_block_bitmap #xxxxxxxxxx for multiple claimed block map.

WTF is going on and how do I fix it? My guess would be that my disk is dieing on me? (I’ll just hit submit now and check this post tomorrow… as the computer in question seem to be busy doing something and I need sleep)

Sounds like the drive is on it’s way out as you assume.
You could try using the dvd to repair normally good at fixing errors in the file system if that’s what’s wrong.

If the dvd can’t fix it i would replace the drive asap

Geoff

Luckily this morning when I woke up I was greeted by the login screen which did let met successfully log in… so the system which happens to be my server is running again, are there any commands I can run to check the disks health? As I might have to considering ordering a new disk… though I’m not really looking forward to it as I previously had the motherboard failing and had to replace the cpu/motherboard and RAM. (And this was supposed to be my way of saving money on the electricity bill by having this PC run instead of my desktop as a print/fileserver… not really working with all the hardware replacement needs)

I did run
smartctl -d ata -a /dev/sda

smartctl 5.39 2008-10-24 22:33 [x86_64-suse-linux-gnu] (openSUSE RPM)
Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA family
Device Model:     WDC WD6400AAKS-75A7B0                        
Serial Number:    WD-WMASY1366430                              
Firmware Version: 01.03B01                                     
User Capacity:    640,135,028,736 bytes                        
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8                                              
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Dec 25 14:35:11 2008 CET                       
SMART support is: Available - device has SMART capability.           
SMART support is: Enabled                                            

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.                                                                         
                                        Auto Offline Data Collection: Enabled.    
Self-test execution status:      (   0) The previous self-test routine completed  
                                        without error or no self-test has ever    
                                        been run.                                 
Total time to complete Offline                                                    
data collection:                 (11580) seconds.                                 
Offline data collection                                                           
capabilities:                    (0x7b) SMART execute Offline immediate.          
                                        Auto Offline data collection on/off support.                                                                                
                                        Suspend Offline collection upon new       
                                        command.                                  
                                        Offline surface scan supported.           
                                        Self-test supported.                      
                                        Conveyance Self-test supported.           
                                        Selective Self-test supported.            
SMART capabilities:            (0x0003) Saves SMART data before entering          
                                        power-saving mode.                        
                                        Supports SMART auto save timer.           
Error logging capability:        (0x01) Error logging supported.                  
                                        General Purpose Logging supported.        
Short self-test routine                                                           
recommended polling time:        (   2) minutes.                                  
Extended self-test routine                                                        
recommended polling time:        ( 136) minutes.                                  
Conveyance self-test routine                                                      
recommended polling time:        (   5) minutes.                                  
SCT capabilities:              (0x303f) SCT Status supported.                     
                                        SCT Feature Control supported.            
                                        SCT Data Table supported.                 

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                                                    
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                                                            
  3 Spin_Up_Time            0x0027   158   152   021    Pre-fail  Always       -       5058                                                                         
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       189                                                                          
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                                                            
  7 Seek_Error_Rate         0x002e   200   200   051    Old_age   Always       -       0                                                                            
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3631                                                                         
 10 Spin_Retry_Count        0x0032   100   100   051    Old_age   Always       -       0                                                                            
 11 Calibration_Retry_Count 0x0032   100   100   051    Old_age   Always       -       0                                                                            
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       189                                                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       167                                                                          
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       189                                                                          
194 Temperature_Celsius     0x0022   105   098   000    Old_age   Always       -       42                                                                           
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                                                            
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                                                            
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0                                                                            
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                                                            
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0                                                                            
240 Head_Flying_Hours       0x0032   096   096   000    Old_age   Always       -       3442                                                                         
241 Unknown_Attribute       0x0032   200   200   000    Old_age   Always       -       3653305820                                                                   
242 Unknown_Attribute       0x0032   200   200   000    Old_age   Always       -       4444854459                                                                   

SMART Error Log Version: 1
No Errors Logged          

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Which seems to indicate everything is just dandy?

Can someone confirm for sure this is a hardware error? got it again today… different files though.

One of them was a css file I’ve been working on which I could save / write to fine but eclipse was having problems committing it over svn saying something about permissions denied.
So I’m startinig to suspect it might not be an entirely hardware related problem but more of a software one…

fixed it by running efsck while not mounting the root (how the hell does it run programs without a root anyway…)

Googling ‘multiply claimed block’ didn’t bring up anything useful either.