boot fail

Openuse 13.2
KDE 4
After power failure system fails to boot. Please see screenshot.
https://www.mediafire.com/view/kb0dskv5o79m5qu/DSCN9630.JPG

It did boot but it is in emergency mode. So my guess is that one or more partitions are not mounting do to the power failure.

So first off you should run fdisk - l (note that is a lower case L not a one)

This will list your partitions so you have a reference.

run fsck /dev/sdX# for each
where X is the drive letter and # is the partition number

Note if this is dual boot some may be FAT or NTFS just ignore those. If fsck shows a problem it will give instructions. If in doubt you can post screen shots here for further advice

That’s emergency mode. Probably the system fails to mount the root partition?

Try to do as the message suggests, i.e. run “journalctl” to view the system logs.
And have a look inside /run/initramfs/rdsosreport.txt (“cat /run/initramfs/rdsosreport.txt” e.g.), maybe post it.

What happens when you type “exit”?

Maybe it would suffice to run a file system check of the / or /home partition.

fsck /dev/sda1 -r

Replace /dev/sda1 with the device your root/home partition is on. “blkid” should show all partitions.

Thank you very much.

fdisk - l

gives error command not found
but I have only one part on the drive so

fsck /dev/sda1

helped

Apparently / failed to mount because of the unclean shutdown/some errors on the filesystem (that’s why the system dropped to emergency mode in the first place), and fdisk is not included in the initrd (initial ramdisk that is used for booting).

Fortunately fsck was able to fix the errors though… :wink:

Only one partition?

I have system partition on /sda1

Other partitions are home and swap (/sda2 and /sda3)

Another problem is that this error repeats every time I reboot the system or power off and on fully (without using hibernate or stand by).

fsck finds many errors (100+)

Have you run smartctl to check the health of the drive?? It sounds like the drive may be dieing. Also are you shutting down properly ie not just turning off the machine but doing a shutdown procedure.

Thank you very much

Yes, I power off properly using GUI shut down

smartctl offline test is shown below. But I can’t understand if test is passed or not.

smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.6-2-desktop] (SUSE RPM)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD10EZRX-00L4HB0
Serial Number:    WD-WCC4J3780450
LU WWN Device Id: 5 0014ee 25f6ae130
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Thu Dec  4 13:26:54 2014 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (13980) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 159) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   126   124   021    Pre-fail  Always       -       4700
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       583
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       530
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       576
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       35
193 Load_Cycle_Count        0x0032   196   196   000    Old_age   Always       -       12821
194 Temperature_Celsius     0x0022   116   101   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1
No Errors Logged


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      90%       527         -
# 2  Extended offline    Aborted by host               90%       524         -
# 3  Extended offline    Completed without error       00%         2         -
# 4  Conveyance offline  Completed without error       00%         0         -


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

If you are shutting down properly you should not be seeing corruption. The smartctl test looks ok but that is only the short test but it should indicate any problems.

What file system are you using is it ext4 or BTRFS??

On 2014-12-04 11:36, rafisv wrote:

> smartctl offline test is shown below. But I can’t understand if test is
> passed or not.

Yes and no.

> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED

Health is OK, so far.

> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:

These are good.

> SMART Error Log Version: 1
> No Errors Logged

Ok


>   SMART Self-test log structure revision number 1
>   Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
>   # 1  Extended offline    Interrupted (host reset)      90%       527         -
>   # 2  Extended offline    Aborted by host               90%       524         -
>   # 3  Extended offline    Completed without error       00%         2         -
>   # 4  Conveyance offline  Completed without error       00%         0         -

But this is not: the test did not complete, one was aborted, the other
interrupted, and very early. You have to trigger the long test, and
allow it to complete, which will take 159 minutes or more. You can use
the machine while it runs, but the system will become sluggish.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

2robin_listas
Thank you for such comprehensive description. I strongly believed that I completed long test on the second attempt, but now I use that this is not the case. Will make once more and report shortly.

2gogalthorp
I use ext4.

Looks like the drive is fine.

smartctl --all /dev/sda      smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.16.6-2-desktop] (SUSE RPM)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD10EZRX-00L4HB0
Serial Number:    WD-WCC4J3780450
LU WWN Device Id: 5 0014ee 25f6ae130
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Sun Dec  7 23:58:58 2014 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (13980) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.                                                                                                                                    
                                        Conveyance Self-test supported.                                                                                                                         
                                        Selective Self-test supported.                                                                                                                          
SMART capabilities:            (0x0003) Saves SMART data before entering                                                                                                                        
                                        power-saving mode.                                                                                                                                      
                                        Supports SMART auto save timer.                                                                                                                         
Error logging capability:        (0x01) Error logging supported.                                                                                                                                
                                        General Purpose Logging supported.                                                                                                                      
Short self-test routine                                                                                                                                                                         
recommended polling time:        (   2) minutes.                                                                                                                                                
Extended self-test routine                                                                                                                                                                      
recommended polling time:        ( 159) minutes.                                                                                                                                                
Conveyance self-test routine                                                                                                                                                                    
recommended polling time:        (   5) minutes.                                                                                                                                                
SCT capabilities:              (0x3035) SCT Status supported.                                                                                                                                   
                                        SCT Feature Control supported.                                                                                                                          
                                        SCT Data Table supported.


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   127   124   021    Pre-fail  Always       -       4641
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       598
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       554
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       591
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       35
193 Load_Cycle_Count        0x0032   196   196   000    Old_age   Always       -       13411
194 Temperature_Celsius     0x0022   107   101   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1
No Errors Logged


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       553         -
# 2  Extended offline    Interrupted (host reset)      90%       527         -
# 3  Extended offline    Aborted by host               90%       524         -
# 4  Extended offline    Completed without error       00%         2         -
# 5  Conveyance offline  Completed without error       00%         0         -


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



On 2014-12-07 22:06, rafisv wrote:
>
> Looks like the drive is fine.

Yes, it does. Now you know :slight_smile:


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

2robin_listas thank you very much

This repeated again recently (forth time). The screen output is below. I mostly use hibernate and reboot only after updates. May this be a reason?
https://www.mediafire.com/folder/939964df7q9oy/Opensuse

Yes indeed. How much memory do you have and how much swap?

On 2014-12-20 19:26, rafisv wrote:
>
> 2robin_listas thank you very much
>
> This repeated again recently (forth time). The screen output is below. I
> mostly use hibernate and reboot only after updates. May this be a
> reason?
> https://www.mediafire.com/folder/939964df7q9oy/Opensuse

No, that looks as a consequence of a failed hibernate return.

When the machine is hibernated, and for whatever reason the restore from
hibernation procedure fails, the consequences are similar to an abrupt
power failure while the computer is running: the filesystem gets very
corrupted. The system had many files opened, half written, half in ram.
Mostly temporary files used by desktop applications, I think. All those
stay “opened” while the machine is frozen.

If, for whatever reason the awakening fails (most often bugs), then the
filesystem is so corrupted that it needs and extensive fsck. And if the
fsck fails, you get dumped into emergency mode…

(That file mentioned in your first photo might have been
crucial to read and find out the reason. That and the
output of “journalctl”)

It is that way, nothing you can do. Try to guess why restore failed…

For instance. Are you using nvidia proprietary driver? If so, which
version? I had to go back to 331.79 because of problems with hibernation
on newer versions.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

There was a bug in 13.2 which caused a fsck immediately after hibernate, before the saved state has been restored.
This fsck could then actually corrupt your filesystem.
https://bugzilla.novell.com/show_bug.cgi?id=906592
My reiserfs got corrupted twice because of this (my first and only filesystem corruptions on this system in 11 years… :wink: ), and only “reiserfsck --rebuild-tree” was able to fix it.

There has been an update to dracut recently that should fix that, but if you still have those problems try to run “sudo mkinitrd” and reboot to make sure that the fix is really in your initrd.

On 2014-12-21 16:16, wolfi323 wrote:

> There was a bug in 13.2 which caused a fsck immediately after hibernate,
> before the saved state has been restored.

Oh, yes, I heard of it. initrd did it. Nasty.

> This fsck could then actually corrupt your filesystem.

A royal corruption. Happened to me once (because I booted to another
partition while “hibernated”).

> https://bugzilla.novell.com/show_bug.cgi?id=906592
> My reiserfs got corrupted twice because of this, and only “reiserfsck
> --rebuild-tree” was able to fix it.

No surprise… :-/

> There has been an update to dracut recently that should fix that, but if
> you still have those problems try to run “sudo mkinitrd” and reboot to
> make sure that the fix is really in your initrd.

:frowning:


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Will try to copy that rdsosreport.txt to USB next time.