...and I can't find my way home....

Running 12.2 and as always I keep a separate disc for /home. In this case it’s a 1Tb drive parked as /dev/sdb.
And of course, like a moron I haven’t backed up since May last year…:shame:

Suddenly this week the system locked up and after several hours it was obvious nothing more was going to happen.
I had no keyboard control, mouse clicks did nothing. The only thing I had left was to give it a “Raytheon restart” as we used to call it in the Navy. (powerdown, powerup)

I got expected issues in that system started in emergency mode. /dev/sda with the system files on it appears to be fine.
But /dev/sdb with the /home partition would not mount.

I went ahead and ran fsck manually on /dev/sdb and got a lot of error but not the yes/no types of responses I’ve seen from fsck in the past.
This was lengthy and in the end mentioned several “bad blocks”.

So I popped in another old 10Gb IDE drive and installed a minimal 13.1 system on it and put my new 2Tb drive in for the /home partition and got a working system.
Then I put the unmountable old 1T drive into a SATA external and tried to access the data on it that way so that I could get critical files at least.
But it won’t mount at all.
This is the message from Dolphin when trying to mount the errant drive externally:

An error occurred while accessing '931.5 GiB Removable Media', the system responded:  The requested operation has failed:  Error mounting /dev/sdd1 at /run/media/JeepNut/8dfd05e1-5941-4269-a444-3aec97032d2f:  Command-line `mount -t "ext4" -o "uhelper=udisks2,nodev,nosuid" "/dev/sdd1" "/run/media/JeepNut/8dfd05e1-5941-4269-a444-3aec97032d2f"' exited with non-zero exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. 

I’ve always used removable media racks so it’s easy enough to yank the 13.1 system disk, pop in the 12.2 system drive and swap the new /home partition drive for the old and re-run fsck so I can get that information available to post as well. I’m sure that will be helpful in determining just how screwed I am.

So two questions at this point:

  1. What log name should look for to grab that detail?
  2. Anything recommendations based on the above info?

Thanks!
JeepNut

I have a 1T drive. I first installed it in an external disk enclosure, and mounted it. I used “gdisk” to create a basic partition table, but fortunately, I did not actually partition it.

Here’s what I saw happening:

Mounting via an external disk enclosure, I was informed that the physical blocksize is 4K and the logical blocksize is also 4K.

When I later installed the same drive as an internal disk, it showed as physical blocksize=4K, logical blocksize=512.

That mismatch of logical blocksize is a problem. The partitioning is done in terms of logical blocksize.

If you are seeing the same problem, then you best hope is to install as an internal drive, even if that means using duct tape to temporarily hold in place.

On 2014-01-19 19:26, SomeSuSEUser wrote:

> I went ahead and ran fsck manually on /dev/sdb and got a lot of error
> but not the yes/no types of responses I’ve seen from fsck in the past.
> This was lengthy and in the end mentioned several “bad blocks”.

Sure it was “/dev/sdb”?


> --------------------
>     failed:  Error mounting /dev/sdd1 at /run/media/JeepNut/8dfd05e1-5941-4269-a444-3aec97032d2f:
>     Command-line `mount -t "ext4" -o "uhelper=udisks2,nodev,nosuid" "/dev/sdd1" "/run/media/JeepNut/8dfd05e1-5941-4269-a444-3aec97032d2f"'
>     exited with non-zero exit status 32: mount: wrong fs type, bad option, bad superblock on /dev/sdd1, missing codepage or helper program,

> --------------------

Notice the difference. It is trying to mount “/dev/sdd1”, which means it
is the first partition of the disk. But the fsck you run above was done
on the raw device, no partitions.

How come?


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

Apologies for the confusion… I’m on the ragged edge of having a clue about what I’m doing… :?

Thanks robin_listas for calling me out on these discrepancies:

Sure it was “/dev/sdb”?

well… no actually it was /dev/sdb1…

Notice the difference. It is trying to mount “/dev/sdd1”, which means it is the first partition of the disk. But the fsck you run above was done
on the raw device, no partitions.
How come?

I do notice now… the difference is that in this case I had attached the drive via USB after having created a 13.1 system to work with.
So it was then id’d as /dev/sdd1 in that system when I tried to access it.
The fsck I mentioned was done on the original system setup so the drive was installed as /dev/sdb.

So now that I’ve had the entire day to try and gather some hopefully more accurate info this time :wink: …and certainly more detailed…

I’ve reset the system drives to the original configuration. 12.2 system drive and the errant 1Tb /home drive in the racks.
Booted up again and back into emergency mode as before. Here are the results and lordie it was painful.
Not knowing how else to get it, I’ve manually typed every line here since I can’t seem to get it out of a log file somehow…


Popeye:~ #  fdisk -l
Disk / dev/sda:  10.3 Gb,  10254827520 bytes
255 heads, 63 sectors/track, 1246 cylinders, total 20028960 sectors
Units = sectors of 1 * 512 = 512 byes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal):  512bytes / 512 bytes
Disk identifier:  0x000775cc

Device        Boot   Start            End           Blocks     Id    System
/dev/sda1       *      2048           17911807      8954880    83   Linux
/dev/sda2              17911808       20027391      1057792    82   Linux swap / Solaris

Disk / dev/sdb:  1000.2 Gb,  1000204886016
255 heads, 63 sectors/track, 121601  cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 byes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal):  512bytes / 512 bytes
Disk identifier:  0x000e6935

Device        Boot   Start       End           Blocks       Id    System
/dev/sdb1             2048       1953523711    976760832    83    Linux

So then I ran fsck manually. robin_listas got me thinking so I did it twice…
.


Popeye:~ #  fsck /dev/sdb
fsck from util-linux 2.21.2
e2fsck 1.42.4 (12June-2012)
ext2fs_open2:  Bad magic number in super-block
fsck.ext2:  Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sdb

The superblock could not be read or does not describe a correct ext2 filesystem.
If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Unfortunately I’m not sure what most of that means. Then I re-ran fsck on /dev/sdb1 like I’d actually done before.
I was going to try and somehow copy out the loads of info it throws, but my web search tells me that there is no log created for fsck errors and that the output is sent only to the screen. I’m sure that makes sense to someone with a lot of knowledge of the system but to me it sounds totally stupid.
Am I missing the obvious some how? But anyway…I give the hundreds of lines of text from my screen as best I can.

There are lots of groups of text that seem to repeat.
I can only scroll up so far on screen but they appear to be “sets” of data that are identical other than that first set of numbers (9xxx.xxxxxx)
That number increments on each line and I suppose is just a sequential “event” number or whatever. Here is one set of the repeating data…


[9404.503775]  ata6.00   exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[9404.505391]  ata6.00:  BMDMA stat 0x24
[9404.506992]  ata6.00   failed command:  READ DMA
[9404.508578]  ata6.00:  cmd c8/00:08:68:09:00/00:00:00:00:00/e0 tag0 dma 131072 in
[9404.508579]                res  51/40:00:68:09:00/00:00:00:00:00/00 Emask 0x9 (media error)
[9404.511810]  ata6.00:  status:  { DRDY ERR }
[9404.513417]  ata6.00:  error:  {UNC}

That identical set repeats many, many times scrolling down the screen over and over and then these lines…


[9404.648463]  end_request:  I/O error, dev sdb, sector 2408
[9404.640074]  Buffer I/O error on device sdb1,   logical block 45
[9404.651698]  Buffer I/O error on device sdb1,   logical block 46
[9404.653281]  Buffer I/O error on device sdb1,   logical block 47
[9404.654857]  Buffer I/O error on device sdb1,   logical block 48
[9404.656408]  Buffer I/O error on device sdb1,   logical block 49
[9404.657940]  Buffer I/O error on device sdb1,   logical block 50
[9404.659641]  Buffer I/O error on device sdb1,   logical block 51
[9404.660957]  Buffer I/O error on device sdb1,   logical block 52
[9404.662434]  Buffer I/O error on device sdb1,   logical block 53

and then immediately afterward, the sets noted above begin with a slight alteration (note the highlighted red number):


[9xxx.xxxxxx]  ata6.00:  BMDMA stat 0x24
[9xxx.xxxxxx]  ata6.00   failed command:  READ DMA
[9xxx.xxxxxx]  ata6.00:  cmd c8/00:08:68:09:00/00:00:00:00:00/e0 tag0 dma **4096** in
[9xxx.xxxxxx]                res  51/40:00:68:09:00/00:00:00:00:00/00 Emask 0x9 (media error)
[9xxx.xxxxxx]  ata6.00:  status:  { DRDY ERR }
[9xxx.xxxxxx]  ata6.00:  error:  {UNC}

That set repeats exactly 6 times, then:


[9xxx.xxxxxx]  end_request:  I/O error, dev sdb, sector 2408
[9xxx.xxxxxx]  Buffer I/O error on device sdb1, logical block 45
fsck.ext4:  Attempt to read block from filesystem resulted in short read while trying to open  /dev/sdb1
Could this be a zero-length partition?
Popeye:~ #

And I’m back to the cmd prompt with no opportunity to answer the question.

I seriously need to kick my own butt for not having backed up my password file, calendars, documents, check registers, etc for a year.
I’m so aggrivated with my stupidity that I can’t begin to tell you.

So pray tell… is there any hope here and if so, what tools do I need to recover data from this drive?

Thanks all!

I would run testdisk and see if it can help restore your data on the drive.

On 2014-01-20 01:36, SomeSuSEUser wrote:
>
> Apologies for the confusion… I’m on the ragged edge of having a clue
> about what I’m doing… :?

Ok.

> I’ve reset the system drives to the original configuration. 12.2 system
> drive and the errant 1Tb /home drive in the racks.
> Booted up again and back into emergency mode as before. Here are the
> results and lordie it was painful.
> Not knowing how else to get it, I’ve manually typed every line here
> since I can’t seem to get it out of a log file somehow…

With a pipe to a file on a usb stick mounted manually, or a photo with a
camera. Typing all that IS a pain.

>
>
> Code:
> --------------------
>
> Popeye:~ # fdisk -l
> Disk / dev/sda: 10.3 Gb, 10254827520 bytes

> Device Boot Start End Blocks Id System
> /dev/sda1 * 2048 17911807 8954880 83 Linux
> /dev/sda2 17911808 20027391 1057792 82 Linux swap / Solaris
>
> Disk / dev/sdb: 1000.2 Gb, 1000204886016

> Device Boot Start End Blocks Id System
> /dev/sdb1 2048 1953523711 976760832 83 Linux
>
> --------------------

So it is “/dev/sdb1”.

>
>
> So then I ran fsck manually. robin_listas got me thinking so I did it
> twice…
> .
>
> Code:
> --------------------
>
> Popeye:~ # fsck /dev/sdb

That one will fail. If it writes something there it destroys data…

> --------------------
>
>
> Unfortunately I’m not sure what most of that means. Then I re-ran fsck
> on /dev/sdb1 like I’d actually done before.
> I was going to try and somehow copy out the loads of info it throws, but
> my web search tells me that there is no log created for fsck errors and
> that the output is sent only to the screen.

There is no log because there is no safe place to store it at that time…

> That number increments on each line and I suppose is just a sequential
> “event” number or whatever. Here is one set of the repeating data…
>
>
> Code:
> --------------------
>
> [9404.503775] ata6.00 exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [9404.505391] ata6.00: BMDMA stat 0x24
> [9404.506992] ata6.00 failed command: READ DMA
> [9404.508578] ata6.00: cmd c8/00:08:68:09:00/00:00:00:00:00/e0 tag0 dma 131072 in
> [9404.508579] res 51/40:00:68:09:00/00:00:00:00:00/00 Emask 0x9 (media error)
> [9404.511810] ata6.00: status: { DRDY ERR }
> [9404.513417] ata6.00: error: {UNC}
> --------------------

Wow. That’s bad, but not related to fsck.

>
>
> That identical set repeats many, many times scrolling down the screen
> over and over and then these lines…
>
>
> Code:
> --------------------
>
> [9404.648463] end_request: I/O error, dev sdb, sector 2408
> [9404.640074] Buffer I/O error on device sdb1, logical block 45
> [9404.651698] Buffer I/O error on device sdb1, logical block 46
> [9404.653281] Buffer I/O error on device sdb1, logical block 47

> --------------------

Bad as well.

>
> That set repeats exactly 6 times, then:

Six failed read attempts. Figures. I think.

>
>
> Code:
> --------------------
>
> [9xxx.xxxxxx] end_request: I/O error, dev sdb, sector 2408
> [9xxx.xxxxxx] Buffer I/O error on device sdb1, logical block 45
> fsck.ext4: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
> Could this be a zero-length partition?
> Popeye:~ #
>
> --------------------
>
>
> And I’m back to the cmd prompt with no opportunity to answer the
> question.

There is no possible answer.

> I seriously need to kick my own butt for not having backed up my
> password file, calendars, documents, check registers, etc for a year.
> I’m so aggrivated with my stupidity that I can’t begin to tell you.
>
> So pray tell… is there any hope here and if so, what tools do I need
> to recover data from this drive?

Well… that’s a hardware problem. Hopefully, it is just a bad cable or
bad connection on the disk hardware. At worst, it could be the
controller on the motherboard or the hard disk.

You said you used a disk caddy? Maybe it is damaged.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

On 2014-01-20 04:26, Argedion wrote:
>
> I would run ‘testdisk’ (http://www.cgsecurity.org/wiki/TestDisk) and see
> if it can help restore your data on the drive.

No way. With those read errors, impossible.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

LOL Argedion, DON’T YOU BELIEVE HIM.
No disrespect intended to Carlos but…

To be fair, In the end it wasn’t really TestDisk that solved this for me but Argedion pushed me in a direction that turned out to be a really good path to go down…[size=2][FONT=arial] I’m anal about logging my system details, always have been. Same with every vehicle I’ve ever owned. The last Jeep I let go of to my granddaugher went with a 132 page log book…but I’d had that Jeep for 19yrs so there were a lot of documented maintenance, repairs, and commentary re: quirks and little known facts.
So here is what I’ve written in my log about this[FONT=Nimbus Sans L] “hopelessly dead” drive issue…[/FONT]

[/size][/FONT]

WOW! The fact that I’m even editing this file is flatly stupendous.
The disk on which I store the /home partition crashed and would not boot.
And as I hadn’t backed up in a year, I basically had lost EVERYTHING other than the system itself.
There’s Lesson#1.
But I did a LOT of reading, a little more trying, asked various sources, and got a fair number of people telling me I was just screwed.
Well guess what? Here I am editing the same file I “lost” last week.

         I poked around with several things including a GParted Live CD which was highly touted, and a Knoppix Live CD which was also recommended but I really NEVER figured out just what the hell I was supposed to actually DO with them.  Admittedly pressed for time, I was in search of the quickest and most reliable recovery method and didn't have the luxury of becoming a forensic computer analysis expert.  Excellent tools in those I'm sure, but I kept looking....
         At some point a poster on the openSuSE forum asked if I'd tried  TestDisk   which I had not, but in doing some homework regarding that I found it advertised to be on the Knoppix CD.  So I booted up the old system w/ Knoppix Live disc and tried to find  TestDisk  but couldn't find it or get anything to run that showed that label.

I had ALSO read about some other tools on a disc called Digital Forensic / Incident Response GNOME Boot CD. I found the dd_rescue tool on that disc and it sounded like what I really was looking for, so I decided to try and figure out if it would even be helpful and then maybe try that. After poking around with that quite a bit, I was able to figure out how to copy ALL the data bit by bit off of the failing 1Tb drive, to a 2Tb disk in a USB adapter. To heck with the errors, just dupe the thing so I can play with it and see what MIGHT be possible…

And it WORKED! COOL! But then once that copy was made, (which took many HOURS by the way) I wasn’t sure what to do with it next.

         I took it to another system I had built w/ 13.1 on it and it wouldn't mount normally via a USB connection on that system either.  At that point it had been 3 or 4 days of trying to figure out how to save the data but since I was NOW working with a different drive (the copy) I figured WTH.  Worse comes to worse I'll maybe toast a 2Tb drive but as a last resort, I figured I would take the failing drive in to someone to have the data recovered if possible.

         So I stuck the 2Tb copy drive in a USB adapter and used a 13.1 media system I'm still in the process of trying to build and tried to access it via YaST / Partition tools.  I'm not totally unfamiliar with the disc partitioner, but I surely don't know a lot.  But once in the partitioner I thought I'd reset the partition table without formatting anything and hope for the best.  Tried some defaults and a new mount point but it still wouldn't mount.

So back to the partitioner for a second go-round and I noticed this gibberish about journaling. Haven’t a CLUE what all that means really but all I did was changed it from ordered to writeback and the thing WORKED! I have no idea WHY it worked but it worked.

         I've since pulled all the data off the 2Tb drive now onto a 3Tb backup drive and am playing with files there to see what I can salvage and move onto the new system I'll build for my daily driver.  (the system I'm using now to write this is another box I'm building up as a media center and then I'm dumping cable TV entirely).  I'll not mess with this 2TB backup or the 1Tb original worn out drive until I'm sure I've got back all I can get back from the 3Tb archival drive.

         But I'm just as giggly as a schoolgirl that I was able to pull this off.  I'm still reassigning permissions and ownerships on things to get the right accesses but that is NOTHING compared to what I’ve been through and to have back my check registers, family photos, original technical compositions, thousands of documents and details regarding 30 years wrenching Jeeps, not to mention 100Gb of music files I've compiled over 20 years...and on and on... nearly a full 1Tb of data!!  Well, it's just priceless.  I'm quite sure that more than 90% of it is fine.

         Just goes to show you, perseverance and a little dumb luck DO work from time to time even when those who “know something” tell you otherwise.  And this isn't the first time I've been down the road and learned that lesson.  I learned a LONG time ago to trust old grey-haired shadetree mechanics over "professionally trained" mechanics with no grease under their nails.  Books are great supplements to experience but they'll never replace it.

Which is why I was so determined this time to make it work. And work it has.

So that’s my story and I’m sticking to it.
Hoping that this is helpful to someone down the road.