Strange random error - could not see SATA drive

I experienced a strange random error that appears to have been my PC losing contact with the PC’s 1.5 TB SATA drive.

This happened once (thus far only once - about two hours ago) on my Intel Core i7 920 PC with a Asus P6T Deluxe V2 mother board running openSUSE-11.4 with KDE-4.4.4. A reboot failed to fix the problem but a cold shut down restart did.

Some details …

I had a zillion apps open, connected to my mother’s PC in North America (from here in Europe) using vnc, had thunderbird, firefox, many konsole sessions, and various leafpad editor sessions.

Firefox appeared to hang, so I tried to close the Firefox page, but the PC appeared frozen (although the mouse would move). Then after about 60 seconds, the screen flashed, firefox closed (crashed may be more accurate) and I had my desktop back, … sort of … And I was able to go to the desktop area where vnc was running and access my mother’s PC. But I was missing almost all the icons off my KDE destkop and most the icons off the panel at the bottom of the desktop.

I tried going to <CTRL><ALT><F1> and then <CTRL><ALT><F7> but that made no difference. I tried going to <CTRL><ALT><F1> to login but it would not let me login as a regular user, nor as root. Nor could I ssh into the PC. After <CTRL><ALT><F7> back at the crippled GUI, I typed ‘shutdown -r now’ (as root) but the command was not recognized. Neither was ‘top’ (as a regular user), nor ‘fdisk’ (as root). Dir was recognized as a regular user as was CD. But I did not believe the information those commands were giving me (it did not appear up to date).

Eventually I killed X with <CTRL><ALT><Backspace> (twice) and then forced a shutdown start with <CTRL><ALT><DELETE>, but then the PC hung during the shutdown, with the hard drive light on solid. That never happens. After a minute of that I decided to hit the ‘hardware’ reset. The PC restarted, but was incredibly slow to go through the motherboard boot menu’s and it reported the hard drive not detected and would not boot. I did a <CTRL><ALT><DELETE> to restart, and pressed DEL to go to BIOS, and then noted in BIOS that the SATA drive was not detected.

I checked the temperature and the PC was a nominal cool temperature (38 to 40 deg c).

I then switched OFF the PC. I also turned the OFF switch on the back OFF for good measure. I waited a minute. And I turned ON again the PC.

The PC booted normally as if nothing had happened.

Most bizarre and a bit of a fright. My Linux is usually rock solid.

Anyway, … no more investigation for now. I’m doing a massive backup of all data since my last backup. This will take a while as I have about 400 GB of home movies (at various stages of processing) from my home video camera to backup to an external hard drive.

I may pickup another 2TB external drive this weekend, just to ensure I have multiple backups.

Here is an extract of some of the /var/log/messages entries from today: SUSE Paste- var/log/messages

I think the problem occurred around 21:55. Before that there was an unrelated firefox crash (about an hour earlier).

The 22:07 entry was the last before 22:40 when I successfully shutdown and restarted the PC.

Most bizarre !

Glad things are working now, but 1st order of business is backup!

I found an interesting Article on SMART drive monitoring you can read here.

S.M.A.R.T. - Wikipedia, the free encyclopedia

So, there is no doubt you did crash, but what was it that died? I see you say the computer was cool, but how did you determine that? Just the CPU? What about memory or Hard disk temps? I see lots of output on the SMART system, but not sure what it means. I must say that the CPU fans that Intel uses on the i7 seem mighty small to me and I have decided to use a better one from Zalman (CNPS7000C-AICu). Nothing massive, but more than twice the size of the Intel Heat sink/Fan Setup. The larger fan setup can also cool the memory down as well. A curios thing I noticed was that during heavy usage, the the i7 seemed to start slowing down, like when doing a kernel compile, of which I have done a lot of with the stock cooler. I also noticed the CPU heat would drop like a rock when the work load let up. It was kind of curious. Also started using memory with built-in heat spreaders, but not sure of this impact. I have switched to the newer case that places the Power Supply at the bottom and uses a much larger fan at the top( Antec 900). These also put a fan directly on the hard drives as well. Your crash could just be blamed on KDE even, but you are not pushing the latest version or anything.

Bottom line guess, CPU too hot, Hard Drive too Hot OR memory error. But as you said, the PC was not hot and so I have no idea what happened then.

Thank You,

Most probably a hardware failure of some sort.

As soon as you can (after the backup of course) you should open the case and check/change the SATA cables, the power connectors - specially if you’re using a IDE->SATA power adapter and perhaps use something like pmagic 6.0 live CD to run the HD/smart tests. At least that’s what I’d do.

Currently I have one 2TB and two 1TB Seagate HDs in my main box at home, running a (properly backed up) media server. No problems until now, but I have a feeling they won’t last much - they where dirty cheap…

I’m thinking hardware also.

When the PC appeared non-responsive, before restarting, I felt the case, and it did not appear hot. When rendering videos for many hours (I’ve had this Intel Core i7 running a batch job for > 24 hrs at a time) I can typically feel warm air coming out of the PC’s side air vents. I felt no such hot air this time. … hmmm … my fans are VERY quiet normally, so if they had stopped I am not sure I would notice. I cleaned the fans a few weeks ago as well.

When I restarted the PC to BIOS and it did not see the SATA drive, I went to the BIOS power management menu which has the various PC temperatures. That is where I obtained the 35-40 degree C temperature from. Mind you by then, I was in to a second restart, and if the PC cools fast then ???

I’m going to pay more attention to /var/log/messages over the next few days. I also have to walk into town to day to pick up my CD/DVD writer which is ready for pickup (it was broken and under warranty) and when at the PC shop I may buy a second Fanec Fanbox 2TB hard drive (USB-3.0) external hard drive for an additional backup. While I have some backups (with ~5TB of external offline storage/backup), my main PC’s drive is a 1.5 TB SATA and I find I am constantly shuffling external drives to find the space to backup. Hence more offline backup storage needed (or I could give up my home video hobby - which I am loathe to do).

Good suggestion. I will do that today when I re-insert my CD/DVD writer which was under warranty (and I pickup today).