Help me figure out why my system is unstable

I’ve been struggling with trying to get this system running right for ages now. Here’s my hardware:
PSU - Corsair HX520W
Motherboard - Asus A8N-SLI Deluxe 18.05 BIOS
CPU - Athlon 64 3200+
Memory - Corsair TWINX1024-3200C2
Video - GeForce 7300
Sound - Turtle Beach Santa Cruz (cs46xx)
TV Tuner - PCHDTV-5500
Optial - NED DVD-RW 3550A
Hard Disk - Seagate st316023as 160Gb SATA
Monitor - Viewsonic GA771

Past problems:
Firmware test - openSUSE Forums
Locking up on install - openSUSE Forums
](http://forums.opensuse.org/applications/404437-gdm-problem.html)

I’m running SUSE 11.1. I’ve tried the 64-bit version and the 32-bit version. Right now I can’t even get it installed properly. Most recently, when I tried to install the 64-bit version, I would get to the software installation step and it would start telling me to install the SUSE install disk 1. Of course, it was in the drive as it had to be read to get that far in the install. I thought, “maybe there is a problem with the 64-bit drivers for the disk controller.” So I tried the 32-bit version. That worked great until I tried to boot the system at the point in the install where it reboots. It wouldn’t recognize the HD as a boot disk.

I tried running the install media check on the 64-bit version. It locked up about 45% of the way through the check. I rebooted and did it again and it worked fine. When I tried the 32-bit media, it went fine. I’ve run the memtest without problems (for days).

I thought perhaps it was a problem with my disk controller or HD. I ran Spinrite (a HD check utility) on the HD. It found no errors.

So what the heck else can I check? Any suggestions?

BTW, my HD is partitioned as follows:
Partition 1 - /boot ext3
Partition 2 - swap
Partition 3 - LVM with volumes for /, /home, /usr, /var, /opt
Partition 4 - media partition for MythTV storage ext3.

I’m installing the base system with GNOME, but also installing KDE4 base system and desktop environment, kernel sources and 32-bit compatibility on the 64-bit install.

So how is it unstable because you say: Right now I can’t even get it installed properly

Are you saying you did get it installed and it was not good?
And now you are trying to install again and are having problems with the media disc?

I should check the disc again. And as for the hardware you can check here
Hardware - openSUSE
Not a comprehensive list

On Mon, 2009-07-20 at 18:26 +0000, Yippee38 wrote:

> I’m running SUSE 11.1. I’ve tried the 64-bit version and the 32-bit
> version. Right now I can’t even get it installed properly. Most
> recently, when I tried to install the 64-bit version, I would get to the
> software installation step and it would start telling me to install the
> SUSE install disk 1. Of course, it was in the drive as it had to be
> read to get that far in the install. I thought, “maybe there is a
> problem with the 64-bit drivers for the disk controller.” So I tried
> the 32-bit version. That worked great until I tried to boot the system
> at the point in the install where it reboots. It wouldn’t recognize the
> HD as a boot disk.

Solution to this one (maybe)… it’s likely your clock is way out of sync
with the rest of the world. Last time I had this issue, that was the
cause.

Go into your bios and manually set the time of your machine to something
roughly correct and see if that helps. AFAIK, this has only been an
issue on very recent suse’s.

I checked the system time/date and it was fine. Thanks for the suggestion though.

I am saying that I am trying to re-install it and I continue to have all kinds of weird problems. If it was the same problem every time, that would be one thing. Many, seemingly unrelated problems makes me think my system is not stable. Does that make sense?

As I said, I did have trouble with the media disk once so I tried the 32-bit install. That time the install appeared to work fine, but then the system would not boot into that just installed system. It gave an “unable to find a boot partition” error.

That hardware link really doesn’t help at all. There is no entry for “Hard Disks, Optical Storage, Flash” in the Hardware Troubleshooting section. My DVD-RW drive isn’t listed in the compatibility section, but that doesn’t mean anything, or it could me everything. Not being listed doesn’t help me though.

If you have verified that the media are OK, then the next step is to
check memory. Run memtest for 24 hours. Even though Windows may work
with the memory, Linux uses some sections that seldom get used by Windows.

As I mentioned in the original post, I’ve run the memory test several times, sometimes for days and have gotten no problems since I’ve gotten my memory timings figured out. I may run it again though.

I’ve found something interesting. I’ve run the media check 3 more times on the 32-bit install and it has passed every time. I did the same on the 64-bit install. Two of the three times, it locked up part way through (about 1/3 of the way through the first time, and about 1/2 way through the last time). When it locks up, the progress bar stops moving, the drive stops spinning, and all three lights on the keyboard start flashing in unison. The second test (of the three attempts) passed. Would that be an indication that the media is bad, or of something else? It seems strange that it does fine with the 32-bit media, but not with the 64-bit. That makes me think it is the media. However, if it is the media, why would it pass sometimes, and locking up with the flashing keyboard lights sound more like a kernel panic to me.

I tried the 64-bit media check again a few times. The first two times it passed again. The third time, it loaded the kernel then went to the SUSE loading screen and froze. The fourth time it passed again.

Are you running the media check on the same dvd drive all the time? Sounds like a flaky dvd drive to me. Media should either pass or fail

Yippee38 wrote:
> I tried the 64-bit media check again a few times. The first two times
> it passed again. The third time, it loaded the kernel then went to the
> SUSE loading screen and froze. The fourth time it passed again.

You definitely have a hardware problem. That kernel should not error.
Have you checked your PSU? You mentioned changing memory timings. Is
your CPU overclocked?

caf4926 wrote:
> Are you running the media check on the same dvd drive all the time?
> Sounds like a flaky dvd drive to me. Media should either pass or fail

ah, yes and no…

i’m with Larry on this one: it is a hardware issue…
but, you are right too: the media IS good or it would never pass muster…

so, it could be the DVD drive is sometimes reading wrong, OR there is
some other hardware problems which does not ALWAYS show itself…

lots of things can cause that…probably the most often is an
intermittent connection fault…maybe a corroded point that sometimes
passes enough current, and sometimes does not…

if it were me (since the flashing LEDs means kernel panic, and that is
most often caused by BAD RAM) i’d pull the RAM and lightly spray the
contacts with an electrical connection lubricant/cleaner (BLUE STUFF
used to be the best name there (20 to 30 years ago)… same for
connections between the motherboard and DVD/CD, hard/floppy drive(s)…

sometimes just pulling the connections apart and putting them back
together is enough…

sometimes there is a itty-bitty crack in the printed ‘wiring’ of the
motherboard that usually is okay but sometimes is not…

sometimes, though improper handling, a static electricity surge is
introduced into a chip which damages but doesn’t destroy leads (read:
http://www.build-your-own-cheap-computer.com/static-electricity.html)

and, sometimes it is just an old, tired, flaky and failing power
supply unit that is being asked to provide more power than it
should–though, in this case i’d rule that out, unless that Corsair HX
was dunked in water prior to installing :wink:

sometimes there is just not a perfect ground connection at all of the
various points where there should be…over tightening some of those
places on the motherboard just wears through the printed ‘wire’ and
leaves only the lead contacting the boards non-conductive, plastic
substructure…

i’d wiggle all of the connections that i didn’t take off and reconnect
(with POWER line removed from the wall…of course…NOT just computer
turned off)

good luck…finding the problem is usually a lot harder than fixin’.


brassy

I’d say it either:

a ) Bad media (always burn at the slowest speed and with, supposedly, good media)

b ) Bad, or picky, drive

c ) Bad memory. Try removing one DIMM if you have two.

Here’s my 50p worth -

  1. Go into BIOS and set everything to failsafe, or default.
  2. Remove all the extra gizmos from the computer apart from the basics, ie keep the mainboard/1xHard drive/1xMem chip (if possible), 1x CD drive. As bare as possible so that you can boot.
  3. Make sure your CD drive is NOT on the same cable as your hard drive!

Now try to install, but don’t play around, just use all the default options, ie no logical volumes and extra software.

Next I would be looking at ACPI.

If all goes well, add one piece of hardware at a time to see what happens.

Good luck :).

VERY good advice growbag!

and it can followed along with my thoughts that there might be a
corroded connection:

  • set BIOS to failsafe, or default
  • unplug//clean contacts and replug EVERYthing, but:
    – only one RAM stick
    – plug in ONE CD drive (NOT on the same cable as the hard drive)
    .- plug in only ONE hard drive
  • Now try to install, but don’t play around (etc etc as you wrote)
  • next, look at ACPI


brassy

Thanks for all the feedback guys.

I ran some additional media checks last night. I ran the 32-bit 3 more times and it passed every time. I downloaded the 64-bit software again, checked the md5 sum, and burned it again at 4x (slowest I can do). I then ran the check and it locked up the first time. It seems very strange that it runs fine in 32-bit, but occasionally fails on the 64-bit. That definitely sounds like hardware or BIOS to me, but specifically where in those two areas, who knows.

I do have only one optical driver.

My CPU is not overclocked.

My PSU is new for this build, and has never been immersed in water. :wink:

I’m a bit worried about the motherboard. Although I’ve replaced every piece of hardware on the machine since starting this build (including replacing the IDE drive with a SATA drive), the motherboard I’m using is used. If there is a problem with that, it’s going to be a big problem since it’s a socket 939 motherboard and will be hard to find a new replacement.

I will run through some of the suggested steps after work today. I’ll let you know what I find.

The media check should not care about your hardware, so long as the hardware is OK. So if the two dvd’s were good they should both pass. Does that make sense?

So have you tried another install with the 32bit?

A trick I learned while I was a PC tech was to clean metal contacts using an eraser (as in a pencil eraser). Not as thorough as using a cleaning solution, but effective nonetheless.

My 2c on the issue. :slight_smile:

When I unplugged everything and plugged it back in, I did exactly that. I used to use that when I was a PC tech way back when.

I am making progress. Stripping the system down to the basics, without changing anything in the BIOS has given me all successful checks. I am now at the point where I’m adding the second memory stick back in (the last piece of hardware to be added back in). We’ll see if this works or not. It may have been something not seated properly (I hope).

I did notice that the hard drive is **** hot. It is beyond warm. I’d call it hot to the touch. Nowhere near burn you hot, but more than just warm though. I’d guess that it’s about 75 degrees in here F (24 C), so it’s not the best environment, but it’s still hotter than I would have expected. I may have to add another fan to this chassis.

> I did notice that the hard drive is **** hot … I may have to add
> another fan to this chassis.

ah, HEAT problems too…there is a unstable maker!

can you place the hard drive it in a bay with an empty slot above and
below (sometimes just moving it away from (say) being between and
right up against the CD/DVD drive and a floppy drive, helps a lot)…

and, look here: > I did notice that the hard drive is **** hot … I
may have to add another fan to this chassis.

ah, HEAT problems too…there is a unstable maker!

can you place it in a bay with an empty slot above and below
(sometimes just moving it away from (say) being between and right up
against the CD/DVD drive and a floppy drive, helps a lot)…

look here: http://www.endpcnoise.com/cgi-bin/e/computercooling.html

and read about about wires, “wind tunnels” and case fans that blow in
or out…good stuff…


brassy

24°C is nothing for a hard drive, over 45°C and I would start to think about adding a fan ;).

You say it’s a used mainboard?

First thing I would do is check all the capacitors on the mainboard. Look for domed tops, they should all be nice and flat. If domed, or you see a gooey mess on the mainboard at the base of one or more capacitors, then they will need replacing.

Also check for mainboard damage around the CPU, inexperienced builders (and careless, yes I’ve done it once myself!) have a knack of allowing heatsinks to scratch the board.

Plus you did clean all the old heatsink compound off the CPU and apply new stuff before installing the heatsink I’m assuming?

Thanks. That’s good to know. The HD is mounted vertically all by itself. The nearest hardware to it is the LED/IR board, and that’s about 2 inches away.

You say it’s a used mainboard?

First thing I would do is check all the capacitors on the mainboard. Look for domed tops, they should all be nice and flat. If domed, or you see a gooey mess on the mainboard at the base of one or more capacitors, then they will need replacing.

Also check for mainboard damage around the CPU, inexperienced builders (and careless, yes I’ve done it once myself!) have a knack of allowing heatsinks to scratch the board.

Plus you did clean all the old heatsink compound off the CPU and apply new stuff before installing the heatsink I’m assuming?

I did check the caps. I’ve had to replace them on a couple of motherboards before. Not a fun job. The very first thing I did when I received this board was to check over all the traces very carefully. I saw no signs of scratches.

Yeah. I replace the compound every time I remove the heatsink.

The good news is that after I put the second stick of memory in, I ran 3 media checks with the 64-bit OS and they all passed. Something must not have been seated properly. I do think that when I removed the PSU connector to the motherboard I noticed that one of the wires was not seated in the plug all the way. I’m not sure if that was it, or if something just wasn’t seated properly.

I screwed everything back in place and now I’m going to run some more media checks just to be sure. I may turn on the motherboard’s temperature monitoring, just to be on the safe side though.