SATA problem - disk, controller or other?

I’m having some grief. I’ve posted elsewhere about a disk that is going faulty on my system, which is based on an MSI K8M Neo-V m/b. My current strategy is to try to add a new disk by adding a new controller. There’s good news and bad news. The good news is that everything’s detected and sometimes works. The bad news is that it doesn’t work all the time :frowning:

I’ve borrowed a Sweex PU102 - SATA Card PCI - PU102 - which uses the Sil 3512 chip and I’ve bought a Samsung HD103SJ 1 TB SATA disk - Samsung SpinPoint F3 Desktop Class 1 TB Internal hard drive - 300 MBps - 7200 rpm

I’ve partitioned the disk and installed Ubuntu 10.04 (I run both opensuse 11.2 and Ubuntu 10.04 on the machine and I had to pick one!). It boots and kind of runs but with lots of flakiness and lockups. There’s lots of lines like this in /var/log/messages:

Jan  5 22:53:27 piglet kernel:   157.390039] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  5 22:53:32 piglet kernel:   162.390035] ata5: hard resetting link
Jan  5 22:53:33 piglet kernel:   162.740037] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  5 22:53:33 piglet kernel:   162.780287] ata5.00: configured for UDMA/100
Jan  5 22:53:33 piglet kernel:   162.780294] ata5.00: device reported invalid CHS sector 0
Jan  5 22:53:33 piglet kernel:   162.780302] ata5: EH complete
Jan  5 22:54:03 piglet kernel:   193.040089] ata5: hard resetting link
Jan  5 22:54:03 piglet kernel:   193.390060] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  5 22:54:03 piglet kernel:   193.430287] ata5.00: configured for UDMA/100
Jan  5 22:54:03 piglet kernel:   193.430295] ata5.00: device reported invalid CHS sector 0
Jan  5 22:54:03 piglet kernel:   193.430308] ata5: EH complete
Jan  5 22:54:07 piglet kernel:   197.042033] ata5.00: limiting speed to UDMA/66:PIO4
Jan  5 22:54:07 piglet kernel:   197.042070] ata5: hard resetting link
Jan  5 22:54:07 piglet kernel:   197.390059] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  5 22:54:07 piglet kernel:   197.430288] ata5.00: configured for UDMA/66
Jan  5 22:54:07 piglet kernel:   197.430305] ata5: EH complete
Jan  5 22:54:08 piglet kernel:   197.821413] ata5.00: configured for UDMA/66
Jan  5 22:54:08 piglet kernel:   197.821437] ata5: EH complete
Jan  5 22:54:38 piglet kernel:   228.040099] ata5: hard resetting link
Jan  5 22:54:38 piglet kernel:   228.390046] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  5 22:54:38 piglet kernel:   228.430286] ata5.00: configured for UDMA/66
Jan  5 22:54:38 piglet kernel:   228.430309] ata5: EH complete

I can find lots of threads here and elsewhere about people with problems where some of these lines occur, but none that I’ve been able to use to clarify my own troubles.

I rebooted into the existing suse 11.2 installation (which is on the disk that is slowly failing with increasing bad sectors) and looked at the disk with smart:
[smartctl 5.39 2009-08-08 r2872~ x86_64-unknown-li - Smartctl-a-hd103sj#1

I then ran a long test (smartctl -t long - took 157 minutes) and had another look:
[smartctl 5.39 2009-08-08 r2872~ x86_64-unknown-li - Smartctl-a-hd103sj#2 which as far as I can tell says there were no problems. I used dd to copy 5 GB from /dev/zero to the disk a few times; it completed without errors and nothing in the log.

I also ran hdparm -tT

 Timing cached reads:   1044 MB in  2.00 seconds = 521.51 MB/sec
 Timing buffered disk reads:  244 MB in  3.01 seconds =  81.07 MB/sec

and this is what lspci shows:

00:00.0 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South]
00:0b.0 Mass storage controller: Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01)
00:0c.0 FireWire (IEEE 1394): Texas Instruments TSB12LV26 IEEE-1394 Controller (Link)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation NV44A [GeForce 6200] (rev a1)

Does this give enough information for anybody to offer an opinion on whether this is a disk problem or a controller problem or something else? What other information would be useful to track down the problem?

Over the next few days I’m hoping to be able to test with another new disk and another new controller, but it would be useful to have some clue what to do in the meantime :frowning:

No idea, really, but I’d try two things:

  1. As the m/b is somewhat old, is there any king of setting (in BIOS or on the HD) to fix it at 1.5 Gbps? In an old SATA HD I had to set a jumper or the BIOS wouldn’t even see it, now it may be automatic (nothing you can do) or done by software.
  2. If the drive you want to copy is not in a RAID I see from your m/b specs that you have two SATA ports. So you could disconnect the other drive, remove the controller board, plug the new drive in and make your copy with a disktool liveCD like PMagic (it’s excelent for stuff like this.

Also, is your PSU enough? You may consider disconnecting other peripherals not in use, like CD drives, etc.

Thanks for the ideas bruno.

There’s nothing in the BIOS to configure SATA, AFAIK. One of my new drives has a jumper (but see below).

The motherboard controller doesn’t work with SATA2 drives at all. That’s why I’m using a new controller.

The PSU is 400 W which should be enough but see below.

Since my first posting I’ve now tried some more things:

(1) I’ve tried a second new disk (320 GB Samsung) which does have a speed forcing jumper. I was able to partition it but could not even format the partitions. I’m going to test that drive elsewhere; SMART says it’s A-OK (no errors logged, self-test passes).

(2) I tried again with my new 1 TB new disk but connected its power in place of the system’s PATA drive instead of additionally to the existing drives. I also disconnected the PATA’s interface cable. The new drive now seems to work properly, though it’s too early to be sure. I don’t know whether the improvement is due to a power issue or a bus issue or something else (phase of the moon?)

(3) I’ve ordered another controller and cables as well, but it hasn’t arrived yet.

Cheers, Dave

PS I couldn’t login to the forums until I deleted my cookies - that’s very annoying and is apparently a known issue! And I couldn’t see this form properly until I disabled the CSS. Also annoying. Two reasons I prefer email!

Have you try IDE(PATA) drive? I have a difference problem, but, when I use IDE drive the problem just go away. All the best.

[/QUOTE]QUOTE I tried again with my new 1 TB new disk but connected its power in place of the system’s PATA drive instead of additionally to the existing drives. I also disconnected the PATA’s interface cable. The new drive now seems to work properly, though it’s too early to be sure. I don’t know whether the improvement is due to a power issue or a bus issue or something else (phase of the moon?)
Have you checked the BIOS for an option to disable IDE /PATA? There are some known issues with running IDE and SATA together in some cases.

(3) I’ve ordered another controller and cables as well, but it hasn’t arrived yet.

I would not do too much on this until the new controller arrives, unless it uses the same chip you will be in for some more work.

Have you checked the BIOS for an option to disable IDE /PATA? There are some known issues with running IDE and SATA together in some cases.

On second thoughts, you are probably using an IDE CD / DVD drive, in which case you don’t want to disable IDE, and have done all that’s needed there, I haven’ heard of the problem being proven to occur with a hard disk and CD / DVD drive combination.

@fettest, thanks for the suggestion but I have no wish to ever buy another IDE drive. I’m working hard to eliminate them from my life!

@dvhenry, yes I could probably disable IDE/PATA but as you guessed, I’m using a PATA DVD drive for now.

The controller I’ve ordered is based on a Via VT6421A whilst the one I’ve borrowed is a Sil 3412. But the motherboard uses an older Via SATA chip so the sata_via driver is already loaded and I’m hoping I won’t even need to rebuild the initrd when the new controller arrives. But we’ll see.

I’ve done some more tests, copying a few 100 GB to the new drive. The new drive is working but there are occasional errors logged and it steps the link down to UDMA/66. So there’s still something not quite right.

I’ve done some more tests, copying a few 100 GB to the new drive. The new drive is working but there are occasional errors logged and it steps the link down to UDMA/66. So there’s still something not quite right.

This is a real bad sign, that will certainly compromise your system reliability.

Did you check for bios updates? Read the changelogs first, to avoid flashing the bios unnecessarily (always something to think twice - or trice - before doing).

If not I’d consider a m/b upgrade if possible.

brunomcl wrote:
>> I’ve done some more tests, copying a few 100 GB to the new drive. The
>> new drive is working but there are occasional errors logged and it steps
>> the link down to UDMA/66. So there’s still something not quite right.
>
> This is a real bad sign, that will certainly compromise your system
> reliability.
>
> Did you check for bios updates? Read the changelogs first, to avoid
> flashing the bios unnecessarily (always something to think twice - or
> trice - before doing).
>
> If not I’d consider a m/b upgrade if possible.

Just to let everybody know, this turned out to be a software problem. Or
rather there’s a kernel patch that fixes a hardware incompatibility.

The VIA 6241-based controller that I bought works with current
generation Samsung (and WD) disks only with recent kernels.
Specifically, those where a patch that rejoices in the name “Joseph
Chan’s magic patch” has been applied.

Cheers, Dave