Dear all,
maybe someone here has a better idea what is happening than me…
In August 2014 I set up a Home Server (Dell T20) which has 4 internal SATA ports (Intel) where I wanted 1 OS drive (500GB, 2.5") an 4 3TB drives with RAID6 for data - so I needed an extra SATA port, supplied via a PCIe SATA card.
A SiI based card is not detected by the sever (card issue, server issue?) - while a Marvell based card work in that it is recognised by the server, so far so good.
Over the time, I noticed that one disk would drop from the array - the disk could sometimes be re-added, sometimes it failed immediately again - erratically, finally having enough I’m trying to figure out where the issue lies…
This issue has occurred on OpenSuSE 13.2 and persists in Leap 42.1.
(I have also contacted TechSupport, but I sort of hope that there may be some people with personal experience here.)
- The card has a chip that is labelled as 88SE9128 - a DOS based BIOS update utility as well as Linux however identify the chip as a 88SE9123 - the card is also sold as having an 88SE9128.
Label on the Chip:
88SE9128-NAA2
PPH7290.04
1322 B1P
TW
(Startech card with 2 Sata 3 Ports, PEXSAT32)
- Using dd to zero fill the drive (outside of mdadm), the process initially starts with around 50-54MB/s and then drops, generally to 10-20MB/s, I have seen it as low as 500KB/s.
The process WILL fail - however it has written 1GB, 2.5GB, 20GB before it fails with no discernible pattern.
This happens on a 3TB drive, but also on a 2TB drive that I tested. - After failing, the drive will not show up in the partitioner in Yast - when querying the SMART data, it will now report a size of 600PB until the server is rebootet - after which the drive shows again and the SMART data comes back with no problems.
- The SMART data for the drive is fine - the drive itself is most definitely fine too (recertified return) - it also passes a SeaTools test. In case anybody wants to know, it is a 7200rpm, 3TB Seagate Barracuda
Tech Support Sugggestion:
Inquiring with StarTech, it was suggested that I should try the card in another computer - the only suitable machine for this is my Windows PC (Windows 7, 64bit) - where I had to install the dedicated Marvell Driver.
- With it, the card seemingly works fine, at least it works on both the 2TB drive doing an 8GB CrystalDisk benchmark and on the 3TB drive doing a 16GB CrystalDisk benchmark.
- This time the drives got around 170-192MB/s sequential read and a good 150MB/s sequential write.
(This is in line with the reported speed when looking at an array rebuild on the internal Intel SATA ports in the server.)
Searching around, it seems that people consider the Marvell chips evil - well, unfortunately, I’m not in a position to purchase a several hundred pounds RAID card.
- One person apparently had issues and resolved them by limiting the rebuild speed - I don’t think this applies as dd slows down rapidly before reporting an I/O error.
(Link: http://tobias.kleemann.net/2013/08/solved-linux-software-raid-1-fallover-problem/ ) - Searching around some more, it is suggested that the Marvell chip corrupts drives larger than 3TB - however this has only been reported by a Russian user (a big thanks to online translators) and is echoed in one forum once and a comment on SATA/SAS card candidates. It is suggested that the problem is an overflow problem in the controller - however I cannot evaluate the accuracy of that statement - it was also reported with a firmware slightly older than the latest version which is 1.0.0.38.
(Link: http://blog.zorinaq.com/?e=10 - see post by Artem Ryabov and here is a more detailed analysis in Russian http://ru-root.livejournal.com/2659575.html )
So, any ideas, suggestions?
IF there is a driver problem in the controller, this could account for the 3TB drives - BUT would not explain why the 2TB drive also fails with dd. The slow write speed in Linux is also peculiar - as Windows is a little over 3 times as quick and normally it is Windows that is slow…
It is no the SATA cable as the same cable was used in Windows - and I also tried swapping it to no avail.