Software Raid 5 really slow.

Hi,

i have setup a software raid with mdadm. It consists of 4 hdds (1 samsung hd203wi and 3 * hd204ui) each with 2tb.

When im doing benchmarks with bonnie++ or hdparm i get about 60mb/s write speed and 70mb/s read speed. Each single drive from the array has a read speed of > 100mb/s when testet with “hdparm -t”.

I will show some information about my setup:


# mdadm -D /dev/md0 
/dev/md0:
        Version : 1.0
  Creation Time : Thu Aug 11 18:22:42 2011
     Raid Level : raid5
     Array Size : 5860539648 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 1953513216 (1863.02 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Aug 25 04:51:32 2011
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 256K

           Name : raid:0
           UUID : 7457b089:6b1627d6:ce57b6fd:9a201cf6
         Events : 1486299

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       8       81        3      active sync   /dev/sdf1


My chunk size is 256k, i also tried 128k but the performance was bad, too.


# tune2fs -l /dev/md0
tune2fs 1.41.14 (22-Dec-2010)
Filesystem volume name:   <none>
Last mounted on:          /home
Filesystem UUID:          02a772b5-21f2-4ed8-be22-a9d4c7dda2e8
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              366288896
Block count:              1465134912
Reserved block count:     73256745
Free blocks:              426107103
Free inodes:              365087866
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      674
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              64
RAID stripe width:        192
Flex block group size:    16
Filesystem created:       Tue Dec  7 18:40:27 2010
Last mount time:          Thu Aug 25 02:26:32 2011
Last write time:          Thu Aug 25 02:26:32 2011
Mount count:              31
Maximum mount count:      32
Last checked:             Thu Aug 11 18:23:52 2011
Check interval:           15552000 (6 months)
Next check after:         Tue Feb  7 17:23:52 2012
Lifetime writes:          4370 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       180879512
Default directory hash:   half_md4
Directory Hash Seed:      db9eb473-727f-447e-abee-b226dc2bbcfc
Journal backup:           inode blocks

I adjusted the stride and stripe width values as this howto https://raid.wiki.kernel.org/index.php/RAID_setup#ext2.2C_ext3.2C_and_ext4 says ( At least i think i did it … ).


# blockdev --getra /dev/md0
64

# blockdev --getra /dev/sd{c,d,e,f}
256
256
256
256

I tried different ra values, the best for me was 64. That was a surprise, as i read alot that increasing this value would not harm the performance. In my case, when i use different values from 64 or 192 i get read speeds of ~30-40mb/s. I increased it up to 32768, but the performance was always bad.


# fdisk -l /dev/sd{c,d,e,f}

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder, zusammen 3907029168 Sektoren
Einheiten = Sektoren von 1 × 512 = 512 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00028e50

   Gerät  boot.     Anfang        Ende     Blöcke   Id  System
/dev/sdc1            2048  3907028991  1953513472   fd  Linux raid autodetect

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder, zusammen 3907029168 Sektoren
Einheiten = Sektoren von 1 × 512 = 512 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0004a004

   Gerät  boot.     Anfang        Ende     Blöcke   Id  System
/dev/sdd1            2048  3907028991  1953513472   fd  Linux raid autodetect

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder, zusammen 3907029168 Sektoren
Einheiten = Sektoren von 1 × 512 = 512 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0008f827

   Gerät  boot.     Anfang        Ende     Blöcke   Id  System
/dev/sde1            2048  3907028991  1953513472   fd  Linux raid autodetect

Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
255 Köpfe, 63 Sektoren/Spur, 243201 Zylinder, zusammen 3907029168 Sektoren
Einheiten = Sektoren von 1 × 512 = 512 Bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006c778

   Gerät  boot.     Anfang        Ende     Blöcke   Id  System
/dev/sdf1            2048  3907028991  1953513472   fd  Linux raid autodetect


As far as i know does a start sector of 2048 mean that the partitions are correctly aligned

Im using opensuse 11.4 x64 with the latest patches from the update repositories. Im using the 2.6.37.6-0.7-desktop kernel.
I have 4gb ram and an atom d525.

What is wrong with my setup? Im out of ideas…

Regards
pepe

On Thu, 25 Aug 2011 03:16:03 +0000, oopepe wrote:

> What is wrong with my setup?

How many disk controllers are involved in the setup?

You may be maxing out the data bus if you just have 1 or 2 controllers.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

Im using the onboard raid controller of my ZOTAC NM10-DTX board. It is configured to not do any raid. I think it is from jmicron.

When i do hdparm -t on a single disk of the raid i get read speeds up to 130mb/s. Doesnt this show that the controller can do better?

Do those 2 TB disk drives have 4K or 512 B sectors? If 4K, were they partitioned
correctly. If not, reads will not be affected, but it would really slow down writes.

I think they are correctly aligned at sector 2048. See the last code box of my initial post. I googled a bit and found that the hd203wi has 512b sectors while the hd204ui has 4k sectors. fdisk however does report only 512b sectors for all disks. Is fdisk right?

oopepe wrote:
> lwfinger;2377980 Wrote:
>> Do those 2 TB disk drives have 4K or 512 B sectors? If 4K, were they
>> partitioned
>> correctly. If not, reads will not be affected, but it would really slow
>> down writes.
>
> I think they are correctly aligned at sector 2048. See the last code
> box of my initial post. I googled a bit and found that the hd203wi has
> 512b sectors while the hd204ui has 4k sectors. fdisk however does report
> only 512b sectors for all disks. Is fdisk right?

I think you might get better answers by typing hd204ui into google and
then asking on some of the forums that pop up where there are people
using these drives.

In particular, the one about firmware updates is important if you
haven’t checked/upgraded it already.

You might also google your mainboard. I don’t know anything about Atom
processors/chipsets but the ZOTAC NM10-DTX looks interesting. But at:

<http://www.overclock3d.net/reviews/cpu_mainboard/zotac_nm10-dtx_motherboard/5>

they say “At an average Read Speed of 90.2MB/sec”, so I think you’re
being over-optimistic. Having said that, you may be able to improve your
performance some. So either there or some other forums may help you.

Cheers, Dave

Is fdisk right?

No. fdisk is wrong. The drives pretend to have 512b sectors. I see the same here with WD Caviar green drives.

As far as i know does a start sector of 2048 mean that the partitions are correctly aligned

Correct. If they were not your results would be much worse for the writing speed.

Hi again,

i solved my problem :slight_smile:

This idea was the right one. It turned out that the Zotac NM10-DTX mainboard is a bit complicated. It has seven sata ports, six internal and one esata. Two internal ports are from the intel nm10 chipset. The motherboard has a JMicron JMB324 chipset which adds two extra sata interfaces. One of these interfaces is used for esata, but the other one gets devided into those four internal sata ports with the JMB324 sata port multiplier (see this thread: NM10-DTX SATA RAID Configuration Issues - ZOTAC Z-SPOT - It’s Time to Play!) which are to referred to as “raid” ports.

So the problem is that the extra four “raid” sata ports on the motherboard share only one real sata connection. This cannot be good for the performance.

Keeping all that in mind i switched the hdd<->sata ports. Im using the two sata ports from the nm10 chipset for two raid devices, the esata port and one of the four “raid” ports for the other two hdds. This way each hdd is connected to its own sata interface. The hdd that sits in one of the four raid ports shares the sata link with my boot hdd. But thats ok, since i dont have much going on on my boot hdd.

After some more tweaking and benchmarking of the raid parameters i ended up with the following for optimal performance:


blockdev --setra 16384 /dev/md0
echo 16384 > /sys/block/md0/md/stripe_cache_size

Using all this i get an enormous performance gain:


# hdparm -t /dev/md0
/dev/md0:
 Timing buffered disk reads: 642 MB in  3.02 seconds = 212.57 MB/sec

The bonnie++ results are quite similar. I get a writec of 72mb/s, write block 163mb/s, rewrite 84mb/s, readc 81mb/s and read block 229mb/s. YAY :slight_smile:

Regards
pepe

oopepe wrote:
> Hi again,
>
> i solved my problem :slight_smile:

That’s great; thanks for the feedback.

Have you checked the firmware? You don’t want to brick your drives :frowning:

@oopepe

Thank you for this very good feedback. It will be a reference for others and makes this thread valuable.

Hi, yes i updated the firmware before i used it in my pc. Ive heard about it from the smartmontools site. Nontheless thanks for the advice.