Hard drives operating below optimal speed?

I’ve recently finished upgrading my system and got to play with a bunch of BIOS and system settings. I’ve been reminded of an issue that’s been bothering me for a while: Although my CPU and memories are now running at optimal settings, my overall performance remains pretty slow… and the culprit always seems to be the hard drive speed. Opening large directories takes a ton of time, and if the HDD is busy many programs will freeze for several seconds. I really want to fix this if possible! I’ve done a lot of research today on how I could optimize HDD performance, but haven’t found a lot of helpful info so I’m asking my questions here.

First some background on my setup. I have two Seagate drives: A 400 GB SATA2 with the root and swap partition, and a 2 TB SATA3 for the home partition. The home and root partitions are both ext4. All controllers are set to AHCI mode (no RAID). The drives are plugged into SATA3 ports on the motherboard and share the same controller… there are also two external SATA brackets connected to another controller, but nothing is plugged into those so they shouldn’t be relevant. My OS is Linux openSUSE Tumbleweed x64.

I first check dmesg to confirm that each drive is indeed running at the correct speed. One is set to 3.0 GB/s (SATA 2) whereas the other is at 6.0 GB/s (SATA 3), so this part seems to be in order.

mircea@linux-qz0r:~> dmesg | grep -i sata | grep 'link up'
    2.899011] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    2.899032] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    3.016080] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

What bothers me apart from drive I/O freezing my processes is that from what I’ve read, I should be reaching practical drive speeds which are the link speed divided by 10. This means that for a SATA2 drive I should be getting 300 MB/s, whereas for a SATA3 drive I should be seeing 600 MB/s. Yet a simple test with hdparm shows that this is far from the case.

mircea@linux-qz0r:~> sudo hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   19738 MB in  1.99 seconds = 9902.31 MB/sec
 Timing buffered disk reads: 220 MB in  3.01 seconds =  73.17 MB/sec
mircea@linux-qz0r:~> sudo hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   19256 MB in  1.99 seconds = 9657.30 MB/sec
 Timing buffered disk reads: 412 MB in  3.00 seconds = 137.33 MB/sec

This is pretty confusing, especially since those speeds seem to vary a bit. Nothing else is reading / writing on the hard drive during that test.

Here are my other hdparm outputs which offer more info about the drives. I’d like to know if any of those settings appear out of the ordinary.

mircea@linux-qz0r:~> sudo hdparm /dev/sda

/dev/sda:
 multcount     = 16 (on)
 IO_support    =  1 (32-bit)
 readonly      =  0 (off)
 readahead     = 1024 (on)
 geometry      = 48641/255/63, sectors = 781420655, start = 0

mircea@linux-qz0r:~> sudo hdparm -i /dev/sda

/dev/sda:

 Model=ST3400620AS, FwRev=3.AAK, SerialNo=9QH0AFLD
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=781420655
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode

mircea@linux-qz0r:~> sudo hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       ST3400620AS                             
        Serial Number:      9QH0AFLD
        Firmware Revision:  3.AAK   
Standards:
        Supported: 7 6 5 4 
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:    16514064
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors:   781420655
        Logical  Sector size:                   512 bytes
        Physical Sector size:                   512 bytes
        device size with M = 1024*1024:      381553 MBytes
        device size with M = 1000*1000:      400087 MBytes (400 GB)
        cache/buffer size  = 16384 KBytes
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 254, current value: 0
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    DOWNLOAD_MICROCODE
                SET_MAX security extension
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Phy event counters
                Device-initiated interface power management
           *    Software settings preservation
Security: 
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase

Checksum: correct
mircea@linux-qz0r:~> sudo hdparm /dev/sdb

/dev/sdb:
 multcount     =  8 (on)
 IO_support    =  1 (32-bit)
 readonly      =  0 (off)
 readahead     = 1024 (on)
 geometry      = 243200/255/63, sectors = 3907020911, start = 0

mircea@linux-qz0r:~> sudo hdparm -i /dev/sdb

/dev/sdb:

 Model=ST32000641AS, FwRev=CC12, SerialNo=9WM07W4C
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=8
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=3907020911
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode

mircea@linux-qz0r:~> sudo hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
        Model Number:       ST32000641AS                            
        Serial Number:      9WM07W4C
        Firmware Revision:  CC12    
        Transport:          Serial
Standards:
        Used: unknown (minor revision code 0x0029) 
        Supported: 8 7 6 5 
        Likely used: 8
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:    16514064
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors:  3907020911
        Logical/Physical Sector size:           512 bytes
        device size with M = 1024*1024:     1907725 MBytes
        device size with M = 1000*1000:     2000394 MBytes (2000 GB)
        cache/buffer size  = unknown
        Nominal Media Rotation Rate: 7200
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 8
        Recommended acoustic management value: 254, current value: 0
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    DOWNLOAD_MICROCODE
                SET_MAX security extension
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
                Write-Read-Verify feature set
           *    WRITE_UNCORRECTABLE_EXT command
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Phy event counters
                Device-initiated interface power management
           *    Software settings preservation
Security: 
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
                supported: enhanced erase
        316min for SECURITY ERASE UNIT. 316min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c50019c655c1
        NAA             : 5
        IEEE OUI        : 000c50
        Unique ID       : 019c655c1
Checksum: correct

One notation that seems to be missing is the “using_dma” flag, this may indicate that DMA is disabled which would be a very bad thing. I’ve also read that certain fstab mount options may improve performance, though most parameters discussed by people remain confusing to me. This is my current /etc/fstab configuration:

/dev/disk/by-id/ata-ST3400620AS_9QH0AFLD-part1 /                    ext4       acl,user_xattr        1 1
/dev/disk/by-id/ata-ST3400620AS_9QH0AFLD-part2 swap                 swap       defaults              0 0
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/disk/by-id/ata-ST32000641AS_9WM07W4C-part1 /home                ext4       defaults             1 2

Is there any way to improve HDD performance (especially for the root partition) or is everything as optimal as it gets? I’m planning on getting an SSD once I have the money, but until then I figured I’d check how the existing drives could be improved.

Hi
Are you using a SATA3 cable for the drive that supports it since I don’t see Gen3 signaling speed (6.0Gb/s) in your output?

Maybe change the scheduler (BFQ)? Partition location can have an impact, as in where the data is on the disk…

Huh… I thought SATA cables are all the same. Are SATA2 cables and SATA3 cables two different things? I don’t know which I’m using in that case and will need to check, thanks for pointing this out.

How do I change the disk scheduler rules system wide? The only way I know of is an option in KSysGuard when you right-click a process, which is per process and only applied temporarily.

Hi
Not meant to be, but depending on cable used, age of the sata2 cables etc, there could be a performance hit, not sure if Seagate implement a check to see what cable is present. In saying that would still expect to see the output indicated in my previous post.

Via grub kernel options I add the following to enable, TW is ready to go and will change automagically;


scsi_mod.use_blk_mq=1

Fair enough. The cables I’m using were bought years ago, but I never used the before so they’re as good as new. I don’t suspect they are the reason for the slowdowns.

That sounds like an interesting parameter, I shall give it a try tomorrow! How do I check that it gets enabled properly once I’ve booted? I assume it’s safe and doesn’t introduce any risk of data breakage.

Hi
You can see the scheduler in use via;


cat /sys/block/sda/queue/scheduler 
[mq-deadline] kyber bfq none

In my case it’s a SSD and it’s /dev/sda so use mq-deadline. Suggest you check before and after… No impact as the change is done after the install by adding the grub option :wink:

See: https://lwn.net/Articles/720675/

A google on “I/O scheduler” will give you all sorts of articles etc…

Awesome. Currently that file reads out:

noop deadline [cfq]

I’m seeing some articles suggest that if I echo the name of a scheduler to that file, the change will be applied in realtime with no need to restart. Will that work for testing or is the file read-only?

Also some benchmarks suggest that kyber might be even better. I should probably enable that as well and see how it compares. Does it also have a kernel parameter?

Hi
Set the bfq grub parameter and they will change to mq-deadline, kyber and bfq.

Thanks, that did it.

mircea@linux-qz0r:~> cat /sys/block/sda/queue/scheduler
mq-deadline kyber [bfq] none

I didn’t see a noticeable improvement in boot time but get the feeling that some directories open a little faster… will have to see how it goes after some use time. If the new scheduler is indeed better and has been around for 4 years, I’m confused as to why it’s not enabled by default to this day… will that change anytime soon in either the kernel or openSUSE Tumbleweed?

One last question: How do I set a specific scheduler at boot time? I can echo to /sys/block/*/queue/scheduler after logging in, but is there a grub2 or fstab parameter I can use to make the change permanent?

Hi
You need to create a udev rule, well copy the existing one and tweak to kyber when the time comes;


cat /usr/lib/udev/rules.d/60-ssd-scheduler.rules

I would change it for nv rather than sd and change the deadline to kyber, when finished your tweaks copy in to /etc/udev/rules.d directory and call it say 61-nvme-scheduler.rules

Then run the command (as root user) udevadm trigger and see if it swaps.

Upon consulting folks on the openSUSE IRC channel, I opened a ticket requesting this scheduler to be enabled by default. From my tests it works perfectly so far, and others have been using it for well over an year without any issues. Further more it’s said that the old framework is bound to be deprecated in future kernel versions, whereas other distros are using blk_mq already from what I’ve heard.

https://bugzilla.opensuse.org/show_bug.cgi?id=1102569

Hi
Oops, was thinking about the other thread, yours are sdN so the rotation would be 1 0 is for ssds… call it say 61-kyber-scheduler.rules, the number is important (61) so it runs after any other rules…

Hi
So you have seen an improvement in your tests?

It feels ever so slightly better so far: I think some directories may be opening up a bit faster than before, whereas logins are a tiny bit quicker. I’ll have to use it for a bit to say with more certainty, but I do get the general impression some operations are a little faster now.

Hi
OK, so in Tumbleweed it’s 60-io-scheduler.rules that sets it all up, copy that one and tweak as required if you want to try kyber

This is a bit more advanced so I’m good for now. I think bfq suits me well… I’m most interested in seeing how the blk_mq framework works with its current default settings, at least for the time being.

Not the cause of your HDD “stuttering”, but… first I’d use a SATA3 HDD for root/swap. It is faster, although R/W speed depends on other things besides the Sata port. Also a 400GB disk sounds old, for me an upgrade would be an option, if possible. Better yet, change it for or add a SSD for root and home, and also swap if you see frequent paging. You can mount your HDD directories under home.

Second, I’ve used a SATA card for two DVD drives, and would sometimes see partial or - most frequently - full freezes, requiring a reboot. The card appear to work normally under Windows, so possibly a driver issue. You may want to physically remove it from the box for testing.

Just my 2c