Results 1 to 4 of 4

Thread: Software Raid 1 and IO-Wait

  1. #1

    Default Software Raid 1 and IO-Wait


    I just installed 11.1 64-bit on a SATA drive. I have 2 additional 1 TB HDDs on that system which I partitioned and set up as a Raid 1 device with mdadm. The resulting /dev/md0 was formatted with ext3.

    My first attempt was to make one big partition on each disk, resulting in two partitions of around 931 GB in size. The raid setup was successful and I started to copy files to it. The first copy via SAMBA of around 1.5 GB of small files was ok. Then I copied a 100 GB file from a local file system to the Raid. After having written 4 GB of the file, the cp process went in uninterruptible sleep (D). I could not figure out why that happened.

    The second attempt was to divide the disks into two partitions of around 465 GB each and mirror them accordingly. With those smaller partitions the copy of the 100 GB file worked without problem and so far I didn't notice any problems.

    Does anyone know why this happens or what can be done to avoid the issue?

    Kind regards

  2. #2

    Default Re: Software Raid 1 and IO-Wait

    Additional info: I wanted to remove a few files from the new raid 1 setup and the rm process went into uninterruptible sleep...

    I found the following in dmesg:

    md: md0: resync done.
    md: resync of RAID array md1
    md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
    md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
    md: using 128k window, over a total of 489171080 blocks.
    RAID1 conf printout:
    --- wd:2 rd:2
    disk 0, wo:0, o:1, dev:sdb1
    disk 1, wo:0, o:1, dev:sdc1
    __ratelimit: 129 callbacks suppressed
    cat[5451] general protection ip:7fe4330d2afd sp:7fff3b2e4f80 error:0 in[7fe4330c8000+1e000]
    attempt to access beyond end of device

    sdb1: rw=1152921504606846977, want=1152921505464877136, limit=975161565
    general protection fault: 0000 [1] SMP
    last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
    CPU 0
    Modules linked in: xt_physdev ppdev parport_pc lp parport usblp joydev st ide_disk ide_cd_mod sco bridge stp bnep rfcomm l2cap bluetooth ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit snd_pcm_oss s
    nd_mixer_oss snd_seq snd_seq_device binfmt_misc af_packet ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_n
    s nf_conntrack_ipv4 nf_conntrack ip_tables cpufreq_conservative ip6table_filter cpufreq_userspace cpufreq_powersave ip6_tables powernow_k8 x_tables ipv6 fuse loop raid1 dm_mod snd_hda_intel snd_pcm
    snd_timer snd_page_alloc snd_hwdep snd rtc_cmos soundcore rtc_core sr_mod button serio_raw r8169 i2c_piix4 wmi rtc_lib mii pcspkr k8temp(N) cdrom fglrx(PX) i2c_core sg usbhid hid ff_memless ehci_h
    cd ohci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic atiixp ide_core ata_generic pata_atiixp ahci libata scsi_mod dock thermal processor thermal_sys hwmon
    Supported: No
    Pid: 5478, comm: md0_raid1 Tainted: P #1
    RIP: 0010:[<ffffffffa04dabad>] [<ffffffffa04dabad>] raid1_end_write_request+0x1b/0x222 [raid1]
    RSP: 0018:ffff8802048e9d20 EFLAGS: 00010282
    RAX: ffffffffa04dab92 RBX: efff8800b6043c40 RCX: ffff8802048e9c40
    RDX: 0000000000008631 RSI: 00000000fffffffb RDI: ffff8800b6040f40
    RBP: 0000000000000000 R08: 0000000000000014 R09: 0000000000000000
    R10: 000000000000000a R11: ffffffffa04dab92 R12: 1000000000000001
    R13: ffff8800b6040f40 R14: ffff8802048e9ed8 R15: 0000000000000008
    FS: 00007f0c71279710(0000) GS:ffffffff80a40080(0000) knlGS:00000000b7289970
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00007ff100c72028 CR3: 000000017519d000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process md0_raid1 (pid: 5478, threadinfo ffff8802048e8000, task ffff88022a0f8340)
    Stack: 0000000000000001 000000003a1fc8dd ffff8800b6040f40 1000000000000001
    1000000033248050 ffff8802048e9ed8 0000000000000008 ffffffff80339a84
    ffff880229468280 0000000000000000 ffffe20008244700 0000000000000008
    Call Trace:
    [<ffffffff80339a84>] generic_make_request+0x127/0x3dc
    [<ffffffffa04d9c7e>] flush_pending_writes+0x63/0x7e [raid1]
    [<ffffffffa04da727>] raid1d+0x42/0x384 [raid1]
    [<ffffffff80401248>] md_thread+0xe5/0x103
    [<ffffffff8024f9e7>] kthread+0x47/0x73
    [<ffffffff8020cf79>] child_rip+0xa/0x11

    Code: 00 00 20 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e c3 41 57 41 56 41 55 49 89 fd 41 54 55 31 ed 53 48 83 ec 08 48 8b 5f 60 4c 8b 47 18 <48> 8b 7b 18 4c 8d 73 18 48 89 da 48 89 f8 48 c1 e8 03 83 e0
    RIP [<ffffffffa04dabad>] raid1_end_write_request+0x1b/0x222 [raid1]
    RSP <ffff8802048e9d20>
    ---[ end trace a42997f35e89f153 ]---

    This looks like the partitions are not of exactly the same size. I set them up during install and I gave exactly the same sizes. As the disks are exactly the same model I didnt really expect problems.

  3. #3

    Default Re: Software Raid 1 and IO-Wait

    Sorry for adding a third reply...
    I examined /proc/partitions and the output shows exactly the same partition sizes.

       8    16  976762584 sdb
       8    17  487580782 sdb1
       8    18  489171217 sdb2
       8    32  976762584 sdc
       8    33  487580782 sdc1
       8    34  489171217 sdc2
       9     0  487580644 md0
       9     1  489171080 md1
    The resync was still running during the uninterruptible sleep occurrences. Sadly that status remained after the resyncs were done.

    For completeness the output of /proc/mdstat:
    Personalities : [raid1] 
    md1 : active raid1 sdc2[1] sdb2[0]
          489171080 blocks super 1.0 [2/2] [UU]
          bitmap: 0/467 pages [0KB], 512KB chunk
    md0 : active raid1 sdc1[1] sdb1[0]
          487580644 blocks super 1.0 [2/2] [UU]
          bitmap: 25/465 pages [100KB], 512KB chunk
    unused devices: <none>

  4. #4

    Default Re: Software Raid 1 and IO-Wait

    I noticed a kjournald in uninterruptible sleep. Maybe this is connected to the following kernel bug:

    Can anyone confirm this?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts