Results 1 to 10 of 10

Thread: Leap 15 install on RAID hangs

  1. #1

    Default Leap 15 install on RAID hangs

    Hello!
    It has happened multiple time now (maybe I should say: every time since Leap 15): I start installation, create 500MB RAID1 with EXT4 for /boot, 60GB RAID10 with XFS for / (with layout o2, chunk size 1M), select "Server" system and add some packages (about 600MB download, 2.5GB installed size) and let it go.
    After some time (2-5 minutes), not always the same, install stalls (well, halts, never to continue), RAID10 rebuild also stalls and never moves further.

    On 4th virtual console, there are messages about "task md1_resync blocked for more than 480 seconds" and various "kworker/xxx blocked for more than 480 seconds". Disks are all new, checked - without bad sectors, machine is new Dell PowerEdge T30 (also happened with random old PC with two good disks, also happened with SSD's).

    After reboot, if I let the RAID finish sync, I can complete install using "current partitions" and only mkfs-ing them again. If I create RAIDs from scratch, same problem. Like there is some race condition between writing to RAID while it is being sync-ed for the first time (should I say that that is officially supported?).

    I have been doing this for years the same way, and this kind of problem never happened before.

    Has anybody seen such behavior?

    Best regards,
    Sinisa

  2. #2
    Join Date
    Jun 2008
    Location
    San Diego, Ca, USA
    Posts
    10,395
    Blog Entries
    1

    Default Re: Leap 15 install on RAID hangs

    You might consider creating your RAID 10 array <after> you've completed your installation...

    To my eye, this SLES documentation is completely applicable to openSUSE (including openSUSE 15)

    https://www.suse.com/documentation/s...tml#cha.raid10

    TSU
    Beginner Wiki Quickstart - https://en.opensuse.org/User:Tsu2/Quickstart_Wiki
    Solved a problem recently? Create a wiki page for future personal reference!
    Learn something new?
    Attended a computing event?
    Post and Share!

  3. #3

    Default Re: Leap 15 install on RAID hangs

    Quote Originally Posted by tsu2 View Post
    You might consider creating your RAID 10 array <after> you've completed your installation...
    Not really a solution, that way I cannot have rootfs on RAID10 (and I really want that). I am already doing that for other filesystems (/home, /data,...)

    It seems to me there happens some race condition between RAID10 sync and (XFS?) writes to the same RAID volume, but I am not enough developer to pinpoint the problem.

    My workaround for now is to create md's for /boot and / in advance, then start install...

    I'm setting up a test machine to test with different filesystems (EXT4, Btrfs) and RAID layouts, will get back with results.

    Sinisa

  4. #4

    Default Re: Leap 15 install on RAID hangs

    Quote Originally Posted by siny View Post
    Not really a solution, that way I cannot have rootfs on RAID10 (and I really want that). I am already doing that for other filesystems (/home, /data,...)

    It seems to me there happens some race condition between RAID10 sync and (XFS?) writes to the same RAID volume, but I am not enough developer to pinpoint the problem.

    My workaround for now is to create md's for /boot and / in advance, then start install...

    I'm setting up a test machine to test with different filesystems (EXT4, Btrfs) and RAID layouts, will get back with results.

    Sinisa
    So it happened again: new machine: AMD Ryzen, 32GB RAM, 2xWD RED 2TB (I know, not a "server" disk, but good enough for testing).
    Started "Net" install, created 500MB RAID1 with EXT4 for /boot and 40GB RAID10 (with parity o2) with XFS for /. After installation was at 9% it stopped. Switched to vc2, /proc/mdstat says sync is at 21.8% and not moving any further...

    15 minutes later, dmesg says:
    [ 1463.260482] INFO: task kworker/2:1:55 blocked for more than 480 seconds.
    [ 1463.260485] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.260486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.260488] kworker/2:1 D 0 55 2 0x00000000
    [ 1463.260534] Workqueue: xfs-eofblocks/md1 xfs_eofblocks_worker [xfs]
    [ 1463.260536] Call Trace:
    [ 1463.260546] ? __schedule+0x23f/0x870
    [ 1463.260549] schedule+0x28/0x80
    [ 1463.260552] rwsem_down_write_failed+0x153/0x320
    [ 1463.260594] ? xlog_grant_head_check+0x42/0xd0 [xfs]
    [ 1463.260599] ? call_rwsem_down_write_failed+0x13/0x20
    [ 1463.260601] call_rwsem_down_write_failed+0x13/0x20
    [ 1463.260605] down_write+0x20/0x30
    [ 1463.260641] xfs_free_eofblocks+0x11a/0x1c0 [xfs]
    [ 1463.260678] xfs_inode_free_eofblocks+0x179/0x1b0 [xfs]
    [ 1463.260713] ? xfs_inode_ag_walk_grab+0x5f/0x90 [xfs]
    [ 1463.260744] xfs_inode_ag_walk.isra.14+0x191/0x420 [xfs]
    [ 1463.260776] ? __xfs_inode_clear_eofblocks_tag+0x120/0x120 [xfs]
    [ 1463.260781] ? load_balance+0x13c/0x920
    [ 1463.260785] ? sched_clock+0x5/0x10
    [ 1463.260816] ? __xfs_inode_clear_eofblocks_tag+0x120/0x120 [xfs]
    [ 1463.260819] ? radix_tree_gang_lookup_tag+0xc4/0x130
    [ 1463.260849] ? __xfs_inode_clear_eofblocks_tag+0x120/0x120 [xfs]
    [ 1463.260879] xfs_inode_ag_iterator_tag+0x73/0xb0 [xfs]
    [ 1463.260910] xfs_eofblocks_worker+0x29/0x40 [xfs]
    [ 1463.260915] process_one_work+0x1da/0x3f0
    [ 1463.260919] worker_thread+0x2b/0x3f0
    [ 1463.260922] ? process_one_work+0x3f0/0x3f0
    [ 1463.260925] kthread+0x11a/0x130
    [ 1463.260928] ? kthread_create_on_node+0x40/0x40
    [ 1463.260930] ret_from_fork+0x22/0x40
    [ 1463.260934] INFO: task kworker/0:2:118 blocked for more than 480 seconds.
    [ 1463.260936] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.260936] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.260937] kworker/0:2 D 0 118 2 0x00000000
    [ 1463.260947] Workqueue: md md_submit_flush_data [md_mod]
    [ 1463.260948] Call Trace:
    [ 1463.260952] ? __schedule+0x23f/0x870
    [ 1463.260955] schedule+0x28/0x80
    [ 1463.260959] wait_barrier+0x11c/0x170 [raid10]
    [ 1463.260963] ? wait_woken+0x80/0x80
    [ 1463.260966] raid10_write_request+0x178/0x910 [raid10]
    [ 1463.260969] ? wait_woken+0x80/0x80
    [ 1463.260973] ? mempool_alloc+0x55/0x160
    [ 1463.260976] raid10_make_request+0xbc/0x130 [raid10]
    [ 1463.260979] ? wait_woken+0x80/0x80
    [ 1463.260985] md_make_request+0x93/0x230 [md_mod]
    [ 1463.260990] generic_make_request+0x101/0x2e0
    [ 1463.260994] ? raid10_write_request+0x6cc/0x910 [raid10]
    [ 1463.260997] raid10_write_request+0x6cc/0x910 [raid10]
    [ 1463.260999] ? wait_woken+0x80/0x80
    [ 1463.261002] ? mempool_alloc+0x55/0x160
    [ 1463.261004] ? sched_clock+0x5/0x10
    [ 1463.261007] ? sched_clock_cpu+0xc/0xb0
    [ 1463.261010] ? pick_next_task_fair+0x494/0x530
    [ 1463.261013] raid10_make_request+0xbc/0x130 [raid10]
    [ 1463.261019] md_submit_flush_data+0x36/0x70 [md_mod]
    [ 1463.261022] process_one_work+0x1da/0x3f0
    [ 1463.261026] worker_thread+0x2b/0x3f0
    [ 1463.261029] ? process_one_work+0x3f0/0x3f0
    [ 1463.261031] kthread+0x11a/0x130
    [ 1463.261034] ? kthread_create_on_node+0x40/0x40
    [ 1463.261036] ret_from_fork+0x22/0x40
    [ 1463.261062] INFO: task kworker/u32:0:5018 blocked for more than 480 seconds.
    [ 1463.261063] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.261064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.261065] kworker/u32:0 D 0 5018 2 0x00000000
    [ 1463.261072] Workqueue: writeback wb_workfn (flush-9:1)
    [ 1463.261074] Call Trace:
    [ 1463.261077] ? __schedule+0x23f/0x870
    [ 1463.261080] schedule+0x28/0x80
    [ 1463.261083] wait_barrier+0x11c/0x170 [raid10]
    [ 1463.261086] ? wait_woken+0x80/0x80
    [ 1463.261089] raid10_write_request+0x178/0x910 [raid10]
    [ 1463.261091] ? wait_woken+0x80/0x80
    [ 1463.261094] ? mempool_alloc+0x55/0x160
    [ 1463.261097] raid10_make_request+0xbc/0x130 [raid10]
    [ 1463.261099] ? wait_woken+0x80/0x80
    [ 1463.261105] md_make_request+0x93/0x230 [md_mod]
    [ 1463.261109] ? pagevec_lookup_tag+0x1d/0x30
    [ 1463.261111] ? write_cache_pages+0xdf/0x430
    [ 1463.261113] generic_make_request+0x101/0x2e0
    [ 1463.261116] ? submit_bio+0x6c/0x140
    [ 1463.261118] submit_bio+0x6c/0x140
    [ 1463.261152] xfs_submit_ioend+0x70/0x1a0 [xfs]
    [ 1463.261186] xfs_vm_writepages+0xaa/0xc0 [xfs]
    [ 1463.261189] do_writepages+0x3c/0xd0
    [ 1463.261195] ? ata_scsi_security_inout_xlat+0x140/0x140
    [ 1463.261198] ? ata_scsi_translate+0xce/0x1a0
    [ 1463.261200] ? __writeback_single_inode+0x3d/0x320
    [ 1463.261202] __writeback_single_inode+0x3d/0x320
    [ 1463.261205] ? fprop_reflect_period_percpu.isra.5+0x70/0xb0
    [ 1463.261208] writeback_sb_inodes+0x18a/0x430
    [ 1463.261211] __writeback_inodes_wb+0x5d/0xb0
    [ 1463.261214] wb_writeback+0x243/0x2d0
    [ 1463.261217] ? wb_workfn+0x16d/0x3f0
    [ 1463.261219] wb_workfn+0x16d/0x3f0
    [ 1463.261222] process_one_work+0x1da/0x3f0
    [ 1463.261226] worker_thread+0x2b/0x3f0
    [ 1463.261229] ? process_one_work+0x3f0/0x3f0
    [ 1463.261231] kthread+0x11a/0x130
    [ 1463.261233] ? kthread_create_on_node+0x40/0x40
    [ 1463.261237] ? do_syscall_64+0x7b/0x140
    [ 1463.261240] ? SyS_exit_group+0x10/0x10
    [ 1463.261242] ret_from_fork+0x22/0x40
    [ 1463.261245] INFO: task md1_resync:5145 blocked for more than 480 seconds.
    [ 1463.261247] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.261247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.261248] md1_resync D 0 5145 2 0x00000000
    [ 1463.261251] Call Trace:
    [ 1463.261254] ? __schedule+0x23f/0x870
    [ 1463.261256] ? wait_woken+0x80/0x80
    [ 1463.261258] schedule+0x28/0x80
    [ 1463.261262] raise_barrier+0x83/0x160 [raid10]
    [ 1463.261264] ? wait_woken+0x80/0x80
    [ 1463.261268] raid10_sync_request+0x1ea/0x1d50 [raid10]
    [ 1463.261275] ? is_mddev_idle+0xc9/0x109 [md_mod]
    [ 1463.261282] ? is_mddev_idle+0xa4/0x109 [md_mod]
    [ 1463.261288] md_do_sync+0x882/0xe90 [md_mod]
    [ 1463.261292] ? cpumask_next_and+0x26/0x40
    [ 1463.261294] ? wait_woken+0x80/0x80
    [ 1463.261301] ? find_pers+0x70/0x70 [md_mod]
    [ 1463.261306] ? md_thread+0x10d/0x140 [md_mod]
    [ 1463.261312] md_thread+0x10d/0x140 [md_mod]
    [ 1463.261315] kthread+0x11a/0x130
    [ 1463.261317] ? kthread_create_on_node+0x40/0x40
    [ 1463.261320] ? do_syscall_64+0x7b/0x140
    [ 1463.261323] ? SyS_exit_group+0x10/0x10
    [ 1463.261325] ret_from_fork+0x22/0x40
    [ 1463.261331] INFO: task xfsaild/md1:5167 blocked for more than 480 seconds.
    [ 1463.261332] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.261332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.261333] xfsaild/md1 D 0 5167 2 0x00000000
    [ 1463.261335] Call Trace:
    [ 1463.261339] ? __schedule+0x23f/0x870
    [ 1463.261341] schedule+0x28/0x80
    [ 1463.261345] wait_barrier+0x11c/0x170 [raid10]
    [ 1463.261347] ? wait_woken+0x80/0x80
    [ 1463.261350] raid10_write_request+0x178/0x910 [raid10]
    [ 1463.261352] ? wait_woken+0x80/0x80
    [ 1463.261355] ? mempool_alloc+0x55/0x160
    [ 1463.261358] raid10_make_request+0xbc/0x130 [raid10]
    [ 1463.261360] ? wait_woken+0x80/0x80
    [ 1463.261366] md_make_request+0x93/0x230 [md_mod]
    [ 1463.261371] ? crc32c_pcl_intel_update+0x93/0xa0 [crc32c_intel]
    [ 1463.261374] generic_make_request+0x101/0x2e0
    [ 1463.261377] ? submit_bio+0x6c/0x140
    [ 1463.261378] submit_bio+0x6c/0x140
    [ 1463.261412] _xfs_buf_ioapply+0x2fa/0x4a0 [xfs]
    [ 1463.261445] ? xfs_buf_delwri_submit_buffers+0xe8/0x260 [xfs]
    [ 1463.261476] ? xfs_buf_submit+0x61/0x210 [xfs]
    [ 1463.261506] xfs_buf_submit+0x61/0x210 [xfs]
    [ 1463.261537] xfs_buf_delwri_submit_buffers+0xe8/0x260 [xfs]
    [ 1463.261579] ? xfsaild+0x343/0x710 [xfs]
    [ 1463.261619] ? xfsaild+0x343/0x710 [xfs]
    [ 1463.261656] xfsaild+0x343/0x710 [xfs]
    [ 1463.261693] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
    [ 1463.261696] ? kthread+0x11a/0x130
    [ 1463.261698] kthread+0x11a/0x130
    [ 1463.261700] ? kthread_create_on_node+0x40/0x40
    [ 1463.261703] ? do_syscall_64+0x7b/0x140
    [ 1463.261705] ? SyS_exit_group+0x10/0x10
    [ 1463.261707] ret_from_fork+0x22/0x40
    [ 1463.261712] INFO: task rpm:5337 blocked for more than 480 seconds.
    [ 1463.261713] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.261713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.261714] rpm D 0 5337 3865 0x00000000
    [ 1463.261716] Call Trace:
    [ 1463.261720] ? __schedule+0x23f/0x870
    [ 1463.261723] schedule+0x28/0x80
    [ 1463.261758] _xfs_log_force_lsn+0x1d5/0x310 [xfs]
    [ 1463.261761] ? file_check_and_advance_wb_err+0x2c/0xc0
    [ 1463.261764] ? wake_up_q+0x70/0x70
    [ 1463.261793] xfs_file_fsync+0xda/0x1a0 [xfs]
    [ 1463.261796] do_fsync+0x38/0x60
    [ 1463.261799] SyS_fdatasync+0xf/0x20
    [ 1463.261801] do_syscall_64+0x7b/0x140
    [ 1463.261804] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 1463.261807] RIP: 0033:0x7fb6ebfdf1a4
    [ 1463.261808] RSP: 002b:00007ffd2d96bd98 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
    [ 1463.261810] RAX: ffffffffffffffda RBX: 0000000000bf4350 RCX: 00007fb6ebfdf1a4
    [ 1463.261812] RDX: 0000000000be43a0 RSI: 0000000000bf4350 RDI: 0000000000000004
    [ 1463.261813] RBP: 0000000000b5d170 R08: 0000000000c853b0 R09: 00007fb6ecd07b60
    [ 1463.261814] R10: 0000000000c83758 R11: 0000000000000246 R12: 0000000000000000
    [ 1463.261815] R13: 0000000000000064 R14: 0000000000010830 R15: 0000000000c83708

  5. #5

    Default Re: Leap 15 install on RAID hangs

    Quote Originally Posted by siny View Post
    So it happened again: new machine: AMD Ryzen, 32GB RAM, 2xWD RED 2TB (I know, not a "server" disk, but good enough for testing).
    Started "Net" install, created 500MB RAID1 with EXT4 for /boot and 40GB RAID10 (with parity o2) with XFS for /. After installation was at 9% it stopped. Switched to vc2, /proc/mdstat says sync is at 21.8% and not moving any further...

    15 minutes later, dmesg says:
    [ 1463.260482] INFO: task kworker/2:1:55 blocked for more than 480 seconds.
    [ 1463.260485] Not tainted 4.12.14-lp150.11-default #1
    [ 1463.260486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    ...
    Just did a fresh LEAP 42.3 install the same way: same PC, Net install, created the same partitions (deleted everything first), same RAID config, and it passed just as expected.

    Will try 15.0 again a bit later...

  6. #6

    Default Re: Leap 15 install on RAID hangs

    Just tested with latest LEAP 15.1 Alpha: Dell PowerEdge T30, 2 1TB HDDs in AHCI mode.

    Run Install from NET CD, created two partitions, first 500MB, second 60GB on both disks, created raid1 mirror over 500MB partitions for /boot, and raid10 mirror over 60GB partitions for / (with 1MB stripe size and o2 layout). Continued to Server selection and started install.

    Everything was working OK until 22%, when it stopped. /proc/mdstat says resync is at 32.7%. Waited 15 minutes, then rebooted...


    Tried LEAP 15.0: deleted all partitions, created all the same from the beginning, started install which stopped at 18% (raid resync at 23.5%)

    Back to 42.3: deleted all partitions, created all the same from the beginning, started install - finished without problems.


    I'd say there is definitely something wrong with LEAP 15+ ...



    Best regards,
    Sinisa

  7. #7

    Default Re: Leap 15 install on RAID hangs

    Hello everyone,

    I feel like I'm talking to myself, but here it is again:

    new setup with two 120GB SSDs: Net install of LEAP 15.0, server selection, 500MB /boot with RAID1 and 110GB / with RAID10, o2 layout, XFS

    Everything was going smooth until 60%, when it stopped. Back at VT-2, cat /proc/mdstat says resync is at 89.9% and never moving further.

    There were no strange messages in dmesg, not in other VTs.



    Next, I tried everything the same, only paused package instalation by clicking Abort, then waiting until initial sync is over, then clicking "Continue installation", and everything went OK.


    Now, 15.1 being in alpha stage and having the same issue, I'd like to see this fixed before release, since I don't think anything can be done for 15.0

    Best regards,
    Sinisa

  8. #8
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    15,446

    Default Re: Leap 15 install on RAID hangs

    If you want things fixed report on bugzilla all here are just users

  9. #9

    Default Re: Leap 15 install on RAID hangs

    Quote Originally Posted by gogalthorp View Post
    If you want things fixed report on bugzilla all here are just users
    Well, the title says :


    so I thought that I might get some Technical Help here...

    Will try bugzilla.

  10. #10
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    15,446

    Default Re: Leap 15 install on RAID hangs

    We help if we can but in general we do not fix bugs

    Complex RAID environments can be tricky in any case you really need to know the ins and outs.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •