Hi.
v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have
blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
result. Soft lockup happens showing stacktraces I'm pasting below.
Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits for
"all superblock writes that were scheduled to complete". Since there is
"scheduled" word, should I also try "none" scheduler with blk-mq enabled?
While I'm trying to reproduce it on a VM without much luck (it happens on my
laptop rarely, like 1 out of 10 suspend-resume cycles), and also re-checking
it with blk-mq disabled, by any chance is this something already known?
Ideally, I'd like to reprduce it in a VM and capture vmcore.
Any suggestions are welcome. Thanks.
===
[ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120 seconds.
[ 9460.165983] Not tainted 4.12.0-pf7 #1
[ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.165996] md0_raid10 D 0 225 2 0x00000000
[ 9460.166005] Call Trace:
[ 9460.166027] __schedule+0x6e7/0xcd0
[ 9460.166043] schedule+0x3d/0xd0
[ 9460.166067] md_super_wait+0x6e/0xa0 [md_mod]
[ 9460.166130] ? wake_bit_function+0x60/0x60
[ 9460.166143] write_page+0x185/0x310 [md_mod]
[ 9460.166156] bitmap_update_sb+0x12f/0x140 [md_mod]
[ 9460.166167] md_update_sb.part.60+0x354/0x840 [md_mod]
[ 9460.166179] ? percpu_ref_switch_to_percpu+0x36/0x40
[ 9460.166190] md_check_recovery+0x214/0x4b0 [md_mod]
[ 9460.166201] raid10d+0x62/0x13c0 [raid10]
[ 9460.166213] ? lock_timer_base+0x81/0xa0
[ 9460.166222] ? try_to_del_timer_sync+0x53/0x80
[ 9460.166232] ? del_timer_sync+0x39/0x40
[ 9460.166246] ? schedule_timeout+0x19b/0x330
[ 9460.166261] ? call_timer_fn+0x160/0x160
[ 9460.166279] md_thread+0x120/0x160 [md_mod]
[ 9460.166290] ? md_thread+0x120/0x160 [md_mod]
[ 9460.166298] ? wake_bit_function+0x60/0x60
[ 9460.166309] kthread+0x124/0x140
[ 9460.166319] ? find_pers+0x70/0x70 [md_mod]
[ 9460.166353] ? kthread_create_on_node+0x70/0x70
[ 9460.166360] ret_from_fork+0x25/0x30
[ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120 seconds.
[ 9460.166376] Not tainted 4.12.0-pf7 #1
[ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.166386] dmcrypt_write D 0 254 2 0x00000000
[ 9460.166492] Call Trace:
[ 9460.166501] __schedule+0x6e7/0xcd0
[ 9460.166511] schedule+0x3d/0xd0
[ 9460.166522] md_write_start+0xe3/0x270 [md_mod]
[ 9460.166529] ? wake_bit_function+0x60/0x60
[ 9460.166538] raid10_make_request+0x3f/0x140 [raid10]
[ 9460.166549] md_make_request+0xa9/0x2a0 [md_mod]
[ 9460.166557] ? __schedule+0x6ef/0xcd0
[ 9460.166567] generic_make_request+0x11e/0x2f0
[ 9460.166578] dmcrypt_write+0x22d/0x250 [dm_crypt]
[ 9460.166586] ? dmcrypt_write+0x22d/0x250 [dm_crypt]
[ 9460.166596] ? wake_up_q+0x80/0x80
[ 9460.166605] kthread+0x124/0x140
[ 9460.166613] ? kthread+0x124/0x140
[ 9460.166621] ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
[ 9460.166628] ? kthread_create_on_node+0x70/0x70
[ 9460.166635] ret_from_fork+0x25/0x30
[ 9460.166677] INFO: task konversation:7110 blocked for more than 120 seconds.
[ 9460.166683] Not tainted 4.12.0-pf7 #1
[ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.166693] konversation D 0 7110 6972 0x00000000
[ 9460.166700] Call Trace:
[ 9460.166709] __schedule+0x6e7/0xcd0
[ 9460.166719] schedule+0x3d/0xd0
[ 9460.166824] io_schedule+0x16/0x40
[ 9460.166833] wait_on_page_bit+0xeb/0x130
[ 9460.166841] ? page_cache_tree_insert+0xc0/0xc0
[ 9460.166882] prepare_pages+0x1b5/0x1d0 [btrfs]
[ 9460.166915] __btrfs_buffered_write+0x267/0x6d0 [btrfs]
[ 9460.166948] btrfs_file_write_iter+0x1c1/0x500 [btrfs]
[ 9460.166956] ? cp_new_stat+0x152/0x180
[ 9460.166966] __vfs_write+0xe4/0x140
[ 9460.166975] vfs_write+0xb1/0x1a0
[ 9460.166984] SyS_write+0x55/0xc0
[ 9460.166994] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.167001] RIP: 0033:0x7fd020f09c0d
[ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
00007fd020f09c0d
[ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
000000000000002b
[ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
0000000000000030
[ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
0000000000000000
[ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
0000000000000001
[ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
seconds.
[ 9460.167080] Not tainted 4.12.0-pf7 #1
[ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.167089] akonadi_imap_re D 0 7363 7107 0x00000000
[ 9460.167096] Call Trace:
[ 9460.167105] __schedule+0x6e7/0xcd0
[ 9460.167114] schedule+0x3d/0xd0
[ 9460.167149] wait_for_writer+0xb1/0xe0 [btrfs]
[ 9460.167157] ? wake_bit_function+0x60/0x60
[ 9460.167190] btrfs_sync_log+0xe2/0xad0 [btrfs]
[ 9460.167201] ? dput.part.24+0x51/0x1d0
[ 9460.167210] ? dput+0x13/0x20
[ 9460.167242] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
[ 9460.167275] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167302] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167313] vfs_fsync_range+0x4b/0xb0
[ 9460.167320] do_fsync+0x3d/0x70
[ 9460.167327] SyS_fdatasync+0x13/0x20
[ 9460.167336] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.167342] RIP: 0033:0x7fcdf0d7522d
[ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
00007fcdf0d7522d
[ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
0000000000000010
[ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
000000000000002f
[ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
0000007b8c7b88f0
[ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
0000007b8c7b66b0
[ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
seconds.
[ 9460.167492] Not tainted 4.12.0-pf7 #1
[ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.167501] akonadi_maildis D 0 7366 7107 0x00000000
[ 9460.167507] Call Trace:
[ 9460.167516] __schedule+0x6e7/0xcd0
[ 9460.167525] schedule+0x3d/0xd0
[ 9460.167557] wait_log_commit+0xd6/0x100 [btrfs]
[ 9460.167564] ? wake_bit_function+0x60/0x60
[ 9460.167593] btrfs_sync_log+0x13e/0xad0 [btrfs]
[ 9460.167603] ? dput.part.24+0x51/0x1d0
[ 9460.167611] ? dput+0x13/0x20
[ 9460.167640] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
[ 9460.167670] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167696] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167705] vfs_fsync_range+0x4b/0xb0
[ 9460.167712] do_fsync+0x3d/0x70
[ 9460.167718] SyS_fdatasync+0x13/0x20
[ 9460.167727] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.167733] RIP: 0033:0x7ff91da0822d
[ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
00007ff91da0822d
[ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
000000000000000f
[ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
000000000000002f
[ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
000000841e910cc0
[ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
000000841e9165c0
[ 9460.167786] INFO: task chromium:7600 blocked for more than 120 seconds.
[ 9460.167791] Not tainted 4.12.0-pf7 #1
[ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.167800] chromium D 0 7600 7020 0x00000000
[ 9460.167807] Call Trace:
[ 9460.167815] __schedule+0x6e7/0xcd0
[ 9460.167824] schedule+0x3d/0xd0
[ 9460.167831] io_schedule+0x16/0x40
[ 9460.167838] wait_on_page_bit+0xeb/0x130
[ 9460.167847] ? page_cache_tree_insert+0xc0/0xc0
[ 9460.167879] btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
[ 9460.167889] do_page_mkwrite+0x31/0x90
[ 9460.167896] ? current_time+0x38/0x70
[ 9460.167905] do_wp_page+0x200/0x4b0
[ 9460.167911] ? file_update_time+0x60/0x110
[ 9460.167920] __handle_mm_fault+0x832/0xd50
[ 9460.167930] handle_mm_fault+0xde/0x220
[ 9460.167938] __do_page_fault+0x24d/0x510
[ 9460.167946] do_page_fault+0x22/0x30
[ 9460.167954] page_fault+0x28/0x30
[ 9460.167959] RIP: 0033:0x64de23cd9c
[ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
[ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
0000398ce8da6510
[ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
0000398cef048700
[ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
0000000000000000
[ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
0000000000000001
[ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
0000000000000000
[ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
seconds.
[ 9460.168019] Not tainted 4.12.0-pf7 #1
[ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.168028] BrowserBlocking D 0 7639 7020 0x00000000
[ 9460.168035] Call Trace:
[ 9460.168044] __schedule+0x6e7/0xcd0
[ 9460.168054] schedule+0x3d/0xd0
[ 9460.168061] io_schedule+0x16/0x40
[ 9460.168068] wait_on_page_bit_common+0xe3/0x180
[ 9460.168076] ? page_cache_tree_insert+0xc0/0xc0
[ 9460.168083] __filemap_fdatawait_range+0x12a/0x1a0
[ 9460.168093] filemap_fdatawait_range+0x14/0x30
[ 9460.168128] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
[ 9460.168157] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[ 9460.168186] btrfs_sync_file+0x17c/0x3f0 [btrfs]
[ 9460.168195] vfs_fsync_range+0x4b/0xb0
[ 9460.168202] do_fsync+0x3d/0x70
[ 9460.168209] SyS_fdatasync+0x13/0x20
[ 9460.168218] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.168223] RIP: 0033:0x7f80b8fcd22d
[ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
00007f80b8fcd22d
[ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
0000000000000119
[ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
0000398ce88693c0
[ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
00007f808c8b77d8
[ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
0000398cef877800
[ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
seconds.
[ 9460.168272] Not tainted 4.12.0-pf7 #1
[ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.168282] Chrome_SyncThre D 0 7867 7020 0x00000000
[ 9460.168289] Call Trace:
[ 9460.168297] __schedule+0x6e7/0xcd0
[ 9460.168306] schedule+0x3d/0xd0
[ 9460.168338] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0 [btrfs]
[ 9460.168348] ? wake_bit_function+0x60/0x60
[ 9460.168377] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
[ 9460.168412] ? release_extent_buffer+0x98/0xb0 [btrfs]
[ 9460.168444] btrfs_log_inode+0x97e/0x1140 [btrfs]
[ 9460.168668] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
[ 9460.168676] ? touch_atime+0xc1/0xd0
[ 9460.168686] ? refcount_inc+0x9/0x30
[ 9460.168719] ? join_transaction+0x122/0x450 [btrfs]
[ 9460.168750] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[ 9460.168779] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
[ 9460.168787] vfs_fsync_range+0x4b/0xb0
[ 9460.168794] do_fsync+0x3d/0x70
[ 9460.168801] SyS_fdatasync+0x13/0x20
[ 9460.168810] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.168815] RIP: 0033:0x7f80b8fcd22d
[ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
00007f80b8fcd22d
[ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
00000000000000fb
[ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
0000000000000001
[ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
00000000004b0000
[ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
0000398ceb9a1048
[ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
seconds.
[ 9460.168858] Not tainted 4.12.0-pf7 #1
[ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.168867] BrowserBlocking D 0 8011 7020 0x00000000
[ 9460.168873] Call Trace:
[ 9460.168882] __schedule+0x6e7/0xcd0
[ 9460.168892] schedule+0x3d/0xd0
[ 9460.168898] io_schedule+0x16/0x40
[ 9460.168905] wait_on_page_bit_common+0xe3/0x180
[ 9460.168912] ? page_cache_tree_insert+0xc0/0xc0
[ 9460.168919] __filemap_fdatawait_range+0x12a/0x1a0
[ 9460.168929] filemap_fdatawait_range+0x14/0x30
[ 9460.168961] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
[ 9460.168989] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[ 9460.169015] btrfs_sync_file+0x17c/0x3f0 [btrfs]
[ 9460.169025] vfs_fsync_range+0x4b/0xb0
[ 9460.169033] do_fsync+0x3d/0x70
[ 9460.169040] SyS_fdatasync+0x13/0x20
[ 9460.169050] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.169055] RIP: 0033:0x7f80b8fcd22d
[ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
00007f80b8fcd22d
[ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
000000000000011d
[ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
0000398ce8870200
[ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
00007f807d5f17d8
[ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
0000398cece75000
[ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
seconds.
[ 9460.169103] Not tainted 4.12.0-pf7 #1
[ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 9460.169113] TaskSchedulerBa D 0 9603 7020 0x00000000
[ 9460.169120] Call Trace:
[ 9460.169128] __schedule+0x6e7/0xcd0
[ 9460.169138] schedule+0x3d/0xd0
[ 9460.169166] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0 [btrfs]
[ 9460.169175] ? wake_bit_function+0x60/0x60
[ 9460.169204] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
[ 9460.169237] ? release_extent_buffer+0x98/0xb0 [btrfs]
[ 9460.169265] btrfs_log_inode+0x97e/0x1140 [btrfs]
[ 9460.169278] ? __set_page_dirty_nobuffers+0x117/0x150
[ 9460.169310] ? btree_set_page_dirty+0xe/0x10 [btrfs]
[ 9460.169346] ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
[ 9460.169378] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
[ 9460.169386] ? touch_atime+0xc1/0xd0
[ 9460.169395] ? refcount_inc+0x9/0x30
[ 9460.169427] ? join_transaction+0x122/0x450 [btrfs]
[ 9460.169457] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[ 9460.169490] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
[ 9460.169498] vfs_fsync_range+0x4b/0xb0
[ 9460.169505] do_fsync+0x3d/0x70
[ 9460.169512] SyS_fdatasync+0x13/0x20
[ 9460.169522] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.169527] RIP: 0033:0x7f80b8fcd22d
[ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
000000000000004b
[ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
00007f80b8fcd22d
[ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
0000000000000117
[ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
0000000000000001
[ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
0000000000000001
[ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
0000398ce9b50d80
===
Hi.
I've re-checked this issue with 4.12.9, and it is still there.
Also, I've managed to reproduce it in a VM with non-virtio disks (just -hda/-
hdb pair in QEMU).
I'm not able to reproduce it with blk_mq disabled. Also, if blk_mq is enabled,
scheduler does not make any difference, even setting "none" triggers hang on
resume.
Next, will check for possible change between v4.12.7 and v4.12.8 (I don't
remember this issue to be triggered on v4.12.7, or maybe I just haven't
noticed it), and also will recompile kernel with debug info to capture proper
vmcore.
On ?ter? 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> Hi.
>
> v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have
> blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> result. Soft lockup happens showing stacktraces I'm pasting below.
>
> Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits
> for "all superblock writes that were scheduled to complete". Since there is
> "scheduled" word, should I also try "none" scheduler with blk-mq enabled?
>
> While I'm trying to reproduce it on a VM without much luck (it happens on my
> laptop rarely, like 1 out of 10 suspend-resume cycles), and also
> re-checking it with blk-mq disabled, by any chance is this something
> already known?
>
> Ideally, I'd like to reprduce it in a VM and capture vmcore.
>
> Any suggestions are welcome. Thanks.
>
> ===
> [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120 seconds.
> [ 9460.165983] Not tainted 4.12.0-pf7 #1
> [ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.165996] md0_raid10 D 0 225 2 0x00000000
> [ 9460.166005] Call Trace:
> [ 9460.166027] __schedule+0x6e7/0xcd0
> [ 9460.166043] schedule+0x3d/0xd0
> [ 9460.166067] md_super_wait+0x6e/0xa0 [md_mod]
> [ 9460.166130] ? wake_bit_function+0x60/0x60
> [ 9460.166143] write_page+0x185/0x310 [md_mod]
> [ 9460.166156] bitmap_update_sb+0x12f/0x140 [md_mod]
> [ 9460.166167] md_update_sb.part.60+0x354/0x840 [md_mod]
> [ 9460.166179] ? percpu_ref_switch_to_percpu+0x36/0x40
> [ 9460.166190] md_check_recovery+0x214/0x4b0 [md_mod]
> [ 9460.166201] raid10d+0x62/0x13c0 [raid10]
> [ 9460.166213] ? lock_timer_base+0x81/0xa0
> [ 9460.166222] ? try_to_del_timer_sync+0x53/0x80
> [ 9460.166232] ? del_timer_sync+0x39/0x40
> [ 9460.166246] ? schedule_timeout+0x19b/0x330
> [ 9460.166261] ? call_timer_fn+0x160/0x160
> [ 9460.166279] md_thread+0x120/0x160 [md_mod]
> [ 9460.166290] ? md_thread+0x120/0x160 [md_mod]
> [ 9460.166298] ? wake_bit_function+0x60/0x60
> [ 9460.166309] kthread+0x124/0x140
> [ 9460.166319] ? find_pers+0x70/0x70 [md_mod]
> [ 9460.166353] ? kthread_create_on_node+0x70/0x70
> [ 9460.166360] ret_from_fork+0x25/0x30
> [ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120
> seconds. [ 9460.166376] Not tainted 4.12.0-pf7 #1
> [ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.166386] dmcrypt_write D 0 254 2 0x00000000
> [ 9460.166492] Call Trace:
> [ 9460.166501] __schedule+0x6e7/0xcd0
> [ 9460.166511] schedule+0x3d/0xd0
> [ 9460.166522] md_write_start+0xe3/0x270 [md_mod]
> [ 9460.166529] ? wake_bit_function+0x60/0x60
> [ 9460.166538] raid10_make_request+0x3f/0x140 [raid10]
> [ 9460.166549] md_make_request+0xa9/0x2a0 [md_mod]
> [ 9460.166557] ? __schedule+0x6ef/0xcd0
> [ 9460.166567] generic_make_request+0x11e/0x2f0
> [ 9460.166578] dmcrypt_write+0x22d/0x250 [dm_crypt]
> [ 9460.166586] ? dmcrypt_write+0x22d/0x250 [dm_crypt]
> [ 9460.166596] ? wake_up_q+0x80/0x80
> [ 9460.166605] kthread+0x124/0x140
> [ 9460.166613] ? kthread+0x124/0x140
> [ 9460.166621] ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
> [ 9460.166628] ? kthread_create_on_node+0x70/0x70
> [ 9460.166635] ret_from_fork+0x25/0x30
> [ 9460.166677] INFO: task konversation:7110 blocked for more than 120
> seconds. [ 9460.166683] Not tainted 4.12.0-pf7 #1
> [ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.166693] konversation D 0 7110 6972 0x00000000
> [ 9460.166700] Call Trace:
> [ 9460.166709] __schedule+0x6e7/0xcd0
> [ 9460.166719] schedule+0x3d/0xd0
> [ 9460.166824] io_schedule+0x16/0x40
> [ 9460.166833] wait_on_page_bit+0xeb/0x130
> [ 9460.166841] ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.166882] prepare_pages+0x1b5/0x1d0 [btrfs]
> [ 9460.166915] __btrfs_buffered_write+0x267/0x6d0 [btrfs]
> [ 9460.166948] btrfs_file_write_iter+0x1c1/0x500 [btrfs]
> [ 9460.166956] ? cp_new_stat+0x152/0x180
> [ 9460.166966] __vfs_write+0xe4/0x140
> [ 9460.166975] vfs_write+0xb1/0x1a0
> [ 9460.166984] SyS_write+0x55/0xc0
> [ 9460.166994] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.167001] RIP: 0033:0x7fd020f09c0d
> [ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000001
> [ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
> 00007fd020f09c0d
> [ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
> 000000000000002b
> [ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
> 0000000000000030
> [ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
> 0000000000000000
> [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
> 0000000000000001
> [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
> seconds.
> [ 9460.167080] Not tainted 4.12.0-pf7 #1
> [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.167089] akonadi_imap_re D 0 7363 7107 0x00000000
> [ 9460.167096] Call Trace:
> [ 9460.167105] __schedule+0x6e7/0xcd0
> [ 9460.167114] schedule+0x3d/0xd0
> [ 9460.167149] wait_for_writer+0xb1/0xe0 [btrfs]
> [ 9460.167157] ? wake_bit_function+0x60/0x60
> [ 9460.167190] btrfs_sync_log+0xe2/0xad0 [btrfs]
> [ 9460.167201] ? dput.part.24+0x51/0x1d0
> [ 9460.167210] ? dput+0x13/0x20
> [ 9460.167242] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> [ 9460.167275] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167302] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167313] vfs_fsync_range+0x4b/0xb0
> [ 9460.167320] do_fsync+0x3d/0x70
> [ 9460.167327] SyS_fdatasync+0x13/0x20
> [ 9460.167336] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.167342] RIP: 0033:0x7fcdf0d7522d
> [ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
> 00007fcdf0d7522d
> [ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
> 0000000000000010
> [ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
> 000000000000002f
> [ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
> 0000007b8c7b88f0
> [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
> 0000007b8c7b66b0
> [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
> seconds.
> [ 9460.167492] Not tainted 4.12.0-pf7 #1
> [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.167501] akonadi_maildis D 0 7366 7107 0x00000000
> [ 9460.167507] Call Trace:
> [ 9460.167516] __schedule+0x6e7/0xcd0
> [ 9460.167525] schedule+0x3d/0xd0
> [ 9460.167557] wait_log_commit+0xd6/0x100 [btrfs]
> [ 9460.167564] ? wake_bit_function+0x60/0x60
> [ 9460.167593] btrfs_sync_log+0x13e/0xad0 [btrfs]
> [ 9460.167603] ? dput.part.24+0x51/0x1d0
> [ 9460.167611] ? dput+0x13/0x20
> [ 9460.167640] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> [ 9460.167670] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167696] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167705] vfs_fsync_range+0x4b/0xb0
> [ 9460.167712] do_fsync+0x3d/0x70
> [ 9460.167718] SyS_fdatasync+0x13/0x20
> [ 9460.167727] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.167733] RIP: 0033:0x7ff91da0822d
> [ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
> 00007ff91da0822d
> [ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
> 000000000000000f
> [ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
> 000000000000002f
> [ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
> 000000841e910cc0
> [ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
> 000000841e9165c0
> [ 9460.167786] INFO: task chromium:7600 blocked for more than 120 seconds.
> [ 9460.167791] Not tainted 4.12.0-pf7 #1
> [ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.167800] chromium D 0 7600 7020 0x00000000
> [ 9460.167807] Call Trace:
> [ 9460.167815] __schedule+0x6e7/0xcd0
> [ 9460.167824] schedule+0x3d/0xd0
> [ 9460.167831] io_schedule+0x16/0x40
> [ 9460.167838] wait_on_page_bit+0xeb/0x130
> [ 9460.167847] ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.167879] btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
> [ 9460.167889] do_page_mkwrite+0x31/0x90
> [ 9460.167896] ? current_time+0x38/0x70
> [ 9460.167905] do_wp_page+0x200/0x4b0
> [ 9460.167911] ? file_update_time+0x60/0x110
> [ 9460.167920] __handle_mm_fault+0x832/0xd50
> [ 9460.167930] handle_mm_fault+0xde/0x220
> [ 9460.167938] __do_page_fault+0x24d/0x510
> [ 9460.167946] do_page_fault+0x22/0x30
> [ 9460.167954] page_fault+0x28/0x30
> [ 9460.167959] RIP: 0033:0x64de23cd9c
> [ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
> [ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
> 0000398ce8da6510
> [ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
> 0000398cef048700
> [ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
> 0000000000000000
> [ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
> 0000000000000001
> [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
> 0000000000000000
> [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
> seconds.
> [ 9460.168019] Not tainted 4.12.0-pf7 #1
> [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.168028] BrowserBlocking D 0 7639 7020 0x00000000
> [ 9460.168035] Call Trace:
> [ 9460.168044] __schedule+0x6e7/0xcd0
> [ 9460.168054] schedule+0x3d/0xd0
> [ 9460.168061] io_schedule+0x16/0x40
> [ 9460.168068] wait_on_page_bit_common+0xe3/0x180
> [ 9460.168076] ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.168083] __filemap_fdatawait_range+0x12a/0x1a0
> [ 9460.168093] filemap_fdatawait_range+0x14/0x30
> [ 9460.168128] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> [ 9460.168157] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> [ 9460.168186] btrfs_sync_file+0x17c/0x3f0 [btrfs]
> [ 9460.168195] vfs_fsync_range+0x4b/0xb0
> [ 9460.168202] do_fsync+0x3d/0x70
> [ 9460.168209] SyS_fdatasync+0x13/0x20
> [ 9460.168218] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.168223] RIP: 0033:0x7f80b8fcd22d
> [ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
> 00007f80b8fcd22d
> [ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
> 0000000000000119
> [ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
> 0000398ce88693c0
> [ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007f808c8b77d8
> [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
> 0000398cef877800
> [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
> seconds.
> [ 9460.168272] Not tainted 4.12.0-pf7 #1
> [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.168282] Chrome_SyncThre D 0 7867 7020 0x00000000
> [ 9460.168289] Call Trace:
> [ 9460.168297] __schedule+0x6e7/0xcd0
> [ 9460.168306] schedule+0x3d/0xd0
> [ 9460.168338] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> [btrfs] [ 9460.168348] ? wake_bit_function+0x60/0x60
> [ 9460.168377] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> [ 9460.168412] ? release_extent_buffer+0x98/0xb0 [btrfs]
> [ 9460.168444] btrfs_log_inode+0x97e/0x1140 [btrfs]
> [ 9460.168668] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> [ 9460.168676] ? touch_atime+0xc1/0xd0
> [ 9460.168686] ? refcount_inc+0x9/0x30
> [ 9460.168719] ? join_transaction+0x122/0x450 [btrfs]
> [ 9460.168750] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> [ 9460.168779] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> [ 9460.168787] vfs_fsync_range+0x4b/0xb0
> [ 9460.168794] do_fsync+0x3d/0x70
> [ 9460.168801] SyS_fdatasync+0x13/0x20
> [ 9460.168810] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.168815] RIP: 0033:0x7f80b8fcd22d
> [ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
> 00007f80b8fcd22d
> [ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> 00000000000000fb
> [ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
> 0000000000000001
> [ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
> 00000000004b0000
> [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
> 0000398ceb9a1048
> [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
> seconds.
> [ 9460.168858] Not tainted 4.12.0-pf7 #1
> [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.168867] BrowserBlocking D 0 8011 7020 0x00000000
> [ 9460.168873] Call Trace:
> [ 9460.168882] __schedule+0x6e7/0xcd0
> [ 9460.168892] schedule+0x3d/0xd0
> [ 9460.168898] io_schedule+0x16/0x40
> [ 9460.168905] wait_on_page_bit_common+0xe3/0x180
> [ 9460.168912] ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.168919] __filemap_fdatawait_range+0x12a/0x1a0
> [ 9460.168929] filemap_fdatawait_range+0x14/0x30
> [ 9460.168961] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> [ 9460.168989] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> [ 9460.169015] btrfs_sync_file+0x17c/0x3f0 [btrfs]
> [ 9460.169025] vfs_fsync_range+0x4b/0xb0
> [ 9460.169033] do_fsync+0x3d/0x70
> [ 9460.169040] SyS_fdatasync+0x13/0x20
> [ 9460.169050] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.169055] RIP: 0033:0x7f80b8fcd22d
> [ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
> 00007f80b8fcd22d
> [ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
> 000000000000011d
> [ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
> 0000398ce8870200
> [ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007f807d5f17d8
> [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
> 0000398cece75000
> [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
> seconds.
> [ 9460.169103] Not tainted 4.12.0-pf7 #1
> [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.169113] TaskSchedulerBa D 0 9603 7020 0x00000000
> [ 9460.169120] Call Trace:
> [ 9460.169128] __schedule+0x6e7/0xcd0
> [ 9460.169138] schedule+0x3d/0xd0
> [ 9460.169166] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> [btrfs] [ 9460.169175] ? wake_bit_function+0x60/0x60
> [ 9460.169204] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> [ 9460.169237] ? release_extent_buffer+0x98/0xb0 [btrfs]
> [ 9460.169265] btrfs_log_inode+0x97e/0x1140 [btrfs]
> [ 9460.169278] ? __set_page_dirty_nobuffers+0x117/0x150
> [ 9460.169310] ? btree_set_page_dirty+0xe/0x10 [btrfs]
> [ 9460.169346] ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
> [ 9460.169378] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> [ 9460.169386] ? touch_atime+0xc1/0xd0
> [ 9460.169395] ? refcount_inc+0x9/0x30
> [ 9460.169427] ? join_transaction+0x122/0x450 [btrfs]
> [ 9460.169457] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> [ 9460.169490] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> [ 9460.169498] vfs_fsync_range+0x4b/0xb0
> [ 9460.169505] do_fsync+0x3d/0x70
> [ 9460.169512] SyS_fdatasync+0x13/0x20
> [ 9460.169522] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.169527] RIP: 0033:0x7f80b8fcd22d
> [ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
> 00007f80b8fcd22d
> [ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> 0000000000000117
> [ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
> 0000000000000001
> [ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
> 0000000000000001
> [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
> 0000398ce9b50d80
> ===
Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote:
> Hi.
>
> I've re-checked this issue with 4.12.9, and it is still there.
>
> Also, I've managed to reproduce it in a VM with non-virtio disks (just
> -hda/- hdb pair in QEMU).
>
> I'm not able to reproduce it with blk_mq disabled. Also, if blk_mq is
> enabled, scheduler does not make any difference, even setting "none"
> triggers hang on resume.
>
> Next, will check for possible change between v4.12.7 and v4.12.8 (I don't
> remember this issue to be triggered on v4.12.7, or maybe I just haven't
> noticed it), and also will recompile kernel with debug info to capture
> proper vmcore.
>
> On ?ter? 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> > Hi.
> >
> > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I
> > have
> > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> > result. Soft lockup happens showing stacktraces I'm pasting below.
> >
> > Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits
> > for "all superblock writes that were scheduled to complete". Since there
> > is
> > "scheduled" word, should I also try "none" scheduler with blk-mq enabled?
> >
> > While I'm trying to reproduce it on a VM without much luck (it happens on
> > my laptop rarely, like 1 out of 10 suspend-resume cycles), and also
> > re-checking it with blk-mq disabled, by any chance is this something
> > already known?
> >
> > Ideally, I'd like to reprduce it in a VM and capture vmcore.
> >
> > Any suggestions are welcome. Thanks.
> >
> > ===
> > [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120
> > seconds.
> > [ 9460.165983] Not tainted 4.12.0-pf7 #1
> > [ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.165996] md0_raid10 D 0 225 2 0x00000000
> > [ 9460.166005] Call Trace:
> > [ 9460.166027] __schedule+0x6e7/0xcd0
> > [ 9460.166043] schedule+0x3d/0xd0
> > [ 9460.166067] md_super_wait+0x6e/0xa0 [md_mod]
> > [ 9460.166130] ? wake_bit_function+0x60/0x60
> > [ 9460.166143] write_page+0x185/0x310 [md_mod]
> > [ 9460.166156] bitmap_update_sb+0x12f/0x140 [md_mod]
> > [ 9460.166167] md_update_sb.part.60+0x354/0x840 [md_mod]
> > [ 9460.166179] ? percpu_ref_switch_to_percpu+0x36/0x40
> > [ 9460.166190] md_check_recovery+0x214/0x4b0 [md_mod]
> > [ 9460.166201] raid10d+0x62/0x13c0 [raid10]
> > [ 9460.166213] ? lock_timer_base+0x81/0xa0
> > [ 9460.166222] ? try_to_del_timer_sync+0x53/0x80
> > [ 9460.166232] ? del_timer_sync+0x39/0x40
> > [ 9460.166246] ? schedule_timeout+0x19b/0x330
> > [ 9460.166261] ? call_timer_fn+0x160/0x160
> > [ 9460.166279] md_thread+0x120/0x160 [md_mod]
> > [ 9460.166290] ? md_thread+0x120/0x160 [md_mod]
> > [ 9460.166298] ? wake_bit_function+0x60/0x60
> > [ 9460.166309] kthread+0x124/0x140
> > [ 9460.166319] ? find_pers+0x70/0x70 [md_mod]
> > [ 9460.166353] ? kthread_create_on_node+0x70/0x70
> > [ 9460.166360] ret_from_fork+0x25/0x30
> > [ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120
> > seconds. [ 9460.166376] Not tainted 4.12.0-pf7 #1
> > [ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.166386] dmcrypt_write D 0 254 2 0x00000000
> > [ 9460.166492] Call Trace:
> > [ 9460.166501] __schedule+0x6e7/0xcd0
> > [ 9460.166511] schedule+0x3d/0xd0
> > [ 9460.166522] md_write_start+0xe3/0x270 [md_mod]
> > [ 9460.166529] ? wake_bit_function+0x60/0x60
> > [ 9460.166538] raid10_make_request+0x3f/0x140 [raid10]
> > [ 9460.166549] md_make_request+0xa9/0x2a0 [md_mod]
> > [ 9460.166557] ? __schedule+0x6ef/0xcd0
> > [ 9460.166567] generic_make_request+0x11e/0x2f0
> > [ 9460.166578] dmcrypt_write+0x22d/0x250 [dm_crypt]
> > [ 9460.166586] ? dmcrypt_write+0x22d/0x250 [dm_crypt]
> > [ 9460.166596] ? wake_up_q+0x80/0x80
> > [ 9460.166605] kthread+0x124/0x140
> > [ 9460.166613] ? kthread+0x124/0x140
> > [ 9460.166621] ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
> > [ 9460.166628] ? kthread_create_on_node+0x70/0x70
> > [ 9460.166635] ret_from_fork+0x25/0x30
> > [ 9460.166677] INFO: task konversation:7110 blocked for more than 120
> > seconds. [ 9460.166683] Not tainted 4.12.0-pf7 #1
> > [ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.166693] konversation D 0 7110 6972 0x00000000
> > [ 9460.166700] Call Trace:
> > [ 9460.166709] __schedule+0x6e7/0xcd0
> > [ 9460.166719] schedule+0x3d/0xd0
> > [ 9460.166824] io_schedule+0x16/0x40
> > [ 9460.166833] wait_on_page_bit+0xeb/0x130
> > [ 9460.166841] ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.166882] prepare_pages+0x1b5/0x1d0 [btrfs]
> > [ 9460.166915] __btrfs_buffered_write+0x267/0x6d0 [btrfs]
> > [ 9460.166948] btrfs_file_write_iter+0x1c1/0x500 [btrfs]
> > [ 9460.166956] ? cp_new_stat+0x152/0x180
> > [ 9460.166966] __vfs_write+0xe4/0x140
> > [ 9460.166975] vfs_write+0xb1/0x1a0
> > [ 9460.166984] SyS_write+0x55/0xc0
> > [ 9460.166994] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.167001] RIP: 0033:0x7fd020f09c0d
> > [ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
> > 0000000000000001
> > [ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
> > 00007fd020f09c0d
> > [ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
> > 000000000000002b
> > [ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
> > 0000000000000030
> > [ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
> > 0000000000000000
> > [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
> > 0000000000000001
> > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
> > seconds.
> > [ 9460.167080] Not tainted 4.12.0-pf7 #1
> > [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.167089] akonadi_imap_re D 0 7363 7107 0x00000000
> > [ 9460.167096] Call Trace:
> > [ 9460.167105] __schedule+0x6e7/0xcd0
> > [ 9460.167114] schedule+0x3d/0xd0
> > [ 9460.167149] wait_for_writer+0xb1/0xe0 [btrfs]
> > [ 9460.167157] ? wake_bit_function+0x60/0x60
> > [ 9460.167190] btrfs_sync_log+0xe2/0xad0 [btrfs]
> > [ 9460.167201] ? dput.part.24+0x51/0x1d0
> > [ 9460.167210] ? dput+0x13/0x20
> > [ 9460.167242] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > [ 9460.167275] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167302] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167313] vfs_fsync_range+0x4b/0xb0
> > [ 9460.167320] do_fsync+0x3d/0x70
> > [ 9460.167327] SyS_fdatasync+0x13/0x20
> > [ 9460.167336] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.167342] RIP: 0033:0x7fcdf0d7522d
> > [ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
> > 00007fcdf0d7522d
> > [ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
> > 0000000000000010
> > [ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
> > 000000000000002f
> > [ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
> > 0000007b8c7b88f0
> > [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
> > 0000007b8c7b66b0
> > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
> > seconds.
> > [ 9460.167492] Not tainted 4.12.0-pf7 #1
> > [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.167501] akonadi_maildis D 0 7366 7107 0x00000000
> > [ 9460.167507] Call Trace:
> > [ 9460.167516] __schedule+0x6e7/0xcd0
> > [ 9460.167525] schedule+0x3d/0xd0
> > [ 9460.167557] wait_log_commit+0xd6/0x100 [btrfs]
> > [ 9460.167564] ? wake_bit_function+0x60/0x60
> > [ 9460.167593] btrfs_sync_log+0x13e/0xad0 [btrfs]
> > [ 9460.167603] ? dput.part.24+0x51/0x1d0
> > [ 9460.167611] ? dput+0x13/0x20
> > [ 9460.167640] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > [ 9460.167670] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167696] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167705] vfs_fsync_range+0x4b/0xb0
> > [ 9460.167712] do_fsync+0x3d/0x70
> > [ 9460.167718] SyS_fdatasync+0x13/0x20
> > [ 9460.167727] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.167733] RIP: 0033:0x7ff91da0822d
> > [ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
> > 00007ff91da0822d
> > [ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
> > 000000000000000f
> > [ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
> > 000000000000002f
> > [ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
> > 000000841e910cc0
> > [ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
> > 000000841e9165c0
> > [ 9460.167786] INFO: task chromium:7600 blocked for more than 120 seconds.
> > [ 9460.167791] Not tainted 4.12.0-pf7 #1
> > [ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.167800] chromium D 0 7600 7020 0x00000000
> > [ 9460.167807] Call Trace:
> > [ 9460.167815] __schedule+0x6e7/0xcd0
> > [ 9460.167824] schedule+0x3d/0xd0
> > [ 9460.167831] io_schedule+0x16/0x40
> > [ 9460.167838] wait_on_page_bit+0xeb/0x130
> > [ 9460.167847] ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.167879] btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
> > [ 9460.167889] do_page_mkwrite+0x31/0x90
> > [ 9460.167896] ? current_time+0x38/0x70
> > [ 9460.167905] do_wp_page+0x200/0x4b0
> > [ 9460.167911] ? file_update_time+0x60/0x110
> > [ 9460.167920] __handle_mm_fault+0x832/0xd50
> > [ 9460.167930] handle_mm_fault+0xde/0x220
> > [ 9460.167938] __do_page_fault+0x24d/0x510
> > [ 9460.167946] do_page_fault+0x22/0x30
> > [ 9460.167954] page_fault+0x28/0x30
> > [ 9460.167959] RIP: 0033:0x64de23cd9c
> > [ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
> > [ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
> > 0000398ce8da6510
> > [ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
> > 0000398cef048700
> > [ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
> > 0000000000000000
> > [ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
> > 0000000000000001
> > [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
> > 0000000000000000
> > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
> > seconds.
> > [ 9460.168019] Not tainted 4.12.0-pf7 #1
> > [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.168028] BrowserBlocking D 0 7639 7020 0x00000000
> > [ 9460.168035] Call Trace:
> > [ 9460.168044] __schedule+0x6e7/0xcd0
> > [ 9460.168054] schedule+0x3d/0xd0
> > [ 9460.168061] io_schedule+0x16/0x40
> > [ 9460.168068] wait_on_page_bit_common+0xe3/0x180
> > [ 9460.168076] ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.168083] __filemap_fdatawait_range+0x12a/0x1a0
> > [ 9460.168093] filemap_fdatawait_range+0x14/0x30
> > [ 9460.168128] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > [ 9460.168157] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > [ 9460.168186] btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > [ 9460.168195] vfs_fsync_range+0x4b/0xb0
> > [ 9460.168202] do_fsync+0x3d/0x70
> > [ 9460.168209] SyS_fdatasync+0x13/0x20
> > [ 9460.168218] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.168223] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
> > 00007f80b8fcd22d
> > [ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
> > 0000000000000119
> > [ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
> > 0000398ce88693c0
> > [ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
> > 00007f808c8b77d8
> > [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
> > 0000398cef877800
> > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
> > seconds.
> > [ 9460.168272] Not tainted 4.12.0-pf7 #1
> > [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.168282] Chrome_SyncThre D 0 7867 7020 0x00000000
> > [ 9460.168289] Call Trace:
> > [ 9460.168297] __schedule+0x6e7/0xcd0
> > [ 9460.168306] schedule+0x3d/0xd0
> > [ 9460.168338] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > [btrfs] [ 9460.168348] ? wake_bit_function+0x60/0x60
> > [ 9460.168377] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > [ 9460.168412] ? release_extent_buffer+0x98/0xb0 [btrfs]
> > [ 9460.168444] btrfs_log_inode+0x97e/0x1140 [btrfs]
> > [ 9460.168668] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > [ 9460.168676] ? touch_atime+0xc1/0xd0
> > [ 9460.168686] ? refcount_inc+0x9/0x30
> > [ 9460.168719] ? join_transaction+0x122/0x450 [btrfs]
> > [ 9460.168750] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > [ 9460.168779] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > [ 9460.168787] vfs_fsync_range+0x4b/0xb0
> > [ 9460.168794] do_fsync+0x3d/0x70
> > [ 9460.168801] SyS_fdatasync+0x13/0x20
> > [ 9460.168810] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.168815] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
> > 00007f80b8fcd22d
> > [ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > 00000000000000fb
> > [ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
> > 0000000000000001
> > [ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
> > 00000000004b0000
> > [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
> > 0000398ceb9a1048
> > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
> > seconds.
> > [ 9460.168858] Not tainted 4.12.0-pf7 #1
> > [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.168867] BrowserBlocking D 0 8011 7020 0x00000000
> > [ 9460.168873] Call Trace:
> > [ 9460.168882] __schedule+0x6e7/0xcd0
> > [ 9460.168892] schedule+0x3d/0xd0
> > [ 9460.168898] io_schedule+0x16/0x40
> > [ 9460.168905] wait_on_page_bit_common+0xe3/0x180
> > [ 9460.168912] ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.168919] __filemap_fdatawait_range+0x12a/0x1a0
> > [ 9460.168929] filemap_fdatawait_range+0x14/0x30
> > [ 9460.168961] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > [ 9460.168989] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > [ 9460.169015] btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > [ 9460.169025] vfs_fsync_range+0x4b/0xb0
> > [ 9460.169033] do_fsync+0x3d/0x70
> > [ 9460.169040] SyS_fdatasync+0x13/0x20
> > [ 9460.169050] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.169055] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
> > 00007f80b8fcd22d
> > [ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
> > 000000000000011d
> > [ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
> > 0000398ce8870200
> > [ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
> > 00007f807d5f17d8
> > [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
> > 0000398cece75000
> > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
> > seconds.
> > [ 9460.169103] Not tainted 4.12.0-pf7 #1
> > [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.169113] TaskSchedulerBa D 0 9603 7020 0x00000000
> > [ 9460.169120] Call Trace:
> > [ 9460.169128] __schedule+0x6e7/0xcd0
> > [ 9460.169138] schedule+0x3d/0xd0
> > [ 9460.169166] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > [btrfs] [ 9460.169175] ? wake_bit_function+0x60/0x60
> > [ 9460.169204] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > [ 9460.169237] ? release_extent_buffer+0x98/0xb0 [btrfs]
> > [ 9460.169265] btrfs_log_inode+0x97e/0x1140 [btrfs]
> > [ 9460.169278] ? __set_page_dirty_nobuffers+0x117/0x150
> > [ 9460.169310] ? btree_set_page_dirty+0xe/0x10 [btrfs]
> > [ 9460.169346] ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
> > [ 9460.169378] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > [ 9460.169386] ? touch_atime+0xc1/0xd0
> > [ 9460.169395] ? refcount_inc+0x9/0x30
> > [ 9460.169427] ? join_transaction+0x122/0x450 [btrfs]
> > [ 9460.169457] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > [ 9460.169490] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > [ 9460.169498] vfs_fsync_range+0x4b/0xb0
> > [ 9460.169505] do_fsync+0x3d/0x70
> > [ 9460.169512] SyS_fdatasync+0x13/0x20
> > [ 9460.169522] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.169527] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
> > 00007f80b8fcd22d
> > [ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > 0000000000000117
> > [ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
> > 0000000000000001
> > [ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
> > 0000000000000001
> > [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
> > 0000398ce9b50d80
> > ===
Hello Oleksandr,
Oleksandr Natalenko - 26.08.17, 12:48:
> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
>
> On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote:
[…]
> > I've re-checked this issue with 4.12.9, and it is still there.
[…]
> > On úterý 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> > > Hi.
> > >
> > > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I
> > > have
> > > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> > > result. Soft lockup happens showing stacktraces I'm pasting below.
I did have occassional hangs on resuming from suspend-to-ram with 4.12, but… I
am not certain that its related to I/O issues.
The hangs were gone as I tried with a kernel config from a friend that still
uses CFQ, I then adapted it to use 1000HZ, low-latency desktop and blk-mq as I
used before and also enabled optimization for the processor type in this
ThinkPad T520, and got hangs on resuming from suspend-to-ram again.
As backing out the change from 250 HZ to 1000 HZ, and from low-latency desktop
to the lesser low latency option did not help, I am now currently re-using the
config from my friend minus quite some drivers and unneeded kernel features,
but otherwise almost unchanged. I.e. back with CFQ as well. So far so good,
but it needs at least 4-5 days of additional testing to be sure.
Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1 didn´t
work and so on… so it may easily be a completely different issue.
I did not see much point in reporting it so far… as I have no idea on how to
reliably pin-point the issue. It happens once every few days, so a bisect
again is out of questions – (it is anyway for a production machine for me) –,
it appears to be a hard freeze, so no debug data… its one of these "you don´t
get to debug me" hangs again. I really have no idea how to a get hold on such
complexity. I am hoping to at least pin-point the exact kernel option that
triggers this issue, but it may take weeks to do so. I´d really love a way for
the kernel to at least to write out debug data before doing hanging
completely.
Thanks,
Martin
> > >
> > > Stacktrace shows that I/O hangs in md_super_wait(), and it means it
> > > waits
> > > for "all superblock writes that were scheduled to complete". Since there
> > > is
> > > "scheduled" word, should I also try "none" scheduler with blk-mq
> > > enabled?
> > >
> > > While I'm trying to reproduce it on a VM without much luck (it happens
> > > on
> > > my laptop rarely, like 1 out of 10 suspend-resume cycles), and also
> > > re-checking it with blk-mq disabled, by any chance is this something
> > > already known?
> > >
> > > Ideally, I'd like to reprduce it in a VM and capture vmcore.
> > >
> > > Any suggestions are welcome. Thanks.
> > >
> > > ===
> > > [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120
> > > seconds.
> > > [ 9460.165983] Not tainted 4.12.0-pf7 #1
> > > [ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.165996] md0_raid10 D 0 225 2 0x00000000
> > > [ 9460.166005] Call Trace:
> > > [ 9460.166027] __schedule+0x6e7/0xcd0
> > > [ 9460.166043] schedule+0x3d/0xd0
> > > [ 9460.166067] md_super_wait+0x6e/0xa0 [md_mod]
> > > [ 9460.166130] ? wake_bit_function+0x60/0x60
> > > [ 9460.166143] write_page+0x185/0x310 [md_mod]
> > > [ 9460.166156] bitmap_update_sb+0x12f/0x140 [md_mod]
> > > [ 9460.166167] md_update_sb.part.60+0x354/0x840 [md_mod]
> > > [ 9460.166179] ? percpu_ref_switch_to_percpu+0x36/0x40
> > > [ 9460.166190] md_check_recovery+0x214/0x4b0 [md_mod]
> > > [ 9460.166201] raid10d+0x62/0x13c0 [raid10]
> > > [ 9460.166213] ? lock_timer_base+0x81/0xa0
> > > [ 9460.166222] ? try_to_del_timer_sync+0x53/0x80
> > > [ 9460.166232] ? del_timer_sync+0x39/0x40
> > > [ 9460.166246] ? schedule_timeout+0x19b/0x330
> > > [ 9460.166261] ? call_timer_fn+0x160/0x160
> > > [ 9460.166279] md_thread+0x120/0x160 [md_mod]
> > > [ 9460.166290] ? md_thread+0x120/0x160 [md_mod]
> > > [ 9460.166298] ? wake_bit_function+0x60/0x60
> > > [ 9460.166309] kthread+0x124/0x140
> > > [ 9460.166319] ? find_pers+0x70/0x70 [md_mod]
> > > [ 9460.166353] ? kthread_create_on_node+0x70/0x70
> > > [ 9460.166360] ret_from_fork+0x25/0x30
> > > [ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120
> > > seconds. [ 9460.166376] Not tainted 4.12.0-pf7 #1
> > > [ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.166386] dmcrypt_write D 0 254 2 0x00000000
> > > [ 9460.166492] Call Trace:
> > > [ 9460.166501] __schedule+0x6e7/0xcd0
> > > [ 9460.166511] schedule+0x3d/0xd0
> > > [ 9460.166522] md_write_start+0xe3/0x270 [md_mod]
> > > [ 9460.166529] ? wake_bit_function+0x60/0x60
> > > [ 9460.166538] raid10_make_request+0x3f/0x140 [raid10]
> > > [ 9460.166549] md_make_request+0xa9/0x2a0 [md_mod]
> > > [ 9460.166557] ? __schedule+0x6ef/0xcd0
> > > [ 9460.166567] generic_make_request+0x11e/0x2f0
> > > [ 9460.166578] dmcrypt_write+0x22d/0x250 [dm_crypt]
> > > [ 9460.166586] ? dmcrypt_write+0x22d/0x250 [dm_crypt]
> > > [ 9460.166596] ? wake_up_q+0x80/0x80
> > > [ 9460.166605] kthread+0x124/0x140
> > > [ 9460.166613] ? kthread+0x124/0x140
> > > [ 9460.166621] ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
> > > [ 9460.166628] ? kthread_create_on_node+0x70/0x70
> > > [ 9460.166635] ret_from_fork+0x25/0x30
> > > [ 9460.166677] INFO: task konversation:7110 blocked for more than 120
> > > seconds. [ 9460.166683] Not tainted 4.12.0-pf7 #1
> > > [ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.166693] konversation D 0 7110 6972 0x00000000
> > > [ 9460.166700] Call Trace:
> > > [ 9460.166709] __schedule+0x6e7/0xcd0
> > > [ 9460.166719] schedule+0x3d/0xd0
> > > [ 9460.166824] io_schedule+0x16/0x40
> > > [ 9460.166833] wait_on_page_bit+0xeb/0x130
> > > [ 9460.166841] ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.166882] prepare_pages+0x1b5/0x1d0 [btrfs]
> > > [ 9460.166915] __btrfs_buffered_write+0x267/0x6d0 [btrfs]
> > > [ 9460.166948] btrfs_file_write_iter+0x1c1/0x500 [btrfs]
> > > [ 9460.166956] ? cp_new_stat+0x152/0x180
> > > [ 9460.166966] __vfs_write+0xe4/0x140
> > > [ 9460.166975] vfs_write+0xb1/0x1a0
> > > [ 9460.166984] SyS_write+0x55/0xc0
> > > [ 9460.166994] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.167001] RIP: 0033:0x7fd020f09c0d
> > > [ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
> > > 0000000000000001
> > > [ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
> > > 00007fd020f09c0d
> > > [ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
> > > 000000000000002b
> > > [ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
> > > 0000000000000030
> > > [ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
> > > 0000000000000000
> > > [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
> > > 0000000000000001
> > > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
> > > seconds.
> > > [ 9460.167080] Not tainted 4.12.0-pf7 #1
> > > [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.167089] akonadi_imap_re D 0 7363 7107 0x00000000
> > > [ 9460.167096] Call Trace:
> > > [ 9460.167105] __schedule+0x6e7/0xcd0
> > > [ 9460.167114] schedule+0x3d/0xd0
> > > [ 9460.167149] wait_for_writer+0xb1/0xe0 [btrfs]
> > > [ 9460.167157] ? wake_bit_function+0x60/0x60
> > > [ 9460.167190] btrfs_sync_log+0xe2/0xad0 [btrfs]
> > > [ 9460.167201] ? dput.part.24+0x51/0x1d0
> > > [ 9460.167210] ? dput+0x13/0x20
> > > [ 9460.167242] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > > [ 9460.167275] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167302] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167313] vfs_fsync_range+0x4b/0xb0
> > > [ 9460.167320] do_fsync+0x3d/0x70
> > > [ 9460.167327] SyS_fdatasync+0x13/0x20
> > > [ 9460.167336] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.167342] RIP: 0033:0x7fcdf0d7522d
> > > [ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
> > > 00007fcdf0d7522d
> > > [ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
> > > 0000000000000010
> > > [ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
> > > 000000000000002f
> > > [ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
> > > 0000007b8c7b88f0
> > > [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
> > > 0000007b8c7b66b0
> > > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
> > > seconds.
> > > [ 9460.167492] Not tainted 4.12.0-pf7 #1
> > > [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.167501] akonadi_maildis D 0 7366 7107 0x00000000
> > > [ 9460.167507] Call Trace:
> > > [ 9460.167516] __schedule+0x6e7/0xcd0
> > > [ 9460.167525] schedule+0x3d/0xd0
> > > [ 9460.167557] wait_log_commit+0xd6/0x100 [btrfs]
> > > [ 9460.167564] ? wake_bit_function+0x60/0x60
> > > [ 9460.167593] btrfs_sync_log+0x13e/0xad0 [btrfs]
> > > [ 9460.167603] ? dput.part.24+0x51/0x1d0
> > > [ 9460.167611] ? dput+0x13/0x20
> > > [ 9460.167640] ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > > [ 9460.167670] btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167696] ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167705] vfs_fsync_range+0x4b/0xb0
> > > [ 9460.167712] do_fsync+0x3d/0x70
> > > [ 9460.167718] SyS_fdatasync+0x13/0x20
> > > [ 9460.167727] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.167733] RIP: 0033:0x7ff91da0822d
> > > [ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
> > > 00007ff91da0822d
> > > [ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
> > > 000000000000000f
> > > [ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
> > > 000000000000002f
> > > [ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
> > > 000000841e910cc0
> > > [ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
> > > 000000841e9165c0
> > > [ 9460.167786] INFO: task chromium:7600 blocked for more than 120
> > > seconds.
> > > [ 9460.167791] Not tainted 4.12.0-pf7 #1
> > > [ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.167800] chromium D 0 7600 7020 0x00000000
> > > [ 9460.167807] Call Trace:
> > > [ 9460.167815] __schedule+0x6e7/0xcd0
> > > [ 9460.167824] schedule+0x3d/0xd0
> > > [ 9460.167831] io_schedule+0x16/0x40
> > > [ 9460.167838] wait_on_page_bit+0xeb/0x130
> > > [ 9460.167847] ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.167879] btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
> > > [ 9460.167889] do_page_mkwrite+0x31/0x90
> > > [ 9460.167896] ? current_time+0x38/0x70
> > > [ 9460.167905] do_wp_page+0x200/0x4b0
> > > [ 9460.167911] ? file_update_time+0x60/0x110
> > > [ 9460.167920] __handle_mm_fault+0x832/0xd50
> > > [ 9460.167930] handle_mm_fault+0xde/0x220
> > > [ 9460.167938] __do_page_fault+0x24d/0x510
> > > [ 9460.167946] do_page_fault+0x22/0x30
> > > [ 9460.167954] page_fault+0x28/0x30
> > > [ 9460.167959] RIP: 0033:0x64de23cd9c
> > > [ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
> > > [ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
> > > 0000398ce8da6510
> > > [ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
> > > 0000398cef048700
> > > [ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
> > > 0000000000000001
> > > [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
> > > 0000000000000000
> > > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
> > > seconds.
> > > [ 9460.168019] Not tainted 4.12.0-pf7 #1
> > > [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.168028] BrowserBlocking D 0 7639 7020 0x00000000
> > > [ 9460.168035] Call Trace:
> > > [ 9460.168044] __schedule+0x6e7/0xcd0
> > > [ 9460.168054] schedule+0x3d/0xd0
> > > [ 9460.168061] io_schedule+0x16/0x40
> > > [ 9460.168068] wait_on_page_bit_common+0xe3/0x180
> > > [ 9460.168076] ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.168083] __filemap_fdatawait_range+0x12a/0x1a0
> > > [ 9460.168093] filemap_fdatawait_range+0x14/0x30
> > > [ 9460.168128] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > > [ 9460.168157] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > > [ 9460.168186] btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > > [ 9460.168195] vfs_fsync_range+0x4b/0xb0
> > > [ 9460.168202] do_fsync+0x3d/0x70
> > > [ 9460.168209] SyS_fdatasync+0x13/0x20
> > > [ 9460.168218] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.168223] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
> > > 0000000000000119
> > > [ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
> > > 0000398ce88693c0
> > > [ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
> > > 00007f808c8b77d8
> > > [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
> > > 0000398cef877800
> > > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
> > > seconds.
> > > [ 9460.168272] Not tainted 4.12.0-pf7 #1
> > > [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.168282] Chrome_SyncThre D 0 7867 7020 0x00000000
> > > [ 9460.168289] Call Trace:
> > > [ 9460.168297] __schedule+0x6e7/0xcd0
> > > [ 9460.168306] schedule+0x3d/0xd0
> > > [ 9460.168338] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > > [btrfs] [ 9460.168348] ? wake_bit_function+0x60/0x60
> > > [ 9460.168377] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > > [ 9460.168412] ? release_extent_buffer+0x98/0xb0 [btrfs]
> > > [ 9460.168444] btrfs_log_inode+0x97e/0x1140 [btrfs]
> > > [ 9460.168668] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > > [ 9460.168676] ? touch_atime+0xc1/0xd0
> > > [ 9460.168686] ? refcount_inc+0x9/0x30
> > > [ 9460.168719] ? join_transaction+0x122/0x450 [btrfs]
> > > [ 9460.168750] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > > [ 9460.168779] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > > [ 9460.168787] vfs_fsync_range+0x4b/0xb0
> > > [ 9460.168794] do_fsync+0x3d/0x70
> > > [ 9460.168801] SyS_fdatasync+0x13/0x20
> > > [ 9460.168810] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.168815] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > > 00000000000000fb
> > > [ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
> > > 0000000000000001
> > > [ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
> > > 00000000004b0000
> > > [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
> > > 0000398ceb9a1048
> > > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
> > > seconds.
> > > [ 9460.168858] Not tainted 4.12.0-pf7 #1
> > > [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.168867] BrowserBlocking D 0 8011 7020 0x00000000
> > > [ 9460.168873] Call Trace:
> > > [ 9460.168882] __schedule+0x6e7/0xcd0
> > > [ 9460.168892] schedule+0x3d/0xd0
> > > [ 9460.168898] io_schedule+0x16/0x40
> > > [ 9460.168905] wait_on_page_bit_common+0xe3/0x180
> > > [ 9460.168912] ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.168919] __filemap_fdatawait_range+0x12a/0x1a0
> > > [ 9460.168929] filemap_fdatawait_range+0x14/0x30
> > > [ 9460.168961] btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > > [ 9460.168989] ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > > [ 9460.169015] btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > > [ 9460.169025] vfs_fsync_range+0x4b/0xb0
> > > [ 9460.169033] do_fsync+0x3d/0x70
> > > [ 9460.169040] SyS_fdatasync+0x13/0x20
> > > [ 9460.169050] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.169055] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
> > > 000000000000011d
> > > [ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
> > > 0000398ce8870200
> > > [ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
> > > 00007f807d5f17d8
> > > [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
> > > 0000398cece75000
> > > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
> > > seconds.
> > > [ 9460.169103] Not tainted 4.12.0-pf7 #1
> > > [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.169113] TaskSchedulerBa D 0 9603 7020 0x00000000
> > > [ 9460.169120] Call Trace:
> > > [ 9460.169128] __schedule+0x6e7/0xcd0
> > > [ 9460.169138] schedule+0x3d/0xd0
> > > [ 9460.169166] wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > > [btrfs] [ 9460.169175] ? wake_bit_function+0x60/0x60
> > > [ 9460.169204] btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > > [ 9460.169237] ? release_extent_buffer+0x98/0xb0 [btrfs]
> > > [ 9460.169265] btrfs_log_inode+0x97e/0x1140 [btrfs]
> > > [ 9460.169278] ? __set_page_dirty_nobuffers+0x117/0x150
> > > [ 9460.169310] ? btree_set_page_dirty+0xe/0x10 [btrfs]
> > > [ 9460.169346] ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
> > > [ 9460.169378] btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > > [ 9460.169386] ? touch_atime+0xc1/0xd0
> > > [ 9460.169395] ? refcount_inc+0x9/0x30
> > > [ 9460.169427] ? join_transaction+0x122/0x450 [btrfs]
> > > [ 9460.169457] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > > [ 9460.169490] btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > > [ 9460.169498] vfs_fsync_range+0x4b/0xb0
> > > [ 9460.169505] do_fsync+0x3d/0x70
> > > [ 9460.169512] SyS_fdatasync+0x13/0x20
> > > [ 9460.169522] entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.169527] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > > 0000000000000117
> > > [ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
> > > 0000000000000001
> > > [ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
> > > 0000000000000001
> > > [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
> > > 0000398ce9b50d80
> > > ===
--
Martin
Recompiled v4.13-rc6 with debug info and lockdep.
Unfortunately, still cannot use crash utility because 7.1.9 does not support
v4.13 (and git HEAD with required fix for 5-level pagetable does not compile
for me).
But I've got much nicer stacktrace + lockdep output (check paste below).
Looks like kernel is blocked on device mutex here:
1510 void scsi_rescan_device(struct device *dev)
1511 {
1512 struct scsi_device *sdev = to_scsi_device(dev);
1513
1514 device_lock(dev); // here
and md0_raid10 kthread is blocked on I/O scheduling point in blk_mq_get_tag().
Again, with blk-mq this issue is not reproducible.
===
[ 62.789665] INFO: task kworker/2:1:36 blocked for more than 20 seconds.
[ 62.793858] Not tainted 4.13.0-rc6 #3
[ 62.796537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 62.800467] kworker/2:1 D13856 36 2 0x00000000
[ 62.802934] Workqueue: events ata_scsi_dev_rescan
[ 62.805207] Call Trace:
[ 62.806957] __schedule+0x8d7/0x930
[ 62.809956] schedule+0x73/0x90
[ 62.812677] schedule_preempt_disabled+0x13/0x20
[ 62.816012] __mutex_lock+0x54f/0x820
[ 62.819050] ? scsi_rescan_device+0x21/0x90
[ 62.821930] ? mark_held_locks+0x58/0x80
[ 62.823994] mutex_lock_nested+0x16/0x20
[ 62.825984] ? mutex_lock_nested+0x16/0x20
[ 62.828032] scsi_rescan_device+0x21/0x90
[ 62.830174] ata_scsi_dev_rescan+0xa1/0x110
[ 62.832732] process_one_work+0x32a/0x620
[ 62.835328] worker_thread+0x221/0x3f0
[ 62.837553] kthread+0x124/0x130
[ 62.839215] ? process_one_work+0x620/0x620
[ 62.841293] ? kthread_stop+0x260/0x260
[ 62.843887] ret_from_fork+0x2a/0x40
[ 62.846664] INFO: task kworker/0:1:37 blocked for more than 20 seconds.
[ 62.849751] Not tainted 4.13.0-rc6 #3
[ 62.851724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 62.856832] kworker/0:1 D13664 37 2 0x00000000
[ 62.859750] Workqueue: events ata_scsi_dev_rescan
[ 62.862611] Call Trace:
[ 62.864213] __schedule+0x8d7/0x930
[ 62.866859] schedule+0x73/0x90
[ 62.869507] schedule_preempt_disabled+0x13/0x20
[ 62.871662] __mutex_lock+0x54f/0x820
[ 62.873541] ? scsi_rescan_device+0x21/0x90
[ 62.875580] ? mark_held_locks+0x58/0x80
[ 62.877871] mutex_lock_nested+0x16/0x20
[ 62.880760] ? mutex_lock_nested+0x16/0x20
[ 62.883544] scsi_rescan_device+0x21/0x90
[ 62.886288] ata_scsi_dev_rescan+0xa1/0x110
[ 62.888766] process_one_work+0x32a/0x620
[ 62.891593] worker_thread+0x221/0x3f0
[ 62.894223] kthread+0x124/0x130
[ 62.895974] ? process_one_work+0x620/0x620
[ 62.898643] ? kthread_stop+0x260/0x260
[ 62.902062] ret_from_fork+0x2a/0x40
[ 62.905109] INFO: task kworker/1:1:40 blocked for more than 20 seconds.
[ 62.908068] Not tainted 4.13.0-rc6 #3
[ 62.910115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 62.913326] kworker/1:1 D13888 40 2 0x00000000
[ 62.915904] Workqueue: events ata_scsi_dev_rescan
[ 62.918133] Call Trace:
[ 62.919958] __schedule+0x8d7/0x930
[ 62.922615] schedule+0x73/0x90
[ 62.925042] schedule_preempt_disabled+0x13/0x20
[ 62.928377] __mutex_lock+0x54f/0x820
[ 62.930915] ? scsi_rescan_device+0x21/0x90
[ 62.933510] ? mark_held_locks+0x58/0x80
[ 62.936079] mutex_lock_nested+0x16/0x20
[ 62.938122] ? mutex_lock_nested+0x16/0x20
[ 62.940394] scsi_rescan_device+0x21/0x90
[ 62.942405] ata_scsi_dev_rescan+0xa1/0x110
[ 62.944491] process_one_work+0x32a/0x620
[ 62.946580] worker_thread+0x221/0x3f0
[ 62.948719] kthread+0x124/0x130
[ 62.950708] ? process_one_work+0x620/0x620
[ 62.953185] ? kthread_stop+0x260/0x260
[ 62.955792] ret_from_fork+0x2a/0x40
[ 62.958235] INFO: task md0_raid10:1716 blocked for more than 20 seconds.
[ 62.962058] Not tainted 4.13.0-rc6 #3
[ 62.965328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 62.970630] md0_raid10 D13696 1716 2 0x00000000
[ 62.972990] Call Trace:
[ 62.975155] __schedule+0x8d7/0x930
[ 62.977618] ? sbitmap_get+0x4e/0xa0
[ 62.979601] schedule+0x73/0x90
[ 62.981206] io_schedule+0x11/0x40
[ 62.982862] blk_mq_get_tag+0x13e/0x250
[ 62.984641] ? blk_mq_get_request+0xdb/0x400
[ 62.986555] ? do_wait_intr_irq+0x80/0x80
[ 62.988503] blk_mq_get_request+0x1a8/0x400
[ 62.990636] blk_mq_make_request+0x22f/0x7c0
[ 62.992783] ? generic_make_request+0xcb/0x360
[ 62.995084] generic_make_request+0xf8/0x360
[ 62.997735] submit_bio+0x10f/0x120
[ 62.999500] ? submit_bio+0x10f/0x120
[ 63.002395] md_super_write+0xc8/0xe0
[ 63.005273] md_update_sb+0x533/0x6e0
[ 63.007286] md_check_recovery+0x1ee/0x4b0
[ 63.010229] raid10d+0x4b/0x15e0
[ 63.012121] ? finish_wait+0x57/0x60
[ 63.015065] ? trace_hardirqs_on_caller+0x178/0x1a0
[ 63.018147] ? trace_hardirqs_on+0xd/0x10
[ 63.020969] md_thread+0xd9/0x120
[ 63.023033] ? md_thread+0xd9/0x120
[ 63.025723] ? do_wait_intr_irq+0x80/0x80
[ 63.028288] kthread+0x124/0x130
[ 63.029947] ? find_pers+0x60/0x60
[ 63.031410] ? kthread_stop+0x260/0x260
[ 63.032922] ? umh_complete+0x40/0x40
[ 63.034415] ? call_usermodehelper_exec_async+0x12f/0x160
[ 63.037659] ret_from_fork+0x2a/0x40
[ 63.039245] INFO: task md0_resync:1720 blocked for more than 20 seconds.
[ 63.042346] Not tainted 4.13.0-rc6 #3
[ 63.044393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 63.048591] md0_resync D13104 1720 2 0x00000000
[ 63.051279] Call Trace:
[ 63.053335] __schedule+0x8d7/0x930
[ 63.056151] ? blk_flush_plug_list+0xd5/0x250
[ 63.058725] schedule+0x73/0x90
[ 63.060382] io_schedule+0x11/0x40
[ 63.062212] blk_mq_get_tag+0x13e/0x250
[ 63.064222] ? blk_mq_get_request+0xdb/0x400
[ 63.066339] ? do_wait_intr_irq+0x80/0x80
[ 63.068875] blk_mq_get_request+0x1a8/0x400
[ 63.071329] blk_mq_make_request+0x22f/0x7c0
[ 63.073599] ? generic_make_request+0xcb/0x360
[ 63.076392] generic_make_request+0xf8/0x360
[ 63.078033] raid10_sync_request+0x214a/0x2290
[ 63.080008] ? raid10_sync_request+0x214a/0x2290
[ 63.081820] ? find_held_lock+0x35/0xa0
[ 63.083345] ? rcu_read_unlock+0x3e/0x5d
[ 63.086222] ? is_mddev_idle+0x101/0x113
[ 63.088684] md_do_sync+0x960/0xe90
[ 63.090960] ? do_wait_intr_irq+0x80/0x80
[ 63.093329] md_thread+0xd9/0x120
[ 63.095277] ? md_thread+0xd9/0x120
[ 63.097438] ? trace_hardirqs_on+0xd/0x10
[ 63.100310] kthread+0x124/0x130
[ 63.102471] ? find_pers+0x60/0x60
[ 63.104895] ? kthread_stop+0x260/0x260
[ 63.107208] ? umh_complete+0x40/0x40
[ 63.110149] ? call_usermodehelper_exec_async+0x12f/0x160
[ 63.113570] ret_from_fork+0x2a/0x40
[ 63.138339] INFO: task systemd-sleep:2599 blocked for more than 20 seconds.
[ 63.141320] Not tainted 4.13.0-rc6 #3
[ 63.143560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 63.147738] systemd-sleep D12640 2599 1 0x00000080
[ 63.150645] Call Trace:
[ 63.152842] __schedule+0x8d7/0x930
[ 63.156430] schedule+0x73/0x90
[ 63.158521] async_synchronize_cookie_domain+0x6c/0xa0
[ 63.161252] ? do_wait_intr_irq+0x80/0x80
[ 63.163461] async_synchronize_full+0x12/0x20
[ 63.166359] dpm_resume+0x303/0x480
[ 63.168656] ? rcu_read_lock_sched_held+0x78/0xa0
[ 63.171555] dpm_resume_end+0xc/0x20
[ 63.174369] suspend_devices_and_enter+0x62f/0x990
[ 63.177177] ? printk+0x3e/0x46
[ 63.179845] ? rcu_read_lock_sched_held+0x78/0xa0
[ 63.182931] pm_suspend+0x2a8/0x490
[ 63.185463] state_store+0xaf/0xe0
[ 63.187429] kobj_attr_store+0xf/0x20
[ 63.190402] sysfs_kf_write+0x3d/0x50
[ 63.192772] kernfs_fop_write+0x114/0x180
[ 63.195435] __vfs_write+0x23/0x110
[ 63.197935] ? rcu_read_lock_sched_held+0x78/0xa0
[ 63.201616] ? rcu_sync_lockdep_assert+0x27/0x50
[ 63.204820] ? __sb_start_write+0x95/0x1f0
[ 63.207522] vfs_write+0xc0/0x180
[ 63.209411] SyS_write+0x44/0xa0
[ 63.211155] do_syscall_64+0x64/0x120
[ 63.212996] entry_SYSCALL64_slow_path+0x25/0x25
[ 63.215264] RIP: 0033:0x7f160e214bf0
[ 63.217244] RSP: 002b:00007fffa8f08e08 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 63.222682] RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
00007f160e214bf0
[ 63.226774] RDX: 0000000000000004 RSI: 0000565398088390 RDI:
0000000000000004
[ 63.231227] RBP: 0000565398088390 R08: 0000565398088240 R09:
00007f160e6e28c0
[ 63.234773] R10: 00007f160e4d5ad8 R11: 0000000000000246 R12:
0000000000000004
[ 63.237877] R13: 0000000000000001 R14: 0000565398088160 R15:
0000000000000004
[ 63.240832] INFO: task kworker/u8:20:2615 blocked for more than 20 seconds.
[ 63.243874] Not tainted 4.13.0-rc6 #3
[ 63.246036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 63.249754] kworker/u8:20 D14128 2615 2 0x00000080
[ 63.252260] Workqueue: events_unbound async_run_entry_fn
[ 63.254765] Call Trace:
[ 63.256338] __schedule+0x8d7/0x930
[ 63.258270] schedule+0x73/0x90
[ 63.260107] async_synchronize_cookie_domain+0x6c/0xa0
[ 63.262473] ? do_wait_intr_irq+0x80/0x80
[ 63.264494] ? async_sdev_thaw+0x20/0x20
[ 63.266467] ? scsi_bus_thaw+0x20/0x20
[ 63.268372] async_synchronize_full_domain+0x13/0x20
[ 63.271074] scsi_bus_resume_common+0x95/0x140
[ 63.274168] scsi_bus_resume+0x10/0x20
[ 63.277058] dpm_run_callback+0x142/0x290
[ 63.279259] device_resume+0x192/0x1e0
[ 63.280883] async_resume+0x18/0x40
[ 63.282321] async_run_entry_fn+0x32/0xd0
[ 63.283856] process_one_work+0x32a/0x620
[ 63.285390] worker_thread+0x221/0x3f0
[ 63.286885] kthread+0x124/0x130
[ 63.288515] ? process_one_work+0x620/0x620
[ 63.290482] ? kthread_stop+0x260/0x260
[ 63.292203] ret_from_fork+0x2a/0x40
[ 63.294060] INFO: task kworker/u8:21:2616 blocked for more than 20 seconds.
[ 63.296818] Not tainted 4.13.0-rc6 #3
[ 63.299821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 63.305221] kworker/u8:21 D14392 2616 2 0x00000080
[ 63.308313] Workqueue: events_unbound async_run_entry_fn
[ 63.312513] Call Trace:
[ 63.314752] __schedule+0x8d7/0x930
[ 63.317683] schedule+0x73/0x90
[ 63.319485] async_synchronize_cookie_domain+0x6c/0xa0
[ 63.321883] ? do_wait_intr_irq+0x80/0x80
[ 63.323875] ? async_sdev_thaw+0x20/0x20
[ 63.325848] ? scsi_bus_thaw+0x20/0x20
[ 63.328287] async_synchronize_full_domain+0x13/0x20
[ 63.330451] scsi_bus_resume_common+0x95/0x140
[ 63.332381] scsi_bus_resume+0x10/0x20
[ 63.334129] dpm_run_callback+0x142/0x290
[ 63.336353] device_resume+0x192/0x1e0
[ 63.338854] async_resume+0x18/0x40
[ 63.341101] async_run_entry_fn+0x32/0xd0
[ 63.343197] process_one_work+0x32a/0x620
[ 63.345300] worker_thread+0x221/0x3f0
[ 63.347356] kthread+0x124/0x130
[ 63.349463] ? process_one_work+0x620/0x620
[ 63.352192] ? kthread_stop+0x260/0x260
[ 63.354697] ret_from_fork+0x2a/0x40
[ 63.357611] INFO: task kworker/u8:22:2617 blocked for more than 20 seconds.
[ 63.361106] Not tainted 4.13.0-rc6 #3
[ 63.363990] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 63.367847] kworker/u8:22 D14392 2617 2 0x00000080
[ 63.370561] Workqueue: events_unbound async_run_entry_fn
[ 63.374235] Call Trace:
[ 63.376351] __schedule+0x8d7/0x930
[ 63.379545] schedule+0x73/0x90
[ 63.382199] async_synchronize_cookie_domain+0x6c/0xa0
[ 63.385132] ? do_wait_intr_irq+0x80/0x80
[ 63.387533] ? async_sdev_thaw+0x20/0x20
[ 63.389684] ? scsi_bus_thaw+0x20/0x20
[ 63.391849] async_synchronize_full_domain+0x13/0x20
[ 63.395454] scsi_bus_resume_common+0x95/0x140
[ 63.397842] scsi_bus_resume+0x10/0x20
[ 63.399883] dpm_run_callback+0x142/0x290
[ 63.402873] device_resume+0x192/0x1e0
[ 63.405915] async_resume+0x18/0x40
[ 63.408723] async_run_entry_fn+0x32/0xd0
[ 63.411627] process_one_work+0x32a/0x620
[ 63.413672] worker_thread+0x221/0x3f0
[ 63.415661] kthread+0x124/0x130
[ 63.417452] ? process_one_work+0x620/0x620
[ 63.419750] ? kthread_stop+0x260/0x260
[ 63.421781] ret_from_fork+0x2a/0x40
[ 63.423653] INFO: task kworker/u8:24:2619 blocked for more than 20 seconds.
[ 63.426647] Not tainted 4.13.0-rc6 #3
[ 63.428599] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 63.431762] kworker/u8:24 D14416 2619 2 0x00000080
[ 63.433650] Workqueue: events_unbound async_run_entry_fn
[ 63.435956] Call Trace:
[ 63.437711] __schedule+0x8d7/0x930
[ 63.440567] ? sbitmap_get+0x4e/0xa0
[ 63.442571] schedule+0x73/0x90
[ 63.444331] io_schedule+0x11/0x40
[ 63.446731] blk_mq_get_tag+0x13e/0x250
[ 63.448974] ? blk_mq_get_request+0xdb/0x400
[ 63.451208] ? do_wait_intr_irq+0x80/0x80
[ 63.453716] blk_mq_get_request+0x1a8/0x400
[ 63.455700] blk_mq_alloc_request+0x56/0xc0
[ 63.457883] blk_get_request+0x29/0x100
[ 63.459658] scsi_execute+0x3b/0x230
[ 63.461536] sd_start_stop_device+0x93/0x120
[ 63.463616] sd_resume+0x67/0x70
[ 63.465310] do_scsi_resume+0x16/0x20
[ 63.467217] scsi_dev_type_resume+0x26/0x70
[ 63.469287] async_sdev_resume+0x10/0x20
[ 63.471178] async_run_entry_fn+0x32/0xd0
[ 63.473197] process_one_work+0x32a/0x620
[ 63.474939] worker_thread+0x221/0x3f0
[ 63.476752] kthread+0x124/0x130
[ 63.478406] ? process_one_work+0x620/0x620
[ 63.480031] ? kthread_stop+0x260/0x260
[ 63.481579] ret_from_fork+0x2a/0x40
[ 63.483548]
Showing all locks held in the system:
[ 63.487087] 4 locks held by kworker/2:1/36:
[ 63.489578] #0: ("events"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.494571] #1: ((&ap->scsi_rescan_task)){+.+...}, at:
[<ffffffff8109f083>] process_one_work+0x1b3/0x620
[ 63.499045] #2: (&ap->scsi_scan_mutex){+.+...}, at: [<ffffffff816f9074>]
ata_scsi_dev_rescan+0x34/0x110
[ 63.505049] #3: (&dev->mutex){......}, at: [<ffffffff816d4b41>]
scsi_rescan_device+0x21/0x90
[ 63.510159] 4 locks held by kworker/0:1/37:
[ 63.512237] #0: ("events"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.516225] #1: ((&ap->scsi_rescan_task)){+.+...}, at:
[<ffffffff8109f083>] process_one_work+0x1b3/0x620
[ 63.520622] #2: (&ap->scsi_scan_mutex){+.+...}, at: [<ffffffff816f9074>]
ata_scsi_dev_rescan+0x34/0x110
[ 63.524931] #3: (&dev->mutex){......}, at: [<ffffffff816d4b41>]
scsi_rescan_device+0x21/0x90
[ 63.528790] 4 locks held by kworker/1:1/40:
[ 63.530879] #0: ("events"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.536187] #1: ((&ap->scsi_rescan_task)){+.+...}, at:
[<ffffffff8109f083>] process_one_work+0x1b3/0x620
[ 63.540720] #2: (&ap->scsi_scan_mutex){+.+...}, at: [<ffffffff816f9074>]
ata_scsi_dev_rescan+0x34/0x110
[ 63.544863] #3: (&dev->mutex){......}, at: [<ffffffff816d4b41>]
scsi_rescan_device+0x21/0x90
[ 63.549947] 1 lock held by khungtaskd/461:
[ 63.552274] #0: (tasklist_lock){.+.+..}, at: [<ffffffff810ce5da>]
debug_show_all_locks+0x4a/0x170
[ 63.557027] 1 lock held by md0_raid10/1716:
[ 63.559209] #0: (&mddev->reconfig_mutex){+.+.+.}, at:
[<ffffffff81832cc7>] md_check_recovery+0xe7/0x4b0
[ 63.587846] 2 locks held by btrfs-transacti/1867:
[ 63.590223] #0: (&fs_info->transaction_kthread_mutex){+.+...}, at:
[<ffffffff81372779>] transaction_kthread+0x49/0x190
[ 63.595024] #1: (&fs_info->tree_log_mutex){+.+...}, at:
[<ffffffff81377c1a>] btrfs_commit_transaction+0x3aa/0x910
[ 63.600911] 5 locks held by systemd-sleep/2599:
[ 63.603394] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811eaeaf>]
vfs_write+0xaf/0x180
[ 63.607819] #1: (&of->mutex){+.+.+.}, at: [<ffffffff8126bf99>]
kernfs_fop_write+0xb9/0x180
[ 63.612001] #2: (s_active#146){.+.+.+}, at: [<ffffffff8126bfa1>]
kernfs_fop_write+0xc1/0x180
[ 63.616079] #3: (pm_mutex){+.+.+.}, at: [<ffffffff810d89f2>] pm_suspend
+0x472/0x490
[ 63.621514] #4: (acpi_scan_lock){+.+.+.}, at: [<ffffffff814e53e2>]
acpi_scan_lock_acquire+0x12/0x20
[ 63.627617] 3 locks held by kworker/u8:20/2615:
[ 63.629583] #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.633125] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.636934] #2: (&dev->mutex){......}, at: [<ffffffff816ae8f9>]
device_resume+0x89/0x1e0
[ 63.641510] 3 locks held by kworker/u8:21/2616:
[ 63.643984] #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.649344] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.655669] #2: (&dev->mutex){......}, at: [<ffffffff816ae8f9>]
device_resume+0x89/0x1e0
[ 63.660176] 3 locks held by kworker/u8:22/2617:
[ 63.662542] #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.668704] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.674307] #2: (&dev->mutex){......}, at: [<ffffffff816ae8f9>]
device_resume+0x89/0x1e0
[ 63.678492] 2 locks held by kworker/u8:24/2619:
[ 63.680737] #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.685153] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>]
process_one_work+0x1b3/0x620
[ 63.689648]
[ 63.691151] =============================================
===
On sobota 26. srpna 2017 12:48:01 CEST Oleksandr Natalenko wrote:
> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
> …SNIP…
On 26/08/17 12:19, Martin Steigerwald wrote:
> Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1 didn´t
> work and so on… so it may easily be a completely different issue.
>
> I did not see much point in reporting it so far… as I have no idea on how to
> reliably pin-point the issue. It happens once every few days, so a bisect
> again is out of questions – (it is anyway for a production machine for me) –,
> it appears to be a hard freeze, so no debug data… its one of these "you don´t
> get to debug me" hangs again. I really have no idea how to a get hold on such
> complexity. I am hoping to at least pin-point the exact kernel option that
> triggers this issue, but it may take weeks to do so. I´d really love a way for
> the kernel to at least to write out debug data before doing hanging
> completely.
This sounds like what I'm getting - SuSE 42.3, no raid, doesn't happen
when the power lead is in (but I think the laptop is configured not to
suspend when powered, precisely to avoid exactly this).
It happens to me quite often, unfortunately, but it seems a KDE issue in
that applications work fine, UNTIL KDE seems to get control of the mouse
at which point I can't do anything.
I've got a feeling it's related to wireless networking actually, my
setup is somewhat borked because KDE, systemd, and wifi don't seem to
work nicely together :-(
Cheers,
Wol
Wols Lists - 26.08.17, 18:17:
> On 26/08/17 12:19, Martin Steigerwald wrote:
> > Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1
> > didn´t
> > work and so on… so it may easily be a completely different issue.
> >
> > I did not see much point in reporting it so far… as I have no idea on how
> > to reliably pin-point the issue. It happens once every few days, so a
> > bisect again is out of questions – (it is anyway for a production machine
> > for me) –, it appears to be a hard freeze, so no debug data… its one of
> > these "you don´t get to debug me" hangs again. I really have no idea how
> > to a get hold on such complexity. I am hoping to at least pin-point the
> > exact kernel option that triggers this issue, but it may take weeks to do
> > so. I´d really love a way for the kernel to at least to write out debug
> > data before doing hanging completely.
>
> This sounds like what I'm getting - SuSE 42.3, no raid, doesn't happen
> when the power lead is in (but I think the laptop is configured not to
> suspend when powered, precisely to avoid exactly this).
>
> It happens to me quite often, unfortunately, but it seems a KDE issue in
> that applications work fine, UNTIL KDE seems to get control of the mouse
> at which point I can't do anything.
>
> I've got a feeling it's related to wireless networking actually, my
> setup is somewhat borked because KDE, systemd, and wifi don't seem to
> work nicely together :-(
Well my issue isn´t related to wireless networking as my laptop at home is
using good old cable based ethernet. I do use Plasma tough. The issue of the
original poster appears to be blk-mq related.
As written, I have no idea whether my issue is related to blk-mq in any way.
Thats the next step I´d test. If the current kernel runs stable… then I´d
enable blk-mq by default again and test whether the hangs reappear. Unless I
can pinpoint the issue to blk-mq… I have nothing further to comment here, I
think. (And I didn´t intend to use the thread for every hang on resume issue
that currently may exist. I just don´t know yet whether my issue is related to
blk-mq so I chimed in.)
Thanks,
--
Martin
On Sat, Aug 26, 2017 at 12:48:01PM +0200, Oleksandr Natalenko wrote:
> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
BTW, given it hangs during resume, it isn't easy to collect debug
info, and there should have been lots useful info there.
You mentioned that you can reproduce it on QEMU, could you
share the exact raid10 setting? such as which disks behind
the raid10.
Or can you reproduce the issue without raid10 involved?
I will try to reproduce it in my VM if you may provide the
above info.
Thanks,
Ming
Hi.
Here is disk setup for QEMU VM:
===
[root@archmq ~]# smartctl -i /dev/sda
…
Device Model: QEMU HARDDISK
Serial Number: QM00001
Firmware Version: 2.5+
User Capacity: 4,294,967,296 bytes [4.29 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
Local Time is: Sun Aug 27 09:31:54 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
[root@archmq ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
`-sda1 8:1 0 4G 0 part
`-md0 9:0 0 4G 0 raid10
`-system 253:0 0 4G 0 crypt
|-system-boot 253:1 0 512M 0 lvm /boot
|-system-swap 253:2 0 512M 0 lvm [SWAP]
`-system-root 253:3 0 3G 0 lvm /
sdb 8:16 0 4G 0 disk
`-sdb1 8:17 0 4G 0 part
`-md0 9:0 0 4G 0 raid10
`-system 253:0 0 4G 0 crypt
|-system-boot 253:1 0 512M 0 lvm /boot
|-system-swap 253:2 0 512M 0 lvm [SWAP]
`-system-root 253:3 0 3G 0 lvm /
sr0 11:0 1 1024M 0 rom
[root@archmq ~]# mdadm --misc --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sat Jul 29 16:37:05 2017
Raid Level : raid10
Array Size : 4191232 (4.00 GiB 4.29 GB)
Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Aug 27 09:30:33 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : far=2
Chunk Size : 512K
Name : archiso:0
UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
Events : 485
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
===
In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it, then
LVM, then ext4 for boot, swap and btrfs for /.
I couldn't reproduce the issue with single disk without RAID.
On neděle 27. srpna 2017 8:02:00 CEST Ming Lei wrote:
> On Sat, Aug 26, 2017 at 12:48:01PM +0200, Oleksandr Natalenko wrote:
> > Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
>
> BTW, given it hangs during resume, it isn't easy to collect debug
> info, and there should have been lots useful info there.
>
> You mentioned that you can reproduce it on QEMU, could you
> share the exact raid10 setting? such as which disks behind
> the raid10.
>
> Or can you reproduce the issue without raid10 involved?
>
> I will try to reproduce it in my VM if you may provide the
> above info.
>
> Thanks,
> Ming
On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> Hi.
>
> Here is disk setup for QEMU VM:
>
> ===
> [root@archmq ~]# smartctl -i /dev/sda
> …
> Device Model: QEMU HARDDISK
> Serial Number: QM00001
> Firmware Version: 2.5+
> User Capacity: 4,294,967,296 bytes [4.29 GB]
> Sector Size: 512 bytes logical/physical
> Device is: Not in smartctl database [for details use: -P showall]
> ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> Local Time is: Sun Aug 27 09:31:54 2017 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> [root@archmq ~]# lsblk
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda 8:0 0 4G 0 disk
> `-sda1 8:1 0 4G 0 part
> `-md0 9:0 0 4G 0 raid10
> `-system 253:0 0 4G 0 crypt
> |-system-boot 253:1 0 512M 0 lvm /boot
> |-system-swap 253:2 0 512M 0 lvm [SWAP]
> `-system-root 253:3 0 3G 0 lvm /
> sdb 8:16 0 4G 0 disk
> `-sdb1 8:17 0 4G 0 part
> `-md0 9:0 0 4G 0 raid10
> `-system 253:0 0 4G 0 crypt
> |-system-boot 253:1 0 512M 0 lvm /boot
> |-system-swap 253:2 0 512M 0 lvm [SWAP]
> `-system-root 253:3 0 3G 0 lvm /
> sr0 11:0 1 1024M 0 rom
>
> [root@archmq ~]# mdadm --misc --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Jul 29 16:37:05 2017
> Raid Level : raid10
> Array Size : 4191232 (4.00 GiB 4.29 GB)
> Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Sun Aug 27 09:30:33 2017
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : far=2
> Chunk Size : 512K
>
> Name : archiso:0
> UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> Events : 485
>
> Number Major Minor RaidDevice State
> 0 8 1 0 active sync /dev/sda1
> 1 8 17 1 active sync /dev/sdb1
> ===
>
> In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it, then
> LVM, then ext4 for boot, swap and btrfs for /.
>
> I couldn't reproduce the issue with single disk without RAID.
Could you verify if the following patch fixes your issue?
>From 9fa53d708ebc1d5b87e62e542dc54272529da244 Mon Sep 17 00:00:00 2001
From: Ming Lei <[email protected]>
Date: Mon, 28 Aug 2017 19:59:08 +0800
Subject: [PATCH] blk-mq: align to legacy path for implementing blk_execute_rq
In legacy path, when one request is run via blk_execute_rq(),
it is added to q->queue_head directly, and I/O scheduler's
queue is bypassed because either merging or sorting isn't
needed.
When SCSI device is put into quiece state, such as during
system suspend, we need to add the request of RQF_PM into
head of the queue.
This patch fixes I/O hang after system resume.
Reported-by: Oleksandr Natalenko <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
block/blk-core.c | 2 +-
block/blk-exec.c | 2 +-
block/blk-flush.c | 2 +-
block/blk-mq-sched.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
block/blk-mq-sched.h | 2 ++
5 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index dbecbf4a64e0..fb75bc646ebc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2330,7 +2330,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
if (q->mq_ops) {
if (blk_queue_io_stat(q))
blk_account_io_start(rq, true);
- blk_mq_sched_insert_request(rq, false, true, false, false);
+ blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
return BLK_STS_OK;
}
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 5c0f3dc446dc..4565aa6bb624 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -61,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
* be reused after dying flag is set
*/
if (q->mq_ops) {
- blk_mq_sched_insert_request(rq, at_head, true, false, false);
+ blk_mq_sched_insert_request_bypass(rq, at_head, true, false, false);
return;
}
diff --git a/block/blk-flush.c b/block/blk-flush.c
index ed5fe322abba..51e89e5c525a 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops)
- blk_mq_sched_insert_request(rq, false, true, false, false);
+ blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
else
list_add_tail(&rq->queuelist, &q->queue_head);
return;
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..eeeea026fb47 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -354,6 +354,64 @@ static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
blk_mq_add_to_requeue_list(rq, false, true);
}
+static void blk_mq_flush_hctx(struct blk_mq_hw_ctx *hctx,
+ struct elevator_queue *e,
+ const bool has_sched_dispatch,
+ struct list_head *rqs)
+{
+ LIST_HEAD(list);
+
+ if (!has_sched_dispatch)
+ blk_mq_flush_busy_ctxs(hctx, &list);
+ else {
+ while (true) {
+ struct request *rq;
+
+ rq = e->type->ops.mq.dispatch_request(hctx);
+ if (!rq)
+ break;
+ list_add_tail(&rq->queuelist, &list);
+ }
+ }
+
+ list_splice_tail(&list, rqs);
+}
+
+void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
+ bool run_queue, bool async,
+ bool can_block)
+{
+ struct request_queue *q = rq->q;
+ struct elevator_queue *e = q->elevator;
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
+ struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+ LIST_HEAD(list);
+ const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
+
+ if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+ blk_mq_sched_insert_flush(hctx, rq, can_block);
+ return;
+ }
+
+ if (at_head)
+ list_add_tail(&rq->queuelist, &list);
+ else {
+ blk_mq_flush_hctx(hctx, e, has_sched_dispatch, &list);
+ list_add_tail(&rq->queuelist, &list);
+ run_queue = true;
+ }
+
+ spin_lock(&hctx->lock);
+ if (at_head)
+ list_splice(&list, &hctx->dispatch);
+ else
+ list_splice_tail(&list, &hctx->dispatch);
+ spin_unlock(&hctx->lock);
+
+ if (run_queue)
+ blk_mq_run_hw_queue(hctx, async);
+}
+
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async, bool can_block)
{
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..4d01697a627f 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -18,6 +18,8 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async, bool can_block);
+void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
+ bool run_queue, bool async, bool can_block);
void blk_mq_sched_insert_requests(struct request_queue *q,
struct blk_mq_ctx *ctx,
struct list_head *list, bool run_queue_async);
--
2.9.5
--
Ming
Ming Lei - 28.08.17, 20:58:
> On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > Hi.
> >
> > Here is disk setup for QEMU VM:
> >
> > ===
> > [root@archmq ~]# smartctl -i /dev/sda
> > …
> > Device Model: QEMU HARDDISK
> > Serial Number: QM00001
> > Firmware Version: 2.5+
> > User Capacity: 4,294,967,296 bytes [4.29 GB]
> > Sector Size: 512 bytes logical/physical
> > Device is: Not in smartctl database [for details use: -P showall]
> > ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> > Local Time is: Sun Aug 27 09:31:54 2017 CEST
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > [root@archmq ~]# lsblk
> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > sda 8:0 0 4G 0 disk
> > `-sda1 8:1 0 4G 0 part
> >
> > `-md0 9:0 0 4G 0 raid10
> >
> > `-system 253:0 0 4G 0 crypt
> >
> > |-system-boot 253:1 0 512M 0 lvm /boot
> > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> >
> > `-system-root 253:3 0 3G 0 lvm /
> >
> > sdb 8:16 0 4G 0 disk
> > `-sdb1 8:17 0 4G 0 part
> >
> > `-md0 9:0 0 4G 0 raid10
> >
> > `-system 253:0 0 4G 0 crypt
> >
> > |-system-boot 253:1 0 512M 0 lvm /boot
> > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> >
> > `-system-root 253:3 0 3G 0 lvm /
> >
> > sr0 11:0 1 1024M 0 rom
> >
> > [root@archmq ~]# mdadm --misc --detail /dev/md0
> >
> > /dev/md0:
> > Version : 1.2
> >
> > Creation Time : Sat Jul 29 16:37:05 2017
> >
> > Raid Level : raid10
> > Array Size : 4191232 (4.00 GiB 4.29 GB)
> >
> > Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> >
> > Raid Devices : 2
> >
> > Total Devices : 2
> >
> > Persistence : Superblock is persistent
> >
> > Update Time : Sun Aug 27 09:30:33 2017
> >
> > State : clean
> >
> > Active Devices : 2
> >
> > Working Devices : 2
> >
> > Failed Devices : 0
> >
> > Spare Devices : 0
> >
> > Layout : far=2
> >
> > Chunk Size : 512K
> >
> > Name : archiso:0
> > UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> >
> > Events : 485
> >
> > Number Major Minor RaidDevice State
> >
> > 0 8 1 0 active sync /dev/sda1
> > 1 8 17 1 active sync /dev/sdb1
> >
> > ===
> >
> > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > then
> > LVM, then ext4 for boot, swap and btrfs for /.
> >
> > I couldn't reproduce the issue with single disk without RAID.
>
> Could you verify if the following patch fixes your issue?
Could this also apply to non MD RAID systems? I am using BTRFS RAID
1 with two SSDs. So far with CFQ it runs stable.
Thanks,
Martin
> From 9fa53d708ebc1d5b87e62e542dc54272529da244 Mon Sep 17 00:00:00 2001
> From: Ming Lei <[email protected]>
> Date: Mon, 28 Aug 2017 19:59:08 +0800
> Subject: [PATCH] blk-mq: align to legacy path for implementing
> blk_execute_rq
>
> In legacy path, when one request is run via blk_execute_rq(),
> it is added to q->queue_head directly, and I/O scheduler's
> queue is bypassed because either merging or sorting isn't
> needed.
>
> When SCSI device is put into quiece state, such as during
> system suspend, we need to add the request of RQF_PM into
> head of the queue.
>
> This patch fixes I/O hang after system resume.
>
> Reported-by: Oleksandr Natalenko <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> block/blk-core.c | 2 +-
> block/blk-exec.c | 2 +-
> block/blk-flush.c | 2 +-
> block/blk-mq-sched.c | 58
> ++++++++++++++++++++++++++++++++++++++++++++++++++++ block/blk-mq-sched.h |
> 2 ++
> 5 files changed, 63 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index dbecbf4a64e0..fb75bc646ebc 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2330,7 +2330,7 @@ blk_status_t blk_insert_cloned_request(struct
> request_queue *q, struct request * if (q->mq_ops) {
> if (blk_queue_io_stat(q))
> blk_account_io_start(rq, true);
> - blk_mq_sched_insert_request(rq, false, true, false, false);
> + blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
> return BLK_STS_OK;
> }
>
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index 5c0f3dc446dc..4565aa6bb624 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -61,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct
> gendisk *bd_disk, * be reused after dying flag is set
> */
> if (q->mq_ops) {
> - blk_mq_sched_insert_request(rq, at_head, true, false, false);
> + blk_mq_sched_insert_request_bypass(rq, at_head, true, false, false);
> return;
> }
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index ed5fe322abba..51e89e5c525a 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
> if ((policy & REQ_FSEQ_DATA) &&
> !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
> if (q->mq_ops)
> - blk_mq_sched_insert_request(rq, false, true, false, false);
> + blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
> else
> list_add_tail(&rq->queuelist, &q->queue_head);
> return;
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4ab69435708c..eeeea026fb47 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -354,6 +354,64 @@ static void blk_mq_sched_insert_flush(struct
> blk_mq_hw_ctx *hctx, blk_mq_add_to_requeue_list(rq, false, true);
> }
>
> +static void blk_mq_flush_hctx(struct blk_mq_hw_ctx *hctx,
> + struct elevator_queue *e,
> + const bool has_sched_dispatch,
> + struct list_head *rqs)
> +{
> + LIST_HEAD(list);
> +
> + if (!has_sched_dispatch)
> + blk_mq_flush_busy_ctxs(hctx, &list);
> + else {
> + while (true) {
> + struct request *rq;
> +
> + rq = e->type->ops.mq.dispatch_request(hctx);
> + if (!rq)
> + break;
> + list_add_tail(&rq->queuelist, &list);
> + }
> + }
> +
> + list_splice_tail(&list, rqs);
> +}
> +
> +void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
> + bool run_queue, bool async,
> + bool can_block)
> +{
> + struct request_queue *q = rq->q;
> + struct elevator_queue *e = q->elevator;
> + struct blk_mq_ctx *ctx = rq->mq_ctx;
> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
> + LIST_HEAD(list);
> + const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> +
> + if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
> + blk_mq_sched_insert_flush(hctx, rq, can_block);
> + return;
> + }
> +
> + if (at_head)
> + list_add_tail(&rq->queuelist, &list);
> + else {
> + blk_mq_flush_hctx(hctx, e, has_sched_dispatch, &list);
> + list_add_tail(&rq->queuelist, &list);
> + run_queue = true;
> + }
> +
> + spin_lock(&hctx->lock);
> + if (at_head)
> + list_splice(&list, &hctx->dispatch);
> + else
> + list_splice_tail(&list, &hctx->dispatch);
> + spin_unlock(&hctx->lock);
> +
> + if (run_queue)
> + blk_mq_run_hw_queue(hctx, async);
> +}
> +
> void blk_mq_sched_insert_request(struct request *rq, bool at_head,
> bool run_queue, bool async, bool can_block)
> {
> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index 9267d0b7c197..4d01697a627f 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -18,6 +18,8 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
>
> void blk_mq_sched_insert_request(struct request *rq, bool at_head,
> bool run_queue, bool async, bool can_block);
> +void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
> + bool run_queue, bool async, bool can_block);
> void blk_mq_sched_insert_requests(struct request_queue *q,
> struct blk_mq_ctx *ctx,
> struct list_head *list, bool run_queue_async);
--
Martin
On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> Ming Lei - 28.08.17, 20:58:
> > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > Hi.
> > >
> > > Here is disk setup for QEMU VM:
> > >
> > > ===
> > > [root@archmq ~]# smartctl -i /dev/sda
> > > …
> > > Device Model: QEMU HARDDISK
> > > Serial Number: QM00001
> > > Firmware Version: 2.5+
> > > User Capacity: 4,294,967,296 bytes [4.29 GB]
> > > Sector Size: 512 bytes logical/physical
> > > Device is: Not in smartctl database [for details use: -P showall]
> > > ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> > > Local Time is: Sun Aug 27 09:31:54 2017 CEST
> > > SMART support is: Available - device has SMART capability.
> > > SMART support is: Enabled
> > >
> > > [root@archmq ~]# lsblk
> > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > > sda 8:0 0 4G 0 disk
> > > `-sda1 8:1 0 4G 0 part
> > >
> > > `-md0 9:0 0 4G 0 raid10
> > >
> > > `-system 253:0 0 4G 0 crypt
> > >
> > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > >
> > > `-system-root 253:3 0 3G 0 lvm /
> > >
> > > sdb 8:16 0 4G 0 disk
> > > `-sdb1 8:17 0 4G 0 part
> > >
> > > `-md0 9:0 0 4G 0 raid10
> > >
> > > `-system 253:0 0 4G 0 crypt
> > >
> > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > >
> > > `-system-root 253:3 0 3G 0 lvm /
> > >
> > > sr0 11:0 1 1024M 0 rom
> > >
> > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > >
> > > /dev/md0:
> > > Version : 1.2
> > >
> > > Creation Time : Sat Jul 29 16:37:05 2017
> > >
> > > Raid Level : raid10
> > > Array Size : 4191232 (4.00 GiB 4.29 GB)
> > >
> > > Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > >
> > > Raid Devices : 2
> > >
> > > Total Devices : 2
> > >
> > > Persistence : Superblock is persistent
> > >
> > > Update Time : Sun Aug 27 09:30:33 2017
> > >
> > > State : clean
> > >
> > > Active Devices : 2
> > >
> > > Working Devices : 2
> > >
> > > Failed Devices : 0
> > >
> > > Spare Devices : 0
> > >
> > > Layout : far=2
> > >
> > > Chunk Size : 512K
> > >
> > > Name : archiso:0
> > > UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > >
> > > Events : 485
> > >
> > > Number Major Minor RaidDevice State
> > >
> > > 0 8 1 0 active sync /dev/sda1
> > > 1 8 17 1 active sync /dev/sdb1
> > >
> > > ===
> > >
> > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > then
> > > LVM, then ext4 for boot, swap and btrfs for /.
> > >
> > > I couldn't reproduce the issue with single disk without RAID.
> >
> > Could you verify if the following patch fixes your issue?
>
> Could this also apply to non MD RAID systems? I am using BTRFS RAID
> 1 with two SSDs. So far with CFQ it runs stable.
It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
--
Ming
Hi.
On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> Could you verify if the following patch fixes your issue?
> …SNIP…
I've applied it to v4.12.9 and rechecked — the issue is still there,
unfortunately. Stacktrace is the same as before.
Were you able to reproduce it in a VM?
Should I re-check it with v4.13-rc7?
Any other suggestions?
Thanks.
On Mon, Aug 28, 2017 at 08:22:26PM +0200, Oleksandr Natalenko wrote:
> Hi.
>
> On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> > Could you verify if the following patch fixes your issue?
> > …SNIP…
>
> I've applied it to v4.12.9 and rechecked — the issue is still there,
> unfortunately. Stacktrace is the same as before.
>
> Were you able to reproduce it in a VM?
Yes, I can.
>
> Should I re-check it with v4.13-rc7?
>
> Any other suggestions?
Please test it with v4.13-rc7 first.
--
Ming
Hello.
Re-tested with v4.13-rc7 + proposed patch and got the same result.
Let me know if any additional testing is needed.
===
[ 82.638148] INFO: task md0_raid10:193 blocked for more than 20 seconds.
[ 82.642804] Not tainted 4.13.0-pf1 #1
[ 82.646998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 82.649007] md0_raid10 D 0 193 2 0x00000000
[ 82.650799] Call Trace:
[ 82.652118] __schedule+0x239/0x890
[ 82.653469] schedule+0x3d/0x90
[ 82.654649] md_super_wait+0x6e/0xa0 [md_mod]
[ 82.656186] ? wait_woken+0x80/0x80
[ 82.657333] md_update_sb.part.59+0x3df/0x840 [md_mod]
[ 82.659084] ? percpu_ref_switch_to_percpu+0x36/0x40
[ 82.660477] md_check_recovery+0x453/0x520 [md_mod]
[ 82.662125] raid10d+0x62/0x1420 [raid10]
[ 82.663457] ? __schedule+0x241/0x890
[ 82.664766] ? schedule+0x3d/0x90
[ 82.665816] ? schedule_timeout+0x208/0x390
[ 82.666785] md_thread+0x120/0x160 [md_mod]
[ 82.668117] ? md_thread+0x120/0x160 [md_mod]
[ 82.669487] ? wait_woken+0x80/0x80
[ 82.670531] kthread+0x125/0x140
[ 82.671602] ? find_pers+0x70/0x70 [md_mod]
[ 82.672452] ? kthread_create_on_node+0x70/0x70
[ 82.673844] ret_from_fork+0x25/0x30
[ 82.674991] INFO: task dmcrypt_write:226 blocked for more than 20 seconds.
[ 82.678246] Not tainted 4.13.0-pf1 #1
[ 82.679336] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 82.681175] dmcrypt_write D 0 226 2 0x00000000
[ 82.682522] Call Trace:
[ 82.683104] __schedule+0x239/0x890
[ 82.684099] schedule+0x3d/0x90
[ 82.684754] md_write_start+0xe3/0x270 [md_mod]
[ 82.685599] ? wait_woken+0x80/0x80
[ 82.686273] raid10_make_request+0x3f/0x140 [raid10]
[ 82.687258] md_make_request+0xa2/0x290 [md_mod]
[ 82.688132] ? _raw_spin_unlock_irq+0x10/0x30
[ 82.689660] ? finish_task_switch+0x75/0x200
[ 82.690624] generic_make_request+0x125/0x320
[ 82.691722] dmcrypt_write+0x22d/0x250 [dm_crypt]
[ 82.693120] ? dmcrypt_write+0x22d/0x250 [dm_crypt]
[ 82.694760] ? wake_up_q+0x80/0x80
[ 82.695752] kthread+0x125/0x140
[ 82.696831] ? kthread+0x125/0x140
[ 82.697964] ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
[ 82.699340] ? kthread_create_on_node+0x70/0x70
[ 82.700728] ret_from_fork+0x25/0x30
[ 82.702378] INFO: task NetworkManager:432 blocked for more than 20 seconds.
[ 82.704022] Not tainted 4.13.0-pf1 #1
[ 82.705264] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 82.709068] NetworkManager D 0 432 1 0x00000000
[ 82.710189] Call Trace:
[ 82.711041] __schedule+0x239/0x890
[ 82.712162] schedule+0x3d/0x90
[ 82.713223] io_schedule+0x16/0x40
[ 82.714324] wait_on_page_bit_common+0xe7/0x170
[ 82.715644] ? page_cache_tree_insert+0xc0/0xc0
[ 82.717008] __filemap_fdatawait_range+0x10d/0x170
[ 82.718562] ? __filemap_fdatawrite_range+0xc1/0x100
[ 82.720103] ? __filemap_fdatawrite_range+0xcd/0x100
[ 82.721561] file_write_and_wait_range+0x78/0xa0
[ 82.722839] xfs_file_fsync+0x5f/0x210 [xfs]
[ 82.724085] vfs_fsync_range+0x4b/0xb0
[ 82.725152] do_fsync+0x3d/0x70
[ 82.726535] SyS_fsync+0x10/0x20
[ 82.727953] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 82.729196] RIP: 0033:0x7fcd816f6c8d
[ 82.730336] RSP: 002b:00007ffd4650d4d0 EFLAGS: 00000293 ORIG_RAX:
000000000000004a
[ 82.732192] RAX: ffffffffffffffda RBX: 0000000000000013 RCX:
00007fcd816f6c8d
[ 82.733922] RDX: 00007ffd4650d4f0 RSI: 00007ffd4650d4f0 RDI:
0000000000000013
[ 82.735313] RBP: 000056436710ff00 R08: 0000000000000000 R09:
00000000313d26e9
[ 82.738973] R10: 000000000000003d R11: 0000000000000293 R12:
00007ffd4650d638
[ 82.740531] R13: 0000000000000001 R14: 0000000000000002 R15:
00005643670eb1c0
[ 82.741869] INFO: task sync:676 blocked for more than 20 seconds.
[ 82.743116] Not tainted 4.13.0-pf1 #1
[ 82.744286] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 82.745935] sync D 0 676 542 0x00000000
[ 82.747338] Call Trace:
[ 82.748281] __schedule+0x239/0x890
[ 82.749421] schedule+0x3d/0x90
[ 82.750458] io_schedule+0x16/0x40
[ 82.751420] wait_on_page_bit_common+0xe7/0x170
[ 82.752668] ? page_cache_tree_insert+0xc0/0xc0
[ 82.753873] __filemap_fdatawait_range+0x10d/0x170
[ 82.755153] ? finish_wait+0x56/0x70
[ 82.756226] filemap_fdatawait_keep_errors+0x27/0x50
[ 82.757568] sync_inodes_sb+0x204/0x2a0
[ 82.758788] ? SyS_tee+0x3d0/0x3d0
[ 82.759926] sync_inodes_one_sb+0x16/0x20
[ 82.761152] iterate_supers+0x94/0x100
[ 82.762419] sys_sync+0x44/0xb0
[ 82.763499] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 82.764495] RIP: 0033:0x7fd85d7941d7
[ 82.765736] RSP: 002b:00007ffe648d4f48 EFLAGS: 00000206 ORIG_RAX:
00000000000000a2
[ 82.769068] RAX: ffffffffffffffda RBX: 0000000000000006 RCX:
00007fd85d7941d7
[ 82.770663] RDX: 00007fd85da50e01 RSI: 0000000000000000 RDI:
00007fd85d8197d3
[ 82.772211] RBP: 0000000000002710 R08: 0000000000000000 R09:
0000000000000000
[ 82.774138] R10: 000000000000082c R11: 0000000000000206 R12:
00007fd85da4ead8
[ 82.776430] R13: 0000000000000030 R14: 0000000000c6cfd0 R15:
00007fd85da4ea80
===
On úterý 29. srpna 2017 2:24:25 CEST Ming Lei wrote:
> On Mon, Aug 28, 2017 at 08:22:26PM +0200, Oleksandr Natalenko wrote:
> > Hi.
> >
> > On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> > > Could you verify if the following patch fixes your issue?
> > > …SNIP…
> >
> > I've applied it to v4.12.9 and rechecked — the issue is still there,
> > unfortunately. Stacktrace is the same as before.
> >
> > Were you able to reproduce it in a VM?
>
> Yes, I can.
>
> > Should I re-check it with v4.13-rc7?
> >
> > Any other suggestions?
>
> Please test it with v4.13-rc7 first.
Hi,
On Tue, Aug 29, 2017 at 05:52:42PM +0200, Oleksandr Natalenko wrote:
> Hello.
>
> Re-tested with v4.13-rc7 + proposed patch and got the same result.
Maybe there is another issue, I didn't use dmcrypt on raid10, will
test in your way to see if I can reproduce it.
BTW, could you share us which blk-mq scheduler you are using on sata?
The patch I posted should address one issue on none scheduler.
--
Ming
On Wed, Aug 30, 2017 at 10:15:37AM +0800, Ming Lei wrote:
> Hi,
>
> On Tue, Aug 29, 2017 at 05:52:42PM +0200, Oleksandr Natalenko wrote:
> > Hello.
> >
> > Re-tested with v4.13-rc7 + proposed patch and got the same result.
>
> Maybe there is another issue, I didn't use dmcrypt on raid10, will
> test in your way to see if I can reproduce it.
>
> BTW, could you share us which blk-mq scheduler you are using on sata?
> The patch I posted should address one issue on none scheduler.
Can't reproduce even with putting dmcypt on raid10 after applying
my patch.
Could you apply the following debug patch and provide the dmesg log
after running the commands below?
# echo 9 > /proc/sys/kernel/printk
# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
BTW, it is better to provide the two sata disk(behind raid10) name.
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index b44c1bb687a2..75b13248ea1c 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -53,17 +53,22 @@ static int scsi_dev_type_suspend(struct device *dev,
{
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
int err;
+ struct scsi_device *sdev = to_scsi_device(dev);
/* flush pending in-flight resume operations, suspend is synchronous */
async_synchronize_full_domain(&scsi_sd_pm_domain);
- err = scsi_device_quiesce(to_scsi_device(dev));
+ sdev_printk(KERN_WARNING, sdev, "%s: enter\n", __func__);
+ err = scsi_device_quiesce(sdev);
if (err == 0) {
+ sdev_printk(KERN_WARNING, sdev, "%s: before suspend\n", __func__);
err = cb(dev, pm);
+ sdev_printk(KERN_WARNING, sdev, "%s: after suspend\n", __func__);
if (err)
- scsi_device_resume(to_scsi_device(dev));
+ scsi_device_resume(sdev);
}
dev_dbg(dev, "scsi suspend: %d\n", err);
+ sdev_printk(KERN_WARNING, sdev, "%s: exit\n", __func__);
return err;
}
@@ -72,9 +77,13 @@ static int scsi_dev_type_resume(struct device *dev,
{
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
int err = 0;
+ struct scsi_device *sdev = to_scsi_device(dev);
+ sdev_printk(KERN_WARNING, sdev, "%s: enter\n", __func__);
+ sdev_printk(KERN_WARNING, sdev, "%s: before resume\n", __func__);
err = cb(dev, pm);
- scsi_device_resume(to_scsi_device(dev));
+ sdev_printk(KERN_WARNING, sdev, "%s: after resume\n", __func__);
+ scsi_device_resume(sdev);
dev_dbg(dev, "scsi resume: %d\n", err);
if (err == 0) {
@@ -83,6 +92,7 @@ static int scsi_dev_type_resume(struct device *dev,
pm_runtime_enable(dev);
}
+ sdev_printk(KERN_WARNING, sdev, "%s: exit\n", __func__);
return err;
}
--
Ming
Hello.
Addressing your questions below.
> Can't reproduce even with putting dmcypt on raid10 after applying my
> patch.
Just a side note, that dm-crypt is not necessary here — I am able to
trigger hang with RAID10 and LVM only.
> BTW, could you share us which blk-mq scheduler you are using on sata?
Okay, now it makes sense. Previously, without your patch I was able to
reproduce the issue with "mq-deadline", "bfq" and "none". Now, I cannot
trigger it with "none" set, but "mq-deadline" still hangs for me. Does
this mean each scheduler should be modified separately?
> Could you apply the following debug patch and provide the dmesg log
> after running the commands below?
Is it still relevant since I confirm issue to be fixed with "none"?
Thanks.
30.08.2017 07:17, Ming Lei написав:
> …SNIP…
> Can't reproduce even with putting dmcypt on raid10 after applying
> my patch.
> …SNIP…
Hi,
On Wed, Aug 30, 2017 at 08:15:02AM +0200, [email protected] wrote:
> Hello.
>
> Addressing your questions below.
>
> > Can't reproduce even with putting dmcypt on raid10 after applying my
> > patch.
>
> Just a side note, that dm-crypt is not necessary here — I am able to trigger
> hang with RAID10 and LVM only.
>
> > BTW, could you share us which blk-mq scheduler you are using on sata?
>
> Okay, now it makes sense. Previously, without your patch I was able to
> reproduce the issue with "mq-deadline", "bfq" and "none". Now, I cannot
> trigger it with "none" set, but "mq-deadline" still hangs for me. Does this
> mean each scheduler should be modified separately?
No, it shouldn't.
>
> > Could you apply the following debug patch and provide the dmesg log
> > after running the commands below?
>
> Is it still relevant since I confirm issue to be fixed with "none"?
Please try the following patch and previous patch together and see
if they make a difference:
>From bc5626b4b65c7ff26567e75f42584c2c43ffe7c1 Mon Sep 17 00:00:00 2001
From: Ming Lei <[email protected]>
Date: Wed, 30 Aug 2017 15:53:01 +0800
Subject: [PATCH] blk-mq: add requests in the tail of hctx->dispatch
So that we can allow to insert request at the head
of hctx->dispatch.
Signed-off-by: Ming Lei <[email protected]>
---
block/blk-mq-sched.c | 2 +-
block/blk-mq.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index eeeea026fb47..b40dd063d61f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
* the dispatch list.
*/
spin_lock(&hctx->lock);
- list_add(&rq->queuelist, &hctx->dispatch);
+ list_add_tail(&rq->queuelist, &hctx->dispatch);
spin_unlock(&hctx->lock);
return true;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4603b115e234..fed3d0c16266 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
blk_mq_put_driver_tag(rq);
spin_lock(&hctx->lock);
- list_splice_init(list, &hctx->dispatch);
+ list_splice_tail_init(list, &hctx->dispatch);
spin_unlock(&hctx->lock);
/*
--
2.9.5
--
Ming
Hi.
So, current summary:
1) first patch + debug patch: I can reproduce the issue in wild, but not
with pm_test set;
2) first patch + debug patch + second patch: I cannot reproduce issue at
all, neither with "none", nor with "mq-deadline".
Thus, "blk-mq: align to legacy path for implementing blk_execute_rq" +
"blk-mq: add requests in the tail of hctx->dispatch" look like a proper
fix. Thanks for that!
> I setup dm-crypt on /dev/md0 directly, could you show me how to setup
> lvm on raid10?
If still relevant, here is a layout:
===
sda 8:0 0 8G 0 disk
|-sda1 8:1 0 128M 0 part /boot/EFI
`-sda2 8:2 0 7.9G 0 part
`-md0 9:0 0 7.9G 0 raid10
`-system-root 253:0 0 7.9G 0 lvm /
sdb 8:16 0 8G 0 disk
|-sdb1 8:17 0 128M 0 part
`-sdb2 8:18 0 7.9G 0 part
`-md0 9:0 0 7.9G 0 raid10
`-system-root 253:0 0 7.9G 0 lvm /
===
Anything else you'd like /me to test?
30.08.2017 10:06, Ming Lei написав:
> …SNIP…
> Please try the following patch and previous patch together and see
> if they make a difference:
> …SNIP…
Hi,
On Wed, Aug 30, 2017 at 12:58:21PM +0200, [email protected] wrote:
> Hi.
>
> So, current summary:
>
> 1) first patch + debug patch: I can reproduce the issue in wild, but not
> with pm_test set;
That is interesting, since I always test via pm_test.
> 2) first patch + debug patch + second patch: I cannot reproduce issue at
> all, neither with "none", nor with "mq-deadline".
>
> Thus, "blk-mq: align to legacy path for implementing blk_execute_rq" +
> "blk-mq: add requests in the tail of hctx->dispatch" look like a proper fix.
> Thanks for that!
That is great, will prepare a formal one for merge.
>
> > I setup dm-crypt on /dev/md0 directly, could you show me how to setup
> > lvm on raid10?
>
> If still relevant, here is a layout:
Looks not relevant, just because I use pm_test.
>
> ===
> sda 8:0 0 8G 0 disk
> |-sda1 8:1 0 128M 0 part /boot/EFI
> `-sda2 8:2 0 7.9G 0 part
> `-md0 9:0 0 7.9G 0 raid10
> `-system-root 253:0 0 7.9G 0 lvm /
> sdb 8:16 0 8G 0 disk
> |-sdb1 8:17 0 128M 0 part
> `-sdb2 8:18 0 7.9G 0 part
> `-md0 9:0 0 7.9G 0 raid10
> `-system-root 253:0 0 7.9G 0 lvm /
> ===
>
> Anything else you'd like /me to test?
You have done enough, thanks a lot for your report and test!
--
Ming
Ming Lei - 28.08.17, 21:32:
> On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 28.08.17, 20:58:
> > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > Hi.
> > > >
> > > > Here is disk setup for QEMU VM:
[…]
> > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > > then
> > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > >
> > > > I couldn't reproduce the issue with single disk without RAID.
> > >
> > > Could you verify if the following patch fixes your issue?
> >
> > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > 1 with two SSDs. So far with CFQ it runs stable.
>
> It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
My findings are different:
On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for Intel
Core/newer Xeon I see this:
1) Running with CFQ: No hang after resume
2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within first 1-2
days.
However with 4.12.9 with CONFIG_HZ=250, CONFIG_PREEMPT_VOLUNTARY=y + no CPU
optimizations: No hang after resume, no matter whether I use CFQ or scsi-mq +
BFQ.
Both configs attached.
Also 4.13.2 with CFQ is stable so far.
I am looking forward for fixes to appear in 4.13.x with x>2 and retest with BFQ
then, unless you have a different advice.
I am running Debian Sid/Experimental from BTRFS RAID 1 with two SSDs
merkaba:~> lsscsi | grep ATA
[0:0:0:0] disk ATA INTEL SSDSA2CW30 0362 /dev/sda
[2:0:0:0] disk ATA Crucial_CT480M50 MU03 /dev/sdb
on ThinkPad T520 (Sandybridge).
Thanks,
--
Martin
On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> Ming Lei - 28.08.17, 21:32:
> > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > Ming Lei - 28.08.17, 20:58:
> > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > Hi.
> > > > >
> > > > > Here is disk setup for QEMU VM:
> […]
> > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > > > then
> > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > >
> > > > > I couldn't reproduce the issue with single disk without RAID.
> > > >
> > > > Could you verify if the following patch fixes your issue?
> > >
> > > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > > 1 with two SSDs. So far with CFQ it runs stable.
> >
> > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
>
> My findings are different:
>
> On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for Intel
> Core/newer Xeon I see this:
>
> 1) Running with CFQ: No hang after resume
>
> 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within first 1-2
> days.
Hi Martin,
Thanks for your report!
Could you test the following patchset to see if it fixes your issue?
https://marc.info/?l=linux-block&m=150579298505484&w=2
--
Ming
On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> Ming Lei - 28.08.17, 20:58:
> > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > Hi.
> > >
> > > Here is disk setup for QEMU VM:
> > >
> > > ===
> > > [root@archmq ~]# smartctl -i /dev/sda
> > > …
> > > Device Model: QEMU HARDDISK
> > > Serial Number: QM00001
> > > Firmware Version: 2.5+
> > > User Capacity: 4,294,967,296 bytes [4.29 GB]
> > > Sector Size: 512 bytes logical/physical
> > > Device is: Not in smartctl database [for details use: -P showall]
> > > ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> > > Local Time is: Sun Aug 27 09:31:54 2017 CEST
> > > SMART support is: Available - device has SMART capability.
> > > SMART support is: Enabled
> > >
> > > [root@archmq ~]# lsblk
> > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > > sda 8:0 0 4G 0 disk
> > > `-sda1 8:1 0 4G 0 part
> > >
> > > `-md0 9:0 0 4G 0 raid10
> > >
> > > `-system 253:0 0 4G 0 crypt
> > >
> > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > >
> > > `-system-root 253:3 0 3G 0 lvm /
> > >
> > > sdb 8:16 0 4G 0 disk
> > > `-sdb1 8:17 0 4G 0 part
> > >
> > > `-md0 9:0 0 4G 0 raid10
> > >
> > > `-system 253:0 0 4G 0 crypt
> > >
> > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > >
> > > `-system-root 253:3 0 3G 0 lvm /
> > >
> > > sr0 11:0 1 1024M 0 rom
> > >
> > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > >
> > > /dev/md0:
> > > Version : 1.2
> > >
> > > Creation Time : Sat Jul 29 16:37:05 2017
> > >
> > > Raid Level : raid10
> > > Array Size : 4191232 (4.00 GiB 4.29 GB)
> > >
> > > Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > >
> > > Raid Devices : 2
> > >
> > > Total Devices : 2
> > >
> > > Persistence : Superblock is persistent
> > >
> > > Update Time : Sun Aug 27 09:30:33 2017
> > >
> > > State : clean
> > >
> > > Active Devices : 2
> > >
> > > Working Devices : 2
> > >
> > > Failed Devices : 0
> > >
> > > Spare Devices : 0
> > >
> > > Layout : far=2
> > >
> > > Chunk Size : 512K
> > >
> > > Name : archiso:0
> > > UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > >
> > > Events : 485
> > >
> > > Number Major Minor RaidDevice State
> > >
> > > 0 8 1 0 active sync /dev/sda1
> > > 1 8 17 1 active sync /dev/sdb1
> > >
> > > ===
> > >
> > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > then
> > > LVM, then ext4 for boot, swap and btrfs for /.
> > >
> > > I couldn't reproduce the issue with single disk without RAID.
> >
> > Could you verify if the following patch fixes your issue?
Yes, the patch should address this kind of issue, not related
with RAID specially, and the latest version can be found in the
following link:
https://marc.info/?l=linux-block&m=150579298505484&w=2
--
Ming
Ming Lei - 21.09.17, 06:20:
> On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 28.08.17, 20:58:
> > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > Hi.
> > > >
> > > > Here is disk setup for QEMU VM:
> > > >
> > > > ===
> > > > [root@archmq ~]# smartctl -i /dev/sda
> > > > …
> > > > Device Model: QEMU HARDDISK
> > > > Serial Number: QM00001
> > > > Firmware Version: 2.5+
> > > > User Capacity: 4,294,967,296 bytes [4.29 GB]
> > > > Sector Size: 512 bytes logical/physical
> > > > Device is: Not in smartctl database [for details use: -P
> > > > showall]
> > > > ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS
> > > > 340-2000
> > > > Local Time is: Sun Aug 27 09:31:54 2017 CEST
> > > > SMART support is: Available - device has SMART capability.
> > > > SMART support is: Enabled
> > > >
> > > > [root@archmq ~]# lsblk
> > > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > > > sda 8:0 0 4G 0 disk
> > > > `-sda1 8:1 0 4G 0 part
> > > >
> > > > `-md0 9:0 0 4G 0 raid10
> > > >
> > > > `-system 253:0 0 4G 0 crypt
> > > >
> > > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > > >
> > > > `-system-root 253:3 0 3G 0 lvm /
> > > >
> > > > sdb 8:16 0 4G 0 disk
> > > > `-sdb1 8:17 0 4G 0 part
> > > >
> > > > `-md0 9:0 0 4G 0 raid10
> > > >
> > > > `-system 253:0 0 4G 0 crypt
> > > >
> > > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > > >
> > > > `-system-root 253:3 0 3G 0 lvm /
> > > >
> > > > sr0 11:0 1 1024M 0 rom
> > > >
> > > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > > >
> > > > /dev/md0:
> > > > Version : 1.2
> > > >
> > > > Creation Time : Sat Jul 29 16:37:05 2017
> > > >
> > > > Raid Level : raid10
> > > > Array Size : 4191232 (4.00 GiB 4.29 GB)
> > > >
> > > > Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > > >
> > > > Raid Devices : 2
> > > >
> > > > Total Devices : 2
> > > >
> > > > Persistence : Superblock is persistent
> > > >
> > > > Update Time : Sun Aug 27 09:30:33 2017
> > > >
> > > > State : clean
> > > >
> > > > Active Devices : 2
> > > >
> > > > Working Devices : 2
> > > >
> > > > Failed Devices : 0
> > > >
> > > > Spare Devices : 0
> > > >
> > > > Layout : far=2
> > > >
> > > > Chunk Size : 512K
> > > >
> > > > Name : archiso:0
> > > > UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > > >
> > > > Events : 485
> > > >
> > > > Number Major Minor RaidDevice State
> > > >
> > > > 0 8 1 0 active sync /dev/sda1
> > > > 1 8 17 1 active sync /dev/sdb1
> > > >
> > > > ===
> > > >
> > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > > then
> > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > >
> > > > I couldn't reproduce the issue with single disk without RAID.
> > >
> > > Could you verify if the following patch fixes your issue?
>
> Yes, the patch should address this kind of issue, not related
> with RAID specially, and the latest version can be found in the
> following link:
>
> https://marc.info/?l=linux-block&m=150579298505484&w=2
Thank you.
So if I understand already I can just add
https://github.com/ming1/linux/tree/my_v4.13-safe-scsi-quiesce_V5_for_test
as an remote and go from there.
Thanks,
--
Martin
Martin Steigerwald - 21.09.17, 09:30:
> Ming Lei - 21.09.17, 06:20:
> > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > Ming Lei - 28.08.17, 20:58:
> > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > Hi.
> > > > >
> > > > > Here is disk setup for QEMU VM:
> > > > >
> > > > > ===
> > > > > [root@archmq ~]# smartctl -i /dev/sda
> > > > > …
> > > > > Device Model: QEMU HARDDISK
> > > > > Serial Number: QM00001
> > > > > Firmware Version: 2.5+
> > > > > User Capacity: 4,294,967,296 bytes [4.29 GB]
> > > > > Sector Size: 512 bytes logical/physical
> > > > > Device is: Not in smartctl database [for details use: -P
> > > > > showall]
> > > > > ATA Version is: ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS
> > > > > 340-2000
> > > > > Local Time is: Sun Aug 27 09:31:54 2017 CEST
> > > > > SMART support is: Available - device has SMART capability.
> > > > > SMART support is: Enabled
> > > > >
> > > > > [root@archmq ~]# lsblk
> > > > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > > > > sda 8:0 0 4G 0 disk
> > > > > `-sda1 8:1 0 4G 0 part
> > > > >
> > > > > `-md0 9:0 0 4G 0 raid10
> > > > >
> > > > > `-system 253:0 0 4G 0 crypt
> > > > >
> > > > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > > > >
> > > > > `-system-root 253:3 0 3G 0 lvm /
> > > > >
> > > > > sdb 8:16 0 4G 0 disk
> > > > > `-sdb1 8:17 0 4G 0 part
> > > > >
> > > > > `-md0 9:0 0 4G 0 raid10
> > > > >
> > > > > `-system 253:0 0 4G 0 crypt
> > > > >
> > > > > |-system-boot 253:1 0 512M 0 lvm /boot
> > > > > |-system-swap 253:2 0 512M 0 lvm [SWAP]
> > > > >
> > > > > `-system-root 253:3 0 3G 0 lvm /
> > > > >
> > > > > sr0 11:0 1 1024M 0 rom
> > > > >
> > > > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > > > >
> > > > > /dev/md0:
> > > > > Version : 1.2
> > > > >
> > > > > Creation Time : Sat Jul 29 16:37:05 2017
> > > > >
> > > > > Raid Level : raid10
> > > > > Array Size : 4191232 (4.00 GiB 4.29 GB)
> > > > >
> > > > > Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > > > >
> > > > > Raid Devices : 2
> > > > >
> > > > > Total Devices : 2
> > > > >
> > > > > Persistence : Superblock is persistent
> > > > >
> > > > > Update Time : Sun Aug 27 09:30:33 2017
> > > > >
> > > > > State : clean
> > > > >
> > > > > Active Devices : 2
> > > > >
> > > > > Working Devices : 2
> > > > >
> > > > > Failed Devices : 0
> > > > >
> > > > > Spare Devices : 0
> > > > >
> > > > > Layout : far=2
> > > > >
> > > > > Chunk Size : 512K
> > > > >
> > > > > Name : archiso:0
> > > > > UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > > > >
> > > > > Events : 485
> > > > >
> > > > > Number Major Minor RaidDevice State
> > > > >
> > > > > 0 8 1 0 active sync /dev/sda1
> > > > > 1 8 17 1 active sync /dev/sdb1
> > > > >
> > > > > ===
> > > > >
> > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on
> > > > > it,
> > > > > then
> > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > >
> > > > > I couldn't reproduce the issue with single disk without RAID.
> > > >
> > > > Could you verify if the following patch fixes your issue?
> >
> > Yes, the patch should address this kind of issue, not related
> > with RAID specially, and the latest version can be found in the
> >
> > following link:
> > https://marc.info/?l=linux-block&m=150579298505484&w=2
>
> Thank you.
>
> So if I understand already I can just add
>
> https://github.com/ming1/linux/tree/my_v4.13-safe-scsi-quiesce_V5_for_test
>
> as an remote and go from there.
https://github.com/ming1/linux/tree/my_v4.13-safe-scsi-quiesce_V5_for_test
and checkout my_v4.13-safe-scsi-quiesce_V5_for_test branch of course.
--
Martin
Ming Lei - 21.09.17, 06:17:
> On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 28.08.17, 21:32:
> > > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > > Ming Lei - 28.08.17, 20:58:
> > > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > > Hi.
> >
> > > > > > Here is disk setup for QEMU VM:
> > […]
> >
> > > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on
> > > > > > it,
> > > > > > then
> > > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > >
> > > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > >
> > > > > Could you verify if the following patch fixes your issue?
> > > >
> > > > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > > > 1 with two SSDs. So far with CFQ it runs stable.
> > >
> > > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
> >
> > My findings are different:
> >
> > On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for
> > Intel Core/newer Xeon I see this:
> >
> > 1) Running with CFQ: No hang after resume
> >
> > 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within
> > first 1-2 days.
>
> Hi Martin,
>
> Thanks for your report!
>
> Could you test the following patchset to see if it fixes your issue?
>
> https://marc.info/?l=linux-block&m=150579298505484&w=2
Testing with https://github.com/ming1/linux.git, my_v4.13-safe-scsi-
quiesce_V5_for_test branch as of 53954fd58fb9fe6894b1635dad1acec96ca51a2f I
now have a bit more than three days and 8 hours of uptime without a hang on
resume from memory or disk, despite using blk-mq + BFQ.
So it looks like this patch set fixes the issue for me. To say for sure I´d say
some more days of testing are needed. But it looks like the hang on resume is
fixed by this patch set.
Thanks,
--
Martin
On Sun, Sep 24, 2017 at 07:33:00PM +0200, Martin Steigerwald wrote:
> Ming Lei - 21.09.17, 06:17:
> > On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> > > Ming Lei - 28.08.17, 21:32:
> > > > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > > > Ming Lei - 28.08.17, 20:58:
> > > > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > > > Hi.
> > >
> > > > > > > Here is disk setup for QEMU VM:
> > > […]
> > >
> > > > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on
> > > > > > > it,
> > > > > > > then
> > > > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > > >
> > > > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > > >
> > > > > > Could you verify if the following patch fixes your issue?
> > > > >
> > > > > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > > > > 1 with two SSDs. So far with CFQ it runs stable.
> > > >
> > > > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
> > >
> > > My findings are different:
> > >
> > > On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for
> > > Intel Core/newer Xeon I see this:
> > >
> > > 1) Running with CFQ: No hang after resume
> > >
> > > 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within
> > > first 1-2 days.
> >
> > Hi Martin,
> >
> > Thanks for your report!
> >
> > Could you test the following patchset to see if it fixes your issue?
> >
> > https://marc.info/?l=linux-block&m=150579298505484&w=2
>
> Testing with https://github.com/ming1/linux.git, my_v4.13-safe-scsi-
> quiesce_V5_for_test branch as of 53954fd58fb9fe6894b1635dad1acec96ca51a2f I
> now have a bit more than three days and 8 hours of uptime without a hang on
> resume from memory or disk, despite using blk-mq + BFQ.
>
> So it looks like this patch set fixes the issue for me. To say for sure I´d say
> some more days of testing are needed. But it looks like the hang on resume is
> fixed by this patch set.
Martin, thanks for your test, it is great to see your issue is fixed
by this patchset.
Also I remember that your issue wasn't related with MD, and actually
you were using BTRFS(RAID), just want to double check with you,
is that true?
--
Ming
Ming Lei - 25.09.17, 10:59:
> On Sun, Sep 24, 2017 at 07:33:00PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 21.09.17, 06:17:
> > > On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> > > > Ming Lei - 28.08.17, 21:32:
> > > > > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > > > > Ming Lei - 28.08.17, 20:58:
> > > > > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko
wrote:
> > > > > > > > Hi.
> > > >
> > > > > > > > Here is disk setup for QEMU VM:
> > > > […]
> > > >
> > > > > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout,
> > > > > > > > LUKS on
> > > > > > > > it,
> > > > > > > > then
> > > > > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > > > >
> > > > > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > > > >
> > > > > > > Could you verify if the following patch fixes your issue?
> > > > > >
> > > > > > Could this also apply to non MD RAID systems? I am using BTRFS
> > > > > > RAID
> > > > > > 1 with two SSDs. So far with CFQ it runs stable.
> > > > >
> > > > > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for
> > > > > you.
> > > >
> > > > My findings are different:
> > > >
> > > > On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for
> > > > Intel Core/newer Xeon I see this:
> > > >
> > > > 1) Running with CFQ: No hang after resume
> > > >
> > > > 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within
> > > > first 1-2 days.
> > >
> > > Hi Martin,
> > >
> > > Thanks for your report!
> > >
> > > Could you test the following patchset to see if it fixes your issue?
> > >
> > > https://marc.info/?l=linux-block&m=150579298505484&w=2
> >
> > Testing with https://github.com/ming1/linux.git, my_v4.13-safe-scsi-
> > quiesce_V5_for_test branch as of 53954fd58fb9fe6894b1635dad1acec96ca51a2f
> > I
> > now have a bit more than three days and 8 hours of uptime without a hang
> > on
> > resume from memory or disk, despite using blk-mq + BFQ.
> >
> > So it looks like this patch set fixes the issue for me. To say for sure
> > I´d say some more days of testing are needed. But it looks like the hang
> > on resume is fixed by this patch set.
>
> Martin, thanks for your test, it is great to see your issue is fixed
> by this patchset.
>
> Also I remember that your issue wasn't related with MD, and actually
> you were using BTRFS(RAID), just want to double check with you,
> is that true?
Yes. But *additionally*, as I explained, the hangs only occured with
CONFIG_HZ=1000, Low latency + CPU optimized kernel, not with CONFIG_HZ=250,
voluntary preemption + generic x86 kernel. This hints a timing issue for me.
So blk-mq + BFQ + CONFIG_HZ=1000 + Low Latency + CPU optimization = hang.
I did not investigate whether blk-mq without scheduler would work. And I did
not investigate which of the three changes to the kernel configurations
triggers the issue. I suspect its either CONFIG_HZ or the preemption option or
both of them.
My original mail had both working and non working kernel configurations
attached.
Thanks,
--
Martin