Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758936Ab2BNDQw (ORCPT ); Mon, 13 Feb 2012 22:16:52 -0500 Received: from mail-lpp01m020-f174.google.com ([209.85.217.174]:45088 "EHLO mail-lpp01m020-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751735Ab2BNDQs (ORCPT ); Mon, 13 Feb 2012 22:16:48 -0500 MIME-Version: 1.0 In-Reply-To: <3125411.WeeEOF8PnR@f209> References: <3125411.WeeEOF8PnR@f209> From: Paul Hartman Date: Mon, 13 Feb 2012 21:16:26 -0600 Message-ID: Subject: Re: Deadlock? To: lkml Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 17553 Lines: 309 On Mon, Feb 13, 2012 at 5:17 AM, Markus wrote: > Hi! > > I noted some kind of deadlock, where I was not able to write to the raid6, > while writing to each disk would still work. > This caused many processes to "wait" in d-state, thus making it impossible to > unmount, cleanly reboot, sync, ... I'm using RAID5 and kernel 3.2.2 and encountered a hang when running e4defrag on some large files today. When I tried to browse to that directory in Midnight Commander, it also became D state hung. After magic sysrq REISUB, raid is resyncing. I don't know if it's related, or useful, but here's my dmesg output: [1555304.508443] INFO: task e4defrag:9929 blocked for more than 120 seconds. [1555304.508446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555304.508449] e4defrag D 000000015ccdc212 0 9929 21774 0x00000000 [1555304.508454] ffff880309b93d38 0000000000000082 ffff880309b93cc8 ffffffff00000000 [1555304.508458] ffff8801009d0000 0000000000010980 ffff880309b93fd8 ffff880309b92000 [1555304.508463] 0000000000010980 0000000000004000 ffff880309b93fd8 0000000000010980 [1555304.508467] Call Trace: [1555304.508474] [] ? sub_preempt_count+0x9d/0xd0 [1555304.508480] [] ? _raw_spin_unlock_irqrestore+0x12/0x40 [1555304.508484] [] ? __wake_up+0x4e/0x70 [1555304.508487] [] ? get_parent_ip+0x11/0x50 [1555304.508490] [] ? get_parent_ip+0x11/0x50 [1555304.508493] [] ? sub_preempt_count+0x9d/0xd0 [1555304.508496] [] ? get_parent_ip+0x11/0x50 [1555304.508500] [] ? get_parent_ip+0x11/0x50 [1555304.508504] [] ? __lock_page+0x70/0x70 [1555304.508507] [] schedule+0x3a/0x50 [1555304.508510] [] io_schedule+0x87/0xd0 [1555304.508513] [] sleep_on_page+0x9/0x10 [1555304.508516] [] __wait_on_bit+0x57/0x80 [1555304.508520] [] wait_on_page_bit+0x6e/0x80 [1555304.508524] [] ? autoremove_wake_function+0x40/0x40 [1555304.508528] [] ? pagevec_lookup_tag+0x20/0x30 [1555304.508531] [] filemap_fdatawait_range+0xfa/0x190 [1555304.508537] [] sys_sync_file_range+0x176/0x180 [1555304.508541] [] system_call_fastpath+0x16/0x1b [1555424.340835] INFO: task e4defrag:9929 blocked for more than 120 seconds. [1555424.340838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555424.340841] e4defrag D 000000015ccdc212 0 9929 21774 0x00000000 [1555424.340845] ffff880309b93d38 0000000000000082 ffff880309b93cc8 ffffffff00000000 [1555424.340850] ffff8801009d0000 0000000000010980 ffff880309b93fd8 ffff880309b92000 [1555424.340854] 0000000000010980 0000000000004000 ffff880309b93fd8 0000000000010980 [1555424.340859] Call Trace: [1555424.340866] [] ? sub_preempt_count+0x9d/0xd0 [1555424.340872] [] ? _raw_spin_unlock_irqrestore+0x12/0x40 [1555424.340875] [] ? __wake_up+0x4e/0x70 [1555424.340878] [] ? get_parent_ip+0x11/0x50 [1555424.340881] [] ? get_parent_ip+0x11/0x50 [1555424.340885] [] ? sub_preempt_count+0x9d/0xd0 [1555424.340888] [] ? get_parent_ip+0x11/0x50 [1555424.340891] [] ? get_parent_ip+0x11/0x50 [1555424.340895] [] ? __lock_page+0x70/0x70 [1555424.340899] [] schedule+0x3a/0x50 [1555424.340902] [] io_schedule+0x87/0xd0 [1555424.340905] [] sleep_on_page+0x9/0x10 [1555424.340908] [] __wait_on_bit+0x57/0x80 [1555424.340911] [] wait_on_page_bit+0x6e/0x80 [1555424.340916] [] ? autoremove_wake_function+0x40/0x40 [1555424.340919] [] ? pagevec_lookup_tag+0x20/0x30 [1555424.340923] [] filemap_fdatawait_range+0xfa/0x190 [1555424.340928] [] sys_sync_file_range+0x176/0x180 [1555424.340932] [] system_call_fastpath+0x16/0x1b [1555544.173120] INFO: task jbd2/md2-8:2842 blocked for more than 120 seconds. [1555544.173123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555544.173126] jbd2/md2-8 D 000000015cd20110 0 2842 2 0x00000000 [1555544.173131] ffff88030fba1ce0 0000000000000046 0000000000000002 0000000100000000 [1555544.173136] ffff88030f9c1650 0000000000010980 ffff88030fba1fd8 ffff88030fba0000 [1555544.173140] 0000000000010980 0000000000004000 ffff88030fba1fd8 0000000000010980 [1555544.173144] Call Trace: [1555544.173152] [] ? native_sched_clock+0x28/0x90 [1555544.173158] [] schedule+0x3a/0x50 [1555544.173163] [] jbd2_journal_commit_transaction+0x17c/0x13a0 [1555544.173166] [] ? __schedule+0x46e/0xa90 [1555544.173171] [] ? get_parent_ip+0x11/0x50 [1555544.173175] [] ? sub_preempt_count+0x9d/0xd0 [1555544.173179] [] ? wake_up_bit+0x40/0x40 [1555544.173183] [] kjournald2+0xb2/0x210 [1555544.173187] [] ? wake_up_bit+0x40/0x40 [1555544.173190] [] ? commit_timeout+0x10/0x10 [1555544.173194] [] kthread+0x96/0xa0 [1555544.173198] [] kernel_thread_helper+0x4/0x10 [1555544.173204] [] ? kthread_worker_fn+0x190/0x190 [1555544.173207] [] ? gs_change+0xb/0xb [1555544.173231] INFO: task e4defrag:9929 blocked for more than 120 seconds. [1555544.173233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555544.173235] e4defrag D 000000015ccdc212 0 9929 21774 0x00000000 [1555544.173238] ffff880309b93d38 0000000000000082 ffff880309b93cc8 ffffffff00000000 [1555544.173242] ffff8801009d0000 0000000000010980 ffff880309b93fd8 ffff880309b92000 [1555544.173245] 0000000000010980 0000000000004000 ffff880309b93fd8 0000000000010980 [1555544.173249] Call Trace: [1555544.173252] [] ? sub_preempt_count+0x9d/0xd0 [1555544.173256] [] ? _raw_spin_unlock_irqrestore+0x12/0x40 [1555544.173259] [] ? __wake_up+0x4e/0x70 [1555544.173261] [] ? get_parent_ip+0x11/0x50 [1555544.173264] [] ? get_parent_ip+0x11/0x50 [1555544.173267] [] ? sub_preempt_count+0x9d/0xd0 [1555544.173270] [] ? get_parent_ip+0x11/0x50 [1555544.173272] [] ? get_parent_ip+0x11/0x50 [1555544.173276] [] ? __lock_page+0x70/0x70 [1555544.173279] [] schedule+0x3a/0x50 [1555544.173281] [] io_schedule+0x87/0xd0 [1555544.173284] [] sleep_on_page+0x9/0x10 [1555544.173287] [] __wait_on_bit+0x57/0x80 [1555544.173290] [] wait_on_page_bit+0x6e/0x80 [1555544.173293] [] ? autoremove_wake_function+0x40/0x40 [1555544.173296] [] ? pagevec_lookup_tag+0x20/0x30 [1555544.173299] [] filemap_fdatawait_range+0xfa/0x190 [1555544.173303] [] sys_sync_file_range+0x176/0x180 [1555544.173307] [] system_call_fastpath+0x16/0x1b [1555544.173310] INFO: task mc:22249 blocked for more than 120 seconds. [1555544.173312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555544.173313] mc D 000000015cd1eecf 0 22249 9847 0x00000000 [1555544.173317] ffff8802f7ea9b48 0000000000000082 ffff8802f7ea9a28 0000000000000000 [1555544.173321] ffff8802fc390000 0000000000010980 ffff8802f7ea9fd8 ffff8802f7ea8000 [1555544.173324] 0000000000010980 0000000000004000 ffff8802f7ea9fd8 0000000000010980 [1555544.173327] Call Trace: [1555544.173331] [] ? __lru_cache_add+0x72/0xd0 [1555544.173334] [] ? get_parent_ip+0x11/0x50 [1555544.173337] [] ? sub_preempt_count+0x9d/0xd0 [1555544.173339] [] ? get_parent_ip+0x11/0x50 [1555544.173342] [] ? get_parent_ip+0x11/0x50 [1555544.173346] [] ? unmap_underlying_metadata+0x50/0x50 [1555544.173349] [] schedule+0x3a/0x50 [1555544.173351] [] io_schedule+0x87/0xd0 [1555544.173354] [] sleep_on_buffer+0x9/0x10 [1555544.173357] [] __wait_on_bit_lock+0x52/0xb0 [1555544.173360] [] ? sub_preempt_count+0x9d/0xd0 [1555544.173363] [] ? unmap_underlying_metadata+0x50/0x50 [1555544.173366] [] out_of_line_wait_on_bit_lock+0x73/0x90 [1555544.173369] [] ? autoremove_wake_function+0x40/0x40 [1555544.173372] [] __lock_buffer+0x2e/0x30 [1555544.173375] [] do_get_write_access+0x5a9/0x6c0 [1555544.173379] [] ? sub_preempt_count+0x9d/0xd0 [1555544.173382] [] ? jbd2_journal_add_journal_head+0xd6/0x1f0 [1555544.173385] [] jbd2_journal_get_write_access+0x2c/0x50 [1555544.173389] [] __ext4_journal_get_write_access+0x39/0x80 [1555544.173392] [] ext4_reserve_inode_write+0x88/0xb0 [1555544.173395] [] ext4_mark_inode_dirty+0x39/0x1e0 [1555544.173399] [] ? filldir64+0xd0/0xd0 [1555544.173402] [] ext4_dirty_inode+0x38/0x60 [1555544.173404] [] __mark_inode_dirty+0x3b/0x220 [1555544.173407] [] touch_atime+0x115/0x160 [1555544.173410] [] vfs_readdir+0xb6/0xc0 [1555544.173413] [] sys_getdents+0x80/0xe0 [1555544.173416] [] system_call_fastpath+0x16/0x1b [1555664.005410] INFO: task jbd2/md2-8:2842 blocked for more than 120 seconds. [1555664.005413] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555664.005416] jbd2/md2-8 D 000000015cd20110 0 2842 2 0x00000000 [1555664.005421] ffff88030fba1ce0 0000000000000046 0000000000000002 0000000100000000 [1555664.005426] ffff88030f9c1650 0000000000010980 ffff88030fba1fd8 ffff88030fba0000 [1555664.005430] 0000000000010980 0000000000004000 ffff88030fba1fd8 0000000000010980 [1555664.005434] Call Trace: [1555664.005442] [] ? native_sched_clock+0x28/0x90 [1555664.005448] [] schedule+0x3a/0x50 [1555664.005453] [] jbd2_journal_commit_transaction+0x17c/0x13a0 [1555664.005456] [] ? __schedule+0x46e/0xa90 [1555664.005462] [] ? get_parent_ip+0x11/0x50 [1555664.005465] [] ? sub_preempt_count+0x9d/0xd0 [1555664.005469] [] ? wake_up_bit+0x40/0x40 [1555664.005473] [] kjournald2+0xb2/0x210 [1555664.005477] [] ? wake_up_bit+0x40/0x40 [1555664.005480] [] ? commit_timeout+0x10/0x10 [1555664.005483] [] kthread+0x96/0xa0 [1555664.005488] [] kernel_thread_helper+0x4/0x10 [1555664.005492] [] ? kthread_worker_fn+0x190/0x190 [1555664.005495] [] ? gs_change+0xb/0xb [1555664.005521] INFO: task e4defrag:9929 blocked for more than 120 seconds. [1555664.005523] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555664.005525] e4defrag D 000000015ccdc212 0 9929 21774 0x00000000 [1555664.005529] ffff880309b93d38 0000000000000082 ffff880309b93cc8 ffffffff00000000 [1555664.005533] ffff8801009d0000 0000000000010980 ffff880309b93fd8 ffff880309b92000 [1555664.005537] 0000000000010980 0000000000004000 ffff880309b93fd8 0000000000010980 [1555664.005542] Call Trace: [1555664.005545] [] ? sub_preempt_count+0x9d/0xd0 [1555664.005549] [] ? _raw_spin_unlock_irqrestore+0x12/0x40 [1555664.005553] [] ? __wake_up+0x4e/0x70 [1555664.005556] [] ? get_parent_ip+0x11/0x50 [1555664.005559] [] ? get_parent_ip+0x11/0x50 [1555664.005562] [] ? sub_preempt_count+0x9d/0xd0 [1555664.005566] [] ? get_parent_ip+0x11/0x50 [1555664.005569] [] ? get_parent_ip+0x11/0x50 [1555664.005573] [] ? __lock_page+0x70/0x70 [1555664.005576] [] schedule+0x3a/0x50 [1555664.005579] [] io_schedule+0x87/0xd0 [1555664.005582] [] sleep_on_page+0x9/0x10 [1555664.005585] [] __wait_on_bit+0x57/0x80 [1555664.005589] [] wait_on_page_bit+0x6e/0x80 [1555664.005592] [] ? autoremove_wake_function+0x40/0x40 [1555664.005596] [] ? pagevec_lookup_tag+0x20/0x30 [1555664.005599] [] filemap_fdatawait_range+0xfa/0x190 [1555664.005604] [] sys_sync_file_range+0x176/0x180 [1555664.005608] [] system_call_fastpath+0x16/0x1b [1555664.005612] INFO: task mc:22249 blocked for more than 120 seconds. [1555664.005614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555664.005616] mc D 000000015cd1eecf 0 22249 9847 0x00000000 [1555664.005620] ffff8802f7ea9b48 0000000000000082 ffff8802f7ea9a28 0000000000000000 [1555664.005624] ffff8802fc390000 0000000000010980 ffff8802f7ea9fd8 ffff8802f7ea8000 [1555664.005628] 0000000000010980 0000000000004000 ffff8802f7ea9fd8 0000000000010980 [1555664.005632] Call Trace: [1555664.005636] [] ? __lru_cache_add+0x72/0xd0 [1555664.005639] [] ? get_parent_ip+0x11/0x50 [1555664.005642] [] ? sub_preempt_count+0x9d/0xd0 [1555664.005646] [] ? get_parent_ip+0x11/0x50 [1555664.005649] [] ? get_parent_ip+0x11/0x50 [1555664.005653] [] ? unmap_underlying_metadata+0x50/0x50 [1555664.005656] [] schedule+0x3a/0x50 [1555664.005659] [] io_schedule+0x87/0xd0 [1555664.005663] [] sleep_on_buffer+0x9/0x10 [1555664.005666] [] __wait_on_bit_lock+0x52/0xb0 [1555664.005669] [] ? sub_preempt_count+0x9d/0xd0 [1555664.005673] [] ? unmap_underlying_metadata+0x50/0x50 [1555664.005676] [] out_of_line_wait_on_bit_lock+0x73/0x90 [1555664.005680] [] ? autoremove_wake_function+0x40/0x40 [1555664.005684] [] __lock_buffer+0x2e/0x30 [1555664.005687] [] do_get_write_access+0x5a9/0x6c0 [1555664.005691] [] ? sub_preempt_count+0x9d/0xd0 [1555664.005695] [] ? jbd2_journal_add_journal_head+0xd6/0x1f0 [1555664.005698] [] jbd2_journal_get_write_access+0x2c/0x50 [1555664.005703] [] __ext4_journal_get_write_access+0x39/0x80 [1555664.005707] [] ext4_reserve_inode_write+0x88/0xb0 [1555664.005710] [] ext4_mark_inode_dirty+0x39/0x1e0 [1555664.005714] [] ? filldir64+0xd0/0xd0 [1555664.005717] [] ext4_dirty_inode+0x38/0x60 [1555664.005721] [] __mark_inode_dirty+0x3b/0x220 [1555664.005724] [] touch_atime+0x115/0x160 [1555664.005728] [] vfs_readdir+0xb6/0xc0 [1555664.005731] [] sys_getdents+0x80/0xe0 [1555664.005734] [] system_call_fastpath+0x16/0x1b [1555783.837700] INFO: task jbd2/md2-8:2842 blocked for more than 120 seconds. [1555783.837703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1555783.837706] jbd2/md2-8 D 000000015cd20110 0 2842 2 0x00000000 [1555783.837711] ffff88030fba1ce0 0000000000000046 0000000000000002 0000000100000000 [1555783.837715] ffff88030f9c1650 0000000000010980 ffff88030fba1fd8 ffff88030fba0000 [1555783.837719] 0000000000010980 0000000000004000 ffff88030fba1fd8 0000000000010980 [1555783.837724] Call Trace: [1555783.837731] [] ? native_sched_clock+0x28/0x90 [1555783.837737] [] schedule+0x3a/0x50 [1555783.837742] [] jbd2_journal_commit_transaction+0x17c/0x13a0 [1555783.837746] [] ? __schedule+0x46e/0xa90 [1555783.837751] [] ? get_parent_ip+0x11/0x50 [1555783.837754] [] ? sub_preempt_count+0x9d/0xd0 [1555783.837759] [] ? wake_up_bit+0x40/0x40 [1555783.837763] [] kjournald2+0xb2/0x210 [1555783.837766] [] ? wake_up_bit+0x40/0x40 [1555783.837770] [] ? commit_timeout+0x10/0x10 [1555783.837773] [] kthread+0x96/0xa0 [1555783.837777] [] kernel_thread_helper+0x4/0x10 [1555783.837781] [] ? kthread_worker_fn+0x190/0x190 [1555783.837784] [] ? gs_change+0xb/0xb -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/