From: Akira Fujita Subject: Re: [BUG] xfstest269 causes deadlock on linux-3.9.0 (ext4) Date: Mon, 27 May 2013 10:49:11 +0900 Message-ID: <51A2BB97.7030802@rs.jp.nec.com> References: <51908CF4.2000703@rs.jp.nec.com> <20130514213755.GA10769@quack.suse.cz> <5192F998.6080706@rs.jp.nec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Cc: ext4 development To: Jan Kara Return-path: Received: from TYO201.gate.nec.co.jp ([210.143.35.51]:65450 "EHLO tyo201.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755697Ab3E0BuU (ORCPT ); Sun, 26 May 2013 21:50:20 -0400 In-Reply-To: <5192F998.6080706@rs.jp.nec.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi Jan, (2013/05/15 11:57), Akira Fujita wrote: > Hi, > > (2013/05/15 6:37), Jan Kara wrote: >> Hello, >> >> On Mon 13-05-13 15:49:24, Akira Fujita wrote: >>> I ran into the deaclock with xfs_test 269 on linux-3.9.0. >>> It seems happen between jbd2_log_wait_commit, sleep_on_buffer >>> and writeback_indoes (Please see ps log below). >>> Once it occurs we can't touch FS anymore. >>> In my case 300 - 1000 trials to occur. Is this known issue? >>> >>> The following kernels seems to have same problem: >>> - linux-3.5-rc5 >>> - linux-3.8.5 >>> - linux-3.9-rc7 >>> And now I'm trying it on linux-3.10-rc1. >>> >>> # ./check generic/269 >>> FSTYP -- ext4 >>> PLATFORM -- Linux/x86_64 mcds1 3.9.0 >>> MKFS_OPTIONS -- /dev/sda12 >>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sda12 /mnt/mp2 >>> >>> >>> # ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:16,comm >>> PID TID CLS RTPRIO NI PRI PSR %CPU STAT WCHAN COMMAND >>> 1 1 TS - 0 19 0 0.0 Ss poll_schedule_ti init >>> 2 2 TS - 0 19 0 0.0 S kthreadd kthreadd >>> 3 3 TS - 0 19 0 0.0 S smpboot_thread_f ksoftirqd/0 >>> ... >>> 2391 2391 TS - 0 19 2 0.1 D jbd2_log_wait_co flush-8:0 >>> ... >>> 22647 22647 TS - 0 19 3 0.0 S worker_thread kworker/3:1 >>> 22655 22655 TS - 0 19 0 0.0 S hrtimer_nanoslee sleep >>> 22657 22657 TS - 0 19 2 0.0 R+ - ps >>> 25330 25330 TS - 0 19 0 0.0 S worker_thread kworker/0:0 >>> 28963 28963 TS - 0 19 1 0.0 S+ wait loop_xfstests.s >>> 28964 28964 TS - 0 19 1 0.0 S+ wait check >>> 29180 29180 TS - 0 19 3 0.0 S kjournald2 jbd2/sda11-8 >>> 29181 29181 TS - -20 39 3 0.0 S< rescuer_thread ext4-dio-unwrit >>> 29199 29199 TS - 0 19 3 0.0 S+ wait 269 >>> 29391 29391 TS - 0 19 0 0.6 D sleep_on_buffer jbd2/sda12-8 >>> 29392 29392 TS - -20 39 3 0.0 S< rescuer_thread ext4-dio-unwrit >>> 29394 29394 TS - 0 19 0 0.0 S wait fsstress >>> 29505 29505 TS - 0 19 3 0.0 D writeback_inodes fsstress >>> >>> # df -T /dev/sda11 /dev/sda12 >>> Filesystem Type 1K-blocks Used Available Use% Mounted on >>> /dev/sda11 ext4 9857264 22308 9327564 1% /mnt/mp1 >>> /dev/sda12 ext4 499656 499656 0 100% /mnt/mp2 >> Thanks for report. No I don't think this problem has been reported >> before. Seeing that sda12 is out of space and fsstress hangs in >> writeback_inodes(), I suspect we have some deadlock in ENOSPC recovery path >> when we want to flush data to disk to reduce delalloc uncertainty. Can you >> run 'echo w >/proc/sysrq-trigger' when the deadlock happens and post your >> dmesg here? Thanks! >> > > Thanks for reply. > I'll take that information when the deadlock happens again. Finally, I got the following messages on linux-3.9.0. Could you take a look? #echo w > /proc/sysrq-trigger /var/log/messages: ----------------------- May 23 19:15:50 mcds1 kernel: [210728.230978] SysRq : Show Blocked State May 23 19:15:50 mcds1 kernel: [210728.230985] task PC stack pid father May 23 19:15:50 mcds1 kernel: [210728.230992] flush-8:0 D ffffffff81806d60 3192 2361 2 0x00000000 May 23 19:15:50 mcds1 kernel: [210728.230997] ffff880214d27a08 0000000000000046 0000000000000282 ffff88021608d330 May 23 19:15:50 mcds1 kernel: [210728.231001] 0000000000000000 ffff880215d95ac0 0000000000012980 ffff880214d27fd8 May 23 19:15:50 mcds1 kernel: [210728.231005] ffff880214d26010 0000000000012980 0000000000012980 ffff880214d27fd8 May 23 19:15:50 mcds1 kernel: [210728.231008] Call Trace: May 23 19:15:50 mcds1 kernel: [210728.231018] [] schedule+0x24/0x70 May 23 19:15:50 mcds1 kernel: [210728.231023] [] jbd2_log_wait_commit+0xb5/0x130 May 23 19:15:50 mcds1 kernel: [210728.231029] [] ? wake_up_bit+0x40/0x40 May 23 19:15:50 mcds1 kernel: [210728.231033] [] ? __jbd2_log_start_commit+0xa4/0xb0 May 23 19:15:50 mcds1 kernel: [210728.231037] [] jbd2_journal_force_commit_nested+0x5b/0xa0 May 23 19:15:50 mcds1 kernel: [210728.231041] [] ext4_da_writepages+0x348/0x5a0 May 23 19:15:50 mcds1 kernel: [210728.231045] [] ? check_preempt_wakeup+0x1b7/0x2a0 May 23 19:15:50 mcds1 kernel: [210728.231051] [] do_writepages+0x1b/0x30 May 23 19:15:50 mcds1 kernel: [210728.231056] [] __writeback_single_inode+0x44/0x220 May 23 19:15:50 mcds1 kernel: [210728.231061] [] writeback_sb_inodes+0x23f/0x430 May 23 19:15:50 mcds1 kernel: [210728.231065] [] wb_writeback+0xed/0x2b0 May 23 19:15:50 mcds1 kernel: [210728.231071] [] ? lock_timer_base+0x37/0x70 May 23 19:15:50 mcds1 kernel: [210728.231075] [] wb_do_writeback+0x9d/0x240 May 23 19:15:50 mcds1 kernel: [210728.231079] [] ? lock_timer_base+0x70/0x70 May 23 19:15:50 mcds1 kernel: [210728.231084] [] bdi_writeback_thread+0xa2/0x230 May 23 19:15:50 mcds1 kernel: [210728.231088] [] ? wb_do_writeback+0x240/0x240 May 23 19:15:50 mcds1 kernel: [210728.231092] [] ? wb_do_writeback+0x240/0x240 May 23 19:15:50 mcds1 kernel: [210728.231095] [] kthread+0xc6/0xd0 May 23 19:15:50 mcds1 kernel: [210728.231099] [] ? kthread_freezable_should_stop+0x70/0x70 May 23 19:15:50 mcds1 kernel: [210728.231104] [] ret_from_fork+0x7c/0xb0 May 23 19:15:50 mcds1 kernel: [210728.231108] [] ? kthread_freezable_should_stop+0x70/0x70 May 23 19:15:50 mcds1 kernel: [210728.231124] jbd2/sda12-8 D ffffffff81806d60 4736 25056 2 0x00000000 May 23 19:15:50 mcds1 kernel: [210728.231128] ffff8801e436db78 0000000000000046 ffff880215c06f00 ffffffff81c10420 May 23 19:15:50 mcds1 kernel: [210728.231131] ffff880200011200 ffff8801f8f296b0 0000000000012980 ffff8801e436dfd8 May 23 19:15:50 mcds1 kernel: [210728.231135] ffff8801e436c010 0000000000012980 0000000000012980 ffff8801e436dfd8 May 23 19:15:50 mcds1 kernel: [210728.231138] Call Trace: May 23 19:15:50 mcds1 kernel: [210728.231142] [] ? __wait_on_buffer+0x30/0x30 May 23 19:15:50 mcds1 kernel: [210728.231146] [] schedule+0x24/0x70 May 23 19:15:50 mcds1 kernel: [210728.231150] [] io_schedule+0x87/0xd0 May 23 19:15:50 mcds1 kernel: [210728.231154] [] sleep_on_buffer+0x9/0x10 May 23 19:15:50 mcds1 kernel: [210728.231157] [] __wait_on_bit+0x57/0x80 May 23 19:15:50 mcds1 kernel: [210728.231161] [] ? submit_bh+0x13c/0x1f0 May 23 19:15:50 mcds1 kernel: [210728.231164] [] ? __wait_on_buffer+0x30/0x30 May 23 19:15:50 mcds1 kernel: [210728.231168] [] out_of_line_wait_on_bit+0x73/0x90 May 23 19:15:50 mcds1 kernel: [210728.231172] [] ? autoremove_wake_function+0x40/0x40 May 23 19:15:50 mcds1 kernel: [210728.231175] [] __wait_on_buffer+0x26/0x30 May 23 19:15:50 mcds1 kernel: [210728.231179] [] jbd2_journal_commit_transaction+0xee0/0x1960 May 23 19:15:50 mcds1 kernel: [210728.231183] [] ? __schedule+0x3ad/0x7a0 May 23 19:15:50 mcds1 kernel: [210728.231188] [] ? lock_timer_base+0x37/0x70 May 23 19:15:50 mcds1 kernel: [210728.231191] [] kjournald2+0xc2/0x240 May 23 19:15:50 mcds1 kernel: [210728.231195] [] ? wake_up_bit+0x40/0x40 May 23 19:15:50 mcds1 kernel: [210728.231199] [] ? commit_timeout+0x10/0x10 May 23 19:15:50 mcds1 kernel: [210728.231202] [] kthread+0xc6/0xd0 May 23 19:15:50 mcds1 kernel: [210728.231206] [] ? kthread_freezable_should_stop+0x70/0x70 May 23 19:15:50 mcds1 kernel: [210728.231210] [] ret_from_fork+0x7c/0xb0 May 23 19:15:50 mcds1 kernel: [210728.231213] [] ? kthread_freezable_should_stop+0x70/0x70 May 23 19:15:50 mcds1 kernel: [210728.231216] umount D ffffffff81806d60 5448 25241 24864 0x00000000 May 23 19:15:50 mcds1 kernel: [210728.231220] ffff8801f3a89c28 0000000000000086 ffff88021fc929f0 ffff88021608cba0 May 23 19:15:50 mcds1 kernel: [210728.231223] ffff8801f3a89bf8 ffff8801f84d8790 0000000000012980 ffff8801f3a89fd8 May 23 19:15:50 mcds1 kernel: [210728.231226] ffff8801f3a88010 0000000000012980 0000000000012980 ffff8801f3a89fd8 May 23 19:15:50 mcds1 kernel: [210728.231230] Call Trace: May 23 19:15:50 mcds1 kernel: [210728.231234] [] schedule+0x24/0x70 May 23 19:15:50 mcds1 kernel: [210728.231237] [] schedule_timeout+0x15d/0x1f0 May 23 19:15:50 mcds1 kernel: [210728.231241] [] ? __cond_resched+0x25/0x40 May 23 19:15:50 mcds1 kernel: [210728.231245] [] wait_for_completion+0x94/0x100 May 23 19:15:50 mcds1 kernel: [210728.231248] [] ? try_to_wake_up+0x2a0/0x2a0 May 23 19:15:50 mcds1 kernel: [210728.231252] [] ? bdi_queue_work+0x81/0x110 May 23 19:15:50 mcds1 kernel: [210728.231256] [] writeback_inodes_sb_nr+0x83/0xb0 May 23 19:15:50 mcds1 kernel: [210728.231261] [] writeback_inodes_sb+0x5a/0x70 May 23 19:15:50 mcds1 kernel: [210728.231264] [] __sync_filesystem+0x4a/0x50 May 23 19:15:50 mcds1 kernel: [210728.231267] [] sync_filesystem+0x32/0x60 May 23 19:15:50 mcds1 kernel: [210728.231273] [] generic_shutdown_super+0x36/0xe0 May 23 19:15:50 mcds1 kernel: [210728.231277] [] kill_block_super+0x2c/0x80 May 23 19:15:50 mcds1 kernel: [210728.231281] [] deactivate_locked_super+0x45/0x70 May 23 19:15:50 mcds1 kernel: [210728.231284] [] deactivate_super+0x45/0x60 May 23 19:15:50 mcds1 kernel: [210728.231287] [] mntput_no_expire+0xf3/0x150 May 23 19:15:50 mcds1 kernel: [210728.231290] [] sys_umount+0xcd/0x3f0 May 23 19:15:50 mcds1 kernel: [210728.231294] [] system_call_fastpath+0x16/0x1b Regards, Akira Fujita