From: Yongqiang Yang Subject: Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Date: Fri, 29 Nov 2013 12:58:29 +0800 Message-ID: References: <20110207205325.FB6A.61FB500B@jp.fujitsu.com> <20111209105616.5A4F.61FB500B@jp.fujitsu.com> <20111215214105.B2DD.61FB500B@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Jan Kara , Andreas Dilger , "Theodore Ts'o" , Ext4 Developers List , Linux Filesystem Mailing List , Christoph Hellwig , Toshiyuki Okajima To: Masayoshi MIZUMA Return-path: Received: from mail-ea0-f172.google.com ([209.85.215.172]:51571 "EHLO mail-ea0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752755Ab3K2E6v (ORCPT ); Thu, 28 Nov 2013 23:58:51 -0500 In-Reply-To: <20111215214105.B2DD.61FB500B@jp.fujitsu.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: How is fthe bug fixed at last? I can not find the accepted patch. Thanks, Yongqiang. On Thu, Dec 15, 2011 at 8:41 PM, Masayoshi MIZUMA wrote: > > (2011/12/09 10:56), Masayoshi MIZUMA wrote: > >> >> (2011/02/07 20:53), Masayoshi MIZUMA wrote: >> >> > Hi, >> > >> > When I checked the freeze feature for ext4 filesystem using fsfreeze command >> > at 2.6.38-rc3, I got the following messeges: >> >> Hi, >> >> I checked freeze function with using below test program at 3.2.0-rc4, >> then, I got following messeages and the test program hanged up. >> I think this bug is still in 3.2.0-rc4... > > I think the problem is as follows. > When a race between ext4_page_mkwrite() and freeze_super() occurs, > ext4_page_mkwrite() can add a inode to a list (bdi_writeback.b_dirty) > which is needed to do writeback nevertheless sb->s_frozen is SB_FREEZE_WRITE > or SB_FREEZE_TRANS. > > process A | process B > ------------------------------+----------------------------------------------- > ext4_page_mkwrite() | > => vfs_check_frozen() | > | freeze_super() > | sb->s_frozen = SB_FREEZE_WRITE > =>__block_page_mkwrite() | => sync_filesystem() > : | # write inodes which are in the list. > : | sb->s_frozen = SB_FREEZE_TRANS > : | > =>__mark_inode_dirty | > # add inode to the list. | > ------------------------------+----------------------------------------------- > > As the result, if "flush" kthread does writeback the inode which was > added by ext4_page_mkwrite() and thaw_super() runs concurrently, the > deadlock will happen. > > Thanks, > Masayoshi Mizuma > >> >> The test program: >> ----------------------------------------------------------- >> #!/bin/bash >> >> DEV_1=/dev/sda5 >> MNT_1=/tmp/sda5 >> LOOP=500 >> >> if [[ ! -d $MNT_1 ]] >> then >> mkdir -p $MNT_1 >> fi >> >> mkfs -t ext4 $DEV_1 >> mount $DEV_1 $MNT_1 >> >> ./fsstress -d $MNT_1/tmp -n 10000 -p 100 > /dev/null 2>&1 & >> PID=$! >> >> for ((i=0; i> do >> echo LOOP: $i >> fsfreeze -f $MNT_1 >> fsfreeze -u $MNT_1 >> done >> >> kill $PID >> ----------------------------------------------------------- >> >> The messages I got when I ran the test program is below. >> ------------------------------------------------------------- >> INFO: task flush-8:0:720 blocked for more than 120 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> flush-8:0 D 0000000100521461 0 720 2 0x00000000 >> ffff8800b4c41a40 0000000000000046 0000000000000000 0000000000000000 >> 0000000000013440 ffff8800b4c41fd8 ffff8800b4c40010 0000000000013440 >> ffff8800b4c41fd8 0000000000013440 ffffffff81a0d020 ffff8800b464d4e0 >> Call Trace: >> [] ? prepare_to_wait+0x5e/0x90 >> [] schedule+0x3f/0x60 >> [] ext4_journal_start_sb+0x145/0x1b0 [ext4] >> [] ? wake_up_bit+0x40/0x40 >> [] ? ext4_meta_trans_blocks+0xb5/0xc0 [ext4] >> [] ext4_da_writepages+0x29d/0x620 [ext4] >> [] ? blk_finish_plug+0x18/0x50 >> [] do_writepages+0x21/0x40 >> [] writeback_single_inode+0x180/0x3b0 >> [] writeback_sb_inodes+0x1a1/0x260 >> [] wb_writeback+0xde/0x2b0 >> [] ? try_to_del_timer_sync+0x86/0xe0 >> [] wb_do_writeback+0xa6/0x260 >> [] ? lock_timer_base+0x70/0x70 >> [] bdi_writeback_thread+0xaa/0x270 >> [] ? wb_do_writeback+0x260/0x260 >> [] ? wb_do_writeback+0x260/0x260 >> [] kthread+0x96/0xa0 >> [] kernel_thread_helper+0x4/0x10 >> [] ? kthread_worker_fn+0x1a0/0x1a0 >> [] ? gs_change+0x13/0x13 >> >> INFO: task fsstress:4376 blocked for more than 120 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> fsstress D ffff88009b52dda8 0 4376 4364 0x00000080 >> ffff88009b52dcb8 0000000000000082 ffffffff810d7e09 ffff88009b52dcc0 >> 0000000000013440 ffff88009b52dfd8 ffff88009b52c010 0000000000013440 >> ffff88009b52dfd8 0000000000013440 ffff88009b4d54e0 ffff8800a1481560 >> Call Trace: >> [] ? trace_clock_local+0x9/0x10 >> [] schedule+0x3f/0x60 >> [] schedule_timeout+0x1fd/0x2e0 >> [] ? trace_nowake_buffer_unlock_commit+0x43/0x60 >> [] ? __switch_to+0x194/0x320 >> [] ? ftrace_raw_event_sched_switch+0x103/0x110 >> [] wait_for_common+0x11d/0x190 >> [] ? try_to_wake_up+0x2b0/0x2b0 >> [] wait_for_completion+0x1d/0x20 >> [] writeback_inodes_sb_nr+0x7f/0xa0 >> [] writeback_inodes_sb+0x5f/0x80 >> [] ? __sync_filesystem+0x90/0x90 >> [] __sync_filesystem+0x4e/0x90 >> [] sync_one_sb+0x1f/0x30 >> [] iterate_supers+0x7a/0xd0 >> [] sys_sync+0x34/0x70 >> [] system_call_fastpath+0x16/0x1b >> ------------------------------------------------------------- >> >> The test program for xfstests is below. >> ------------------------------------------------------------- >> #! /bin/bash >> # FSQA Test No. 277 >> # >> # Run fsstress and freeze/unfreeze in parallel >> # >> #----------------------------------------------------------------------- >> # Copyright (c) 2006 Silicon Graphics, Inc. All Rights Reserved. >> # >> # This program is free software; you can redistribute it and/or >> # modify it under the terms of the GNU General Public License as >> # published by the Free Software Foundation. >> # >> # This program is distributed in the hope that it would be useful, >> # but WITHOUT ANY WARRANTY; without even the implied warranty of >> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> # GNU General Public License for more details. >> # >> # You should have received a copy of the GNU General Public License >> # along with this program; if not, write the Free Software Foundation, >> # Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> # >> #----------------------------------------------------------------------- >> # >> # creator >> owner=m.mizuma@jp.fujitsu.com >> >> seq=`basename $0` >> echo "QA output created by $seq" >> >> here=`pwd` >> tmp=/tmp/$$ >> status=0 # success is the default! >> trap "rm -f $tmp.*; exit \$status" 0 1 2 3 15 >> >> # get standard environment, filters and checks >> . ./common.rc >> . ./common.filter >> >> _workout() >> { >> echo "" >> echo "Run fsstress" >> echo "" >> num_iterations=500 >> out=$SCRATCH_MNT/fsstress.$$ >> args="-p100 -n10000 -d $out" >> echo "fsstress $args" >> $here/$seq.full >> $FSSTRESS_PROG $args > /dev/null 2>&1 & >> pid=$! >> echo "Run xfs_freeze in parallel" >> for ((i=0; i < num_iterations; i++)) >> do >> xfs_freeze -f $SCRATCH_MNT | tee -a $seq.full >> xfs_freeze -u $SCRATCH_MNT | tee -a $seq.full >> done >> kill $pid 2> /dev/null >> wait $pid >> } >> >> # real QA test starts here >> _supported_fs generic >> _supported_os Linux >> _need_to_be_root >> _require_scratch >> >> _scratch_mkfs >> $seq.full 2>&1 >> _scratch_mount >> >> if ! _workout; then >> umount $SCRATCH_DEV 2>/dev/null >> exit >> fi >> >> if ! _scratch_unmount; then >> echo "failed to umount" >> status=1 >> exit >> fi >> _check_scratch_fs >> status=$? >> exit >> ------------------------------------------------------------- >> >> Thanks, >> Masayoshi Mizuma >> >> > >> > --------------------------------------------------------------------- >> > Feb 7 15:05:09 RX300S6 kernel: INFO: task fsfreeze:2104 blocked for more than 120 seconds. >> > Feb 7 15:05:09 RX300S6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > Feb 7 15:05:09 RX300S6 kernel: fsfreeze D ffff880076d5f040 0 2104 2018 0x00000000 >> > Feb 7 15:05:09 RX300S6 kernel: ffff88005a9f3d98 0000000000000086 ffff88005a9f3d38 ffffffff00000000 >> > Feb 7 15:05:09 RX300S6 kernel: 0000000000014d40 ffff880076d5eab0 ffff880076d5f040 ffff88005a9f3fd8 >> > Feb 7 15:05:09 RX300S6 kernel: ffff880076d5f048 0000000000014d40 ffff88005a9f2010 0000000000014d40 >> > Feb 7 15:05:09 RX300S6 kernel: Call Trace: >> > Feb 7 15:05:09 RX300S6 kernel: [] rwsem_down_failed_common+0xb5/0x140 >> > Feb 7 15:05:09 RX300S6 kernel: [] rwsem_down_write_failed+0x13/0x20 >> > Feb 7 15:05:09 RX300S6 kernel: [] call_rwsem_down_write_failed+0x13/0x20 >> > Feb 7 15:05:09 RX300S6 kernel: [] ? down_write+0x32/0x40 >> > Feb 7 15:05:09 RX300S6 kernel: [] thaw_super+0x28/0xd0 >> > Feb 7 15:05:09 RX300S6 kernel: [] do_vfs_ioctl+0x368/0x560 >> > Feb 7 15:05:09 RX300S6 kernel: [] ? sys_newfstat+0x33/0x40 >> > Feb 7 15:05:09 RX300S6 kernel: [] sys_ioctl+0xa1/0xb0 >> > Feb 7 15:05:09 RX300S6 kernel: [] system_call_fastpath+0x16/0x1b >> > ... >> > Feb 7 15:07:09 RX300S6 kernel: INFO: task flush-8:0:1409 blocked for more than 120 seconds. >> > Feb 7 15:07:09 RX300S6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > Feb 7 15:07:09 RX300S6 kernel: flush-8:0 D ffff880037777a30 0 1409 2 0x00000000 >> > Feb 7 15:07:09 RX300S6 kernel: ffff880037c95a80 0000000000000046 ffff88007c8037a0 0000000000000000 >> > Feb 7 15:07:09 RX300S6 kernel: 0000000000014d40 ffff8800377774a0 ffff880037777a30 ffff880037c95fd8 >> > Feb 7 15:07:09 RX300S6 kernel: ffff880037777a38 0000000000014d40 ffff880037c94010 0000000000014d40 >> > Feb 7 15:07:09 RX300S6 kernel: Call Trace: >> > Feb 7 15:07:09 RX300S6 kernel: [] ext4_journal_start_sb+0x75/0x130 [ext4] >> > Feb 7 15:07:09 RX300S6 kernel: [] ? autoremove_wake_function+0x0/0x40 >> > Feb 7 15:07:09 RX300S6 kernel: [] ext4_da_writepages+0x27a/0x640 [ext4] >> > Feb 7 15:07:09 RX300S6 kernel: [] do_writepages+0x21/0x40 >> > Feb 7 15:07:09 RX300S6 kernel: [] writeback_single_inode+0x98/0x240 >> > Feb 7 15:07:09 RX300S6 kernel: [] writeback_sb_inodes+0xce/0x170 >> > Feb 7 15:07:09 RX300S6 kernel: [] writeback_inodes_wb+0x99/0x160 >> > Feb 7 15:07:09 RX300S6 kernel: [] wb_writeback+0x2bb/0x430 >> > Feb 7 15:07:09 RX300S6 kernel: [] wb_do_writeback+0x22c/0x280 >> > Feb 7 15:07:09 RX300S6 kernel: [] bdi_writeback_thread+0xb2/0x260 >> > Feb 7 15:07:09 RX300S6 kernel: [] ? bdi_writeback_thread+0x0/0x260 >> > Feb 7 15:07:09 RX300S6 kernel: [] ? bdi_writeback_thread+0x0/0x260 >> > Feb 7 15:07:09 RX300S6 kernel: [] kthread+0x96/0xa0 >> > Feb 7 15:07:09 RX300S6 kernel: [] kernel_thread_helper+0x4/0x10 >> > Feb 7 15:07:09 RX300S6 kernel: [] ? kthread+0x0/0xa0 >> > Feb 7 15:07:09 RX300S6 kernel: [] ? kernel_thread_helper+0x0/0x10 >> > --------------------------------------------------------------------- >> > >> > I think the following deadlock problem happened: >> > >> > [flush-8:0:1409] | [fsfreeze:2104] >> > --------------------------------------------+-------------------------------- >> > writeback_inodes_wb | >> > pin_sb_for_writeback | >> > down_read_trylock(&sb->s_umount) | >> > writeback_sb_inodes |thaw_super >> > writeback_single_inode | down_write(&sb->s_umount) >> > do_writepages | # stop until flush-8:0 releases >> > ext4_da_writepages | # read lock of sb->s_umount... >> > ext4_journal_start_sb | >> > vfs_check_frozen | >> > wait_event((sb)->s_wait_unfrozen, | >> > ((sb)->s_frozen < (level))) | >> > # stop until being waked up by | >> > # fsfreeze... | >> > --------------------------------------------+-------------------------------- >> > >> > Could anyone check this problem? >> > >> > Thanks, >> > Masayoshi Mizuma >> > >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Wishes Yongqiang Yang