Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755094Ab0AEAr6 (ORCPT ); Mon, 4 Jan 2010 19:47:58 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755067Ab0AEArn (ORCPT ); Mon, 4 Jan 2010 19:47:43 -0500 Received: from cantor2.suse.de ([195.135.220.15]:56176 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754414Ab0AEArl (ORCPT ); Mon, 4 Jan 2010 19:47:41 -0500 Date: Tue, 5 Jan 2010 01:47:42 +0100 From: Jan Kara To: Greg Kroah-Hartman , "Theodore Ts'o" Cc: linux-kernel@vger.kernel.org, stable@kernel.org, stable-review@kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk, Dmitry Monakhov , Jan Kara Subject: Re: [PATCH 83/97] ext4: Fix potential quota deadlock Message-ID: <20100105004741.GD3252@quack.suse.cz> References: <20100105003133.GA7199@kroah.com> <1262651630-7354-83-git-send-email-gregkh@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1262651630-7354-83-git-send-email-gregkh@suse.de> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11226 Lines: 259 On Mon 04-01-10 16:33:36, Greg Kroah-Hartman wrote: > From: Dmitry Monakhov > > commit d21cd8f163ac44b15c465aab7306db931c606908 upstream. Ted has found a serious bug in this patch over Christmas and fixed it. Ted, do you prefer to just discard this patch or will you add your fixes? Honza > We have to delay vfs_dq_claim_space() until allocation context destruction. > Currently we have following call-trace: > ext4_mb_new_blocks() > /* task is already holding ac->alloc_semp */ > ->ext4_mb_mark_diskspace_used > ->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */ > ->ext4_mb_release_context() /* drop ac->alloc_semp here */ > > Let's move quota claiming to ext4_da_update_reserve_space() > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.32-rc7 #18 > ------------------------------------------------------- > write-truncate-/3465 is trying to acquire lock: > (&s->s_dquot.dqptr_sem){++++..}, at: [] dquot_claim_space+0x3b/0x1b0 > > but task is already holding lock: > (&meta_group_info[i]->alloc_sem){++++..}, at: [] ext4_mb_load_buddy+0xb2/0x370 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #3 (&meta_group_info[i]->alloc_sem){++++..}: > [] __lock_acquire+0xd7b/0x1260 > [] lock_acquire+0xba/0xd0 > [] down_read+0x51/0x90 > [] ext4_mb_load_buddy+0xb2/0x370 > [] ext4_mb_free_blocks+0x46c/0x870 > [] ext4_free_blocks+0x73/0x130 > [] ext4_ext_truncate+0x76c/0x8d0 > [] ext4_truncate+0x187/0x5e0 > [] vmtruncate+0x6b/0x70 > [] inode_setattr+0x62/0x190 > [] ext4_setattr+0x25a/0x370 > [] notify_change+0x151/0x340 > [] do_truncate+0x6d/0xa0 > [] may_open+0x1d4/0x200 > [] do_filp_open+0x1eb/0x910 > [] do_sys_open+0x6d/0x140 > [] sys_open+0x2e/0x40 > [] sysenter_do_call+0x12/0x32 > > -> #2 (&ei->i_data_sem){++++..}: > [] __lock_acquire+0xd7b/0x1260 > [] lock_acquire+0xba/0xd0 > [] down_read+0x51/0x90 > [] ext4_get_blocks+0x47/0x450 > [] ext4_getblk+0x61/0x1d0 > [] ext4_bread+0x1f/0xa0 > [] ext4_quota_write+0x12c/0x310 > [] qtree_write_dquot+0x93/0x120 > [] v2_write_dquot+0x28/0x30 > [] dquot_commit+0xab/0xf0 > [] ext4_write_dquot+0x77/0x90 > [] ext4_mark_dquot_dirty+0x2f/0x50 > [] dquot_alloc_inode+0x101/0x180 > [] ext4_new_inode+0x602/0xf00 > [] ext4_create+0x89/0x150 > [] vfs_create+0xa2/0xc0 > [] do_filp_open+0x7a7/0x910 > [] do_sys_open+0x6d/0x140 > [] sys_open+0x2e/0x40 > [] sysenter_do_call+0x12/0x32 > > -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}: > [] __lock_acquire+0xd7b/0x1260 > [] lock_acquire+0xba/0xd0 > [] mutex_lock_nested+0x65/0x2d0 > [] vfs_load_quota_inode+0x4bd/0x5a0 > [] vfs_quota_on_path+0x5f/0x70 > [] ext4_quota_on+0x112/0x190 > [] sys_quotactl+0x44a/0x8a0 > [] sysenter_do_call+0x12/0x32 > > -> #0 (&s->s_dquot.dqptr_sem){++++..}: > [] __lock_acquire+0x1091/0x1260 > [] lock_acquire+0xba/0xd0 > [] down_read+0x51/0x90 > [] dquot_claim_space+0x3b/0x1b0 > [] ext4_mb_mark_diskspace_used+0x36f/0x380 > [] ext4_mb_new_blocks+0x34a/0x530 > [] ext4_ext_get_blocks+0x122b/0x13c0 > [] ext4_get_blocks+0x226/0x450 > [] mpage_da_map_blocks+0xc3/0xaa0 > [] ext4_da_writepages+0x506/0x790 > [] do_writepages+0x22/0x50 > [] __filemap_fdatawrite_range+0x6d/0x80 > [] filemap_flush+0x2b/0x30 > [] ext4_alloc_da_blocks+0x5c/0x60 > [] ext4_release_file+0x75/0xb0 > [] __fput+0xf9/0x210 > [] fput+0x27/0x30 > [] filp_close+0x4c/0x80 > [] put_files_struct+0x6e/0xd0 > [] exit_files+0x47/0x60 > [] do_exit+0x144/0x710 > [] do_group_exit+0x38/0xa0 > [] get_signal_to_deliver+0x2ac/0x410 > [] do_notify_resume+0xb9/0x890 > [] work_notifysig+0x13/0x21 > > other info that might help us debug this: > > 3 locks held by write-truncate-/3465: > #0: (jbd2_handle){+.+...}, at: [] start_this_handle+0x38f/0x5c0 > #1: (&ei->i_data_sem){++++..}, at: [] ext4_get_blocks+0xb6/0x450 > #2: (&meta_group_info[i]->alloc_sem){++++..}, at: [] ext4_mb_load_buddy+0xb2/0x370 > > stack backtrace: > Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18 > Call Trace: > [] ? printk+0x1d/0x22 > [] print_circular_bug+0xca/0xd0 > [] __lock_acquire+0x1091/0x1260 > [] ? sched_clock_local+0xd2/0x170 > [] ? trace_hardirqs_off_caller+0x20/0xd0 > [] lock_acquire+0xba/0xd0 > [] ? dquot_claim_space+0x3b/0x1b0 > [] down_read+0x51/0x90 > [] ? dquot_claim_space+0x3b/0x1b0 > [] dquot_claim_space+0x3b/0x1b0 > [] ext4_mb_mark_diskspace_used+0x36f/0x380 > [] ext4_mb_new_blocks+0x34a/0x530 > [] ? ext4_ext_find_extent+0x25d/0x280 > [] ext4_ext_get_blocks+0x122b/0x13c0 > [] ? sched_clock_local+0xd2/0x170 > [] ? sched_clock_cpu+0x120/0x160 > [] ? cpu_clock+0x4f/0x60 > [] ? trace_hardirqs_off_caller+0x20/0xd0 > [] ? down_write+0x8c/0xa0 > [] ext4_get_blocks+0x226/0x450 > [] ? sched_clock_cpu+0x120/0x160 > [] ? cpu_clock+0x4f/0x60 > [] ? trace_hardirqs_off+0xb/0x10 > [] mpage_da_map_blocks+0xc3/0xaa0 > [] ? find_get_pages_tag+0x16c/0x180 > [] ? find_get_pages_tag+0x0/0x180 > [] ? __mpage_da_writepage+0x16d/0x1a0 > [] ? pagevec_lookup_tag+0x2e/0x40 > [] ? write_cache_pages+0xdb/0x3d0 > [] ? __mpage_da_writepage+0x0/0x1a0 > [] ext4_da_writepages+0x506/0x790 > [] ? cpu_clock+0x4f/0x60 > [] ? sched_clock_local+0xd2/0x170 > [] ? sched_clock_cpu+0x120/0x160 > [] ? sched_clock_cpu+0x120/0x160 > [] ? ext4_da_writepages+0x0/0x790 > [] do_writepages+0x22/0x50 > [] __filemap_fdatawrite_range+0x6d/0x80 > [] filemap_flush+0x2b/0x30 > [] ext4_alloc_da_blocks+0x5c/0x60 > [] ext4_release_file+0x75/0xb0 > [] __fput+0xf9/0x210 > [] fput+0x27/0x30 > [] filp_close+0x4c/0x80 > [] put_files_struct+0x6e/0xd0 > [] exit_files+0x47/0x60 > [] do_exit+0x144/0x710 > [] ? lock_release_holdtime+0x33/0x210 > [] ? _spin_unlock_irq+0x27/0x30 > [] do_group_exit+0x38/0xa0 > [] ? trace_hardirqs_on+0xb/0x10 > [] get_signal_to_deliver+0x2ac/0x410 > [] do_notify_resume+0xb9/0x890 > [] ? trace_hardirqs_off_caller+0x20/0xd0 > [] ? lock_release_holdtime+0x33/0x210 > [] ? autoremove_wake_function+0x0/0x50 > [] ? trace_hardirqs_on_caller+0x134/0x190 > [] ? trace_hardirqs_on+0xb/0x10 > [] ? security_file_permission+0x14/0x20 > [] ? vfs_write+0x131/0x190 > [] ? do_sync_write+0x0/0x120 > [] ? sysenter_do_call+0x27/0x32 > [] work_notifysig+0x13/0x21 > > CC: Theodore Ts'o > Signed-off-by: Dmitry Monakhov > Signed-off-by: Jan Kara > Signed-off-by: Greg Kroah-Hartman > --- > fs/ext4/inode.c | 9 +++++++-- > fs/ext4/mballoc.c | 6 ------ > 2 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 374d38c..9fc624d 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1088,7 +1088,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks) > static void ext4_da_update_reserve_space(struct inode *inode, int used) > { > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); > - int total, mdb, mdb_free; > + int total, mdb, mdb_free, mdb_claim = 0; > > spin_lock(&EXT4_I(inode)->i_block_reservation_lock); > /* recalculate the number of metablocks still need to be reserved */ > @@ -1101,7 +1101,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) > > if (mdb_free) { > /* Account for allocated meta_blocks */ > - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks; > + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; > + BUG_ON(mdb_free < mdb_claim); > + mdb_free -= mdb_claim; > > /* update fs dirty blocks counter */ > percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); > @@ -1112,8 +1114,11 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) > /* update per-inode reservations */ > BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks); > EXT4_I(inode)->i_reserved_data_blocks -= used; > + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim); > spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > > + vfs_dq_claim_block(inode, used + mdb_claim); > + > /* > * free those over-booking quota for metadata blocks > */ > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 7d71148..82b9778 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2755,12 +2755,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, > if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) > /* release all the reserved blocks if non delalloc */ > percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks); > - else { > - percpu_counter_sub(&sbi->s_dirtyblocks_counter, > - ac->ac_b_ex.fe_len); > - /* convert reserved quota blocks to real quota blocks */ > - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len); > - } > > if (sbi->s_log_groups_per_flex) { > ext4_group_t flex_group = ext4_flex_group(sbi, > -- > 1.6.6 > -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/