Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:35883 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753382Ab3GJAV1 (ORCPT ); Tue, 9 Jul 2013 20:21:27 -0400 Date: Tue, 9 Jul 2013 20:21:20 -0400 From: "J. Bruce Fields" To: Dave Chinner Cc: Al Viro , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Theodore Ts'o" , Andreas Dilger Subject: Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code Message-ID: <20130710002120.GM32574@pad.fieldses.org> References: <1372882356-14168-1-git-send-email-bfields@redhat.com> <1372882356-14168-2-git-send-email-bfields@redhat.com> <20130709220411.GK3438@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20130709220411.GK3438@dastard> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote: > On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote: > > From: "J. Bruce Fields" > > > > We want to do this elsewhere as well. > > > > Cc: "Theodore Ts'o" > > Cc: Andreas Dilger > > Signed-off-by: J. Bruce Fields > > --- > > fs/ext4/ext4.h | 2 -- > > fs/ext4/ioctl.c | 4 ++-- > > fs/ext4/move_extent.c | 40 ++-------------------------------------- > > fs/inode.c | 29 +++++++++++++++++++++++++++++ > > include/linux/fs.h | 3 +++ > > 5 files changed, 36 insertions(+), 42 deletions(-) > > > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > > index 5aae3d1..3590abe 100644 Thanks for the comment: > Just to throw a spanner in the works - have you considered that > other filesystems might have different inode lock ordering rules? > > For example, XFS locks multiple inodes in ascending inode number > order, not ordered by pointer address. Hence we end up different > inode lock ordering at different layers of the stack and I can't see > that ending well.... What lock(s) is it taking exactly, where? If there's a possible deadlock, can we come up with a compatible ordering? > > diff --git a/fs/inode.c b/fs/inode.c > > index 00d5fc3..b8afbc7 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode) > > EXPORT_SYMBOL(unlock_new_inode); > > > > /** > > + * lock_two_nondirectories - take two i_mutexes on non-directory objects > > + * @inode1: first inode to lock > > + * @inode2: second inode to lock > > + */ > > +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2) > > +{ > > + if (inode1 < inode2) { > > + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT); > > + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD); > > + } else { > > + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); > > + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); > > + } > > +} > > +EXPORT_SYMBOL(lock_two_nondirectories); > > What makes this specific to non-directories? See http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@redhat.com> The only caller outside ext4 is vfs_rename_other. I think we could make it work for directories two if necessary though the ordering would be more complicated. Currently there's no reason. > If it's not to be used for directory inodes, then there should be > WARN_ON_ONCE() guards in the code... Sure. So something like the following. Hm. I also overlooked that ext4 had a BUG() for the case they're equal. Maybe we should keep that too if it's not overkill. --b. commit ad9a94b0e91d6057734e9835782e0c2cdc148bdc Author: J. Bruce Fields Date: Wed Apr 18 15:16:33 2012 -0400 vfs: pull ext4's double-i_mutex-locking into common code We want to do this elsewhere as well. Also catch any attempts to use it for directories (where this ordering would conflict with ancestor-first directory ordering in lock_rename). Cc: Andreas Dilger Cc: Dave Chinner Acked-by: Jeff Layton Acked-by: "Theodore Ts'o" Signed-off-by: J. Bruce Fields diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5aae3d1..3590abe 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first, struct inode *second); extern void ext4_double_up_write_data_sem(struct inode *orig_inode, struct inode *donor_inode); -void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2); -void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2); extern int ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 start_orig, __u64 start_donor, __u64 len, __u64 *moved_len); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 9491ac0..12048f7 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb, /* Protect orig inodes against a truncate and make sure, * that only 1 swap_inode_boot_loader is running. */ - ext4_inode_double_lock(inode, inode_bl); + lock_two_nondirectories(inode, inode_bl); truncate_inode_pages(&inode->i_data, 0); truncate_inode_pages(&inode_bl->i_data, 0); @@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb, ext4_inode_resume_unlocked_dio(inode); ext4_inode_resume_unlocked_dio(inode_bl); - ext4_inode_double_unlock(inode, inode_bl); + unlock_two_nondirectories(inode, inode_bl); iput(inode_bl); diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index 3dcbf36..986a838 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode, } /** - * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2 - * - * @inode1: the inode structure - * @inode2: the inode structure - * - * Lock two inodes' i_mutex - */ -void -ext4_inode_double_lock(struct inode *inode1, struct inode *inode2) -{ - BUG_ON(inode1 == inode2); - if (inode1 < inode2) { - mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT); - mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD); - } else { - mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); - mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); - } -} - -/** - * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2 - * - * @inode1: the inode that is released first - * @inode2: the inode that is released second - * - */ - -void -ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2) -{ - mutex_unlock(&inode1->i_mutex); - mutex_unlock(&inode2->i_mutex); -} - -/** * ext4_move_extents - Exchange the specified range of a file * * @o_filp: file structure of the original file @@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, return -EINVAL; } /* Protect orig and donor inodes against a truncate */ - ext4_inode_double_lock(orig_inode, donor_inode); + lock_two_nondirectories(orig_inode, donor_inode); /* Wait for all existing dio workers */ ext4_inode_block_unlocked_dio(orig_inode); @@ -1538,7 +1502,7 @@ out: ext4_double_up_write_data_sem(orig_inode, donor_inode); ext4_inode_resume_unlocked_dio(orig_inode); ext4_inode_resume_unlocked_dio(donor_inode); - ext4_inode_double_unlock(orig_inode, donor_inode); + unlock_two_nondirectories(orig_inode, donor_inode); return ret; } diff --git a/fs/inode.c b/fs/inode.c index 00d5fc3..8f3c6fa 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -980,6 +980,37 @@ void unlock_new_inode(struct inode *inode) EXPORT_SYMBOL(unlock_new_inode); /** + * lock_two_nondirectories - take two i_mutexes on non-directory objects + * @inode1: first inode to lock + * @inode2: second inode to lock + */ +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2) +{ + WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode)); + WARN_ON_ONCE(inode1 == inode2); + if (inode1 < inode2) { + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD); + } else { + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); + } +} +EXPORT_SYMBOL(lock_two_nondirectories); + +/** + * unlock_two_nondirectories - release locks from lock_two_nondirectories() + * @inode1: first inode to unlock + * @inode2: second inode to unlock + */ +void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2) +{ + mutex_unlock(&inode1->i_mutex); + mutex_unlock(&inode2->i_mutex); +} +EXPORT_SYMBOL(unlock_two_nondirectories); + +/** * iget5_locked - obtain an inode from a mounted file system * @sb: super block of file system * @hashval: hash value (usually inode number) to get diff --git a/include/linux/fs.h b/include/linux/fs.h index 65c2be2..3258761 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class I_MUTEX_QUOTA }; +void lock_two_nondirectories(struct inode *, struct inode*); +void unlock_two_nondirectories(struct inode *, struct inode*); + /* * NOTE: in a 32bit arch with a preemptable kernel and * an UP compile the i_size_read/write must be atomic