=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.22-rc6 #14
-------------------------------------------------------
elinks/4461 is trying to acquire lock:
(&(&ip->i_lock)->mr_lock/1){----}, at: [<c01ef3d0>] xfs_ilock+0x50/0xd0
but task is already holding lock:
(&(&ip->i_lock)->mr_lock){----}, at: [<c01ef3d0>] xfs_ilock+0x50/0xd0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&ip->i_lock)->mr_lock){----}:
[<c0137e0d>] __lock_acquire+0xe0d/0xfc0
[<c013802d>] lock_acquire+0x6d/0x90
[<c012e4f1>] down_write_nested+0x41/0x60
[<c01ef412>] xfs_ilock+0x92/0xd0
[<c01eff98>] xfs_iget_core+0x398/0x5c0
[<c01f027e>] xfs_iget+0xbe/0x120
[<c02098bd>] xfs_trans_iget+0xfd/0x160
[<c01f379c>] xfs_ialloc+0xac/0x500
[<c020a46c>] xfs_dir_ialloc+0x6c/0x2a0
[<c02109ae>] xfs_create+0x33e/0x630
[<c021bf7f>] xfs_vn_mknod+0x19f/0x210
[<c016fd12>] vfs_mknod+0xa2/0xf0
[<c03c3dbf>] unix_bind+0x1bf/0x2e0
[<c0373ba4>] sys_bind+0x54/0x80
[<c037518d>] sys_socketcall+0x7d/0x260
[<c01029ce>] sysenter_past_esp+0x5f/0x99
[<ffffffff>] 0xffffffff
-> #0 (&(&ip->i_lock)->mr_lock/1){----}:
[<c0137c85>] __lock_acquire+0xc85/0xfc0
[<c013802d>] lock_acquire+0x6d/0x90
[<c012e6e1>] down_read_nested+0x41/0x60
[<c01ef3d0>] xfs_ilock+0x50/0xd0
[<c020dd45>] xfs_lock_inodes+0x175/0x190
[<c02054af>] xfs_lock_for_rename+0x22f/0x280
[<c02055af>] xfs_rename+0xaf/0x8c0
[<c021bb64>] xfs_vn_rename+0x34/0x80
[<c0170237>] vfs_rename+0x307/0x370
[<c0171e3b>] sys_renameat+0x1bb/0x1f0
[<c0171e98>] sys_rename+0x28/0x30
[<c01029ce>] sysenter_past_esp+0x5f/0x99
[<ffffffff>] 0xffffffff
other info that might help us debug this:
3 locks held by elinks/4461:
#0: (&inode->i_mutex/1){--..}, at: [<c016f459>] lock_rename+0xe9/0xf0
#1: (&inode->i_mutex){--..}, at: [<c03c8a8c>] mutex_lock+0x1c/0x20
#2: (&(&ip->i_lock)->mr_lock){----}, at: [<c01ef3d0>] xfs_ilock+0x50/0xd0
stack backtrace:
[<c010398a>] show_trace_log_lvl+0x1a/0x30
[<c01046a2>] show_trace+0x12/0x20
[<c0104765>] dump_stack+0x15/0x20
[<c0135d4c>] print_circular_bug_tail+0x6c/0x80
[<c0137c85>] __lock_acquire+0xc85/0xfc0
[<c013802d>] lock_acquire+0x6d/0x90
[<c012e6e1>] down_read_nested+0x41/0x60
[<c01ef3d0>] xfs_ilock+0x50/0xd0
[<c020dd45>] xfs_lock_inodes+0x175/0x190
[<c02054af>] xfs_lock_for_rename+0x22f/0x280
[<c02055af>] xfs_rename+0xaf/0x8c0
[<c021bb64>] xfs_vn_rename+0x34/0x80
[<c0170237>] vfs_rename+0x307/0x370
[<c0171e3b>] sys_renameat+0x1bb/0x1f0
[<c0171e98>] sys_rename+0x28/0x30
[<c01029ce>] sysenter_past_esp+0x5f/0x99
=======================
On 6/25/07, Johannes Weiner <[email protected]> wrote:
> [...]
> this is what just hit the ring buffer when I was surfing with elinks on a
> brand-new -rc6.
Johannes:
This is a known bogus warning. You can safely ignore it.
David, Ingo:
[ Ok, so we know that XFS wants to lock inodes in ascending inode number
order and not strictly the parent-first-child-second order that rest of the fs/
code does, so that makes it difficult to teach lockdep about this kind of lock
ordering ... ]
However, this (bogus) warning still causes way too much noise on the lists
(2-3 or more every week?) and most users wouldn't understand how or why
this warning is bogus, so would get unnecessarily disturbed about it.
Could there be a way to whitelist such "known bogus cases" in lockdep
and stop it from complaining?
Satyam
* Satyam Sharma <[email protected]> wrote:
> [ Ok, so we know that XFS wants to lock inodes in ascending inode
> number order and not strictly the parent-first-child-second order that
> rest of the fs/ code does, so that makes it difficult to teach lockdep
> about this kind of lock ordering ... ]
hm, there's a recent API addition to lockdep that should help with this:
have you looked into the patch below, could it be used to solve the XFS
problem? Basically when you are one step deeper into a recursive locking
scenario you can use lock_set_subclass() on the held lock to reset it
one level higher.
this preserves lockdep checking but allows arbitrary deep locking.
Ingo
---------------------->
Subject: [patch] lockdep: lock_set_subclass - reset a held lock's subclass
From: Peter Zijlstra <[email protected]>
this can be used to reset a held lock's subclass, for arbitrary-depth
iterated data structures such as trees or lists which have per-node
locks.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/lockdep.h | 4 ++
kernel/lockdep.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+)
Index: linux/include/linux/lockdep.h
===================================================================
--- linux.orig/include/linux/lockdep.h
+++ linux/include/linux/lockdep.h
@@ -243,6 +243,9 @@ extern void lock_acquire(struct lockdep_
extern void lock_release(struct lockdep_map *lock, int nested,
unsigned long ip);
+extern void lock_set_subclass(struct lockdep_map *lock, unsigned int subclass,
+ unsigned long ip);
+
# define INIT_LOCKDEP .lockdep_recursion = 0,
#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
@@ -259,6 +262,7 @@ static inline void lockdep_on(void)
# define lock_acquire(l, s, t, r, c, i) do { } while (0)
# define lock_release(l, n, i) do { } while (0)
+# define lock_set_subclass(l, s, i) do { } while (0)
# define lockdep_init() do { } while (0)
# define lockdep_info() do { } while (0)
# define lockdep_init_map(lock, name, key, sub) do { (void)(key); } while (0)
Index: linux/kernel/lockdep.c
===================================================================
--- linux.orig/kernel/lockdep.c
+++ linux/kernel/lockdep.c
@@ -2278,6 +2278,55 @@ static int check_unlock(struct task_stru
return 1;
}
+static int
+__lock_set_subclass(struct lockdep_map *lock,
+ unsigned int subclass, unsigned long ip)
+{
+ struct task_struct *curr = current;
+ struct held_lock *hlock, *prev_hlock;
+ struct lock_class *class;
+ unsigned int depth;
+ int i;
+
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(!depth))
+ return 0;
+
+ prev_hlock = NULL;
+ for (i = depth-1; i >= 0; i--) {
+ hlock = curr->held_locks + i;
+ /*
+ * We must not cross into another context:
+ */
+ if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+ break;
+ if (hlock->instance == lock)
+ goto found_it;
+ prev_hlock = hlock;
+ }
+ return print_unlock_inbalance_bug(curr, lock, ip);
+
+found_it:
+ class = register_lock_class(lock, subclass, 0);
+ hlock->class = class;
+
+ curr->lockdep_depth = i;
+ curr->curr_chain_key = hlock->prev_chain_key;
+
+ for (; i < depth; i++) {
+ hlock = curr->held_locks + i;
+ if (!__lock_acquire(hlock->instance,
+ hlock->class->subclass, hlock->trylock,
+ hlock->read, hlock->check, hlock->hardirqs_off,
+ hlock->acquire_ip))
+ return 0;
+ }
+
+ if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth))
+ return 0;
+ return 1;
+}
+
/*
* Remove the lock to the list of currently held locks in a
* potentially non-nested (out of order) manner. This is a
@@ -2473,6 +2522,26 @@ void lock_release(struct lockdep_map *lo
EXPORT_SYMBOL_GPL(lock_release);
+void
+lock_set_subclass(struct lockdep_map *lock,
+ unsigned int subclass, unsigned long ip)
+{
+ unsigned long flags;
+
+ if (unlikely(current->lockdep_recursion))
+ return;
+
+ raw_local_irq_save(flags);
+ current->lockdep_recursion = 1;
+ check_flags(flags);
+ if (__lock_set_subclass(lock, subclass, ip))
+ check_chain_key(current);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+
+EXPORT_SYMBOL_GPL(lock_set_subclass);
+
/*
* Used by the testsuite, sanitize the validator state
* after a simulated failure:
On Mon, Jun 25, 2007 at 11:01:11PM +0200, Ingo Molnar wrote:
>
> * Satyam Sharma <[email protected]> wrote:
>
> > [ Ok, so we know that XFS wants to lock inodes in ascending inode
> > number order and not strictly the parent-first-child-second order that
> > rest of the fs/ code does, so that makes it difficult to teach lockdep
> > about this kind of lock ordering ... ]
It does both - parent-first/child-second and ascending inode # order,
which is where the problem is. standing alone, these seem fine, but
they don't appear to work when the child has a lower inode number
than the parent.
> hm, there's a recent API addition to lockdep that should help with this:
> have you looked into the patch below, could it be used to solve the XFS
> problem? Basically when you are one step deeper into a recursive locking
> scenario you can use lock_set_subclass() on the held lock to reset it
> one level higher.
Peter Zijlstra already pointed me at that patch, Ingo ;). I'm looking
into it at the moment. Not being a lockdep expert, it's taking me a
little time to understand exactly what is needed here.
At first I wasn't sure that this would work, but now I've seen the
patch that used it, I suspect that it can be used. That patch
(http://lkml.org/lkml/2007/1/28/88) has three cases enumerated
(prev,cur,next) to match the three node types in the list and that
the "next" got set back to "cur" via lock_set_subclass() when
walking the list so that lockdep always sees cur -> next
transistions when walking the list. Have I read this correctly?
Reading this, it seems to me that the xfs_lock_inodes() code needs
to set the lock subclass of the first inode to "parent" (and lock it
as a parent) and then as it walks across the remining inodes it sets
the previous inode to a parent and locks the current inode as a
child.
It seems that I'd also need to make sure that this is done on the
normal parent/child lock ordering as well.
Does this make any sense? lockdep is not something I've paid much
attention to up to this point (someone else did the current notation
and now on leave for a couple more months), so I'm trying to pick up
the pieces as quickly as possible...
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
On 26-06-2007 04:16, David Chinner wrote:
> On Mon, Jun 25, 2007 at 11:01:11PM +0200, Ingo Molnar wrote:
>> * Satyam Sharma <[email protected]> wrote:
>>
>>> [ Ok, so we know that XFS wants to lock inodes in ascending inode
>>> number order and not strictly the parent-first-child-second order that
>>> rest of the fs/ code does, so that makes it difficult to teach lockdep
>>> about this kind of lock ordering ... ]
>
> It does both - parent-first/child-second and ascending inode # order,
> which is where the problem is. standing alone, these seem fine, but
> they don't appear to work when the child has a lower inode number
> than the parent.
...
>From xfs_inode.h:
/*
* Flags for lockdep annotations.
*
* XFS_I[O]LOCK_PARENT - for operations that require locking two inodes
* (ie directory operations that require locking a directory inode and
* an entry inode). The first inode gets locked with this flag so it
* gets a lockdep subclass of 1 and the second lock will have a lockdep
* subclass of 0.
*
* XFS_I[O]LOCK_INUMORDER - for locking several inodes at the some time
* with xfs_lock_inodes(). This flag is used as the starting subclass
* and each subsequent lock acquired will increment the subclass by one.
* So the first lock acquired will have a lockdep subclass of 2, the
* second lock will have a lockdep subclass of 3, and so on.
*/
I don't know xfs code, and probably miss something, but it seems
there could be some inconsistency: lockdep warning shows mr_lock/1
taken both before and after mr_lock (i.e. /0). According to the
above comment there should be always 1 before 0...
Cheers,
Jarek P.
On Tue, Jun 26, 2007 at 11:35:20AM +0200, Jarek Poplawski wrote:
> On 26-06-2007 04:16, David Chinner wrote:
> > It does both - parent-first/child-second and ascending inode # order,
> > which is where the problem is. standing alone, these seem fine, but
> > they don't appear to work when the child has a lower inode number
> > than the parent.
> ...
>
> >From xfs_inode.h:
>
> /*
> * Flags for lockdep annotations.
> *
> * XFS_I[O]LOCK_PARENT - for operations that require locking two inodes
> * (ie directory operations that require locking a directory inode and
> * an entry inode). The first inode gets locked with this flag so it
> * gets a lockdep subclass of 1 and the second lock will have a lockdep
> * subclass of 0.
> *
> * XFS_I[O]LOCK_INUMORDER - for locking several inodes at the some time
> * with xfs_lock_inodes(). This flag is used as the starting subclass
> * and each subsequent lock acquired will increment the subclass by one.
> * So the first lock acquired will have a lockdep subclass of 2, the
> * second lock will have a lockdep subclass of 3, and so on.
> */
>
> I don't know xfs code, and probably miss something, but it seems
> there could be some inconsistency: lockdep warning shows mr_lock/1
> taken both before and after mr_lock (i.e. /0). According to the
> above comment there should be always 1 before 0...
That just fired some rusty neurons.
#define XFS_IOLOCK_SHIFT 16
#define XFS_IOLOCK_PARENT (1 << XFS_IOLOCK_SHIFT)
#define XFS_IOLOCK_INUMORDER (2 << XFS_IOLOCK_SHIFT)
#define XFS_ILOCK_SHIFT 24
#define XFS_ILOCK_PARENT (1 << XFS_ILOCK_SHIFT)
#define XFS_ILOCK_INUMORDER (2 << XFS_ILOCK_SHIFT)
So, in a lock_mode parameter, the upper 8 bits are for the ILOCK lockdep
subclass, and the 16..23 bits are for the IOLOCK lockdep subclass.
Where do we add them?
static inline int
xfs_lock_inumorder(int lock_mode, int subclass)
{
if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
lock_mode |= (subclass + XFS_IOLOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
lock_mode |= (subclass + XFS_ILOCK_INUMORDER) << XFS_ILOCK_SHIFT;
return lock_mode;
}
OH, look at those nice overflow bugs in that in that code. We shift
the XFS_IOLOCK_INUMORDER and XFS_ILOCK_INUMORDER bits out the far
side of the lock_mode variable result in lock subclasses of 0-3 instead
of 2-5....
Bugger, eh?
Patch below should fix this (untested).
Jarek - thanks for pointing what I should have seen earlier.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/xfs_inode.h | 15 +++++++++------
fs/xfs/xfs_vnodeops.c | 4 ++--
2 files changed, 11 insertions(+), 8 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c 2007-06-25 13:56:20.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-06-26 22:46:55.412060598 +1000
@@ -2256,9 +2256,9 @@ static inline int
xfs_lock_inumorder(int lock_mode, int subclass)
{
if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
- lock_mode |= (subclass + XFS_IOLOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
+ lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
- lock_mode |= (subclass + XFS_ILOCK_INUMORDER) << XFS_ILOCK_SHIFT;
+ lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT;
return lock_mode;
}
Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.h 2007-06-20 17:59:35.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_inode.h 2007-06-26 22:46:50.104749916 +1000
@@ -386,19 +386,22 @@ xfs_iflags_test(xfs_inode_t *ip, unsigne
* gets a lockdep subclass of 1 and the second lock will have a lockdep
* subclass of 0.
*
- * XFS_I[O]LOCK_INUMORDER - for locking several inodes at the some time
+ * XFS_LOCK_INUMORDER - for locking several inodes at the some time
* with xfs_lock_inodes(). This flag is used as the starting subclass
* and each subsequent lock acquired will increment the subclass by one.
* So the first lock acquired will have a lockdep subclass of 2, the
- * second lock will have a lockdep subclass of 3, and so on.
+ * second lock will have a lockdep subclass of 3, and so on. It is
+ * the responsibility of the class builder to shift this to the correct
+ * portion of the lock_mode lockdep mask.
*/
+#define XFS_LOCK_PARENT 1
+#define XFS_LOCK_INUMORDER 2
+
#define XFS_IOLOCK_SHIFT 16
-#define XFS_IOLOCK_PARENT (1 << XFS_IOLOCK_SHIFT)
-#define XFS_IOLOCK_INUMORDER (2 << XFS_IOLOCK_SHIFT)
+#define XFS_IOLOCK_PARENT (XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT)
#define XFS_ILOCK_SHIFT 24
-#define XFS_ILOCK_PARENT (1 << XFS_ILOCK_SHIFT)
-#define XFS_ILOCK_INUMORDER (2 << XFS_ILOCK_SHIFT)
+#define XFS_ILOCK_PARENT (XFS_LOCK_PARENT << XFS_ILOCK_SHIFT)
#define XFS_IOLOCK_DEP_MASK 0x00ff0000
#define XFS_ILOCK_DEP_MASK 0xff000000
Patch looks good, Dave.
(though, I stuffed up reviewing that bit of code previously:-)
Oh, previous typo: s/inodes at the some time/inodes at the same time/
--Tim
David Chinner wrote:
> On Tue, Jun 26, 2007 at 11:35:20AM +0200, Jarek Poplawski wrote:
>> On 26-06-2007 04:16, David Chinner wrote:
>>> It does both - parent-first/child-second and ascending inode # order,
>>> which is where the problem is. standing alone, these seem fine, but
>>> they don't appear to work when the child has a lower inode number
>>> than the parent.
>> ...
>>
>> >From xfs_inode.h:
>>
>> /*
>> * Flags for lockdep annotations.
>> *
>> * XFS_I[O]LOCK_PARENT - for operations that require locking two inodes
>> * (ie directory operations that require locking a directory inode and
>> * an entry inode). The first inode gets locked with this flag so it
>> * gets a lockdep subclass of 1 and the second lock will have a lockdep
>> * subclass of 0.
>> *
>> * XFS_I[O]LOCK_INUMORDER - for locking several inodes at the some time
>> * with xfs_lock_inodes(). This flag is used as the starting subclass
>> * and each subsequent lock acquired will increment the subclass by one.
>> * So the first lock acquired will have a lockdep subclass of 2, the
>> * second lock will have a lockdep subclass of 3, and so on.
>> */
>>
>> I don't know xfs code, and probably miss something, but it seems
>> there could be some inconsistency: lockdep warning shows mr_lock/1
>> taken both before and after mr_lock (i.e. /0). According to the
>> above comment there should be always 1 before 0...
>
> That just fired some rusty neurons.
>
> #define XFS_IOLOCK_SHIFT 16
> #define XFS_IOLOCK_PARENT (1 << XFS_IOLOCK_SHIFT)
> #define XFS_IOLOCK_INUMORDER (2 << XFS_IOLOCK_SHIFT)
>
> #define XFS_ILOCK_SHIFT 24
> #define XFS_ILOCK_PARENT (1 << XFS_ILOCK_SHIFT)
> #define XFS_ILOCK_INUMORDER (2 << XFS_ILOCK_SHIFT)
>
> So, in a lock_mode parameter, the upper 8 bits are for the ILOCK lockdep
> subclass, and the 16..23 bits are for the IOLOCK lockdep subclass.
>
> Where do we add them?
>
> static inline int
> xfs_lock_inumorder(int lock_mode, int subclass)
> {
> if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
> lock_mode |= (subclass + XFS_IOLOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
> if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
> lock_mode |= (subclass + XFS_ILOCK_INUMORDER) << XFS_ILOCK_SHIFT;
>
> return lock_mode;
> }
>
>
> OH, look at those nice overflow bugs in that in that code. We shift
> the XFS_IOLOCK_INUMORDER and XFS_ILOCK_INUMORDER bits out the far
> side of the lock_mode variable result in lock subclasses of 0-3 instead
> of 2-5....
>
> Bugger, eh?
>
> Patch below should fix this (untested).
>
> Jarek - thanks for pointing what I should have seen earlier.
>
> Cheers,
>
> Dave.
> Patch below should fix this (untested).
Just tested 2.6.22-rc6: message is gone when patch is applied. But
deleting some directories in /var/tmp (which lives on xfs) I got:
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
Thomas
--
keep mailinglists in english, feel free to send PM in german