LinuxLists.cc - [RFC][PATCH 7/7] lockdep: spin_lock_nest

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

> NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.

A) It is probably a good idea to put this in a comment somewhere near
where spin_lock_nest_lock() is declared.

B) It is probably a good idea to write that comment in such a way that
dumb people like me understand what the limit is. The sentence I
quoted above is too telegraphic for me to get. Is the point that no
more than 48 spinlocks can be held at once, even if the inner locks
are protected by some top level lock? Or do you mean something else?

- R.

2008-08-04 14:20:00

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 07:07 -0700, Roland Dreier wrote:
> > NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
>
> A) It is probably a good idea to put this in a comment somewhere near
> where spin_lock_nest_lock() is declared.

Its not particular to this annotation - its true for anything lockdep.

> B) It is probably a good idea to write that comment in such a way that
> dumb people like me understand what the limit is. The sentence I
> quoted above is too telegraphic for me to get. Is the point that no
> more than 48 spinlocks can be held at once, even if the inner locks
> are protected by some top level lock? Or do you mean something else?

No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything
covered by lockdep) held by any one code-path; including nested
interrupt contexts.

2008-08-04 14:26:37

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

> No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything
> covered by lockdep) held by any one code-path; including nested
> interrupt contexts.

Does that mean that something like the new mm_take_all_locks() operation
is going to explode if someone tries to use it with lockdep on?

- R.

2008-08-04 14:32:53

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 07:26 -0700, Roland Dreier wrote:
> > No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything
> > covered by lockdep) held by any one code-path; including nested
> > interrupt contexts.
>
> Does that mean that something like the new mm_take_all_locks() operation
> is going to explode if someone tries to use it with lockdep on?

Gah - yes, clearly nobody tried this.. :-/

Just looking at the code it will not only run into this limit, but it
would warn about recursion on the second file/anon vma due to utter lack
of annotation.

Why are people still developing without lockdep?

/me sad

2008-08-04 14:55:24

by Dave Jones

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, Aug 04, 2008 at 04:32:12PM +0200, Peter Zijlstra wrote:
> On Mon, 2008-08-04 at 07:26 -0700, Roland Dreier wrote:
> > > No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything
> > > covered by lockdep) held by any one code-path; including nested
> > > interrupt contexts.
> >
> > Does that mean that something like the new mm_take_all_locks() operation
> > is going to explode if someone tries to use it with lockdep on?
>
> Gah - yes, clearly nobody tried this.. :-/
>
> Just looking at the code it will not only run into this limit, but it
> would warn about recursion on the second file/anon vma due to utter lack
> of annotation.
>
> Why are people still developing without lockdep?

More puzzling, is why hasn't this triggered in the Fedora rawhide kernels,
which do have lockdep enabled.

Dave

--
http://www.codemonkey.org.uk

2008-08-04 14:56:18

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 10:53 -0400, Dave Jones wrote:
> On Mon, Aug 04, 2008 at 04:32:12PM +0200, Peter Zijlstra wrote:
> > On Mon, 2008-08-04 at 07:26 -0700, Roland Dreier wrote:
> > > > No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything
> > > > covered by lockdep) held by any one code-path; including nested
> > > > interrupt contexts.
> > >
> > > Does that mean that something like the new mm_take_all_locks() operation
> > > is going to explode if someone tries to use it with lockdep on?
> >
> > Gah - yes, clearly nobody tried this.. :-/
> >
> > Just looking at the code it will not only run into this limit, but it
> > would warn about recursion on the second file/anon vma due to utter lack
> > of annotation.
> >
> > Why are people still developing without lockdep?
>
> More puzzling, is why hasn't this triggered in the Fedora rawhide kernels,
> which do have lockdep enabled.

My guess is that the kvm thing attaches before there are any vma, and
leaves after all the vma are gone. So it would never actually trigger.

2008-08-04 16:27:18

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, Aug 04, 2008 at 04:56:03PM +0200, Peter Zijlstra wrote:
> My guess is that the kvm thing attaches before there are any vma, and
> leaves after all the vma are gone. So it would never actually trigger.

Yes, lockdep seems to be fine with kvm in kernel mainline in my
current and past testing. Andrew asked me to check this long ago.

vmx ~ # zgrep LOCKDEP /proc/config.gz
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y

CPA self-test:
4k 26112 large 1997 gb 0 x 2652[ffff880000000000-ffff8800bffff000]
miss 262144
4k 186880 large 1683 gb 0 x 43021[ffff880000000000-ffff8800bffff000]
miss 262144
4k 186880 large 1683 gb 0 x 43021[ffff880000000000-ffff8800bffff000]
miss 262144
ok.
loaded kvm module ()
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0
kvm: emulating exchange as write
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0

I can't see lockdep errors in dmesg starting one more multiple VM in a
loop (all run on a quadcore).

GRU is likely the same.

The only real place where lockdep is unusable in my experience is
preempt-RT, it grinds it to an halt during boot on 8-way before
reaching the shell.

2008-08-04 16:39:24

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 18:26 +0200, Andrea Arcangeli wrote:
> On Mon, Aug 04, 2008 at 04:56:03PM +0200, Peter Zijlstra wrote:
> > My guess is that the kvm thing attaches before there are any vma, and
> > leaves after all the vma are gone. So it would never actually trigger.
>
> Yes, lockdep seems to be fine with kvm in kernel mainline in my
> current and past testing. Andrew asked me to check this long ago.
>
> vmx ~ # zgrep LOCKDEP /proc/config.gz
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_LOCKDEP=y

You also need CONFIG_PROVE_LOCKING, but I'll assume that's set too.

> CPA self-test:
> 4k 26112 large 1997 gb 0 x 2652[ffff880000000000-ffff8800bffff000]
> miss 262144
> 4k 186880 large 1683 gb 0 x 43021[ffff880000000000-ffff8800bffff000]
> miss 262144
> 4k 186880 large 1683 gb 0 x 43021[ffff880000000000-ffff8800bffff000]
> miss 262144
> ok.
> loaded kvm module ()
> apic write: bad size=1 fee00030
> Ignoring de-assert INIT to vcpu 0
> Ignoring de-assert INIT to vcpu 0
> kvm: emulating exchange as write
> apic write: bad size=1 fee00030
> Ignoring de-assert INIT to vcpu 0
> Ignoring de-assert INIT to vcpu 0
> apic write: bad size=1 fee00030
> Ignoring de-assert INIT to vcpu 0
> Ignoring de-assert INIT to vcpu 0
> apic write: bad size=1 fee00030
> Ignoring de-assert INIT to vcpu 0
> apic write: bad size=1 fee00030
> Ignoring de-assert INIT to vcpu 0
> apic write: bad size=1 fee00030
> Ignoring de-assert INIT to vcpu 0
> Ignoring de-assert INIT to vcpu 0
> Ignoring de-assert INIT to vcpu 0
>
> I can't see lockdep errors in dmesg starting one more multiple VM in a
> loop (all run on a quadcore).

Dave Jones just handed me:

https://bugzilla.redhat.com/show_bug.cgi?id=457779

> GRU is likely the same.
>
> The only real place where lockdep is unusable in my experience is
> preempt-RT, it grinds it to an halt during boot on 8-way before
> reaching the shell.

David Miller just did a patch that might fix that.

2008-08-04 17:27:42

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, Aug 04, 2008 at 06:38:55PM +0200, Peter Zijlstra wrote:
> You also need CONFIG_PROVE_LOCKING, but I'll assume that's set too.

Sorry, that wasn't the problem, but my current testing passed because
of another error... to test this report I rebuilt a kvm configured for
rhel not for mainline so it wasn't the right test...

When "long ago" I tested that this was working fine (actually when
Andrew asked me), I guess lockdep was working because the
implementation with the vmalloc array was slightly different,
otherwise I don't know. I'm fairly certain that it worked fine at some
point, and I didn't expect the refactoring to generate false positives.

> Dave Jones just handed me:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=457779

I can reproduce this now yes after a 'make sync'.

Obviously this is a bug in lockdep that it trips over this otherwise
if lockdep was right the kernel should deadlock while this is just a
false positive and everything runs fine.

I assume it can't understand the spinlock address is different (I
think it uses the address as key only for static locks), so I wonder
if you could call print_deadlock_bug() from the failure path of the
spinlock to solve this?

Do things like double_rq_lock works just because rq1 and rq2 don't
have the same name like in my case where all locks are called "mapping"->?

> David Miller just did a patch that might fix that.

Woow cool, after 11 months I lost any hope that lockdep could ever
work in that environment... Was it an actual bug or is this some way
to lower the complexity of the graph build?

2008-08-04 17:47:23

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

From: Andrea Arcangeli <[email protected]>

Lockdep can't recognize if spinlocks are at a different address. So
trylock avoids lockdep to generate false positives. After lockdep will
be fixed this change can and should be reverted.

Signed-off-by: Andrea Arcangeli <[email protected]>
---

On Mon, Aug 04, 2008 at 07:27:28PM +0200, Andrea Arcangeli wrote:
> I can reproduce this now yes after a 'make sync'.

(btw it's not like I forgot to sync, but I sync against the wrong
source tree previously because it was in the bash history, it's a bit
complicate to explain)

> I assume it can't understand the spinlock address is different (I
> think it uses the address as key only for static locks), so I wonder
> if you could call print_deadlock_bug() from the failure path of the
> spinlock to solve this?

In the meantime (as I doubt lockdep will get fixed any time soon) this
will workaround it.

diff -r 3469dce61df1 mm/mmap.c
--- a/mm/mmap.c Tue Jul 29 20:01:28 2008 +0200
+++ b/mm/mmap.c Mon Aug 04 19:41:53 2008 +0200
@@ -2279,8 +2279,13 @@ static void vm_lock_anon_vma(struct anon
/*
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
+ *
+ * spin_lock would confuse lockdep who can't
+ * differentiate between the 'mapping' always changing
+ * address.
*/
- spin_lock(&anon_vma->lock);
+ while (!spin_trylock(&anon_vma->lock))
+ cpu_relax();
/*
* We can safely modify head.next after taking the
* anon_vma->lock. If some other vma in this mm shares
@@ -2310,7 +2315,13 @@ static void vm_lock_mapping(struct addre
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- spin_lock(&mapping->i_mmap_lock);
+ /*
+ * spin_lock would confuse lockdep who can't
+ * differentiate between the 'mapping' always changing
+ * address.
+ */
+ while (!spin_trylock(&mapping->i_mmap_lock))
+ cpu_relax();
}
}

2008-08-04 17:57:43

[permalink] [raw]

Subject: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

From: Andrea Arcangeli <[email protected]>

Lockdep can't recognize if spinlocks are at a different address. So
using trylock in a loop is one way to avoid lockdep to generate false
positives. After lockdep will be fixed this change can and should be
reverted.

Signed-off-by: Andrea Arcangeli <[email protected]>
---

Resubmit because I didn't update the subject and so I improved the
comments too. This is mostly to show it's a bug in lockdep and it can
be trivially worked around without having to fix lockdep for real, any
superior solution to this hack is more than welcome and recommended.

diff -r 3469dce61df1 mm/mmap.c
--- a/mm/mmap.c Tue Jul 29 20:01:28 2008 +0200
+++ b/mm/mmap.c Mon Aug 04 19:54:27 2008 +0200
@@ -2279,8 +2279,12 @@ static void vm_lock_anon_vma(struct anon
/*
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
+ *
+ * spin_lock would confuse lockdep who doesn't notice
+ * the 'anon_vma' always changing address.
*/
- spin_lock(&anon_vma->lock);
+ while (!spin_trylock(&anon_vma->lock))
+ cpu_relax();
/*
* We can safely modify head.next after taking the
* anon_vma->lock. If some other vma in this mm shares
@@ -2310,7 +2314,12 @@ static void vm_lock_mapping(struct addre
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- spin_lock(&mapping->i_mmap_lock);
+ /*
+ * spin_lock would confuse lockdep who doesn't notice
+ * the 'mapping' always changing address.
+ */
+ while (!spin_trylock(&mapping->i_mmap_lock))
+ cpu_relax();
}
}

2008-08-04 18:06:36

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

Peter Zijlstra wrote:
> Expose the new lock protection lock.
>
> This can be used to annotate places where we take multiple locks of the
> same class and avoid deadlocks by always taking another (top-level) lock
> first.
>

OK, so the expected usage is:

spin_lock(&outer_lock);
/* take in any order */
spin_lock_nest_lock(&inner[0], &outer_lock);
spin_lock_nest_lock(&inner[2], &outer_lock);
spin_lock_nest_lock(&inner[1], &outer_lock);
...

?

And it's OK to

1. take inner locks one at a time without holding the outer lock
2. use plain spin_lock on inner locks when you're taking them one at
a time, and
3. release the outer lock before releasing the inner locks

but it's not OK to try to use different outer locks for a given inner lock.

Yes?

J

> NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> include/linux/lockdep.h | 2 ++
> include/linux/spinlock.h | 6 ++++++
> kernel/spinlock.c | 11 +++++++++++
> 3 files changed, 19 insertions(+)
>
> Index: linux-2.6/include/linux/lockdep.h
> ===================================================================
> --- linux-2.6.orig/include/linux/lockdep.h
> +++ linux-2.6/include/linux/lockdep.h
> @@ -410,8 +410,10 @@ static inline void print_irqtrace_events
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> # ifdef CONFIG_PROVE_LOCKING
> # define spin_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i)
> +# define spin_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 2, n, i)
> # else
> # define spin_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 1, NULL, i)
> +# define spin_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, NULL, i)
> # endif
> # define spin_release(l, n, i) lock_release(l, n, i)
> #else
> Index: linux-2.6/include/linux/spinlock.h
> ===================================================================
> --- linux-2.6.orig/include/linux/spinlock.h
> +++ linux-2.6/include/linux/spinlock.h
> @@ -183,8 +183,14 @@ do { \
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> # define spin_lock_nested(lock, subclass) _spin_lock_nested(lock, subclass)
> +# define spin_lock_nest_lock(lock, nest_lock) \
> + do { \
> + typecheck(struct lockdep_map *, &nest_lock->dep_map); \
> + _spin_lock_nest_lock(lock, &nest_lock->dep_map); \
> + } while (0)
> #else
> # define spin_lock_nested(lock, subclass) _spin_lock(lock)
> +# define spin_lock_nest_lock(lock, nest_lock) _spin_lock(lock)
> #endif
>
> #define write_lock(lock) _write_lock(lock)
> Index: linux-2.6/kernel/spinlock.c
> ===================================================================
> --- linux-2.6.orig/kernel/spinlock.c
> +++ linux-2.6/kernel/spinlock.c
> @@ -292,6 +292,7 @@ void __lockfunc _spin_lock_nested(spinlo
> }
>
> EXPORT_SYMBOL(_spin_lock_nested);
> +
> unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass)
> {
> unsigned long flags;
> @@ -314,6 +315,16 @@ unsigned long __lockfunc _spin_lock_irqs
>
> EXPORT_SYMBOL(_spin_lock_irqsave_nested);
>
> +void __lockfunc _spin_lock_nest_lock(spinlock_t *lock,
> + struct lockdep_map *nest_lock)
> +{
> + preempt_disable();
> + spin_acquire_nest(&lock->dep_map, 0, 0, nest_lock, _RET_IP_);
> + LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
> +}
> +
> +EXPORT_SYMBOL(_spin_lock_nest_lock)
> +
> #endif
>
> void __lockfunc _spin_unlock(spinlock_t *lock)
>
>

2008-08-04 18:49:34

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 19:27 +0200, Andrea Arcangeli wrote:
> On Mon, Aug 04, 2008 at 06:38:55PM +0200, Peter Zijlstra wrote:
> > You also need CONFIG_PROVE_LOCKING, but I'll assume that's set too.
>
> Sorry, that wasn't the problem, but my current testing passed because
> of another error... to test this report I rebuilt a kvm configured for
> rhel not for mainline so it wasn't the right test...
>
> When "long ago" I tested that this was working fine (actually when
> Andrew asked me), I guess lockdep was working because the
> implementation with the vmalloc array was slightly different,
> otherwise I don't know. I'm fairly certain that it worked fine at some
> point, and I didn't expect the refactoring to generate false positives.
>
> > Dave Jones just handed me:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=457779
>
> I can reproduce this now yes after a 'make sync'.
>
> Obviously this is a bug in lockdep that it trips over this otherwise
> if lockdep was right the kernel should deadlock while this is just a
> false positive and everything runs fine.
>
> I assume it can't understand the spinlock address is different (I
> think it uses the address as key only for static locks), so I wonder
> if you could call print_deadlock_bug() from the failure path of the
> spinlock to solve this?
>
> Do things like double_rq_lock works just because rq1 and rq2 don't
> have the same name like in my case where all locks are called "mapping"->?

Lockdep doesn't look at individual lock instances, but instead works on
lock classes. The rq locks used to have a separate class each, but the
third patch in this series changes that.

The advantage of not working on lock instances is that the dependency
graph becomes much smaller, and it will report problems much earlier,
the down-side is that you will have to augment the information a little.

Normally locks are grouped in classes based on their init site; eg.
every instance initialized by a particular spin_lock_init() will belong
to the same class.

Besides this grouping, you can use things like spin_lock_nested(), which
can be used to annotate simple recursion (up to 8 sub-classes).

Then there is the possibility to explicitly create classes and assign
locks to classes using lockdep_set_class().

Either way - there is a limit of MAX_LOCK_DEPTH (48) locks that can be
tracked per task (including nesting interrupts).

Making that dynamic is rather challenging - as it would involve
allocating memory while potentially holding allocator locks.

> > David Miller just did a patch that might fix that.
>
> Woow cool, after 11 months I lost any hope that lockdep could ever
> work in that environment... Was it an actual bug or is this some way
> to lower the complexity of the graph build?

He reduced the search complexity of the graph, look at the first patch
in this series.

2008-08-04 18:49:54

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, 2008-08-04 at 19:57 +0200, Andrea Arcangeli wrote:
> From: Andrea Arcangeli <[email protected]>
>
> Lockdep can't recognize if spinlocks are at a different address. So
> using trylock in a loop is one way to avoid lockdep to generate false
> positives. After lockdep will be fixed this change can and should be
> reverted.
>
> Signed-off-by: Andrea Arcangeli <[email protected]>

NAK, come-on, you didn't even bother to look at the available
annotations..

> ---
>
> Resubmit because I didn't update the subject and so I improved the
> comments too. This is mostly to show it's a bug in lockdep and it can
> be trivially worked around without having to fix lockdep for real, any
> superior solution to this hack is more than welcome and recommended.
>
> diff -r 3469dce61df1 mm/mmap.c
> --- a/mm/mmap.c Tue Jul 29 20:01:28 2008 +0200
> +++ b/mm/mmap.c Mon Aug 04 19:54:27 2008 +0200
> @@ -2279,8 +2279,12 @@ static void vm_lock_anon_vma(struct anon
> /*
> * The LSB of head.next can't change from under us
> * because we hold the mm_all_locks_mutex.
> + *
> + * spin_lock would confuse lockdep who doesn't notice
> + * the 'anon_vma' always changing address.
> */
> - spin_lock(&anon_vma->lock);
> + while (!spin_trylock(&anon_vma->lock))
> + cpu_relax();
> /*
> * We can safely modify head.next after taking the
> * anon_vma->lock. If some other vma in this mm shares
> @@ -2310,7 +2314,12 @@ static void vm_lock_mapping(struct addre
> */
> if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
> BUG();
> - spin_lock(&mapping->i_mmap_lock);
> + /*
> + * spin_lock would confuse lockdep who doesn't notice
> + * the 'mapping' always changing address.
> + */
> + while (!spin_trylock(&mapping->i_mmap_lock))
> + cpu_relax();
> }
> }
>

2008-08-04 18:54:55

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 11:06 -0700, Jeremy Fitzhardinge wrote:
> Peter Zijlstra wrote:
> > Expose the new lock protection lock.
> >
> > This can be used to annotate places where we take multiple locks of the
> > same class and avoid deadlocks by always taking another (top-level) lock
> > first.
> >
>
> OK, so the expected usage is:
>
> spin_lock(&outer_lock);
> /* take in any order */
> spin_lock_nest_lock(&inner[0], &outer_lock);
> spin_lock_nest_lock(&inner[2], &outer_lock);
> spin_lock_nest_lock(&inner[1], &outer_lock);
> ...
>
> ?

Yes (there it no requirement that the outer lock is a spinlock_t, just
that it has a ->dep_map member - so: mutex, rwsem and spinlock will do).

> And it's OK to
>
> 1. take inner locks one at a time without holding the outer lock

Yes

> 2. use plain spin_lock on inner locks when you're taking them one at
> a time, and

Yes

> 3. release the outer lock before releasing the inner locks

Only if you then release the inner locks in the reverse order you took
them - the nested release code (releasing a lock that is not on the top
of the stack) basically pops and pushes all the locks, the push will
fail if the outer lock is released.

> but it's not OK to try to use different outer locks for a given inner lock.

It doesn't validate this part - as with most lockdep annotations you can
annotate away real deadlocks.

2008-08-04 18:56:56

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

> NAK, come-on, you didn't even bother to look at the available
> annotations..

Agree that this trylock-in-a-loop is a horrible hack that should never
be merged.

However I wonder what you think the right way to annotate the
potentially unbounded number of locks taken in mm_take_all_locks() is?

- R.

2008-08-04 19:06:08

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, 2008-08-04 at 11:56 -0700, Roland Dreier wrote:
> > NAK, come-on, you didn't even bother to look at the available
> > annotations..
>
> Agree that this trylock-in-a-loop is a horrible hack that should never
> be merged.
>
> However I wonder what you think the right way to annotate the
> potentially unbounded number of locks taken in mm_take_all_locks() is?

Good question, one which I don't currently have an answer for - see my
other mail why getting rid of the MAX_LOCK_DEPTH limitation is hard.

I'll give it another serious go tomorrow (if nothing else preempts me).

2008-08-04 19:27:00

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

Peter Zijlstra wrote:
>> 3. release the outer lock before releasing the inner locks
>>
>
> Only if you then release the inner locks in the reverse order you took
> them - the nested release code (releasing a lock that is not on the top
> of the stack) basically pops and pushes all the locks, the push will
> fail if the outer lock is released.
>

OK. I don't actually need to do this, but I was asking for
completeness. But to clarify, you only need to do the reverse unlock if
you do it after unlocking the outer lock? If you're still holding the
outer lock, you can unlock in any order?

>> but it's not OK to try to use different outer locks for a given inner lock.
>>
>
> It doesn't validate this part - as with most lockdep annotations you can
> annotate away real deadlocks.
>

Hm, OK.

J

2008-08-04 19:32:23

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 4 Aug 2008, Jeremy Fitzhardinge wrote:
>
> OK. I don't actually need to do this, but I was asking for completeness. But
> to clarify, you only need to do the reverse unlock if you do it after
> unlocking the outer lock? If you're still holding the outer lock, you can
> unlock in any order?

Release order should always be totally irrelevant, whether you hold outer
locks or not. Only the order of _getting_ locks matter.

And yes, if there is an outer lock, even the order of getting locks is
irrelevant, as long as anybody who gets more than one inner lock always
holds the outer one.

Linus

2008-08-04 19:39:17

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, 2008-08-04 at 12:31 -0700, Linus Torvalds wrote:
>
> On Mon, 4 Aug 2008, Jeremy Fitzhardinge wrote:
> >
> > OK. I don't actually need to do this, but I was asking for completeness. But
> > to clarify, you only need to do the reverse unlock if you do it after
> > unlocking the outer lock? If you're still holding the outer lock, you can
> > unlock in any order?
>
> Release order should always be totally irrelevant, whether you hold outer
> locks or not. Only the order of _getting_ locks matter.
>
> And yes, if there is an outer lock, even the order of getting locks is
> irrelevant, as long as anybody who gets more than one inner lock always
> holds the outer one.

I agree, its just non-trivial to convince lockdep of this. I can give it
a go though.

2008-08-04 20:16:19

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 08:48:59PM +0200, Peter Zijlstra wrote:
> On Mon, 2008-08-04 at 19:57 +0200, Andrea Arcangeli wrote:
> > From: Andrea Arcangeli <[email protected]>
> >
> > Lockdep can't recognize if spinlocks are at a different address. So
> > using trylock in a loop is one way to avoid lockdep to generate false
> > positives. After lockdep will be fixed this change can and should be
> > reverted.
> >
> > Signed-off-by: Andrea Arcangeli <[email protected]>
>
> NAK, come-on, you didn't even bother to look at the available
> annotations..

Let's say when I hear prove-locking my instinct tells me to walk away
as fast as I can. I'll try to explain why I prefer the trylock loop
(which btw is the only one that will hide the lockdep false positives
here and I welcome you to fix it in another way, well another way that
comes to mind is to call __raw_spin_lock which I didn't do because I
don't like those lowlevel details in common code).

So about prove-locking:

1) in production is disabled so when I get bugreports I've to grab
locking deadlock information as usual (sysrq+t/p or preferably
lkcd/kdump)

2) while coding it's useless as well because I don't need this thing
to debug and fix any deadlocks

3) this only finds bugs after the system is hung and I can fix it by
other means then

Despite having used this for a while, it never happened to me that
this has been useful as single time, I only run into false positives
and systems grinding to an halt so not allowing debug kernels to run
which reduced even more the ability to debug the kernel. I'm not aware
of anybody else who was debugging that fixed bugs thanks to this
code. So in my experience this only has hurt me and I don't want it
ever enabled in anything I deal with.

If only this had a slight chance to find bugs that don't easily
trigger at runtime it'd be an entirely different matter.

To be clear: my criticism is only to prove-locking, for example I find
very useful that there are other things like spinlock debug options
that warns if you call GFP with GFP_KERNEL while you're in atomic
context (which unfortunately only works with preempt enabled for no
good reason because the atomic context is maintained by non-preempt
too but anyway). The spinlock sleep debug effectively find bugs that
would otherwise go unnoticed. Let's say I'm only interested in stuff
that shows bugs that would otherwise go unnoticed, everything else I
can debug it by myself without prove-locking printk in the logs.

So this can only help if you aren't capable of running and decoding
sysrq+p/t. And it can't help in production either because it has to be
disabled there (and production would be the only place where
explaining how to grab a sysrq+p/t is a problem, and normally it's
simpler to grab kdump/lkcd).

Furthermore I don't like to slowdown (up to grinding system to an halt
before these fixes, but still not so fast) for this so unnecessary
feature while debugging. This also explains why after checking that
the patch was ok when Andrew asked it, I disabled prove-locking pretty
quickly.

So perhaps I misunderstand how lockdep works, in which case you're
welcome to correct me, otherwise I'd really like to know if anybody
really felt some bug couldn't be fixed in equal time without it (and I
mean excluding production where obviously this is useless as it can't
be possibly enabled). Understanding what are the implications of the
deadlock is sure slower than comparing the disassembly of the
sysrq+p/t (which 99% of has to be done anyway because the report comes
from a production system, if the bug was reproducible it would have
been tripped before it hit production).

I see it as a didactical theoretical exercise, and I hope this
explains why I prefer the trylock loop any day compared to any more
complex prove-locking specific API polluting the common code. And if
it was for me I'd remove prove-locking from the kernel the moment it
started to pollute the common code for zero debugging advantages.

2008-08-04 20:17:11

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

Linus Torvalds wrote:
> On Mon, 4 Aug 2008, Jeremy Fitzhardinge wrote:
>
>> OK. I don't actually need to do this, but I was asking for completeness. But
>> to clarify, you only need to do the reverse unlock if you do it after
>> unlocking the outer lock? If you're still holding the outer lock, you can
>> unlock in any order?
>>
>
> Release order should always be totally irrelevant, whether you hold outer
> locks or not. Only the order of _getting_ locks matter.
>

Right. But Peter said lockdep was fussy about it for some reason. If
the point of the exercise is to eliminate spurious warnings, it would be
nice to avoid new ones in the process...

J

2008-08-04 20:37:32

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, 2008-08-04 at 22:15 +0200, Andrea Arcangeli wrote:
> On Mon, Aug 04, 2008 at 08:48:59PM +0200, Peter Zijlstra wrote:
> > On Mon, 2008-08-04 at 19:57 +0200, Andrea Arcangeli wrote:
> > > From: Andrea Arcangeli <[email protected]>
> > >
> > > Lockdep can't recognize if spinlocks are at a different address. So
> > > using trylock in a loop is one way to avoid lockdep to generate false
> > > positives. After lockdep will be fixed this change can and should be
> > > reverted.
> > >
> > > Signed-off-by: Andrea Arcangeli <[email protected]>
> >
> > NAK, come-on, you didn't even bother to look at the available
> > annotations..
>
> Let's say when I hear prove-locking my instinct tells me to walk away
> as fast as I can. I'll try to explain why I prefer the trylock loop
> (which btw is the only one that will hide the lockdep false positives
> here and I welcome you to fix it in another way, well another way that
> comes to mind is to call __raw_spin_lock which I didn't do because I
> don't like those lowlevel details in common code).
>
> So about prove-locking:
>
> 1) in production is disabled so when I get bugreports I've to grab
> locking deadlock information as usual (sysrq+t/p or preferably
> lkcd/kdump)
>
> 2) while coding it's useless as well because I don't need this thing
> to debug and fix any deadlocks
>
> 3) this only finds bugs after the system is hung and I can fix it by
> other means then

You're so wrong it not even funny. It reports about deadlocks before
they happen. All it needs is to observe a lock order violation and it
will report it. In order for the dead-lock to happen, you need to
actually hit the violation concurrently with the normal order.

IOW, lockdep can even spot deadlocks on a non-preempt single cpu setup
where they can never actually happen.

Furthermore, it does more than the simple lock deadlocks, it also takes
IRQ state into account. So it can tell about hard or soft irq recursion
deadlocks.

Having lockdep on while developing saves a lot of trouble - in fact it
_has_ caught many real bugs before they could be introduced to mainline,
ask Arjan who has supervised driver development.

Not only that, it caught plenty of real bugs in mainline as well as -rt.
These days it appears to not catch many because the tree is in such good
shape, but that is fully thanks to lockdep.

That is not to say it's perferct - lockdep certainly does have it
limitations. But your portrail is very in-accurate.

2008-08-04 21:10:15

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 10:37:22PM +0200, Peter Zijlstra wrote:
> You're so wrong it not even funny. It reports about deadlocks before
> they happen. All it needs is to observe a lock order violation and it

Now tell me how it helps to report them... It tells me the system has
crashed and where, it's not like I couldn't figure it out by myself
but just noticing nothing works and all cpus are spinning in some
spinlock slow path and pressing sysrq+t/p.

> will report it. In order for the dead-lock to happen, you need to
> actually hit the violation concurrently with the normal order.

The point is that this is a runtime evaluation of lock orders, if
runtime isn't the lucky one that reproduces the deadlock, it'll find
nothing at all.

> IOW, lockdep can even spot deadlocks on a non-preempt single cpu setup
> where they can never actually happen.

Ask the preempt-RT folks, advertising infinite-cpu-smp on UP. No need
of lockdep to reproduce deadlocks that couldn't be found without it in
UP. Infact lockdep only avoids to optimize away the spinlock, you
don't need preempt-RT to reproduce at zero-runtime-cost in a equally
easily debuggable way whatever the lockdep can reproduced on a UP
compile. You've only to run a SMP kernel on UP, big deal that lockdep
is solving to prevent you to run a smp kernel in up...

> Furthermore, it does more than the simple lock deadlocks, it also takes
> IRQ state into account. So it can tell about hard or soft irq recursion
> deadlocks.

Ok this gets just a bit more interesting but my understanding is that
again this only "reports" stuff that crashes hard and is trivially
debuggable by other means.

> Having lockdep on while developing saves a lot of trouble - in fact it
> _has_ caught many real bugs before they could be introduced to mainline,
> ask Arjan who has supervised driver development.

Are you saying those bugs weren't trivially debuggable anyway by
kernel-hackers with a simple sysrq+p/t or kgdb stack trace?

If yes, I can only imagine that nobody takes care of setting up
debugging options that actually are _zero_ runtime cost and don't
pollute the kernel common code.

> Not only that, it caught plenty of real bugs in mainline as well as -rt.
> These days it appears to not catch many because the tree is in such good
> shape, but that is fully thanks to lockdep.
>
> That is not to say it's perferct - lockdep certainly does have it
> limitations. But your portrail is very in-accurate.

I've an hard time to believe you, the only thing I can agree is that
if you don't want to run smp kernel on UP (or even better preempt-RT
on UP) you can run lockdep to find recursion.

smp-preempt-RT on UP is a real feature that is useful debugging,
because it makes bugs visible that weren't, lockdep is useless as far
as I can tell.

2008-08-04 21:14:59

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

Hi Andrea,

On Mon, Aug 04, 2008 at 10:37:22PM +0200, Peter Zijlstra wrote:
>> You're so wrong it not even funny. It reports about deadlocks before
>> they happen. All it needs is to observe a lock order violation and it

On Tue, Aug 5, 2008 at 12:09 AM, Andrea Arcangeli <[email protected]> wrote:
> Now tell me how it helps to report them... It tells me the system has
> crashed and where, it's not like I couldn't figure it out by myself
> but just noticing nothing works and all cpus are spinning in some
> spinlock slow path and pressing sysrq+t/p.

Sometimes it's hard to actually trigger a deadlock condition during
development, especially if you're not developing on a big iron
machine.

2008-08-04 21:29:56

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, 4 Aug 2008 23:09:54 +0200
Andrea Arcangeli <[email protected]> wrote:

> On Mon, Aug 04, 2008 at 10:37:22PM +0200, Peter Zijlstra wrote:
> > You're so wrong it not even funny. It reports about deadlocks before
> > they happen. All it needs is to observe a lock order violation and
> > it
>
> Now tell me how it helps to report them... It tells me the system has
> crashed and where,

I think you totally misunderstand things then.

Lockdep will report a problem if it *ever* sees a BA order after it has
seen a BA order. They don't have to happen at the same time. Or even
within hours of eachother.

They MIGHT happen... in a narrow time window, when you have a deadlock.
But lockdep will warn you about the order violation without actually
having to dealock... because the AB is likely to be done already most
of the time when the BA happens.

2008-08-04 21:32:22

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Tue, Aug 05, 2008 at 12:14:48AM +0300, Pekka Enberg wrote:
> Sometimes it's hard to actually trigger a deadlock condition during
> development, especially if you're not developing on a big iron
> machine.

My understand is that lockdep can't do static analysis of code, it's
not a checker. It can only evaluate the locks in the order the cpu
takes them. And if the CPU never takes them in the wrong order because
only your unlikely to happen slow path takes them in the wrong order
while the other cpus hold them in the opposite order, lockdep can't
say nothing at all, how could it? If it would say something when the
cpu doesn't deadlock a moment later, then it would be a false positive
99% of the time.

So now we learn lockdep also lead developers to think if their code
doesn't generate lockdep errors it will be safe in production, so
again more damage thanks to lockdep.

Lockdep is a printk that reports that you were incredibly lucky, that
you got the lock inversion during development even if there was only
once chance in a thousand, and that you can celebrate. But I know I
can figure it out by myself when the system has crashed without
something eating all cpu and polluting kernel common code to tell me.

In other words I imagine lockdep as something like this:

char *ptr = NULL
BUG_ON(!ptr); /* this is lockdep */
*ptr = 100;

Luckily nobody dreams to add BUG_ON that checks for null pointer to
dereference, as there's the mmu for that.

The same way for the null pointer dereference, there are zerocost
means to debug all sort of kernel crashing deadlocks.

Now perhaps I misunderstood lockdep entirely but I doubt as this thing
is driven by calls made by the spin_lock functions, and it can't know
anything about the spin_lock that haven't been run by the cpu yet.

2008-08-04 21:33:22

by David Miller

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

From: Peter Zijlstra <[email protected]>
Date: Mon, 04 Aug 2008 18:38:55 +0200

> On Mon, 2008-08-04 at 18:26 +0200, Andrea Arcangeli wrote:
> > The only real place where lockdep is unusable in my experience is
> > preempt-RT, it grinds it to an halt during boot on 8-way before
> > reaching the shell.
>
> David Miller just did a patch that might fix that.

Peter, your patches are just as effective at making all of my 64-cpu,
128-cpu, etc. machines usable with lockdep.

2008-08-04 21:42:38

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, 4 Aug 2008 23:30:18 +0200
Andrea Arcangeli <[email protected]> wrote:

> Now perhaps I misunderstood lockdep entirely

You did ;)

At runtime lockdep will "learn" the lock ordering rules, building a
graph in memory. If it ever sees the thus-learnt rules violated, it
will warn.

Simple example:

static DEFINE_MUTEX(a);
static DEFINE_MUTEX(b);

void f1(void)
{
mutex_lock(a);
mutex_lock(b);
}

void f2(void)
{
mutex_lock(b);
mutex_lock(a);
}

void doit(void)
{
f1();
f2();
}

lockdep will warn here about the ranking violation. As soon as it
occurs. Even though the system never deadlocked.

It's very very clever and powerful.

2008-08-04 21:44:57

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, 4 Aug 2008 23:30:18 +0200
Andrea Arcangeli <[email protected]> wrote:

> On Tue, Aug 05, 2008 at 12:14:48AM +0300, Pekka Enberg wrote:
> > Sometimes it's hard to actually trigger a deadlock condition during
> > development, especially if you're not developing on a big iron
> > machine.
>
> My understand is that lockdep can't do static analysis of code, it's
> not a checker. It can only evaluate the locks in the order the cpu
> takes them. And if the CPU never takes them in the wrong order because
> only your unlikely to happen slow path takes them in the wrong order

yes lockdep will only complain WHEN you take them in the wrong order

But you claimed you would for sure be in a deadlock at that point which
is generally not correct.

> So now we learn lockdep also lead developers to think if their code
> doesn't generate lockdep errors it will be safe in production, so
> again more damage thanks to lockdep.

this comment totally puzzles me... rather than calling you naive...
where was this said or even implied????

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-08-04 21:56:38

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 02:27:52PM -0700, Arjan van de Ven wrote:
> I think you totally misunderstand things then.
>
> Lockdep will report a problem if it *ever* sees a BA order after it has
> seen a BA order. They don't have to happen at the same time. Or even
> within hours of eachother.

For me the false positive was generated by the recursive spinlock
feature, not the deadlock inversion feature. If you limit my comments
to the recursive spinlock feature perhaps you'll be more likely to
agree with them.

> They MIGHT happen... in a narrow time window, when you have a deadlock.
> But lockdep will warn you about the order violation without actually
> having to dealock... because the AB is likely to be done already most
> of the time when the BA happens.

So this is about the lock inversion feature, I admit I didn't realize
it has memory that locks have been taken in AB order and spawns the
first time locks have been taken in BA order. But this will lead to
more false positives because there's nothing wrong to do AB BA if
there's C lock taken before them!

Perhaps enforcing lockdep all over the kernel is worth it just for
this AB BA thing in case it doesn't only spawn false positives, but
still I can't like something that is as inaccurate and prone for
errors as lockdep and spreading all over the place to try to reduce
the false positives it emits. I can't like that but that's just
me... others clearly love it.

2008-08-04 21:57:48

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

From: Andrea Arcangeli <[email protected]>
Date: Mon, 4 Aug 2008 23:09:54 +0200

> On Mon, Aug 04, 2008 at 10:37:22PM +0200, Peter Zijlstra wrote:
> > You're so wrong it not even funny. It reports about deadlocks before
> > they happen. All it needs is to observe a lock order violation and it
>
> Now tell me how it helps to report them... It tells me the system has
> crashed and where, it's not like I couldn't figure it out by myself
> but just noticing nothing works and all cpus are spinning in some
> spinlock slow path and pressing sysrq+t/p.

It tells you about deadlock scenerios that you haven't even encoutered
yet. It shows bugs for parallel code path sequences that have not
even occurred yet.

But I think you're beyond the point of being able to see why lockdep
is valuable. So why don't we just move on.

2008-08-04 22:13:28

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 02:41:14PM -0700, Andrew Morton wrote:
> lockdep will warn here about the ranking violation. As soon as it
> occurs. Even though the system never deadlocked.

Yes I wasn't aware of the AB BA feature that doesn't require the
inversion to happen on both cpus at the same time, despite having
spent weeks when it was kernel crashing systems at boot (which shows
it's not immediately easy to understand how that thing works by
reading the code).

static DEFINE_MUTEX(a);
static DEFINE_MUTEX(b);
static DEFINE_MUTEX(c);

void f1(void)
{
mutex_lock(c);
mutex_lock(a);
mutex_lock(b);
mutex_unlock(c);
mutex_unlock(a);
mutex_unlock(b);
}

void f2(void)
{
mutex_lock(c);
mutex_lock(b);
mutex_lock(a);
mutex_unlock(c);
mutex_unlock(b);
mutex_unlock(a);
}

void f3(void)
{
mutex_lock(a);
mutex_lock(b);
mutex_unlock(a);
mutex_unlock(b);
}

void f4(void)
{
mutex_lock(b);
mutex_lock(a);
mutex_unlock(b);
mutex_unlock(a);
}

void doit(void)
{
f1();
f2();
stop_machine(f3);
stop_machine(f4);
}

> It's very very clever and powerful.

I'm curious, does it also handle the above? I mean not a big deal to
refactor the code to shutdown lockdep warnings, but it doesn't look
that clever to me. It seems just clever the minimum enough to be
useful with its AB BA memory.

Now I see lockdep has a good point to exist, for the lock inversion
"memory" but this is by no mean a feature I can remotely like in
general given how inaccurate and prone for false positives it is. It
looks more a necessary evil to deal with, like I tried to do with my
patch (even before understanding it was more useful than what I
thought initially).

2008-08-04 22:33:28

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 02:42:28PM -0700, Arjan van de Ven wrote:
> yes lockdep will only complain WHEN you take them in the wrong order
>
> But you claimed you would for sure be in a deadlock at that point which
> is generally not correct.

I already said I didn't know about that despite having spent a fair
amount of time trying to understand why lockdep crashes systems at
boot about an year ago. I admit I didn't understand much about it and
reducing its computation time didn't look feasible, perhaps my fault,
and I'm glad if Peter found a way to make it boot after 1 year.

> this comment totally puzzles me... rather than calling you naive...
> where was this said or even implied????

It was just a logical conclusion of the statement that lockdep will
warn of the following classes of locking bugs: "lock inversion
scenarios"... There's no warning anywhere that it might not find them
at all. Or point me to where it is warned you still have to read the
code and verify it yourself, or you still risk to AB BA in
production. If it was warned I wouldn't have mentioned it, people seem
to talk like if lockdep is a checker doing static analyses of all
paths when it can't.

Partly of what I said before is true, even if I didn't understand the
actual details of the AB BA memory it has by reading the code: the BA
may happen only when system is OOM etc... so even lockdep memory may
never find it. So my warning that code might AB BA deadlock even if
lockdep doesn't warn sounds fair enough and without it I'm still
afraid it can lead to developers think everything is ok with regard to
AB BA. In any case (even if everyone already understand lockdep better
than I did before this discussion), even if I'm wrong a sign of
warning that lockdep isn't enough, can't hurt.

2008-08-04 23:39:32

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Tue, 5 Aug 2008 00:30:11 +0200
Andrea Arcangeli <[email protected]> wrote:

> On Mon, Aug 04, 2008 at 02:42:28PM -0700, Arjan van de Ven wrote:
> > yes lockdep will only complain WHEN you take them in the wrong order
> >
> > But you claimed you would for sure be in a deadlock at that point
> > which is generally not correct.
>
> I already said I didn't know about that despite having spent a fair
> amount of time trying to understand why lockdep crashes systems at
> boot about an year ago. I admit I didn't understand much about it and
> reducing its computation time didn't look feasible, perhaps my fault,
> and I'm glad if Peter found a way to make it boot after 1 year.

interesting; lockdep has been working for the last.. 2 1/2 years at
least, and I don't remember seeing bugreports against it from you that
would describe it as totally non-functional.

Oh well.. seems you're rather preoccupied about it; that's ok, you're
entitled to your opinion even if I don't agree with it ;-)

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-08-05 00:47:40

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 04:38:16PM -0700, Arjan van de Ven wrote:
> interesting; lockdep has been working for the last.. 2 1/2 years at
> least, and I don't remember seeing bugreports against it from you that
> would describe it as totally non-functional.

I reported it to Peter. If you see David's email, I guess it can be
implied that I wasn't the only one aware that prove-locking made
certain systems non functional, I thought it was widespread knowledge
maybe not.

It's amazing that things seem to have improved on that side, it surely
gives me more confidence in prove-locking!

> Oh well.. seems you're rather preoccupied about it; that's ok, you're
> entitled to your opinion even if I don't agree with it ;-)

So let me understand better: your opinion is that all of lockdep is
useful, not just the AB BA detection?

By reading the source again after 11 months to me it still looks
check_deadlock() only has knowledge of the current context. It loops
over the task struct checking all the locks of the current task!
Combine the great feature that check_deadlock provides, with crashing
at boot, and I hope that better explains my feeling about
lockdep-prove-locking.

This check_deadlock() thing is the real core of my dislike of
prove-locking! The check_noncircular part I totally agree it's useful
now that I see it works differently than check_deadlock (when I read
it last time I thought it worked the same as check_deadlock).

check_noncircular being useful doesn't automatically make
check_deadlock useful.

And incidentally it's exactly this check_deadlock part that is
trapping on my code and that is now requiring silly changes to the
common code (the ones I did) or to make the common code even more
complex (what Peter is planning I guess).

2008-08-05 02:05:50

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

> The point is that this is a runtime evaluation of lock orders, if
> runtime isn't the lucky one that reproduces the deadlock, it'll find
> nothing at all.

I think the point you miss is that lockdep can report a potential
deadlock, even if the deadlock does not actually occur. For example
suppose there is an AB-BA deadlock somewhere. For this to actually
trigger, we have to have one CPU running the AB code path at exactly the
moment another CPU runs the BA code path, with the right timing so one
CPU holds A and tries to grab B while the other CPU already holds B.

With lockdep, we just have to have the AB code path run once at any
point, and then the BA code path run at any later time (even days after
the AB code path has released all the locks). And then we get a
warning dump that explains the exact potential deadlock.

- R.

2008-08-05 02:18:46

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Mon, Aug 04, 2008 at 07:00:03PM -0700, Roland Dreier wrote:
> > The point is that this is a runtime evaluation of lock orders, if
> > runtime isn't the lucky one that reproduces the deadlock, it'll find
> > nothing at all.
>
> I think the point you miss is that lockdep can report a potential
> deadlock, even if the deadlock does not actually occur. For example
> suppose there is an AB-BA deadlock somewhere. For this to actually
> trigger, we have to have one CPU running the AB code path at exactly the
> moment another CPU runs the BA code path, with the right timing so one
> CPU holds A and tries to grab B while the other CPU already holds B.
>
> With lockdep, we just have to have the AB code path run once at any
> point, and then the BA code path run at any later time (even days after
> the AB code path has released all the locks). And then we get a
> warning dump that explains the exact potential deadlock.

Thanks a lot for the detailed explanation of check_noncircular. I
agree check_noncircular is surely a good argument not to get rid of
prove-locking as a whole. But check_noncircular is also a red-herring
in this context. It's not check_noncircular trapping here,
check_deadlock traps with false positives instead. The question is
what are those false positives buying us? To avoid a developer to
press sysrq+p or break on kgdb?

Let's focus on check_deadlock->print_deadlock_bug and somebody who's
not beyond the point please explain what print_deadlock_bug reports
that does not actually occur and why it's a good idea to change the
common code to accommodate for its false positives instead of getting
rid of it for good.

2008-08-05 12:02:25

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

> Let's focus on check_deadlock->print_deadlock_bug and somebody who's
> not beyond the point please explain what print_deadlock_bug reports
> that does not actually occur and why it's a good idea to change the
> common code to accommodate for its false positives instead of getting
> rid of it for good.

check_deadlock operates on classes of locks, so it can warn about
potential deadlocks, eg if we have

foo(obj1, obj2)
{
lock(obj1);
lock(obj2);
...

then foo(obj, obj); is a deadlock but lockdep can warn about foo(obj,
different_obj) without triggering the deadlock in reality. Of course
this leads to false positives, and we sometimes have to change correct
code to help lockdep, but usually such rewriting leads to simpler
clearer better locking anyway.

- R.

2008-08-05 12:20:43

[permalink] [raw]

Subject: Re: [PATCH] workaround minor lockdep bug triggered by mm_take_all_locks

On Tue, Aug 05, 2008 at 05:02:07AM -0700, Roland Dreier wrote:
> check_deadlock operates on classes of locks, so it can warn about
> potential deadlocks, eg if we have
>
> foo(obj1, obj2)
> {
> lock(obj1);
> lock(obj2);
> ...
>
> then foo(obj, obj); is a deadlock but lockdep can warn about foo(obj,
> different_obj) without triggering the deadlock in reality. Of course
> this leads to false positives, and we sometimes have to change correct
> code to help lockdep, but usually such rewriting leads to simpler
> clearer better locking anyway.

It surely doesn't lead to simpler and clearer better locking in the
case we're discussing here, and I don't know of other cases where it
leads to better locking but feel free to make examples.

2008-08-07 12:34:55

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

Please fold (or tell me to provide a new patch)

Signed-off-by: Peter Zijlstra <[email protected]>
---
diff -u linux-2.6/include/linux/spinlock.h linux-2.6/include/linux/spinlock.h
--- linux-2.6/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -185,8 +185,8 @@
# define spin_lock_nested(lock, subclass) _spin_lock_nested(lock, subclass)
# define spin_lock_nest_lock(lock, nest_lock) \
do { \
- typecheck(struct lockdep_map *, &nest_lock->dep_map); \
- _spin_lock_nest_lock(lock, &nest_lock->dep_map); \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map);\
+ _spin_lock_nest_lock(lock, &(nest_lock)->dep_map); \
} while (0)
#else
# define spin_lock_nested(lock, subclass) _spin_lock(lock)
diff -u linux-2.6/kernel/spinlock.c linux-2.6/kernel/spinlock.c
--- linux-2.6/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -323,7 +323,7 @@
LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
}

-EXPORT_SYMBOL(_spin_lock_nest_lock)
+EXPORT_SYMBOL(_spin_lock_nest_lock);

#endif

--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -22,6 +22,8 @@ int in_lock_functions(unsigned long addr
void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock);
void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass)
__acquires(lock);
+void __lockfunc _spin_lock_nest_lock(spinlock_t *lock, struct lockdep_map *map)
+ __acquires(lock);
void __lockfunc _read_lock(rwlock_t *lock) __acquires(lock);
void __lockfunc _write_lock(rwlock_t *lock) __acquires(lock);
void __lockfunc _spin_lock_bh(spinlock_t *lock) __acquires(lock);

2008-10-08 15:27:20

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Mon, Aug 04, 2008 at 12:31:22PM -0700, Linus Torvalds wrote:
>
>
> On Mon, 4 Aug 2008, Jeremy Fitzhardinge wrote:
> >
> > OK. I don't actually need to do this, but I was asking for completeness. But
> > to clarify, you only need to do the reverse unlock if you do it after
> > unlocking the outer lock? If you're still holding the outer lock, you can
> > unlock in any order?
>
> Release order should always be totally irrelevant, whether you hold outer
> locks or not. Only the order of _getting_ locks matter.

Technically, you are 100% correct.

>
> And yes, if there is an outer lock, even the order of getting locks is
> irrelevant, as long as anybody who gets more than one inner lock always
> holds the outer one.

But I need to disagree on a programming practice style. Unlocking locks
in a non nested order is just bad programming practice. Unless there is
a good reason to do so, one should never release locks in a non reverse
order they were taken.

This can be a source of bugs, where people might notice an outer lock
being released and think the inner locks were too.

Lately the kernel has been going through a lot of clean ups that have
been making the kernel a much more maintainable beast. I feel we should
enforce the rule of unlocking order (again, unless there is a good
reason not to). Not for a technical reason, but just for a more
maintainable one.

-- Steve

2008-10-08 15:45:28

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Wed, 8 Oct 2008, Steven Rostedt wrote:
> >
> > And yes, if there is an outer lock, even the order of getting locks is
> > irrelevant, as long as anybody who gets more than one inner lock always
> > holds the outer one.
>
> But I need to disagree on a programming practice style. Unlocking locks
> in a non nested order is just bad programming practice.

No it is not.

Unlocking locks in non-nested order can sometimes be very much the rigth
thing to do, and thinking otherwise is (a) naive and (b) can generate
totally unnecessary and pointless bugs.

The thing is, sometimes you have to do it, and imposing totally made-up
rules ("unlocks have to nest") just confuses everybody.

The FACT is, that unlocks do not have to nest cleanly. That's a rock solid
*FACT*. The locking order matters, and the unlocking order does not.

And if you cannot accept that as a fact, and you then say "unlock order
should matter just to keep things nice and clean", then you end up being
screwed and/or confused when you can't hold to the unlock order.

There are many perfectly valid reasons not to unlock in reverse order.
Don't create make-believe rules that break those reasons for no gain.

For example:
- let's say that you have a singly-linked list of objects.
- you need to lock all objects, do something, and then unlock all
objects.
- the *only* sane way to do that is to just traverse the list twice.
- that means that you will unlock the objects in the same order you
locked them, _not_ in reverse ("nested") order.
- if you force a rule of "unlocks must be nested", then

YOU ARE A F*CKING MORON.

It's that simple. Don't do made-up rules that have no technical reason for
them.

Lock ordering matters. Unlock ordering does not. It really is that simple.
Don't confuse the issue by claiming anything else.

Linus

2008-10-08 15:52:35

by Nick Piggin

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Thursday 09 October 2008 02:27, Steven Rostedt wrote:
> On Mon, Aug 04, 2008 at 12:31:22PM -0700, Linus Torvalds wrote:
> > On Mon, 4 Aug 2008, Jeremy Fitzhardinge wrote:
> > > OK. I don't actually need to do this, but I was asking for
> > > completeness. But to clarify, you only need to do the reverse unlock
> > > if you do it after unlocking the outer lock? If you're still holding
> > > the outer lock, you can unlock in any order?
> >
> > Release order should always be totally irrelevant, whether you hold outer
> > locks or not. Only the order of _getting_ locks matter.
>
> Technically, you are 100% correct.
>
> > And yes, if there is an outer lock, even the order of getting locks is
> > irrelevant, as long as anybody who gets more than one inner lock always
> > holds the outer one.
>
> But I need to disagree on a programming practice style. Unlocking locks
> in a non nested order is just bad programming practice. Unless there is
> a good reason to do so, one should never release locks in a non reverse
> order they were taken.

An outer one might be more likely to be contended, so you might want
to release it asap.

Other times, you have lock A and lock B held (like scheduler rqs).
You can say unlock(A); unlock(B);
or if (A < B) unlock(B); unlock(A); if (A > B) unlock (B);

> This can be a source of bugs, where people might notice an outer lock
> being released and think the inner locks were too.
>
> Lately the kernel has been going through a lot of clean ups that have
> been making the kernel a much more maintainable beast. I feel we should
> enforce the rule of unlocking order (again, unless there is a good
> reason not to). Not for a technical reason, but just for a more
> maintainable one.

I don't really think it would make things more maintainable, FWIW.

2008-10-08 16:04:15

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

Linus Torvalds wrote:
> On Wed, 8 Oct 2008, Steven Rostedt wrote:
>
>>> And yes, if there is an outer lock, even the order of getting locks is
>>> irrelevant, as long as anybody who gets more than one inner lock always
>>> holds the outer one.
>>>
>> But I need to disagree on a programming practice style. Unlocking locks
>> in a non nested order is just bad programming practice.
>>
>
> No it is not.
>
> Unlocking locks in non-nested order can sometimes be very much the rigth
> thing to do, and thinking otherwise is (a) naive and (b) can generate
> totally unnecessary and pointless bugs.
>
> The thing is, sometimes you have to do it, and imposing totally made-up
> rules ("unlocks have to nest") just confuses everybody.
>
> The FACT is, that unlocks do not have to nest cleanly. That's a rock solid
> *FACT*. The locking order matters, and the unlocking order does not.
>
> And if you cannot accept that as a fact, and you then say "unlock order
> should matter just to keep things nice and clean", then you end up being
> screwed and/or confused when you can't hold to the unlock order.
>
> There are many perfectly valid reasons not to unlock in reverse order.
> Don't create make-believe rules that break those reasons for no gain.
>

Unfortunately, you cut out my comment that I stated "unless there is a
good reason not to",
which the below example is a good reason ;-)
> For example:
> - let's say that you have a singly-linked list of objects.
> - you need to lock all objects, do something, and then unlock all
> objects.
> - the *only* sane way to do that is to just traverse the list twice.
> - that means that you will unlock the objects in the same order you
> locked them, _not_ in reverse ("nested") order.
> - if you force a rule of "unlocks must be nested", then
>
> YOU ARE A F*CKING MORON.
>
> It's that simple. Don't do made-up rules that have no technical reason for
> them.
>
> Lock ordering matters. Unlock ordering does not. It really is that simple.
> Don't confuse the issue by claiming anything else.
>

Yes, I totally agree that there are good reasons not to unlock in
reverse order, and the example
you gave happens to be one of them.

I just find that seeing something like:

lock(A);
lock(B);

[do something]

unlock(A);
unlock(B);

just seems to be sloppy.

I wont harp on this, it only came up in conversation in which someone
pointed out your
post.

-- Steve

2008-10-08 16:20:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

On Wed, 8 Oct 2008, Steven Rostedt wrote:
>
> Unfortunately, you cut out my comment that I stated "unless there is a good
> reason not to", which the below example is a good reason ;-)

Well, teh thing is, as long as you think nesting is good, you end up being
very confused when it isn't.

> I just find that seeing something like:
>
> lock(A);
> lock(B);
>
> [do something]
>
> unlock(A);
> unlock(B);
>
> just seems to be sloppy.

Of course you'll often get nesting almost by mistake.

For example, any normal locking that is hierarchical tends to nest
naturally, especially if you end up using lots of small functions (which
you should). Or simply due to error handling.

So in that sense, nesting may be a "natural" thing, but partly exactly
_because_ it is fairly natural, we should not try to make it even more
than that, because then when things don't nest (and it does happen), if
the "things should nest" camp gets too strong, bugs happen.

They happen because somebody looks at the non-nesting case, and thinks
it's sloppy, and tries to fix it. And in the process just complicates
matters - and very likely introduces bugs.

And THAT is why people shouldn't think that locks "should" nest. Because
they shouldn't. They often do, but we're better off making people very
aware of the fact that 'often' isn't 'good design', it's just a matter of
'it happens in practice' rather than 'it should be that way'.

I know for a fact that some people thought unlocking in non-nested order
was a bug. And I believe that belief is a dangerous one.

Linus

2008-10-08 16:55:10

[permalink] [raw]

Subject: Re: [RFC][PATCH 7/7] lockdep: spin_lock_nest_lock()

Linus Torvalds wrote:
>
>
> I know for a fact that some people thought unlocking in non-nested order
> was a bug. And I believe that belief is a dangerous one.
>

Ah, OK. You are fighting against nesting nazis, fair enough.

I have written a bit of code where nesting was not possible (similar to
your example, but I call those traversal locking not nesting). I just
find that
the locks should be nested when the nesting is natural. Breaking the nesting
on natural nesting locks is a bug, IMHO. But as you know, there are several
programmers out there that can not determine the difference between natural
nesting locks and non nesting locks.

By adding such a rule, those that can not tell the difference will be
making a
lot of needless noise, hence, it is best not to make any such rule.

Lesson learned. I'll now go back to debugging my code.

-- Steve

2008-10-08 17:19:32