LinuxLists.cc - [RFC/PATCH] debug workqueue deadlocks with lockdep

2007-06-28 09:38:45

Subject: [RFC/PATCH] debug workqueue deadlocks with lockdep

This patch adds a fake lock to each workqueue in order to debug things
where you have something like

my_function() -> lock(); ...; flush_workqueue(); ...
vs
run_workqueue() -> my_work() -> ...; lock(); ...

which can obviously deadlock if my_work is scheduled when my_function()
is invoked (but hasn't locked yet.)

Signed-off-by: Johannes Berg <[email protected]>
---
I'm not sure if this is correct on SMP because then might it complain
that the fake lock is locked at the same time from multiple CPUs? I
haven't (on my quad G5), however, had it print out anything but the
trace below which is legitimate.

This is sort of a regression test for an actual deadlock in mac80211 I
found by experiencing it, apparently it's really hard to trigger. Then I
wanted to see if I could add lockdep support to debug cases like it.
Here's the output with my patch:

[ 171.522236] =======================================================
[ 171.522248] [ INFO: possible circular locking dependency detected ]
[ 171.522257] 2.6.22-rc6 #163
[ 171.522266] -------------------------------------------------------
[ 171.522275] khubd/1922 is trying to acquire lock:
[ 171.522283] (khelper){--..}, at: [<c000000000060c8c>] .flush_workqueue+0x48/0xf4
[ 171.522318]
[ 171.522320] but task is already holding lock:
[ 171.522330] (rtnl_mutex){--..}, at: [<c0000000003a4784>] .mutex_lock+0x3c/0x58
[ 171.522366]
[ 171.522368] which lock already depends on the new lock.
[ 171.522371]
[ 171.522383]
[ 171.522386] the existing dependency chain (in reverse order) is:
[ 171.522396]
[ 171.522398] -> #1 (rtnl_mutex){--..}:
[ 171.522424] [<c000000000072434>] .__lock_acquire+0xb8c/0xd68
[ 171.522476] [<c0000000000726b0>] .lock_acquire+0xa0/0xe8
[ 171.522527] [<c0000000003a44b0>] .__mutex_lock_slowpath+0x138/0x3d0
[ 171.522579] [<c0000000003a4784>] .mutex_lock+0x3c/0x58
[ 171.522629] [<c000000000327208>] .rtnl_lock+0x24/0x40
[ 171.522677] [<c000000000328fb4>] .linkwatch_event+0x20/0x70
[ 171.522725] [<c00000000005fe38>] .run_workqueue+0x114/0x22c
[ 171.522774] [<c0000000000612f8>] .worker_thread+0x11c/0x140
[ 171.522822] [<c000000000066230>] .kthread+0x84/0xd4
[ 171.522872] [<c0000000000235ec>] .kernel_thread+0x4c/0x68
[ 171.522922]
[ 171.522924] -> #0 (khelper){--..}:
[ 171.522947] [<c000000000072324>] .__lock_acquire+0xa7c/0xd68
[ 171.523000] [<c0000000000726b0>] .lock_acquire+0xa0/0xe8
[ 171.523051] [<c000000000060cbc>] .flush_workqueue+0x78/0xf4
[ 171.523102] [<d000000000498374>] .ieee80211_stop+0x1fc/0x3a4 [mac80211]
[ 171.523176] [<c00000000031b8bc>] .dev_close+0xb8/0xfc
[ 171.523222] [<c00000000031b9c0>] .unregister_netdevice+0xc0/0x254
[ 171.523275] [<d0000000004a828c>] .__ieee80211_if_del+0x34/0x50 [mac80211]
[ 171.523352] [<d000000000497654>] .ieee80211_unregister_hw+0xf8/0x2d8 [mac80211]
[ 171.523421] [<d00000000045cfa8>] .disconnect+0x3c/0x98 [zd1211rw_mac80211]
[ 171.523483] [<d0000000000ff650>] .usb_unbind_interface+0x6c/0xcc [usbcore]
[ 171.523555] [<c00000000027bcdc>] .__device_release_driver+0xcc/0x110
[ 171.523605] [<c00000000027c430>] .device_release_driver+0x70/0xbc
[ 171.523655] [<c00000000027b270>] .bus_remove_device+0xa0/0xcc
[ 171.523708] [<c0000000002782f0>] .device_del+0x2f4/0x3d4
[ 171.523758] [<d0000000000fbd68>] .usb_disable_device+0xa0/0x150 [usbcore]
[ 171.523833] [<d0000000000f6eac>] .usb_disconnect+0xfc/0x1a4 [usbcore]
[ 171.523907] [<d0000000000f77d4>] .hub_thread+0x3d8/0xc70 [usbcore]
[ 171.523984] [<c000000000066230>] .kthread+0x84/0xd4
[ 171.524039] [<c0000000000235ec>] .kernel_thread+0x4c/0x68
[ 171.524093]
[ 171.524096] other info that might help us debug this:
[ 171.524099]
[ 171.524109] 1 lock held by khubd/1922:
[ 171.524117] #0: (rtnl_mutex){--..}, at: [<c0000000003a4784>] .mutex_lock+0x3c/0x58
[ 171.524155]
[ 171.524157] stack backtrace:
[ 171.524165] Call Trace:
[ 171.524174] [c0000000086af130] [c00000000000f624] .show_stack+0x70/0x1bc (unreliable)
[ 171.524203] [c0000000086af1e0] [c00000000000f790] .dump_stack+0x20/0x34
[ 171.524228] [c0000000086af260] [c000000000070310] .print_circular_bug_tail+0x84/0xa8
[ 171.524256] [c0000000086af330] [c000000000072324] .__lock_acquire+0xa7c/0xd68
[ 171.524281] [c0000000086af420] [c0000000000726b0] .lock_acquire+0xa0/0xe8
[ 171.524307] [c0000000086af4e0] [c000000000060cbc] .flush_workqueue+0x78/0xf4
[ 171.524332] [c0000000086af580] [d000000000498374] .ieee80211_stop+0x1fc/0x3a4 [mac80211]
[ 171.524376] [c0000000086af640] [c00000000031b8bc] .dev_close+0xb8/0xfc
[ 171.524401] [c0000000086af6d0] [c00000000031b9c0] .unregister_netdevice+0xc0/0x254
[ 171.524425] [c0000000086af760] [d0000000004a828c] .__ieee80211_if_del+0x34/0x50 [mac80211]
[ 171.524475] [c0000000086af7f0] [d000000000497654] .ieee80211_unregister_hw+0xf8/0x2d8 [mac80211]
[ 171.524522] [c0000000086af8c0] [d00000000045cfa8] .disconnect+0x3c/0x98 [zd1211rw_mac80211]
[ 171.524558] [c0000000086af960] [d0000000000ff650] .usb_unbind_interface+0x6c/0xcc [usbcore]
[ 171.524611] [c0000000086afa00] [c00000000027bcdc] .__device_release_driver+0xcc/0x110
[ 171.524635] [c0000000086afa90] [c00000000027c430] .device_release_driver+0x70/0xbc
[ 171.524662] [c0000000086afb20] [c00000000027b270] .bus_remove_device+0xa0/0xcc
[ 171.524687] [c0000000086afbb0] [c0000000002782f0] .device_del+0x2f4/0x3d4
[ 171.524713] [c0000000086afc50] [d0000000000fbd68] .usb_disable_device+0xa0/0x150 [usbcore]
[ 171.524763] [c0000000086afcf0] [d0000000000f6eac] .usb_disconnect+0xfc/0x1a4 [usbcore]
[ 171.524811] [c0000000086afdb0] [d0000000000f77d4] .hub_thread+0x3d8/0xc70 [usbcore]
[ 171.524860] [c0000000086aff00] [c000000000066230] .kthread+0x84/0xd4
[ 171.524887] [c0000000086aff90] [c0000000000235ec] .kernel_thread+0x4c/0x68

[actually from a previous version of the patch]

In the actual case I experienced #1 wasn't from linkwatch_event() but
from something else in mac80211 but they both do rtnl_lock().

And finally, the patch:

---
kernel/workqueue.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

--- linux-2.6-git.orig/kernel/workqueue.c 2007-06-27 19:10:35.183199046 +0200
+++ linux-2.6-git/kernel/workqueue.c 2007-06-27 20:30:00.896667168 +0200
@@ -61,8 +61,15 @@ struct workqueue_struct {
const char *name;
int singlethread;
int freezeable; /* Freeze threads during suspend */
+#ifdef CONFIG_LOCKDEP
+ struct lockdep_map lockdep_map;
+#endif
};

+#ifdef CONFIG_LOCKDEP
+static struct lock_class_key wq_lockdep_key;
+#endif
+
/* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
threads to each one as cpus come/go. */
static DEFINE_MUTEX(workqueue_mutex);
@@ -257,7 +264,9 @@ static void run_workqueue(struct cpu_wor

BUG_ON(get_wq_data(work) != cwq);
work_clear_pending(work);
+ lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
f(work);
+ lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);

if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
printk(KERN_ERR "BUG: workqueue leaked lock or atomic: "
@@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp
} else {
struct wq_barrier barr;

+ lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
+
active = 0;
spin_lock_irq(&cwq->lock);
if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
@@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
int cpu;

might_sleep();
+ lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_release(&wq->lockdep_map, 0, _THIS_IP_);
for_each_cpu_mask(cpu, *cpu_map)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
}
@@ -700,6 +714,9 @@ struct workqueue_struct *__create_workqu
}

wq->name = name;
+#ifdef CONFIG_LOCKDEP
+ lockdep_init_map(&wq->lockdep_map, name, &wq_lockdep_key, 0);
+#endif
wq->singlethread = singlethread;
wq->freezeable = freezeable;
INIT_LIST_HEAD(&wq->list);

2007-06-28 16:33:40

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Wed, 2007-06-27 at 20:40 +0200, Johannes Berg wrote:

> --- linux-2.6-git.orig/kernel/workqueue.c 2007-06-27 19:10:35.183199046 +0200
> +++ linux-2.6-git/kernel/workqueue.c 2007-06-27 20:30:00.896667168 +0200
> @@ -61,8 +61,15 @@ struct workqueue_struct {
> const char *name;
> int singlethread;
> int freezeable; /* Freeze threads during suspend */
> +#ifdef CONFIG_LOCKDEP
> + struct lockdep_map lockdep_map;
> +#endif
> };
>
> +#ifdef CONFIG_LOCKDEP
> +static struct lock_class_key wq_lockdep_key;
> +#endif

I think I misunderstood how lockdep works because the output was
actually wrong, should the key be part of the workqueue_struct as well?

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part

2007-06-30 08:06:26

by Ingo Molnar

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote:
> No, that's not right either, but Arjan just helped me a bit with how
> lockdep works and I think I have the right idea now. Ignore this for
> now, I'll send a new patch in a few days.

ok. But in general, this is a very nice idea!

i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all
the workqueue APIs specified in a form that makes them lockdep coverable
like Johannes did. This debug mechanism could have helped with the
recent DVB lockup that Thomas Sattler reported.

Ingo

2007-06-28 17:33:06

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Thu, 2007-06-28 at 18:33 +0200, Johannes Berg wrote:

> I think I misunderstood how lockdep works because the output was
> actually wrong, should the key be part of the workqueue_struct as well?

No, that's not right either, but Arjan just helped me a bit with how
lockdep works and I think I have the right idea now. Ignore this for
now, I'll send a new patch in a few days.

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part

2007-06-30 11:46:07

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On 06/30, Ingo Molnar wrote:
>
> On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote:
> > No, that's not right either, but Arjan just helped me a bit with how
> > lockdep works and I think I have the right idea now. Ignore this for
> > now, I'll send a new patch in a few days.
>
> ok. But in general, this is a very nice idea!
>
> i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all
> the workqueue APIs specified in a form that makes them lockdep coverable
> like Johannes did. This debug mechanism could have helped with the
> recent DVB lockup that Thomas Sattler reported.

I think this idea is great!

Johannes, could you change wait_on_work() as well? Most users of
flush_workqueue() should be converted to use it.

> @@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp
> } else {
> struct wq_barrier barr;
>
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> +
> active = 0;
> spin_lock_irq(&cwq->lock);

I am not sure why you skip "if (cwq->thread == current)" case, it can
deadlock in the same way.

But, perhaps we should not change flush_cpu_workqueue(). If we detect the
deadlock, we will have num_online_cpus() reports, yes?

And,

> if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
> int cpu;
>
> might_sleep();
> + lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_release(&wq->lockdep_map, 0, _THIS_IP_);

one of the 2 callers was already modified. Perhaps it is better to add
lock_acquire() into the second caller, cleanup_workqueue_thread(), but
skip flush_cpu_workqueue() ?

Oleg.

2007-07-04 13:57:09

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Wed, 2007-07-04 at 16:52 +0400, Oleg Nesterov wrote:

> Yes. And no other work (except a barrier) can run before the caller of
> wait_on_work() is woken.

Alright, thanks. Yes, then what you proposed makes a lot of sense, I'll
implement it.

> Aha, now I see where I was confused. Yes, we can't avoid the false positives
> with flush_workqueue().
>
> I hope this won't be a problem, because almost every usage of flush_workqueue()
> is pointless nowadays. So even if we have a false positive, it probably
> means the code needs cleanups anyway.
>
> But see below,

[...]

> If you are going to do this, may I suggest you to make 2 separate patches?
> Exactly because we can't avoid the false positives with flush_workqueue(),
> it would be nice if we have an option to revert the 2-nd patch if there are
> too many false positives (I hope this won't happen).

I've run this patch on my system for a few days now and only seen
exactly one warning; however, it's *not* actually a *false* positive,
it's a positive but it's also perfectly possible to deadlock if the
system is loaded more than one work item is stuck on the workqueue for
some reason. Say A takes L1 and B runs without locks, and then you flush
the workqueue under L1; you'll get a real deadlock when both A and B are
actually scheduled to run!

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part

2007-07-04 12:22:11

by Ingo Molnar

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

* Johannes Berg <[email protected]> wrote:

> > > @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
> > >
> > > BUG_ON(get_wq_data(work) != cwq);
> > > work_clear_pending(work);
> > > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > > f(work);
> > > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> > ^^^
> > Isn't it better to call lock_release() with nested == 1 ?
>
> Not sure, Ingo?

well, in this case the lock/unlock should nest perfectly (i.e. it should
always be balanced perfectly), so indeed calling with nested==1 leads to
stricter checking.

non-nested unlocks occur when people do stuff like:

spin_lock(&lock1);
spin_lock(&lock2);
spin_unlock(&lock1);
spin_unlock(&lock2);

the first unlock is not 'nested perfectly'. Now for the workqueue
dep_map this shouldnt be a legal combination, right?

Ingo

2007-07-04 13:59:20

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Wed, 2007-07-04 at 14:21 +0200, Ingo Molnar wrote:

> well, in this case the lock/unlock should nest perfectly (i.e. it should
> always be balanced perfectly), so indeed calling with nested==1 leads to
> stricter checking.
>
> non-nested unlocks occur when people do stuff like:
>
> spin_lock(&lock1);
> spin_lock(&lock2);
> spin_unlock(&lock1);
> spin_unlock(&lock2);
>
> the first unlock is not 'nested perfectly'. Now for the workqueue
> dep_map this shouldnt be a legal combination, right?

I don't think so, will change to use nested==1.

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part

2007-07-05 08:43:39

by Ingo Molnar

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

* Oleg Nesterov <[email protected]> wrote:

> > > Isn't it better to call lock_release() with nested == 1 ?
> >
> > Not sure, Ingo?
>
> Ingo, could you also explain the meaning of "nested" parameter? Looks
> like it is just unneeded, lock_release_nested() does a quick check and
> use lock_release_non_nested() when hlock is not on top of stack.

hm, i forgot about that. We basically turned off all bad-nesting
warnings in lockdep due to false positives.

the workqueue dependencies should all nest perfectly so please just use
nested==1, even though it's a dummy right now. We might want to turn
unlock-nest checking back on in the future. Or we'll remove that
parameter altogether.

Ingo

2007-07-02 08:37:31

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Sat, 2007-06-30 at 15:46 +0400, Oleg Nesterov wrote:
> On 06/30, Ingo Molnar wrote:
> >
> > On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote:
> > > No, that's not right either, but Arjan just helped me a bit with how
> > > lockdep works and I think I have the right idea now. Ignore this for
> > > now, I'll send a new patch in a few days.
> >
> > ok. But in general, this is a very nice idea!
> >
> > i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all
> > the workqueue APIs specified in a form that makes them lockdep coverable
> > like Johannes did. This debug mechanism could have helped with the
> > recent DVB lockup that Thomas Sattler reported.
>
> I think this idea is great!

:)

> Johannes, could you change wait_on_work() as well? Most users of
> flush_workqueue() should be converted to use it.

Sure. The case I had used flush_workqueue() but I'll look into
wait_on_work() and maybe converting the place where I had this.

> > @@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp
> > } else {
> > struct wq_barrier barr;
> >
> > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> > +
> > active = 0;
> > spin_lock_irq(&cwq->lock);
>
> I am not sure why you skip "if (cwq->thread == current)" case, it can
> deadlock in the same way.

I wasn't sure what would happen with recursion.

> But, perhaps we should not change flush_cpu_workqueue(). If we detect the
> deadlock, we will have num_online_cpus() reports, yes?

Not sure what you're thinking of here. I initially had it in
flush_workqueue() but then put it into flush_cpu_workqueue(), but I have
to admit that I already forgot why.

> And,
>
> > if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> > @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
> > int cpu;
> >
> > might_sleep();
> > + lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > + lock_release(&wq->lockdep_map, 0, _THIS_IP_);
>
> one of the 2 callers was already modified. Perhaps it is better to add
> lock_acquire() into the second caller, cleanup_workqueue_thread(), but
> skip flush_cpu_workqueue() ?

I'll have to look at the code.

The problem I discussed with Arjan is that with this patch all
workqueues are in the same class which is wrong, so I'll have to modify
the workqueue creation API by introducing some macros for the LOCKDEP
case that pass the current code spot as the workqueue class. I'll try to
do that later today.

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part

2007-07-04 12:51:23

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On 07/04, Johannes Berg wrote:
>
> On Tue, 2007-07-03 at 21:31 +0400, Oleg Nesterov wrote:
>
> > If A does NOT take a lock L1, then it is OK to do cancel_work_sync(A)
> > under L1, regardless of which other work_structs this workqueue has,
> > before or after A.
>
> Ah, cancel_work_sync() waits only for it if A is currently running?

Yes. And no other work (except a barrier) can run before the caller of
wait_on_work() is woken.

> > Now we have a false positive if some time we queue B into that workqueue,
> > and this is not good.
>
> Right. I was thinking of the flush_workqueue case where any of A or B
> matters.

Aha, now I see where I was confused. Yes, we can't avoid the false positives
with flush_workqueue().

I hope this won't be a problem, because almost every usage of flush_workqueue()
is pointless nowadays. So even if we have a false positive, it probably
means the code needs cleanups anyway.

But see below,

> > We can avoid this problem if we put lockdep_map into work_struct, so
> > that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
> > takes wq->lockdep_map.
>
> Yeah, and then we'll take both wq->lockdep_map and the
> work_struct->lockdep_map when running that work. That should work, I'll
> give it a go later.

If you are going to do this, may I suggest you to make 2 separate patches?
Exactly because we can't avoid the false positives with flush_workqueue(),
it would be nice if we have an option to revert the 2-nd patch if there are
too many false positives (I hope this won't happen).

(please ignore if this is not suitable for you).

> > > @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
> > >
> > > BUG_ON(get_wq_data(work) != cwq);
> > > work_clear_pending(work);
> > > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > > f(work);
> > > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> > ^^^
> > Isn't it better to call lock_release() with nested == 1 ?
>
> Not sure, Ingo?

Ingo, could you also explain the meaning of "nested" parameter? Looks
like it is just unneeded, lock_release_nested() does a quick check
and use lock_release_non_nested() when hlock is not on top of stack.

Oleg.

2007-07-04 11:52:30

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

Oleg,

Thanks for your comments. Shows how little I really understand the
workqueue API :)

On Tue, 2007-07-03 at 21:31 +0400, Oleg Nesterov wrote:

> > I think this could lead to false positives, but then I think we
> > shouldn't care about those. Let me explain. The thing is that with this
> > it can happen that the code using the workqueue somehow obeys an
> > ordering in the work it queues, so as far as I can tell it'd be fine to
> > have two work items A and B where only B takes a lock L1, and then have
> > a wait_on_work(A) under L1, if and only if B was always queued right
> > after A (or something like that).
>
> Not sure I understand. Yes, we can have false positives, but I think the
> ordering in the workqueue doesn't matter.
>
> If A does NOT take a lock L1, then it is OK to do cancel_work_sync(A)
> under L1, regardless of which other work_structs this workqueue has,
> before or after A.

Ah, cancel_work_sync() waits only for it if A is currently running?

> Now we have a false positive if some time we queue B into that workqueue,
> and this is not good.

Right. I was thinking of the flush_workqueue case where any of A or B
matters.

> We can avoid this problem if we put lockdep_map into work_struct, so
> that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
> takes wq->lockdep_map.

Yeah, and then we'll take both wq->lockdep_map and the
work_struct->lockdep_map when running that work. That should work, I'll
give it a go later.

> But probably we don't need this right now, at least until we really
> have a lot of false positives while converting from flush_workqueue()
> to cancel_work_sync/cancel_delayed_work_sync.

I didn't really think about those yet, but I think you're right.

> > @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
> >
> > BUG_ON(get_wq_data(work) != cwq);
> > work_clear_pending(work);
> > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > f(work);
> > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> ^^^
> Isn't it better to call lock_release() with nested == 1 ?

Not sure, Ingo?

> > +#define create_workqueue(name) \
> > +({ \
> > + static struct lock_class_key __key; \
> > + struct workqueue_struct *__wq; \
> > + \
> > + __wq = __create_workqueue((name), 0, 0, &__key); \
> > + __wq; \
> > +})
>
> Why do we need __wq ?

No particular reason I think, I copied some other code doing it that
way.

> +#define create_workqueue(name) \
> ({ \
> static struct lock_class_key __key; \
> __create_workqueue((name), 0, 0, &__key); \
> })
>
> Actually, I'd suggest to rename __create_workqueue() to __create_workqueue_key(),
> and then you need the only change in linux/workqueue.h
>
> - extern struct workqueue_struct *__create_workqueue(...);
> + extern struct workqueue_struct *__create_workqueue_key(..., key);
> + #define __create_workqueue(...) \
> + static struct lock_class_key __key; \
> + __create_workqueue_key(..., key); \
>
> but this is a matter of taste.

Sure, that works. I thought about compiling out the argument completely
for the no-lockdep case, would you prefer that?

> Btw, I think your patch found a real bug in net/mac80211/, cool!

Actually, I found the bug by experiencing it and analysing the stack
traces; then I thought that it should be possible to add lockdep support
for that to avoid regressing :)

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part

2007-07-02 14:34:11

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Sat, 2007-06-30 at 15:46 +0400, Oleg Nesterov wrote:

[reordered a bit]

> And,
>
> > if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> > @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
> > int cpu;
> >
> > might_sleep();
> > + lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > + lock_release(&wq->lockdep_map, 0, _THIS_IP_);
>
> one of the 2 callers was already modified. Perhaps it is better to add
> lock_acquire() into the second caller, cleanup_workqueue_thread(), but
> skip flush_cpu_workqueue() ?

I see what you mean now, that hunk wasn't supposed to be in my patch, I
wanted to move not copy the code.

> Johannes, could you change wait_on_work() as well? Most users of
> flush_workqueue() should be converted to use it.

I think this could lead to false positives, but then I think we
shouldn't care about those. Let me explain. The thing is that with this
it can happen that the code using the workqueue somehow obeys an
ordering in the work it queues, so as far as I can tell it'd be fine to
have two work items A and B where only B takes a lock L1, and then have
a wait_on_work(A) under L1, if and only if B was always queued right
after A (or something like that). However, since this invariant is
rather likely to be broken by subsequent changes to the code, I think
such code should probably use two workqueues instead, and I doubt it
ever happens anyway since then people could in most cases just put both
works into one function. Hence I included it in my patch below.

> But, perhaps we should not change flush_cpu_workqueue(). If we detect the
> deadlock, we will have num_online_cpus() reports, yes?

The idea here was that it would cover cleanup_workqueue_thread() as
well, but I can equally well just put it there, as below.

Arjan, thanks for your help, this patch seems to work as expected
without the false positive that was my previous dump I showed.

Here's my example rechecked with this new patch:

[ 174.639892] =======================================================
[ 174.639909] [ INFO: possible circular locking dependency detected ]
[ 174.639916] 2.6.22-rc6 #168
[ 174.639924] -------------------------------------------------------
[ 174.639933] khubd/1927 is trying to acquire lock:
[ 174.639940] (zd1211rw_mac80211#2){--..}, at: [<c000000000060d04>] .flush_workqueue+0x48/0xf4
[ 174.639971]
[ 174.639974] but task is already holding lock:
[ 174.639982] (rtnl_mutex){--..}, at: [<c0000000003a47fc>] .mutex_lock+0x3c/0x58
[ 174.640016]
[ 174.640018] which lock already depends on the new lock.
[ 174.640021]
[ 174.640030]
[ 174.640032] the existing dependency chain (in reverse order) is:
[ 174.640040]
[ 174.640042] -> #1 (rtnl_mutex){--..}:
[ 174.640065] [<c0000000000724ac>] .__lock_acquire+0xb8c/0xd68
[ 174.640121] [<c000000000072728>] .lock_acquire+0xa0/0xe8
[ 174.640169] [<c0000000003a4528>] .__mutex_lock_slowpath+0x138/0x3d0
[ 174.640216] [<c0000000003a47fc>] .mutex_lock+0x3c/0x58
[ 174.640262] [<c000000000327284>] .rtnl_lock+0x24/0x40
[ 174.640314] [<d0000000004a91b8>] .ieee80211_sta_work+0xab4/0x1028 [mac80211]
[ 174.640401] [<c00000000005fe20>] .run_workqueue+0x114/0x22c
[ 174.640452] [<c000000000061370>] .worker_thread+0x11c/0x140
[ 174.640500] [<c0000000000662a8>] .kthread+0x84/0xd4
[ 174.640549] [<c0000000000235e8>] .kernel_thread+0x4c/0x68
[ 174.640602]
[ 174.640604] -> #0 (zd1211rw_mac80211#2){--..}:
[ 174.640634] [<c00000000007239c>] .__lock_acquire+0xa7c/0xd68
[ 174.640683] [<c000000000072728>] .lock_acquire+0xa0/0xe8
[ 174.640732] [<c000000000060d34>] .flush_workqueue+0x78/0xf4
[ 174.640769] [<d00000000049a330>] .ieee80211_stop+0x1b4/0x35c [mac80211]
[ 174.640839] [<c00000000031b938>] .dev_close+0xb8/0xfc
[ 174.640891] [<c00000000031ba3c>] .unregister_netdevice+0xc0/0x254
[ 174.640941] [<d0000000004aa234>] .__ieee80211_if_del+0x34/0x50 [mac80211]
[ 174.641018] [<d000000000499658>] .ieee80211_unregister_hw+0xf8/0x2d8 [mac80211]
[ 174.641089] [<d00000000045efb0>] .disconnect+0x3c/0x98 [zd1211rw_mac80211]
[ 174.641154] [<d0000000000a9650>] .usb_unbind_interface+0x6c/0xcc [usbcore]
[ 174.641248] [<c00000000027bd54>] .__device_release_driver+0xcc/0x110
[ 174.641299] [<c00000000027c4a8>] .device_release_driver+0x70/0xbc
[ 174.641351] [<c00000000027b2e8>] .bus_remove_device+0xa0/0xcc
[ 174.641400] [<c000000000278368>] .device_del+0x2f4/0x3d4
[ 174.641451] [<d0000000000a5d68>] .usb_disable_device+0xa0/0x150 [usbcore]
[ 174.641525] [<d0000000000a0eac>] .usb_disconnect+0xfc/0x1a4 [usbcore]
[ 174.641599] [<d0000000000a17d4>] .hub_thread+0x3d8/0xc70 [usbcore]
[ 174.641673] [<c0000000000662a8>] .kthread+0x84/0xd4
[ 174.641721] [<c0000000000235e8>] .kernel_thread+0x4c/0x68
[ 174.641767]
[ 174.641769] other info that might help us debug this:
[ 174.641772]
[ 174.641782] 1 lock held by khubd/1927:
[ 174.641791] #0: (rtnl_mutex){--..}, at: [<c0000000003a47fc>] .mutex_lock+0x3c/0x58
[ 174.641827]
[ 174.641830] stack backtrace:
[ 174.641838] Call Trace:
[ 174.641848] [c00000007ebab130] [c00000000000f620] .show_stack+0x70/0x1bc (unreliable)
[ 174.641879] [c00000007ebab1e0] [c00000000000f78c] .dump_stack+0x20/0x34
[ 174.641903] [c00000007ebab260] [c000000000070388] .print_circular_bug_tail+0x84/0xa8
[ 174.641930] [c00000007ebab330] [c00000000007239c] .__lock_acquire+0xa7c/0xd68
[ 174.641955] [c00000007ebab420] [c000000000072728] .lock_acquire+0xa0/0xe8
[ 174.641979] [c00000007ebab4e0] [c000000000060d34] .flush_workqueue+0x78/0xf4
[ 174.642002] [c00000007ebab580] [d00000000049a330] .ieee80211_stop+0x1b4/0x35c [mac80211]
[ 174.642047] [c00000007ebab640] [c00000000031b938] .dev_close+0xb8/0xfc
[ 174.642071] [c00000007ebab6d0] [c00000000031ba3c] .unregister_netdevice+0xc0/0x254
[ 174.642096] [c00000007ebab760] [d0000000004aa234] .__ieee80211_if_del+0x34/0x50 [mac80211]
[ 174.642147] [c00000007ebab7f0] [d000000000499658] .ieee80211_unregister_hw+0xf8/0x2d8 [mac80211]
[ 174.642193] [c00000007ebab8c0] [d00000000045efb0] .disconnect+0x3c/0x98 [zd1211rw_mac80211]
[ 174.642227] [c00000007ebab960] [d0000000000a9650] .usb_unbind_interface+0x6c/0xcc [usbcore]
[ 174.642277] [c00000007ebaba00] [c00000000027bd54] .__device_release_driver+0xcc/0x110
[ 174.642302] [c00000007ebaba90] [c00000000027c4a8] .device_release_driver+0x70/0xbc
[ 174.642327] [c00000007ebabb20] [c00000000027b2e8] .bus_remove_device+0xa0/0xcc
[ 174.642351] [c00000007ebabbb0] [c000000000278368] .device_del+0x2f4/0x3d4
[ 174.642375] [c00000007ebabc50] [d0000000000a5d68] .usb_disable_device+0xa0/0x150 [usbcore]
[ 174.642422] [c00000007ebabcf0] [d0000000000a0eac] .usb_disconnect+0xfc/0x1a4 [usbcore]
[ 174.642468] [c00000007ebabdb0] [d0000000000a17d4] .hub_thread+0x3d8/0xc70 [usbcore]
[ 174.642517] [c00000007ebabf00] [c0000000000662a8] .kthread+0x84/0xd4
[ 174.642541] [c00000007ebabf90] [c0000000000235e8] .kernel_thread+0x4c/0x68

I haven't had it report anything else, but I don't hook up many devices
to that particular machine.

---
include/linux/workqueue.h | 41 ++++++++++++++++++++++++++++++++++++-----
kernel/workqueue.c | 17 ++++++++++++++++-
2 files changed, 52 insertions(+), 6 deletions(-)

--- linux-2.6-git.orig/kernel/workqueue.c 2007-07-02 14:18:44.331427320 +0200
+++ linux-2.6-git/kernel/workqueue.c 2007-07-02 14:37:41.441427320 +0200
@@ -61,6 +61,9 @@ struct workqueue_struct {
const char *name;
int singlethread;
int freezeable; /* Freeze threads during suspend */
+#ifdef CONFIG_LOCKDEP
+ struct lockdep_map lockdep_map;
+#endif
};

/* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
@@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor

BUG_ON(get_wq_data(work) != cwq);
work_clear_pending(work);
+ lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
f(work);
+ lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);

if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
printk(KERN_ERR "BUG: workqueue leaked lock or atomic: "
@@ -376,6 +381,8 @@ void fastcall flush_workqueue(struct wor
int cpu;

might_sleep();
+ lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_release(&wq->lockdep_map, 0, _THIS_IP_);
for_each_cpu_mask(cpu, *cpu_map)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
}
@@ -453,6 +460,9 @@ static void wait_on_work(struct work_str
wq = cwq->wq;
cpu_map = wq_cpu_map(wq);

+ lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_release(&wq->lockdep_map, 0, _THIS_IP_);
+
for_each_cpu_mask(cpu, *cpu_map)
wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
}
@@ -683,7 +693,8 @@ static void start_workqueue_thread(struc
}

struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread, int freezeable)
+ int singlethread, int freezeable,
+ struct lock_class_key *key)
{
struct workqueue_struct *wq;
struct cpu_workqueue_struct *cwq;
@@ -700,6 +711,7 @@ struct workqueue_struct *__create_workqu
}

wq->name = name;
+ lockdep_init_map(&wq->lockdep_map, name, key, 0);
wq->singlethread = singlethread;
wq->freezeable = freezeable;
INIT_LIST_HEAD(&wq->list);
@@ -739,6 +751,9 @@ static void cleanup_workqueue_thread(str
if (cwq->thread == NULL)
return;

+ lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
+
/*
* If the caller is CPU_DEAD the single flush_cpu_workqueue()
* is not enough, a concurrent flush_workqueue() can insert a
--- linux-2.6-git.orig/include/linux/workqueue.h 2007-07-02 14:20:28.207427320 +0200
+++ linux-2.6-git/include/linux/workqueue.h 2007-07-02 14:25:35.158427320 +0200
@@ -119,11 +119,42 @@ struct execute_work {

extern struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread,
- int freezeable);
-#define create_workqueue(name) __create_workqueue((name), 0, 0)
-#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1)
-#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0)
+ int singlethread,
+ int freezeable,
+ struct lock_class_key *key);
+#ifdef CONFIG_LOCKDEP
+#define create_workqueue(name) \
+({ \
+ static struct lock_class_key __key; \
+ struct workqueue_struct *__wq; \
+ \
+ __wq = __create_workqueue((name), 0, 0, &__key); \
+ __wq; \
+})
+#define create_freezeable_workqueue(name) \
+({ \
+ static struct lock_class_key __key; \
+ struct workqueue_struct *__wq; \
+ \
+ __wq = __create_workqueue((name), 1, 1, &__key); \
+ __wq; \
+})
+#define create_singlethread_workqueue(name) \
+({ \
+ static struct lock_class_key __key; \
+ struct workqueue_struct *__wq; \
+ \
+ __wq = __create_workqueue((name), 1, 0, &__key); \
+ __wq; \
+})
+#else
+#define create_workqueue(name) \
+ __create_workqueue((name), 0, 0, NULL)
+#define create_freezeable_workqueue(name) \
+ __create_workqueue((name), 1, 1, NULL)
+#define create_singlethread_workqueue(name) \
+ __create_workqueue((name), 1, 0, NULL)
+#endif

extern void destroy_workqueue(struct workqueue_struct *wq);

2007-07-04 12:25:16

by Ingo Molnar

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

* Johannes Berg <[email protected]> wrote:

> > > +#define create_workqueue(name) \
> > > +({ \
> > > + static struct lock_class_key __key; \
> > > + struct workqueue_struct *__wq; \
> > > + \
> > > + __wq = __create_workqueue((name), 0, 0, &__key); \
> > > + __wq; \
> > > +})
> >
> > Why do we need __wq ?
>
> No particular reason I think, I copied some other code doing it that
> way.

yep, should be fine doing this:

#define create_workqueue(name) \
({ \
static struct lock_class_key __key; \
\
__create_workqueue((name), 0, 0, &__key); \
})

(and the return value of __create_workqueue() will be the 'return value'
of the macro as well.)

> > + extern struct workqueue_struct *__create_workqueue_key(..., key);
> > + #define __create_workqueue(...) \
> > + static struct lock_class_key __key; \
> > + __create_workqueue_key(..., key); \
> >
> > but this is a matter of taste.

the above macro should at minimum be encapsulated with
do { ... } while (0) so that __create_workqueue() is a single C
statement.

Ingo

2007-07-03 19:55:24

by Ingo Molnar

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

* Oleg Nesterov <[email protected]> wrote:

> Now we have a false positive if some time we queue B into that
> workqueue, and this is not good.
>
> We can avoid this problem if we put lockdep_map into work_struct, so
> that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
> takes wq->lockdep_map.
>
> But probably we don't need this right now, at least until we really
> have a lot of false positives while converting from flush_workqueue()
> to cancel_work_sync/cancel_delayed_work_sync.

would be nice to map the real dependencies. Adding another lockdep_map
is really low-cost and the lack of false positives is reassuring.

Ingo

2007-07-03 17:31:12

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On 07/02, Johannes Berg wrote:
>
> On Sat, 2007-06-30 at 15:46 +0400, Oleg Nesterov wrote:
>
> > Johannes, could you change wait_on_work() as well? Most users of
> > flush_workqueue() should be converted to use it.

(to avoid a possible confusion: I meant, most users of flush_workqueue()
should be converted to use cancel_work_sync/cancel_delayed_work_sync
which in turn use wait_on_work()).

> I think this could lead to false positives, but then I think we
> shouldn't care about those. Let me explain. The thing is that with this
> it can happen that the code using the workqueue somehow obeys an
> ordering in the work it queues, so as far as I can tell it'd be fine to
> have two work items A and B where only B takes a lock L1, and then have
> a wait_on_work(A) under L1, if and only if B was always queued right
> after A (or something like that).

Not sure I understand. Yes, we can have false positives, but I think the
ordering in the workqueue doesn't matter.

If A does NOT take a lock L1, then it is OK to do cancel_work_sync(A)
under L1, regardless of which other work_structs this workqueue has,
before or after A.

Now we have a false positive if some time we queue B into that workqueue,
and this is not good.

We can avoid this problem if we put lockdep_map into work_struct, so
that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
takes wq->lockdep_map.

But probably we don't need this right now, at least until we really
have a lot of false positives while converting from flush_workqueue()
to cancel_work_sync/cancel_delayed_work_sync.

> However, since this invariant is
> rather likely to be broken by subsequent changes to the code, I think
> such code should probably use two workqueues instead, and I doubt it
> ever happens anyway since then people could in most cases just put both
> works into one function.

Well, I am not sure, think about the shared workqueues like keventd_wq...

> Hence I included it in my patch below.

a couple of minor nits below,

> @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
>
> BUG_ON(get_wq_data(work) != cwq);
> work_clear_pending(work);
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> f(work);
> + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
^^^
Isn't it better to call lock_release() with nested == 1 ?

> +#define create_workqueue(name) \
> +({ \
> + static struct lock_class_key __key; \
> + struct workqueue_struct *__wq; \
> + \
> + __wq = __create_workqueue((name), 0, 0, &__key); \
> + __wq; \
> +})

Why do we need __wq ?

+#define create_workqueue(name) \
({ \
static struct lock_class_key __key; \
__create_workqueue((name), 0, 0, &__key); \
})

Actually, I'd suggest to rename __create_workqueue() to __create_workqueue_key(),
and then you need the only change in linux/workqueue.h

- extern struct workqueue_struct *__create_workqueue(...);
+ extern struct workqueue_struct *__create_workqueue_key(..., key);
+ #define __create_workqueue(...) \
+ static struct lock_class_key __key; \
+ __create_workqueue_key(..., key); \

but this is a matter of taste.

Btw, I think your patch found a real bug in net/mac80211/, cool!

Oleg.