2013-07-16 14:44:36

by Srivatsa S. Bhat

[permalink] [raw]
Subject: workqueue, pci: INFO: possible recursive locking detected

Hi,

I have been seeing this warning every time during boot. I haven't
spent time digging through it though... Please let me know if
any machine-specific info is needed.

Regards,
Srivatsa S. Bhat


----------------------------------------------------

=============================================
[ INFO: possible recursive locking detected ]
3.11.0-rc1-lockdep-fix-a #6 Not tainted
---------------------------------------------
kworker/0:1/142 is trying to acquire lock:
((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0

but task is already holding lock:
((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock((&wfc.work));
lock((&wfc.work));

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/0:1/142:
#0: (events){.+.+.+}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
#1: ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
#2: (&__lockdep_no_validate__){......}, at: [<ffffffff8139a3ba>] device_attach+0x2a/0xc0

stack backtrace:
CPU: 0 PID: 142 Comm: kworker/0:1 Not tainted 3.11.0-rc1-lockdep-fix-a #6
Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: events work_for_cpu_fn
ffff881036fecd88 ffff881036fef678 ffffffff8161a919 0000000000000003
ffff881036fec400 ffff881036fef6a8 ffffffff810c2234 ffff881036fec400
ffff881036fecd88 ffff881036fec400 0000000000000000 ffff881036fef708
Call Trace:
[<ffffffff8161a919>] dump_stack+0x59/0x80
[<ffffffff810c2234>] print_deadlock_bug+0xf4/0x100
[<ffffffff810c3d14>] validate_chain+0x504/0x750
[<ffffffff810c426d>] __lock_acquire+0x30d/0x580
[<ffffffff810c4577>] lock_acquire+0x97/0x170
[<ffffffff81077100>] ? start_flush_work+0x220/0x220
[<ffffffff81077148>] flush_work+0x48/0xb0
[<ffffffff81077100>] ? start_flush_work+0x220/0x220
[<ffffffff810c2c10>] ? mark_held_locks+0x80/0x130
[<ffffffff81074cfb>] ? queue_work_on+0x4b/0xa0
[<ffffffff810c2f85>] ? trace_hardirqs_on_caller+0x105/0x1d0
[<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81077320>] work_on_cpu+0x80/0x90
[<ffffffff8106f950>] ? wqattrs_hash+0x190/0x190
[<ffffffff812d37b0>] ? pci_pm_prepare+0x60/0x60
[<ffffffff812a0059>] ? cpumask_next_and+0x29/0x50
[<ffffffff812d38da>] __pci_device_probe+0x9a/0xe0
[<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff812d4be2>] ? pci_dev_get+0x22/0x30
[<ffffffff812d4c2a>] pci_device_probe+0x3a/0x60
[<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8139a4bc>] really_probe+0x6c/0x320
[<ffffffff8139a7b7>] driver_probe_device+0x47/0xa0
[<ffffffff8139a8c0>] ? __driver_attach+0xb0/0xb0
[<ffffffff8139a913>] __device_attach+0x53/0x60
[<ffffffff81398404>] bus_for_each_drv+0x74/0xa0
[<ffffffff8139a430>] device_attach+0xa0/0xc0
[<ffffffff812cb2d9>] pci_bus_add_device+0x39/0x60
[<ffffffff812eec21>] virtfn_add+0x251/0x3e0
[<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff812ef29f>] sriov_enable+0x22f/0x3d0
[<ffffffff812ef48d>] pci_enable_sriov+0x4d/0x60
[<ffffffffa0143045>] be_vf_setup+0x175/0x410 [be2net]
[<ffffffffa01493ca>] be_setup+0x37a/0x4b0 [be2net]
[<ffffffffa0149ac0>] be_probe+0x5c0/0x820 [be2net]
[<ffffffff812d37fe>] local_pci_probe+0x4e/0x90
[<ffffffff8106f968>] work_for_cpu_fn+0x18/0x30
[<ffffffff81075e4a>] process_one_work+0x1da/0x610
[<ffffffff81075dd9>] ? process_one_work+0x169/0x610
[<ffffffff8107650c>] worker_thread+0x28c/0x3a0
[<ffffffff81076280>] ? process_one_work+0x610/0x610
[<ffffffff8107da3e>] kthread+0xee/0x100
[<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8162a71c>] ret_from_fork+0x7c/0xb0
[<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70


2013-07-17 10:03:30

by Lai Jiangshan

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
> Hi,
>
> I have been seeing this warning every time during boot. I haven't
> spent time digging through it though... Please let me know if
> any machine-specific info is needed.
>
> Regards,
> Srivatsa S. Bhat
>
>
> ----------------------------------------------------
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock((&wfc.work));
> lock((&wfc.work));


Hi, Srivatsa

This is false negative, the two "wfc"s are different, they are
both on stack. flush_work() can't be deadlock in such case:

void foo(void *)
{
...
if (xxx)
work_on_cpu(..., foo, ...);
...
}

bar()
{
work_on_cpu(..., foo, ...);
}

The complaint is caused by "work_on_cpu() uses a static lock_class_key".
we should fix work_on_cpu().
(but the caller should also be careful, the foo()/local_pci_probe() is re-entering)

But I can't find an elegant fix.

long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
{
struct work_for_cpu wfc = { .fn = fn, .arg = arg };

+#ifdef CONFIG_LOCKDEP
+ static struct lock_class_key __key;
+ INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+ lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
+#else
INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
schedule_work_on(cpu, &wfc.work);
flush_work(&wfc.work);
return wfc.ret;
}


Any think? Tejun?

thanks,
Lai

>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 3 locks held by kworker/0:1/142:
> #0: (events){.+.+.+}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
> #1: ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
> #2: (&__lockdep_no_validate__){......}, at: [<ffffffff8139a3ba>] device_attach+0x2a/0xc0
>
> stack backtrace:
> CPU: 0 PID: 142 Comm: kworker/0:1 Not tainted 3.11.0-rc1-lockdep-fix-a #6
> Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: events work_for_cpu_fn
> ffff881036fecd88 ffff881036fef678 ffffffff8161a919 0000000000000003
> ffff881036fec400 ffff881036fef6a8 ffffffff810c2234 ffff881036fec400
> ffff881036fecd88 ffff881036fec400 0000000000000000 ffff881036fef708
> Call Trace:
> [<ffffffff8161a919>] dump_stack+0x59/0x80
> [<ffffffff810c2234>] print_deadlock_bug+0xf4/0x100
> [<ffffffff810c3d14>] validate_chain+0x504/0x750
> [<ffffffff810c426d>] __lock_acquire+0x30d/0x580
> [<ffffffff810c4577>] lock_acquire+0x97/0x170
> [<ffffffff81077100>] ? start_flush_work+0x220/0x220
> [<ffffffff81077148>] flush_work+0x48/0xb0
> [<ffffffff81077100>] ? start_flush_work+0x220/0x220
> [<ffffffff810c2c10>] ? mark_held_locks+0x80/0x130
> [<ffffffff81074cfb>] ? queue_work_on+0x4b/0xa0
> [<ffffffff810c2f85>] ? trace_hardirqs_on_caller+0x105/0x1d0
> [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff81077320>] work_on_cpu+0x80/0x90
> [<ffffffff8106f950>] ? wqattrs_hash+0x190/0x190
> [<ffffffff812d37b0>] ? pci_pm_prepare+0x60/0x60
> [<ffffffff812a0059>] ? cpumask_next_and+0x29/0x50
> [<ffffffff812d38da>] __pci_device_probe+0x9a/0xe0
> [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff812d4be2>] ? pci_dev_get+0x22/0x30
> [<ffffffff812d4c2a>] pci_device_probe+0x3a/0x60
> [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff8139a4bc>] really_probe+0x6c/0x320
> [<ffffffff8139a7b7>] driver_probe_device+0x47/0xa0
> [<ffffffff8139a8c0>] ? __driver_attach+0xb0/0xb0
> [<ffffffff8139a913>] __device_attach+0x53/0x60
> [<ffffffff81398404>] bus_for_each_drv+0x74/0xa0
> [<ffffffff8139a430>] device_attach+0xa0/0xc0
> [<ffffffff812cb2d9>] pci_bus_add_device+0x39/0x60
> [<ffffffff812eec21>] virtfn_add+0x251/0x3e0
> [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff812ef29f>] sriov_enable+0x22f/0x3d0
> [<ffffffff812ef48d>] pci_enable_sriov+0x4d/0x60
> [<ffffffffa0143045>] be_vf_setup+0x175/0x410 [be2net]
> [<ffffffffa01493ca>] be_setup+0x37a/0x4b0 [be2net]
> [<ffffffffa0149ac0>] be_probe+0x5c0/0x820 [be2net]
> [<ffffffff812d37fe>] local_pci_probe+0x4e/0x90
> [<ffffffff8106f968>] work_for_cpu_fn+0x18/0x30
> [<ffffffff81075e4a>] process_one_work+0x1da/0x610
> [<ffffffff81075dd9>] ? process_one_work+0x169/0x610
> [<ffffffff8107650c>] worker_thread+0x28c/0x3a0
> [<ffffffff81076280>] ? process_one_work+0x610/0x610
> [<ffffffff8107da3e>] kthread+0xee/0x100
> [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
> [<ffffffff8162a71c>] ret_from_fork+0x7c/0xb0
> [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
>
>

2013-07-18 20:27:35

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected


On 07/17/2013 03:37 PM, Lai Jiangshan wrote:
> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
>> Hi,
>>
>> I have been seeing this warning every time during boot. I haven't
>> spent time digging through it though... Please let me know if
>> any machine-specific info is needed.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>
>> ----------------------------------------------------
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock((&wfc.work));
>> lock((&wfc.work));
>
>


Hi Lai,

Thanks for taking a look into this!

>
> This is false negative,

I believe you meant false-positive...

> the two "wfc"s are different, they are
> both on stack. flush_work() can't be deadlock in such case:
>
> void foo(void *)
> {
> ...
> if (xxx)
> work_on_cpu(..., foo, ...);
> ...
> }
>
> bar()
> {
> work_on_cpu(..., foo, ...);
> }
>
> The complaint is caused by "work_on_cpu() uses a static lock_class_key".
> we should fix work_on_cpu().
> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering)
>
> But I can't find an elegant fix.
>
> long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> {
> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>
> +#ifdef CONFIG_LOCKDEP
> + static struct lock_class_key __key;
> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> + lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
> +#else
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> return wfc.ret;
> }
>

Unfortunately that didn't seem to fix it.. I applied the patch
shown below, and I got the same old warning.

---

kernel/workqueue.c | 6 ++++++
1 file changed, 6 insertions(+)


diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..07d9a67 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
{
struct work_for_cpu wfc = { .fn = fn, .arg = arg };

+#ifdef CONFIG_LOCKDEP
+ static struct lock_class_key __key;
+ INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+ lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
+#else
INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
schedule_work_on(cpu, &wfc.work);
flush_work(&wfc.work);
return wfc.ret;



Warning:
--------

wmi: Mapper loaded
be2net 0000:11:00.0: irq 102 for MSI/MSI-X
be2net 0000:11:00.0: enabled 1 MSI-x vector(s)
be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue
be2net 0000:11:00.0: created 1 TX queue(s)
pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000

=============================================
[ INFO: possible recursive locking detected ]
3.11.0-rc1-wq-fix #10 Not tainted
---------------------------------------------
kworker/0:1/126 is trying to acquire lock:
(&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0

but task is already holding lock:
(&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&wfc.work);
lock(&wfc.work);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/0:1/126:
#0: (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
#1: (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
#2: (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0

stack backtrace:
CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10
Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: events work_for_cpu_fn
ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003
ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80
ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8
Call Trace:
[<ffffffff81619059>] dump_stack+0x59/0x80
[<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100
[<ffffffff810c3104>] validate_chain+0x504/0x750
[<ffffffff810c365d>] __lock_acquire+0x30d/0x580
[<ffffffff810c3967>] lock_acquire+0x97/0x170
[<ffffffff810770f0>] ? start_flush_work+0x220/0x220
[<ffffffff81077138>] flush_work+0x48/0xb0
[<ffffffff810770f0>] ? start_flush_work+0x220/0x220
[<ffffffff810c2000>] ? mark_held_locks+0x80/0x130
[<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0
[<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0
[<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81077334>] work_on_cpu+0xa4/0xc0
[<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
[<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60
[<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0
[<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff812d3302>] ? pci_dev_get+0x22/0x30
[<ffffffff812d334a>] pci_device_probe+0x3a/0x60
[<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff81398bdc>] really_probe+0x6c/0x320
[<ffffffff81398ed7>] driver_probe_device+0x47/0xa0
[<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0
[<ffffffff81399033>] __device_attach+0x53/0x60
[<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0
[<ffffffff81398b50>] device_attach+0xa0/0xc0
[<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60
[<ffffffff812ed341>] virtfn_add+0x251/0x3e0
[<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0
[<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60
[<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net]
[<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net]
[<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net]
[<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90
[<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30
[<ffffffff81075e3a>] process_one_work+0x1da/0x610
[<ffffffff81075dc9>] ? process_one_work+0x169/0x610
[<ffffffff810764fc>] worker_thread+0x28c/0x3a0
[<ffffffff81076270>] ? process_one_work+0x610/0x610
[<ffffffff8107da5e>] kthread+0xee/0x100
[<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
[<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
[<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
be2net 0000:11:04.0: enabling device (0040 -> 0042)
be2net 0000:11:04.0: Could not use PCIe error reporting
be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1
be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1


Regards,
Srivatsa S. Bhat

2013-07-19 02:01:02

by Lai Jiangshan

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>
> On 07/17/2013 03:37 PM, Lai Jiangshan wrote:
>> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
>>> Hi,
>>>
>>> I have been seeing this warning every time during boot. I haven't
>>> spent time digging through it though... Please let me know if
>>> any machine-specific info is needed.
>>>
>>> Regards,
>>> Srivatsa S. Bhat
>>>
>>>
>>> ----------------------------------------------------
>>>
>>> =============================================
>>> [ INFO: possible recursive locking detected ]
>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>> ---------------------------------------------
>>> kworker/0:1/142 is trying to acquire lock:
>>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>
>>> but task is already holding lock:
>>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>
>>> other info that might help us debug this:
>>> Possible unsafe locking scenario:
>>>
>>> CPU0
>>> ----
>>> lock((&wfc.work));
>>> lock((&wfc.work));
>>
>>
>
>
> Hi Lai,
>
> Thanks for taking a look into this!
>
>>
>> This is false negative,
>
> I believe you meant false-positive...
>
>> the two "wfc"s are different, they are
>> both on stack. flush_work() can't be deadlock in such case:
>>
>> void foo(void *)
>> {
>> ...
>> if (xxx)
>> work_on_cpu(..., foo, ...);
>> ...
>> }
>>
>> bar()
>> {
>> work_on_cpu(..., foo, ...);
>> }
>>
>> The complaint is caused by "work_on_cpu() uses a static lock_class_key".
>> we should fix work_on_cpu().
>> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering)
>>
>> But I can't find an elegant fix.
>>
>> long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> {
>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> +#ifdef CONFIG_LOCKDEP
>> + static struct lock_class_key __key;
>> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> + lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
>> +#else
>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +#endif
>> schedule_work_on(cpu, &wfc.work);
>> flush_work(&wfc.work);
>> return wfc.ret;
>> }
>>
>
> Unfortunately that didn't seem to fix it.. I applied the patch
> shown below, and I got the same old warning.
>
> ---
>
> kernel/workqueue.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..07d9a67 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> {
> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>
> +#ifdef CONFIG_LOCKDEP
> + static struct lock_class_key __key;

Sorry, this "static" should be removed.

Thanks,
Lai


> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> + lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
> +#else
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> return wfc.ret;
>
>
>
> Warning:
> --------
>
> wmi: Mapper loaded
> be2net 0000:11:00.0: irq 102 for MSI/MSI-X
> be2net 0000:11:00.0: enabled 1 MSI-x vector(s)
> be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue
> be2net 0000:11:00.0: created 1 TX queue(s)
> pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-wq-fix #10 Not tainted
> ---------------------------------------------
> kworker/0:1/126 is trying to acquire lock:
> (&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0
>
> but task is already holding lock:
> (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&wfc.work);
> lock(&wfc.work);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 3 locks held by kworker/0:1/126:
> #0: (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
> #1: (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
> #2: (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0
>
> stack backtrace:
> CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10
> Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: events work_for_cpu_fn
> ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003
> ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80
> ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8
> Call Trace:
> [<ffffffff81619059>] dump_stack+0x59/0x80
> [<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100
> [<ffffffff810c3104>] validate_chain+0x504/0x750
> [<ffffffff810c365d>] __lock_acquire+0x30d/0x580
> [<ffffffff810c3967>] lock_acquire+0x97/0x170
> [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
> [<ffffffff81077138>] flush_work+0x48/0xb0
> [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
> [<ffffffff810c2000>] ? mark_held_locks+0x80/0x130
> [<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0
> [<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0
> [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff81077334>] work_on_cpu+0xa4/0xc0
> [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
> [<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60
> [<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0
> [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff812d3302>] ? pci_dev_get+0x22/0x30
> [<ffffffff812d334a>] pci_device_probe+0x3a/0x60
> [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff81398bdc>] really_probe+0x6c/0x320
> [<ffffffff81398ed7>] driver_probe_device+0x47/0xa0
> [<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0
> [<ffffffff81399033>] __device_attach+0x53/0x60
> [<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0
> [<ffffffff81398b50>] device_attach+0xa0/0xc0
> [<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60
> [<ffffffff812ed341>] virtfn_add+0x251/0x3e0
> [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0
> [<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60
> [<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net]
> [<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net]
> [<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net]
> [<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90
> [<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30
> [<ffffffff81075e3a>] process_one_work+0x1da/0x610
> [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
> [<ffffffff810764fc>] worker_thread+0x28c/0x3a0
> [<ffffffff81076270>] ? process_one_work+0x610/0x610
> [<ffffffff8107da5e>] kthread+0xee/0x100
> [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
> [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
> [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
> be2net 0000:11:04.0: enabling device (0040 -> 0042)
> be2net 0000:11:04.0: Could not use PCIe error reporting
> be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1
> be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1
>
>
> Regards,
> Srivatsa S. Bhat
>
>

2013-07-19 09:01:11

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>
>> ---
>>
>> kernel/workqueue.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..07d9a67 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> {
>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> +#ifdef CONFIG_LOCKDEP
>> + static struct lock_class_key __key;
>
> Sorry, this "static" should be removed.
>

That didn't help either :-( Because it makes lockdep unhappy,
since the key isn't persistent.

This is the patch I used:

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..7967e3b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
{
struct work_for_cpu wfc = { .fn = fn, .arg = arg };

+#ifdef CONFIG_LOCKDEP
+ struct lock_class_key __key;
+ INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+ lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
+#else
INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
schedule_work_on(cpu, &wfc.work);
flush_work(&wfc.work);
return wfc.ret;


And here are the new warnings:


Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
BUG: key ffff881039557b98 not in .data!
------------[ cut here ]------------
WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in:
CPU: 8 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc1-wq-fix3-a #12
Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
0000000000000bab ffff881039557a58 ffffffff81619069 ffffffff819f6dce
ffff881039557aa8 ffff881039557a98 ffffffff8104e6ac ffff881039557b98
ffff881039557b58 0000000000000000 ffff881039557b58 0000000000000000
Call Trace:
[<ffffffff81619069>] dump_stack+0x59/0x80
[<ffffffff8104e6ac>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8104e796>] warn_slowpath_fmt+0x46/0x50
[<ffffffff810bfbb8>] lockdep_init_map+0x168/0x170
[<ffffffff81077326>] work_on_cpu+0x96/0xd0
[<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
[<ffffffff812d1ee0>] ? pci_pm_prepare+0x60/0x60
[<ffffffff812d200a>] __pci_device_probe+0x9a/0xe0
[<ffffffff8161efa0>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff812d3312>] ? pci_dev_get+0x22/0x30
[<ffffffff812d335a>] pci_device_probe+0x3a/0x60
[<ffffffff8161efa0>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff81398bec>] really_probe+0x6c/0x320
[<ffffffff81398ee7>] driver_probe_device+0x47/0xa0
[<ffffffff81398feb>] __driver_attach+0xab/0xb0
[<ffffffff81398f40>] ? driver_probe_device+0xa0/0xa0
[<ffffffff81398f40>] ? driver_probe_device+0xa0/0xa0
[<ffffffff81396bf4>] bus_for_each_dev+0x94/0xb0
[<ffffffff813988ae>] driver_attach+0x1e/0x20
[<ffffffff81398258>] bus_add_driver+0x208/0x290
[<ffffffff81399597>] driver_register+0x77/0x160
[<ffffffff81d66b9e>] ? dmi_pcie_pme_disable_msi+0x21/0x21
[<ffffffff812d3464>] __pci_register_driver+0x64/0x70
[<ffffffff81d66c07>] pcie_portdrv_init+0x69/0x7a
[<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0
[<ffffffff8107b0d6>] ? parse_args+0x156/0x2f0
[<ffffffff81d32b80>] ? kernel_init_freeable+0x28a/0x28a
[<ffffffff81d328d8>] do_basic_setup+0x9d/0xbb
[<ffffffff81d32b80>] ? kernel_init_freeable+0x28a/0x28a
[<ffffffff81d32b00>] kernel_init_freeable+0x20a/0x28a
[<ffffffff81611c40>] ? rest_init+0x180/0x180
[<ffffffff81611c4e>] kernel_init+0xe/0xf0
[<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
[<ffffffff81611c40>] ? rest_init+0x180/0x180
---[ end trace 41047d25544a3721 ]---
pcieport 0000:00:01.0: irq 88 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:02.0: irq 89 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:02.2: irq 90 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:03.0: irq 91 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:11.0: irq 92 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:1c.0: irq 93 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:1c.4: irq 94 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:00.0: irq 95 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:01.0: irq 96 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:01.1: irq 97 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:02.0: irq 98 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:02.2: irq 99 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:03.0: irq 100 for MSI/MSI-X
pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded

<many more instances of "BUG: key ... not in .data!">

Regards,
Srivatsa S. Bhat

2013-07-22 11:48:51

by Lai Jiangshan

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>
>>> ---
>>>
>>> kernel/workqueue.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..07d9a67 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>> {
>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>
>>> +#ifdef CONFIG_LOCKDEP
>>> + static struct lock_class_key __key;
>>
>> Sorry, this "static" should be removed.
>>
>
> That didn't help either :-( Because it makes lockdep unhappy,
> since the key isn't persistent.
>
> This is the patch I used:
>
> ---
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..7967e3b 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> {
> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>
> +#ifdef CONFIG_LOCKDEP
> + struct lock_class_key __key;
> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> + lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
> +#else
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> return wfc.ret;
>
>
> And here are the new warnings:
>
>
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
> io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> BUG: key ffff881039557b98 not in .data!
> ------------[ cut here ]------------
> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()

Sorry again.

>From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <[email protected]>
Date: Mon, 22 Jul 2013 19:08:51 +0800
Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
call work_on_cpu()

If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock((&wfc.work));
> lock((&wfc.work));
>
> *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
But we don't want to change the flush_work(), so we use
completion instead of flush_work() in the work_on_cpu().

Reported-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..b021a45 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4731,6 +4731,7 @@ struct work_for_cpu {
long (*fn)(void *);
void *arg;
long ret;
+ struct completion done;
};

static void work_for_cpu_fn(struct work_struct *work)
@@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);

wfc->ret = wfc->fn(wfc->arg);
+ complete(&wfc->done);
}

/**
@@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
struct work_for_cpu wfc = { .fn = fn, .arg = arg };

INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+ init_completion(&wfc.done);
schedule_work_on(cpu, &wfc.work);
- flush_work(&wfc.work);
+ wait_for_completion(&wfc.done);
return wfc.ret;
}
EXPORT_SYMBOL_GPL(work_on_cpu);
--
1.7.4.4

2013-07-22 15:40:47

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>
>>>> ---
>>>>
>>>> kernel/workqueue.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index f02c4a4..07d9a67 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>> {
>>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>
>>>> +#ifdef CONFIG_LOCKDEP
>>>> + static struct lock_class_key __key;
>>>
>>> Sorry, this "static" should be removed.
>>>
>>
>> That didn't help either :-( Because it makes lockdep unhappy,
>> since the key isn't persistent.
>>
>> This is the patch I used:
>>
>> ---
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..7967e3b 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> {
>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> +#ifdef CONFIG_LOCKDEP
>> + struct lock_class_key __key;
>> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> + lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>> +#else
>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +#endif
>> schedule_work_on(cpu, &wfc.work);
>> flush_work(&wfc.work);
>> return wfc.ret;
>>
>>
>> And here are the new warnings:
>>
>>
>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>> io scheduler noop registered
>> io scheduler deadline registered
>> io scheduler cfq registered (default)
>> BUG: key ffff881039557b98 not in .data!
>> ------------[ cut here ]------------
>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
>
> Sorry again.
>
> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <[email protected]>
> Date: Mon, 22 Jul 2013 19:08:51 +0800
> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
> call work_on_cpu()
>
> If the @fn call work_on_cpu() again, the lockdep will complain:
>
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock((&wfc.work));
>> lock((&wfc.work));
>>
>> *** DEADLOCK ***
>
> It is false-positive lockdep report. In this sutiation,
> the two "wfc"s of the two work_on_cpu() are different,
> they are both on stack. flush_work() can't be deadlock.
>
> To fix this, we need to avoid the lockdep checking in this case,
> But we don't want to change the flush_work(), so we use
> completion instead of flush_work() in the work_on_cpu().
>
> Reported-by: Srivatsa S. Bhat <[email protected]>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---

That worked, thanks a lot!

Tested-by: Srivatsa S. Bhat <[email protected]>

Regards,
Srivatsa S. Bhat

> kernel/workqueue.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..b021a45 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
> long (*fn)(void *);
> void *arg;
> long ret;
> + struct completion done;
> };
>
> static void work_for_cpu_fn(struct work_struct *work)
> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
> struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>
> wfc->ret = wfc->fn(wfc->arg);
> + complete(&wfc->done);
> }
>
> /**
> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> + init_completion(&wfc.done);
> schedule_work_on(cpu, &wfc.work);
> - flush_work(&wfc.work);
> + wait_for_completion(&wfc.done);
> return wfc.ret;
> }
> EXPORT_SYMBOL_GPL(work_on_cpu);
>

2013-07-22 21:32:38

by Tejun Heo

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On Mon, Jul 22, 2013 at 07:52:34PM +0800, Lai Jiangshan wrote:
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..b021a45 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
> long (*fn)(void *);
> void *arg;
> long ret;
> + struct completion done;
> };
>
> static void work_for_cpu_fn(struct work_struct *work)
> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
> struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>
> wfc->ret = wfc->fn(wfc->arg);
> + complete(&wfc->done);
> }
>
> /**
> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> + init_completion(&wfc.done);
> schedule_work_on(cpu, &wfc.work);
> - flush_work(&wfc.work);
> + wait_for_completion(&wfc.done);

Hmmm... it's kinda nasty. Given how infrequently work_on_cpu() users
nest, I think it'd be cleaner to have work_on_cpu_nested() which takes
@subclass. It requires extra work on the caller's part but I think
that actually is useful as nested work_on_cpu()s are pretty weird
things.

Thanks.

--
tejun

2013-07-22 21:38:45

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

[+cc Alex, Yinghai, linux-pci]

On Mon, Jul 22, 2013 at 9:37 AM, Srivatsa S. Bhat
<[email protected]> wrote:
> On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
>> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>>
>>>>> ---
>>>>>
>>>>> kernel/workqueue.c | 6 ++++++
>>>>> 1 file changed, 6 insertions(+)
>>>>>
>>>>>
>>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>>> index f02c4a4..07d9a67 100644
>>>>> --- a/kernel/workqueue.c
>>>>> +++ b/kernel/workqueue.c
>>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>> {
>>>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>>
>>>>> +#ifdef CONFIG_LOCKDEP
>>>>> + static struct lock_class_key __key;
>>>>
>>>> Sorry, this "static" should be removed.
>>>>
>>>
>>> That didn't help either :-( Because it makes lockdep unhappy,
>>> since the key isn't persistent.
>>>
>>> This is the patch I used:
>>>
>>> ---
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..7967e3b 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>> {
>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>
>>> +#ifdef CONFIG_LOCKDEP
>>> + struct lock_class_key __key;
>>> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> + lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>>> +#else
>>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> +#endif
>>> schedule_work_on(cpu, &wfc.work);
>>> flush_work(&wfc.work);
>>> return wfc.ret;
>>>
>>>
>>> And here are the new warnings:
>>>
>>>
>>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>>> io scheduler noop registered
>>> io scheduler deadline registered
>>> io scheduler cfq registered (default)
>>> BUG: key ffff881039557b98 not in .data!
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
>>
>> Sorry again.
>>
>> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <[email protected]>
>> Date: Mon, 22 Jul 2013 19:08:51 +0800
>> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
>> call work_on_cpu()
>>
>> If the @fn call work_on_cpu() again, the lockdep will complain:
>>
>>> [ INFO: possible recursive locking detected ]
>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>> ---------------------------------------------
>>> kworker/0:1/142 is trying to acquire lock:
>>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>
>>> but task is already holding lock:
>>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>
>>> other info that might help us debug this:
>>> Possible unsafe locking scenario:
>>>
>>> CPU0
>>> ----
>>> lock((&wfc.work));
>>> lock((&wfc.work));
>>>
>>> *** DEADLOCK ***
>>
>> It is false-positive lockdep report. In this sutiation,
>> the two "wfc"s of the two work_on_cpu() are different,
>> they are both on stack. flush_work() can't be deadlock.
>>
>> To fix this, we need to avoid the lockdep checking in this case,
>> But we don't want to change the flush_work(), so we use
>> completion instead of flush_work() in the work_on_cpu().
>>
>> Reported-by: Srivatsa S. Bhat <[email protected]>
>> Signed-off-by: Lai Jiangshan <[email protected]>
>> ---
>
> That worked, thanks a lot!
>
> Tested-by: Srivatsa S. Bhat <[email protected]>
>
> Regards,
> Srivatsa S. Bhat
>
>> kernel/workqueue.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..b021a45 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>> long (*fn)(void *);
>> void *arg;
>> long ret;
>> + struct completion done;
>> };
>>
>> static void work_for_cpu_fn(struct work_struct *work)
>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>> struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>
>> wfc->ret = wfc->fn(wfc->arg);
>> + complete(&wfc->done);
>> }
>>
>> /**
>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> + init_completion(&wfc.done);
>> schedule_work_on(cpu, &wfc.work);
>> - flush_work(&wfc.work);
>> + wait_for_completion(&wfc.done);
>> return wfc.ret;
>> }
>> EXPORT_SYMBOL_GPL(work_on_cpu);
>>
>

Isn't this for the same issue Alex and others have been working on?

It doesn't feel like we have consensus on how this should be fixed.
You're proposing a change to work_on_cpu(), Alex proposed a change to
pci_call_probe() [1], Yinghai proposed some changes to the PCI core
SR-IOV code and several drivers [2].

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]

2013-07-22 22:06:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On Mon, Jul 22, 2013 at 2:38 PM, Bjorn Helgaas <[email protected]> wrote:
> [+cc Alex, Yinghai, linux-pci]

>
> Isn't this for the same issue Alex and others have been working on?

yes.

>
> It doesn't feel like we have consensus on how this should be fixed.
> You're proposing a change to work_on_cpu(), Alex proposed a change to
> pci_call_probe() [1], Yinghai proposed some changes to the PCI core
> SR-IOV code and several drivers [2].
>
> [1] https://lkml.kernel.org/r/[email protected]
> [2] https://lkml.kernel.org/r/[email protected]

Actually i used Alex's patch and attached -v2 of sriov_attatch in my
local testing for a while. -v2: remove the device_schedule_callback
hacking.

Also, someone comment that numa_node_id() in alex's should not be
used, as it could
be preempted.

Thanks

Yinghai


Attachments:
fix_racing_removing_6_6.patch (14.30 kB)

2013-07-22 22:33:05

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/22/2013 02:38 PM, Bjorn Helgaas wrote:
> [+cc Alex, Yinghai, linux-pci]
>
> On Mon, Jul 22, 2013 at 9:37 AM, Srivatsa S. Bhat
> <[email protected]> wrote:
>> On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
>>> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>>>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>>> ---
>>>>>>
>>>>>> kernel/workqueue.c | 6 ++++++
>>>>>> 1 file changed, 6 insertions(+)
>>>>>>
>>>>>>
>>>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>>>> index f02c4a4..07d9a67 100644
>>>>>> --- a/kernel/workqueue.c
>>>>>> +++ b/kernel/workqueue.c
>>>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>>> {
>>>>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>>>
>>>>>> +#ifdef CONFIG_LOCKDEP
>>>>>> + static struct lock_class_key __key;
>>>>> Sorry, this "static" should be removed.
>>>>>
>>>> That didn't help either :-( Because it makes lockdep unhappy,
>>>> since the key isn't persistent.
>>>>
>>>> This is the patch I used:
>>>>
>>>> ---
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index f02c4a4..7967e3b 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>> {
>>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>
>>>> +#ifdef CONFIG_LOCKDEP
>>>> + struct lock_class_key __key;
>>>> + INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>>> + lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>>>> +#else
>>>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>>> +#endif
>>>> schedule_work_on(cpu, &wfc.work);
>>>> flush_work(&wfc.work);
>>>> return wfc.ret;
>>>>
>>>>
>>>> And here are the new warnings:
>>>>
>>>>
>>>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>>>> io scheduler noop registered
>>>> io scheduler deadline registered
>>>> io scheduler cfq registered (default)
>>>> BUG: key ffff881039557b98 not in .data!
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
>>> Sorry again.
>>>
>>> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <[email protected]>
>>> Date: Mon, 22 Jul 2013 19:08:51 +0800
>>> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
>>> call work_on_cpu()
>>>
>>> If the @fn call work_on_cpu() again, the lockdep will complain:
>>>
>>>> [ INFO: possible recursive locking detected ]
>>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>>> ---------------------------------------------
>>>> kworker/0:1/142 is trying to acquire lock:
>>>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>>
>>>> but task is already holding lock:
>>>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>>
>>>> other info that might help us debug this:
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0
>>>> ----
>>>> lock((&wfc.work));
>>>> lock((&wfc.work));
>>>>
>>>> *** DEADLOCK ***
>>> It is false-positive lockdep report. In this sutiation,
>>> the two "wfc"s of the two work_on_cpu() are different,
>>> they are both on stack. flush_work() can't be deadlock.
>>>
>>> To fix this, we need to avoid the lockdep checking in this case,
>>> But we don't want to change the flush_work(), so we use
>>> completion instead of flush_work() in the work_on_cpu().
>>>
>>> Reported-by: Srivatsa S. Bhat <[email protected]>
>>> Signed-off-by: Lai Jiangshan <[email protected]>
>>> ---
>> That worked, thanks a lot!
>>
>> Tested-by: Srivatsa S. Bhat <[email protected]>
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>> kernel/workqueue.c | 5 ++++-
>>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..b021a45 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>>> long (*fn)(void *);
>>> void *arg;
>>> long ret;
>>> + struct completion done;
>>> };
>>>
>>> static void work_for_cpu_fn(struct work_struct *work)
>>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>>> struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>>
>>> wfc->ret = wfc->fn(wfc->arg);
>>> + complete(&wfc->done);
>>> }
>>>
>>> /**
>>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>
>>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> + init_completion(&wfc.done);
>>> schedule_work_on(cpu, &wfc.work);
>>> - flush_work(&wfc.work);
>>> + wait_for_completion(&wfc.done);
>>> return wfc.ret;
>>> }
>>> EXPORT_SYMBOL_GPL(work_on_cpu);
>>>
> Isn't this for the same issue Alex and others have been working on?
>
> It doesn't feel like we have consensus on how this should be fixed.
> You're proposing a change to work_on_cpu(), Alex proposed a change to
> pci_call_probe() [1], Yinghai proposed some changes to the PCI core
> SR-IOV code and several drivers [2].
>
> [1] https://lkml.kernel.org/r/[email protected]
> [2] https://lkml.kernel.org/r/[email protected]

The solution I proposed was flawed due to possible preemption issues.
If this solution resolves the issue then I am fine with it as long as it
doesn't introduce any new issues.

Thanks,

Alex

2013-07-23 01:19:34

by Lai Jiangshan

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/23/2013 05:32 AM, Tejun Heo wrote:
> On Mon, Jul 22, 2013 at 07:52:34PM +0800, Lai Jiangshan wrote:
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..b021a45 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>> long (*fn)(void *);
>> void *arg;
>> long ret;
>> + struct completion done;
>> };
>>
>> static void work_for_cpu_fn(struct work_struct *work)
>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>> struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>
>> wfc->ret = wfc->fn(wfc->arg);
>> + complete(&wfc->done);
>> }
>>
>> /**
>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> + init_completion(&wfc.done);
>> schedule_work_on(cpu, &wfc.work);
>> - flush_work(&wfc.work);
>> + wait_for_completion(&wfc.done);
>
> Hmmm... it's kinda nasty. Given how infrequently work_on_cpu() users
> nest, I think it'd be cleaner to have work_on_cpu_nested() which takes
> @subclass. It requires extra work on the caller's part but I think
> that actually is useful as nested work_on_cpu()s are pretty weird
> things.
>

The problem is that the userS may not know their work_on_cpu() nested,
especially when work_on_cpu()s are on different subsystems and the call depth
is deep enough but the nested work_on_cpu() depends on some conditions.

I prefer to change the user instead of introducing work_on_cpu_nested(), and
I accept to change the user only instead of change work_on_cpu() since there is only
one nested-calls case found.

But I'm thinking, since nested work_on_cpu() don't have any problem,
Why workqueue.c don't offer a more friendly API/behavior?

Thanks,
Lai

2013-07-23 14:38:49

by Tejun Heo

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

Hey, Lai.

On Tue, Jul 23, 2013 at 09:23:14AM +0800, Lai Jiangshan wrote:
> The problem is that the userS may not know their work_on_cpu() nested,
> especially when work_on_cpu()s are on different subsystems and the call depth
> is deep enough but the nested work_on_cpu() depends on some conditions.

Yeah, that's a possibility. Not sure how much it'd actually matter
tho given that this is the only instance we have and we've had the
lockdep annotation for years.

> I prefer to change the user instead of introducing work_on_cpu_nested(), and
> I accept to change the user only instead of change work_on_cpu() since there is only
> one nested-calls case found.
>
> But I'm thinking, since nested work_on_cpu() don't have any problem,
> Why workqueue.c don't offer a more friendly API/behavior?

If we wanna solve it from workqueue side, let's please do it by
introduing an internal flush_work() variant which skips the lockdep
annotation. I'd really like to avoid using completion here. It's
nasty as it depends solely on the fact that completion doesn't have
lockdep annotation yet. Let's do it explicitly.

Thanks.

--
tejun

2013-07-24 11:13:32

by Lai Jiangshan

[permalink] [raw]
Subject: Re: workqueue, pci: INFO: possible recursive locking detected

On 07/23/2013 10:38 PM, Tejun Heo wrote:
> Hey, Lai.
>
> On Tue, Jul 23, 2013 at 09:23:14AM +0800, Lai Jiangshan wrote:
>> The problem is that the userS may not know their work_on_cpu() nested,
>> especially when work_on_cpu()s are on different subsystems and the call depth
>> is deep enough but the nested work_on_cpu() depends on some conditions.
>
> Yeah, that's a possibility. Not sure how much it'd actually matter
> tho given that this is the only instance we have and we've had the
> lockdep annotation for years.
>
>> I prefer to change the user instead of introducing work_on_cpu_nested(), and
>> I accept to change the user only instead of change work_on_cpu() since there is only
>> one nested-calls case found.
>>
>> But I'm thinking, since nested work_on_cpu() don't have any problem,
>> Why workqueue.c don't offer a more friendly API/behavior?
>
> If we wanna solve it from workqueue side, let's please do it by
> introduing an internal flush_work() variant which skips the lockdep
> annotation. I'd really like to avoid using completion here. It's
> nasty as it depends solely on the fact that completion doesn't have
> lockdep annotation yet. Let's do it explicitly.
>
> Thanks.
>

>From 269bf1a2f47f04e0daf429c2cdf4052b4e8fb309 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <[email protected]>
Date: Wed, 24 Jul 2013 18:21:50 +0800
Subject: [PATCH] workqueue: allow the function of work_on_cpu() can call
work_on_cpu()

If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock((&wfc.work));
> lock((&wfc.work));
>
> *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
thus we instroduce a internal __flush_work() which skip the lockdep.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 29 +++++++++++++++++++----------
1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..53df707 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2817,6 +2817,19 @@ already_gone:
return false;
}

+static bool __flush_work(struct work_struct *work)
+{
+ struct wq_barrier barr;
+
+ if (start_flush_work(work, &barr)) {
+ wait_for_completion(&barr.done);
+ destroy_work_on_stack(&barr.work);
+ return true;
+ } else {
+ return false;
+ }
+}
+
/**
* flush_work - wait for a work to finish executing the last queueing instance
* @work: the work to flush
@@ -2830,18 +2843,10 @@ already_gone:
*/
bool flush_work(struct work_struct *work)
{
- struct wq_barrier barr;
-
lock_map_acquire(&work->lockdep_map);
lock_map_release(&work->lockdep_map);

- if (start_flush_work(work, &barr)) {
- wait_for_completion(&barr.done);
- destroy_work_on_stack(&barr.work);
- return true;
- } else {
- return false;
- }
+ return __flush_work(work);
}
EXPORT_SYMBOL_GPL(flush_work);

@@ -4756,7 +4761,11 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)

INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
schedule_work_on(cpu, &wfc.work);
- flush_work(&wfc.work);
+ /*
+ * flushing the work can't lead to deadlock, using __flush_work()
+ * to avoid the lockdep complaint for nested work_on_cpu()s.
+ */
+ __flush_work(&wfc.work);
return wfc.ret;
}
EXPORT_SYMBOL_GPL(work_on_cpu);
--
1.7.4.4

2013-07-24 16:25:49

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] workqueue: allow work_on_cpu() to be called recursively

Applied to wq/for-3.11-fixes with comment and subject tweaks.

Thanks!

---------- 8< ------------

>From c2fda509667b0fda4372a237f5a59ea4570b1627 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <[email protected]>
Date: Wed, 24 Jul 2013 18:31:42 +0800

If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock((&wfc.work));
> lock((&wfc.work));
>
> *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
thus we instroduce a internal __flush_work() which skip the lockdep.

tj: Minor comment adjustment.

Signed-off-by: Lai Jiangshan <[email protected]>
Reported-by: "Srivatsa S. Bhat" <[email protected]>
Reported-by: Alexander Duyck <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..55f5f0a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2817,6 +2817,19 @@ already_gone:
return false;
}

+static bool __flush_work(struct work_struct *work)
+{
+ struct wq_barrier barr;
+
+ if (start_flush_work(work, &barr)) {
+ wait_for_completion(&barr.done);
+ destroy_work_on_stack(&barr.work);
+ return true;
+ } else {
+ return false;
+ }
+}
+
/**
* flush_work - wait for a work to finish executing the last queueing instance
* @work: the work to flush
@@ -2830,18 +2843,10 @@ already_gone:
*/
bool flush_work(struct work_struct *work)
{
- struct wq_barrier barr;
-
lock_map_acquire(&work->lockdep_map);
lock_map_release(&work->lockdep_map);

- if (start_flush_work(work, &barr)) {
- wait_for_completion(&barr.done);
- destroy_work_on_stack(&barr.work);
- return true;
- } else {
- return false;
- }
+ return __flush_work(work);
}
EXPORT_SYMBOL_GPL(flush_work);

@@ -4756,7 +4761,14 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)

INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
schedule_work_on(cpu, &wfc.work);
- flush_work(&wfc.work);
+
+ /*
+ * The work item is on-stack and can't lead to deadlock through
+ * flushing. Use __flush_work() to avoid spurious lockdep warnings
+ * when work_on_cpu()s are nested.
+ */
+ __flush_work(&wfc.work);
+
return wfc.ret;
}
EXPORT_SYMBOL_GPL(work_on_cpu);
--
1.8.3.1

2013-07-27 17:15:30

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] workqueue: allow work_on_cpu() to be called recursively

On 07/24/2013 09:55 PM, Tejun Heo wrote:
> Applied to wq/for-3.11-fixes with comment and subject tweaks.
>
> Thanks!
>
> ---------- 8< ------------
>
> From c2fda509667b0fda4372a237f5a59ea4570b1627 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <[email protected]>
> Date: Wed, 24 Jul 2013 18:31:42 +0800
>
> If the @fn call work_on_cpu() again, the lockdep will complain:
>
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>> ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock((&wfc.work));
>> lock((&wfc.work));
>>
>> *** DEADLOCK ***
>
> It is false-positive lockdep report. In this sutiation,
> the two "wfc"s of the two work_on_cpu() are different,
> they are both on stack. flush_work() can't be deadlock.
>
> To fix this, we need to avoid the lockdep checking in this case,
> thus we instroduce a internal __flush_work() which skip the lockdep.
>
> tj: Minor comment adjustment.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> Reported-by: "Srivatsa S. Bhat" <[email protected]>
> Reported-by: Alexander Duyck <[email protected]>
> Signed-off-by: Tejun Heo <[email protected]>
> ---

This version works as well, it fixes the issue I was facing.
Thank you!

FWIW:
Tested-by: Srivatsa S. Bhat <[email protected]>

Regards,
Srivatsa S. Bhat

> kernel/workqueue.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..55f5f0a 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2817,6 +2817,19 @@ already_gone:
> return false;
> }
>
> +static bool __flush_work(struct work_struct *work)
> +{
> + struct wq_barrier barr;
> +
> + if (start_flush_work(work, &barr)) {
> + wait_for_completion(&barr.done);
> + destroy_work_on_stack(&barr.work);
> + return true;
> + } else {
> + return false;
> + }
> +}
> +
> /**
> * flush_work - wait for a work to finish executing the last queueing instance
> * @work: the work to flush
> @@ -2830,18 +2843,10 @@ already_gone:
> */
> bool flush_work(struct work_struct *work)
> {
> - struct wq_barrier barr;
> -
> lock_map_acquire(&work->lockdep_map);
> lock_map_release(&work->lockdep_map);
>
> - if (start_flush_work(work, &barr)) {
> - wait_for_completion(&barr.done);
> - destroy_work_on_stack(&barr.work);
> - return true;
> - } else {
> - return false;
> - }
> + return __flush_work(work);
> }
> EXPORT_SYMBOL_GPL(flush_work);
>
> @@ -4756,7 +4761,14 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>
> INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> schedule_work_on(cpu, &wfc.work);
> - flush_work(&wfc.work);
> +
> + /*
> + * The work item is on-stack and can't lead to deadlock through
> + * flushing. Use __flush_work() to avoid spurious lockdep warnings
> + * when work_on_cpu()s are nested.
> + */
> + __flush_work(&wfc.work);
> +
> return wfc.ret;
> }
> EXPORT_SYMBOL_GPL(work_on_cpu);
>