by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH v5 09/13] PCI: Introduce /sys/bus/pci/devices/.../remove

On Tue, 2009-03-24 at 03:46 -0700, Andrew Morton wrote:

> But I don't think we've seen a coherent description of what's actually
> _wrong_ with the current code. flush_cpu_workqueue() has been handling
> this case for many years with no problems reported as far as I know.
>
> So what has caused this sudden flurry of reports? Did something change in
> lockdep? What is this
>
> [ 537.380128] (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
> [ 537.380128]
> [ 537.380128] but task is already holding lock:
> [ 537.380128] (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>
> supposed to mean? "events" isn't a lock - it's the name of a kernel
> thread, isn't it? If this is supposed to be deadlockable then how?

events is indeed the schedule_work workqueue thread name -- I just used
that for lack of a better name.

> Because I don't immediately see what's wrong with e1000_remove() calling
> flush_work(). It's undesirable, and we can perhaps improve it via some
> means, but where is the bug?

There is no bug -- it's a false positive in a way. I've pointed this out
in the original thread, see
http://thread.gmane.org/gmane.linux.kernel/550877/focus=550932

johannes

Attachments:

signature.asc (836.00 B)
This is a digitally signed message part

2009-03-24 13:22:44

On Tue, 2009-03-24 at 11:23 -0600, Alex Chiang wrote:

> > There is no bug -- it's a false positive in a way. I've pointed this out
> > in the original thread, see
> > http://thread.gmane.org/gmane.linux.kernel/550877/focus=550932
>
> I'm actually a bit confused now.

Sorry.

> Peter explained why flushing a workqueue from the same queue is
> bad, and in general I agree, but what do you mean by "false
> positive"?

Well, even though generally flushing it from within is bad, the actual
thing lockdep reports is bogus -- it's reporting a nested locking.

> By the way, this scenario:
>
> code path 1:
> my_function() -> lock(L1); ...; flush_workqueue(); ...
>
> code path 2:
> run_workqueue() -> my_work() -> ...; lock(L1); ...
>
> is _not_ what is happening here.

Indeed.

> So what you really have going on is:
>
> sysfs callback -> add remove callback to global workqueue
> remove callback fires off (pci_remove_bus_device) and we do...
> device_unregister
> driver's ->remove method called
> driver's ->remove method calls flush_scheduled_work
>
> Yes, after read the thread I agree that generically calling
> flush_workqueue in the middle of run_workqueue is bad, but the
> lockdep warning that Kenji showed us really won't deadlock.

Exactly that is what I meant by "false positive".

> This is because pci_remove_bus_device() will not acquire any lock
> L1 that an individual device driver will attempt to acquire in
> the remove path. If that were the case, we would deadlock every
> time you rmmod'ed a device driver's module or every time you shut
> your machine down.
>
> I think from my end, there are 2 things I need to do:
>
> a) make sysfs_schedule_callback() use its own work queue
> instead of global work queue, because too many drivers
> call flush_scheduled_work in their remove path
>
> b) give sysfs attributes the ability to commit suicide
>
> (a) is short term work, 2.6.30 timeframe, since it doesn't
> involve any large conceptual changes.
>
> (b) is picking up Tejun Heo's existing work, but that was a bit
> controversial last time, and I'm not sure it will make it during
> this merge window.
>
> Question for the lockdep folks though -- given what I described,
> do you agree that the warning we saw was a false positive? Or am
> I off in left field?

I think we're not sure yet -- it seems Lai Jiangshan described a
scenario in which flushing from within the work actually _can_ deadlock.

johannes

Attachments:

signature.asc (836.00 B)
This is a digitally signed message part

2009-03-25 05:06:49

by Kenji Kaneshige

[permalink] [raw]

Subject: Re: [PATCH v5 09/13] PCI: Introduce /sys/bus/pci/devices/.../remove

Alex Chiang wrote:
> * Kenji Kaneshige <[email protected]>:
>> I still have the following kernel error messages in testing with your
>> latest set of patches (Jesse's linux-next). The test case is removing
>> e1000e device or its parent bridge by "echo 1 > /sys/bus/pci/devices/
>> .../remove".
>>
>> [ 537.379995] =============================================
>> [ 537.380124] [ INFO: possible recursive locking detected ]
>> [ 537.380128] 2.6.29-rc8-kk #1
>> [ 537.380128] ---------------------------------------------
>> [ 537.380128] events/4/56 is trying to acquire lock:
>> [ 537.380128] (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
>> [ 537.380128]
>> [ 537.380128] but task is already holding lock:
>> [ 537.380128] (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>> [ 537.380128]
>> [ 537.380128] other info that might help us debug this:
>> [ 537.380128] 3 locks held by events/4/56:
>> [ 537.380128] #0: (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>> [ 537.380128] #1: (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>> [ 537.380128] #2: (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
>
> I still cannot reproduce this lockdep issue, even using your
> .config with an e1000e device on an x86_64 kernel. :(
>
> I tried removing the endpoint, an intermediate bridge device, and
> the parent bus. I don't know what I'm doing wrong...
>

I don't know either...
The reproducibility is 100% on my environment. The steps are
just boot the system and remove the device.

> Can you please try this patch though, and see if it fixes the
> warning? It applies on top of my other sysfs patch that
> introduces a mutex in sysfs_schedule_callback.

Anyway, I confirmed the kernel error messages were gone with
the patch against sysfs. Note that I used the following patch
I made for testing instead since your patch could not be
applied to Jesse's linux-next.

Thanks,
Kenji Kaneshige

fs/sysfs/file.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

Index: linux-next-20090323/fs/sysfs/file.c
===================================================================
--- linux-next-20090323.orig/fs/sysfs/file.c 2009-03-25 12:09:37.000000000 +0900
+++ linux-next-20090323/fs/sysfs/file.c 2009-03-25 13:40:10.000000000 +0900
@@ -677,6 +677,7 @@
kfree(ss);
}

+static struct workqueue_struct *sysfsd_wq;
/**
* sysfs_schedule_callback - helper to schedule a callback for a kobject
* @kobj: object we're acting for.
@@ -704,6 +705,17 @@

if (!try_module_get(owner))
return -ENODEV;
+
+ if (!sysfsd_wq) {
+ sysfsd_wq = create_workqueue("sysfsd");
+ if (!sysfsd_wq) {
+ printk(KERN_ERR
+ "%s: Could not create workqueue\n", __func__);
+ WARN_ON(1);
+ return -ENOMEM;
+ }
+ }
+
ss = kmalloc(sizeof(*ss), GFP_KERNEL);
if (!ss) {
module_put(owner);
@@ -715,7 +727,7 @@
ss->data = data;
ss->owner = owner;
INIT_WORK(&ss->work, sysfs_schedule_callback_work);
- schedule_work(&ss->work);
+ queue_work(sysfsd_wq, &ss->work);
return 0;
}
EXPORT_SYMBOL_GPL(sysfs_schedule_callback);

2009-03-25 05:22:30

by Alex Chiang

[permalink] [raw]

Subject: Re: [PATCH v5 09/13] PCI: Introduce /sys/bus/pci/devices/.../remove

* Kenji Kaneshige <[email protected]>:
> Alex Chiang wrote:
> > * Kenji Kaneshige <[email protected]>:
> >> I still have the following kernel error messages in testing with your
> >> latest set of patches (Jesse's linux-next). The test case is removing
> >> e1000e device or its parent bridge by "echo 1 > /sys/bus/pci/devices/
> >> .../remove".
> >>
> >> [ 537.379995] =============================================
> >> [ 537.380124] [ INFO: possible recursive locking detected ]
> >> [ 537.380128] 2.6.29-rc8-kk #1
> >> [ 537.380128] ---------------------------------------------
> >> [ 537.380128] events/4/56 is trying to acquire lock:
> >> [ 537.380128] (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
> >> [ 537.380128]
> >> [ 537.380128] but task is already holding lock:
> >> [ 537.380128] (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> >> [ 537.380128]
> >> [ 537.380128] other info that might help us debug this:
> >> [ 537.380128] 3 locks held by events/4/56:
> >> [ 537.380128] #0: (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> >> [ 537.380128] #1: (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> >> [ 537.380128] #2: (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
> >
> > I still cannot reproduce this lockdep issue, even using your
> > .config with an e1000e device on an x86_64 kernel. :(
> >
> > I tried removing the endpoint, an intermediate bridge device, and
> > the parent bus. I don't know what I'm doing wrong...
> >
>
> I don't know either...
> The reproducibility is 100% on my environment. The steps are
> just boot the system and remove the device.
>
> > Can you please try this patch though, and see if it fixes the
> > warning? It applies on top of my other sysfs patch that
> > introduces a mutex in sysfs_schedule_callback.
>
> Anyway, I confirmed the kernel error messages were gone with
> the patch against sysfs. Note that I used the following patch
> I made for testing instead since your patch could not be
> applied to Jesse's linux-next.

Great, thank you for testing Kenji-san.

/ac

2009-03-25 05:39:24

by Kenji Kaneshige

[permalink] [raw]

Subject: Re: [PATCH v5 09/13] PCI: Introduce /sys/bus/pci/devices/.../remove

Alex Chiang wrote:
> * Kenji Kaneshige <[email protected]>:
>> Alex Chiang wrote:
>>> * Kenji Kaneshige <[email protected]>:
>>>> I still have the following kernel error messages in testing with your
>>>> latest set of patches (Jesse's linux-next). The test case is removing
>>>> e1000e device or its parent bridge by "echo 1 > /sys/bus/pci/devices/
>>>> .../remove".
>>>>
>>>> [ 537.379995] =============================================
>>>> [ 537.380124] [ INFO: possible recursive locking detected ]
>>>> [ 537.380128] 2.6.29-rc8-kk #1
>>>> [ 537.380128] ---------------------------------------------
>>>> [ 537.380128] events/4/56 is trying to acquire lock:
>>>> [ 537.380128] (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
>>>> [ 537.380128]
>>>> [ 537.380128] but task is already holding lock:
>>>> [ 537.380128] (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>>>> [ 537.380128]
>>>> [ 537.380128] other info that might help us debug this:
>>>> [ 537.380128] 3 locks held by events/4/56:
>>>> [ 537.380128] #0: (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>>>> [ 537.380128] #1: (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>>>> [ 537.380128] #2: (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
>>> I still cannot reproduce this lockdep issue, even using your
>>> .config with an e1000e device on an x86_64 kernel. :(
>>>
>>> I tried removing the endpoint, an intermediate bridge device, and
>>> the parent bus. I don't know what I'm doing wrong...
>>>
>> I don't know either...
>> The reproducibility is 100% on my environment. The steps are
>> just boot the system and remove the device.
>>
>>> Can you please try this patch though, and see if it fixes the
>>> warning? It applies on top of my other sysfs patch that
>>> introduces a mutex in sysfs_schedule_callback.
>> Anyway, I confirmed the kernel error messages were gone with
>> the patch against sysfs. Note that I used the following patch
>> I made for testing instead since your patch could not be
>> applied to Jesse's linux-next.
>
> Great, thank you for testing Kenji-san.
>

You're welcome.

Just in case, my patch is just for testing, and it is very buggy
(no destroy operation, lack of module_put() in error code path,
and so on). Please consider it as just for testing.

Thanks,
Kenji Kaneshige