2012-10-24 06:07:52

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 0/2] ACPI: container hot remove support.

Hi,

The container hotplug handler container_notify_cb() didn't implement
the hot-remove functionality. So, these 2 patches implement it like
the following way:

patch 1. Do not use kacpid_wq/kacpid_notify_wq to handle container hotplug event,
use kacpi_hotplug_wq instead to avoid deadlock.
Doing this is to reuse acpi_bus_hot_remove_device() in container
hot-remove handling.

patch 2. Introduce a new function container_device_remove() to handle
ACPI_NOTIFY_EJECT_REQUEST event for container.


change log v1 -> v2:

1. In patch1: Based on the lastest for-pci-split-pci-root-hp-2 branch from Lu Yinghai,
use alloc_acpi_hp_work() to add container hotplug work into kacpi_hotplug_wq.

2. In patch2: Allocate ej_event after container is stopped, so that we don't need to
kfree the ej_event if stopping container failed.


This is based on Lu Yinghai's job.
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-split-pci-root-hp-2

Tang Chen (2):
Use kacpi_hotplug_wq to handle container hotplug event.
Improve container_notify_cb() to support container hot-remove.

drivers/acpi/container.c | 75 +++++++++++++++++++++++++++++++++++++++------
1 files changed, 65 insertions(+), 10 deletions(-)


2012-10-24 06:07:10

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 1/2] Use kacpi_hotplug_wq to handle container hotplug event.

As the comments in __acpi_os_execute() said:

We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
because the hotplug code may call driver .remove() functions,
which invoke flush_scheduled_work/acpi_os_wait_events_complete
to flush these workqueues.

we should keep the hotplug code in kacpi_hotplug_wq.

But we have the following call series in kernel now:
acpi_ev_queue_notify_request()
|--> acpi_os_execute()
|--> __acpi_os_execute(type, function, context, 0)

The last parameter 0 makes the container_notify_cb() executed in
kacpi_notify_wq or kacpid_wq. So, we need to put the real hotplug code
into kacpi_hotplug_wq.

Signed-off-by: Tang Chen <[email protected]>
---
drivers/acpi/container.c | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
index 69e2d6b..d300e03 100644
--- a/drivers/acpi/container.c
+++ b/drivers/acpi/container.c
@@ -35,6 +35,7 @@
#include <acpi/acpi_bus.h>
#include <acpi/acpi_drivers.h>
#include <acpi/container.h>
+#include <acpi/acpiosxf.h>

#define PREFIX "ACPI: "

@@ -165,14 +166,21 @@ static int container_device_add(struct acpi_device **device, acpi_handle handle)
return result;
}

-static void container_notify_cb(acpi_handle handle, u32 type, void *context)
+static void __container_notify_cb(struct work_struct *work)
{
struct acpi_device *device = NULL;
int result;
int present;
acpi_status status;
+ struct acpi_hp_work *hp_work;
+ acpi_handle handle;
+ u32 type;
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */

+ hp_work = container_of(work, struct acpi_hp_work, work);
+ handle = hp_work->handle;
+ type = hp_work->type;
+
switch (type) {
case ACPI_NOTIFY_BUS_CHECK:
/* Fall through */
@@ -224,6 +232,13 @@ static void container_notify_cb(acpi_handle handle, u32 type, void *context)
return;
}

+static void container_notify_cb(acpi_handle handle, u32 type,
+ void *context)
+{
+ alloc_acpi_hp_work(handle, type, context,
+ __container_notify_cb);
+}
+
static acpi_status
container_walk_namespace_cb(acpi_handle handle,
u32 lvl, void *context, void **rv)
--
1.7.1

2012-10-24 06:07:14

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

This patch introduces a new function container_device_remove() to do
the container hot-remove job. It works like the following:

1. call acpi_bus_trim(device, 0) to stop the container device.
2. generate the KOBJ_OFFLINE uevent. (Did I do this correct ?)
3. call acpi_bus_hot_remove_device(), which will call acpi_bus_trim(device, 1)
to remove the container.

This patch is based on Lu Yinghai's work.
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-split-pci-root-hp-2

Signed-off-by: Tang Chen <[email protected]>
---
drivers/acpi/container.c | 58 ++++++++++++++++++++++++++++++++++++++-------
1 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
index d300e03..9676578 100644
--- a/drivers/acpi/container.c
+++ b/drivers/acpi/container.c
@@ -166,6 +166,35 @@ static int container_device_add(struct acpi_device **device, acpi_handle handle)
return result;
}

+static int container_device_remove(struct acpi_device *device)
+{
+ int ret;
+ struct acpi_eject_event *ej_event;
+
+ /* stop container device at first */
+ ret = acpi_bus_trim(device, 0);
+ printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);
+ if (ret)
+ return ret;
+
+ /* event originated from ACPI eject notification */
+ device->flags.eject_pending = 1;
+
+ /* send the uevent before remove the device */
+ kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
+
+ ej_event = kmalloc(sizeof(*ej_event), GFP_KERNEL);
+ if (!ej_event)
+ return -ENOMEM;
+
+ ej_event->device = device;
+ ej_event->event = ACPI_NOTIFY_EJECT_REQUEST;
+
+ acpi_bus_hot_remove_device(ej_event);
+
+ return 0;
+}
+
static void __container_notify_cb(struct work_struct *work)
{
struct acpi_device *device = NULL;
@@ -181,6 +210,9 @@ static void __container_notify_cb(struct work_struct *work)
handle = hp_work->handle;
type = hp_work->type;

+ present = is_device_present(handle);
+ status = acpi_bus_get_device(handle, &device);
+
switch (type) {
case ACPI_NOTIFY_BUS_CHECK:
/* Fall through */
@@ -189,13 +221,14 @@ static void __container_notify_cb(struct work_struct *work)
(type == ACPI_NOTIFY_BUS_CHECK) ?
"ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK");

- present = is_device_present(handle);
- status = acpi_bus_get_device(handle, &device);
if (!present) {
if (ACPI_SUCCESS(status)) {
/* device exist and this is a remove request */
- device->flags.eject_pending = 1;
- kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
+ result = container_device_remove(device);
+ if (result) {
+ printk(KERN_WARNING "Failed to remove container\n");
+ break;
+ }
return;
}
break;
@@ -215,12 +248,19 @@ static void __container_notify_cb(struct work_struct *work)
break;

case ACPI_NOTIFY_EJECT_REQUEST:
- if (!acpi_bus_get_device(handle, &device) && device) {
- device->flags.eject_pending = 1;
- kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
- return;
+ printk(KERN_WARNING "Container driver received %s event\n",
+ "ACPI_NOTIFY_EJECT_REQUEST");
+
+ if (!present || ACPI_FAILURE(status) || !device)
+ break;
+
+ result = container_device_remove(device);
+ if (result) {
+ printk(KERN_WARNING "Failed to remove container\n");
+ break;
}
- break;
+
+ return;

default:
/* non-hotplug event; possibly handled by other handler */
--
1.7.1

2012-10-24 06:55:20

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] Use kacpi_hotplug_wq to handle container hotplug event.

Hi Tang,

2012/10/24 15:05, Tang Chen wrote:
> As the comments in __acpi_os_execute() said:
>
> We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
> because the hotplug code may call driver .remove() functions,
> which invoke flush_scheduled_work/acpi_os_wait_events_complete
> to flush these workqueues.
>
> we should keep the hotplug code in kacpi_hotplug_wq.
>
> But we have the following call series in kernel now:
> acpi_ev_queue_notify_request()
> |--> acpi_os_execute()
> |--> __acpi_os_execute(type, function, context, 0)
>
> The last parameter 0 makes the container_notify_cb() executed in
> kacpi_notify_wq or kacpid_wq. So, we need to put the real hotplug code
> into kacpi_hotplug_wq.

I cannot understand the purpose of the patch.
Is the patch a bug fix patch? If yes, what problem happens?

Thanks,
Yasuaki Ishimatsu

>
> Signed-off-by: Tang Chen <[email protected]>
> ---
> drivers/acpi/container.c | 17 ++++++++++++++++-
> 1 files changed, 16 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
> index 69e2d6b..d300e03 100644
> --- a/drivers/acpi/container.c
> +++ b/drivers/acpi/container.c
> @@ -35,6 +35,7 @@
> #include <acpi/acpi_bus.h>
> #include <acpi/acpi_drivers.h>
> #include <acpi/container.h>
> +#include <acpi/acpiosxf.h>
>
> #define PREFIX "ACPI: "
>
> @@ -165,14 +166,21 @@ static int container_device_add(struct acpi_device **device, acpi_handle handle)
> return result;
> }
>
> -static void container_notify_cb(acpi_handle handle, u32 type, void *context)
> +static void __container_notify_cb(struct work_struct *work)
> {
> struct acpi_device *device = NULL;
> int result;
> int present;
> acpi_status status;
> + struct acpi_hp_work *hp_work;
> + acpi_handle handle;
> + u32 type;
> u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */
>
> + hp_work = container_of(work, struct acpi_hp_work, work);
> + handle = hp_work->handle;
> + type = hp_work->type;
> +
> switch (type) {
> case ACPI_NOTIFY_BUS_CHECK:
> /* Fall through */
> @@ -224,6 +232,13 @@ static void container_notify_cb(acpi_handle handle, u32 type, void *context)
> return;
> }
>
> +static void container_notify_cb(acpi_handle handle, u32 type,
> + void *context)
> +{
> + alloc_acpi_hp_work(handle, type, context,
> + __container_notify_cb);
> +}
> +
> static acpi_status
> container_walk_namespace_cb(acpi_handle handle,
> u32 lvl, void *context, void **rv)
>

2012-10-24 07:44:14

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] Use kacpi_hotplug_wq to handle container hotplug event.

Hi Ishimatsu-san:

By the way, if you want to reproduce this problem, just
modify my patch1 to call __container_notify_cb() directly
in container_notify_cb(). And apply my patch2.

Then, you add a container, and remove it.
The deadlock will be triggered.

And this patch is based on Lu Yinghai's tree:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-pci-split-pci-root-hp-2


On 10/24/2012 02:54 PM, Yasuaki Ishimatsu wrote:
> Hi Tang,
>
> 2012/10/24 15:05, Tang Chen wrote:
>> As the comments in __acpi_os_execute() said:
>>
>> We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
>> because the hotplug code may call driver .remove() functions,
>> which invoke flush_scheduled_work/acpi_os_wait_events_complete
>> to flush these workqueues.
>>
>> we should keep the hotplug code in kacpi_hotplug_wq.
>>
>> But we have the following call series in kernel now:
>> acpi_ev_queue_notify_request()
>> |--> acpi_os_execute()
>> |--> __acpi_os_execute(type, function, context, 0)
>>
>> The last parameter 0 makes the container_notify_cb() executed in
>> kacpi_notify_wq or kacpid_wq. So, we need to put the real hotplug code
>> into kacpi_hotplug_wq.
>
> I cannot understand the purpose of the patch.
> Is the patch a bug fix patch? If yes, what problem happens?
>
> Thanks,
> Yasuaki Ishimatsu
>
>>
>> Signed-off-by: Tang Chen<[email protected]>
>> ---
>> drivers/acpi/container.c | 17 ++++++++++++++++-
>> 1 files changed, 16 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
>> index 69e2d6b..d300e03 100644
>> --- a/drivers/acpi/container.c
>> +++ b/drivers/acpi/container.c
>> @@ -35,6 +35,7 @@
>> #include<acpi/acpi_bus.h>
>> #include<acpi/acpi_drivers.h>
>> #include<acpi/container.h>
>> +#include<acpi/acpiosxf.h>
>>
>> #define PREFIX "ACPI: "
>>
>> @@ -165,14 +166,21 @@ static int container_device_add(struct acpi_device **device, acpi_handle handle)
>> return result;
>> }
>>
>> -static void container_notify_cb(acpi_handle handle, u32 type, void *context)
>> +static void __container_notify_cb(struct work_struct *work)
>> {
>> struct acpi_device *device = NULL;
>> int result;
>> int present;
>> acpi_status status;
>> + struct acpi_hp_work *hp_work;
>> + acpi_handle handle;
>> + u32 type;
>> u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */
>>
>> + hp_work = container_of(work, struct acpi_hp_work, work);
>> + handle = hp_work->handle;
>> + type = hp_work->type;
>> +
>> switch (type) {
>> case ACPI_NOTIFY_BUS_CHECK:
>> /* Fall through */
>> @@ -224,6 +232,13 @@ static void container_notify_cb(acpi_handle handle, u32 type, void *context)
>> return;
>> }
>>
>> +static void container_notify_cb(acpi_handle handle, u32 type,
>> + void *context)
>> +{
>> + alloc_acpi_hp_work(handle, type, context,
>> + __container_notify_cb);
>> +}
>> +
>> static acpi_status
>> container_walk_namespace_cb(acpi_handle handle,
>> u32 lvl, void *context, void **rv)
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-24 07:48:06

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] Use kacpi_hotplug_wq to handle container hotplug event.

On 10/24/2012 02:54 PM, Yasuaki Ishimatsu wrote:
> Hi Tang,
>
> 2012/10/24 15:05, Tang Chen wrote:
>> As the comments in __acpi_os_execute() said:
>>
>> We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
>> because the hotplug code may call driver .remove() functions,
>> which invoke flush_scheduled_work/acpi_os_wait_events_complete
>> to flush these workqueues.
>>
>> we should keep the hotplug code in kacpi_hotplug_wq.
>>
>> But we have the following call series in kernel now:
>> acpi_ev_queue_notify_request()
>> |--> acpi_os_execute()
>> |--> __acpi_os_execute(type, function, context, 0)
>>
>> The last parameter 0 makes the container_notify_cb() executed in
>> kacpi_notify_wq or kacpid_wq. So, we need to put the real hotplug code
>> into kacpi_hotplug_wq.
>
> I cannot understand the purpose of the patch.
> Is the patch a bug fix patch? If yes, what problem happens?

Hi Yasuaki-san,

Actually, it is a problem. But container hot-remove was not implemented
in container_notify_cb(), so this problem would never be triggered. So I
cannot say it is a bug in kernel.

The problem is here:

acpi_pci_root_remove() will finally call acpi_os_wait_events_complete():

void acpi_os_wait_events_complete(void)
{
flush_workqueue(kacpid_wq);
flush_workqueue(kacpi_notify_wq);
}

which means it will flush kacpid_wq and kacpi_notify_wq. So the current
work should not be in these 2 workqueue, otherwise it will cause
deadlock: the worker will wait for itself to complete.

But unfortunately, in the beginning, we have:

acpi_ev_queue_notify_request()
|--> acpi_os_execute()
|--> __acpi_os_execute(type, function, context, 0)

Please refer to the code, you will see the last parameter 0 will make
the hotplug call serial in kacpid_wq or kacpi_notify_wq. And it is hard
coded in kernel. I don't know why and I don't how to fix it.

So I made this patch, and want to see what you guys think about it. :)


The deadlock call trace is like below:


[ 302.383606] =============================================
[ 302.448094] [ INFO: possible recursive locking detected ]
[ 302.512578] 3.6.0-rc5-luyh-hostbridge-hotplug+ #13 Not tainted
[ 302.582252] ---------------------------------------------
[ 302.646736] kworker/0:2/1412 is trying to acquire lock:
[ 302.709143] (kacpi_notify){++++.+}, at: [<ffffffff81091300>]
flush_workqueue+0x0/0x5c0
[ 302.805222]
[ 302.805222] but task is already holding lock:
[ 302.874898] (kacpi_notify){++++.+}, at: [<ffffffff81090528>]
process_one_work+0x1b8/0x680
[ 302.974083]
[ 302.974083] other info that might help us debug this:
[ 303.052067] Possible unsafe locking scenario:
[ 303.052067]
[ 303.122785] CPU0
[ 303.151965] ----
[ 303.181150] lock(kacpi_notify);
[ 303.220935] lock(kacpi_notify);
[ 303.260721]
[ 303.260721] *** DEADLOCK ***
[ 303.260721]
[ 303.331434] May be due to missing lock nesting notation
[ 303.331434]
[ 303.412529] 4 locks held by kworker/0:2/1412:
[ 303.464553] #0: (kacpi_notify){++++.+}, at: [<ffffffff81090528>]
process_one_work+0x1b8/0x680
[ 303.569042] #1: ((&dpc->work)#2){+.+.+.}, at: [<ffffffff81090528>]
process_one_work+0x1b8/0x680
[ 303.675718] #2: (&__lockdep_no_validate__){......}, at:
[<ffffffff8143cca7>] device_release_driver+0x27/0x50
[ 303.795782] #3: (pci_acpi_pm_notify_mtx){+.+.+.}, at:
[<ffffffff81385443>] remove_pm_notifier+0x33/0x90
[ 303.910662]
[ 303.910662] stack backtrace:
[ 303.962687] Pid: 1412, comm: kworker/0:2 Not tainted
3.6.0-rc5-luyh-hostbridge-hotplug+ #13
[ 304.062470] Call Trace:
[ 304.091666] [<ffffffff810da704>] print_deadlock_bug+0xf4/0x100
[ 304.162384] [<ffffffff810dc6a9>] validate_chain+0x549/0x7e0
[ 304.229987] [<ffffffff810dcc36>] __lock_acquire+0x2f6/0x4f0
[ 304.297587] [<ffffffff810dba65>] ? debug_check_no_locks_freed+0xa5/0xf0
[ 304.377650] [<ffffffff810dcecd>] lock_acquire+0x9d/0x190
[ 304.442141] [<ffffffff81091300>] ? flush_workqueue_prep_cwqs+0x260/0x260
[ 304.523242] [<ffffffff810d8759>] ? lockdep_init_map+0x59/0x150
[ 304.593963] [<ffffffff810914af>] flush_workqueue+0x1af/0x5c0
[ 304.662605] [<ffffffff81091300>] ? flush_workqueue_prep_cwqs+0x260/0x260
[ 304.743713] [<ffffffff810a6ab8>] ? complete+0x28/0x60
[ 304.805084] [<ffffffff810a6ab8>] ? complete+0x28/0x60
[ 304.866457] [<ffffffff810db925>] ? trace_hardirqs_on_caller+0x105/0x190
[ 304.946515] [<ffffffff810a6ab8>] ? complete+0x28/0x60
[ 305.007891] [<ffffffff81385443>] ? remove_pm_notifier+0x33/0x90
[ 305.079649] [<ffffffff813854e0>] ?
pci_acpi_remove_bus_pm_notifier+0x20/0x20
[ 305.164905] [<ffffffff813a340e>] acpi_os_wait_events_complete+0x21/0x23
[ 305.244970] [<ffffffff813b7b3c>] acpi_remove_notify_handler+0x47/0x183
[ 305.323994] [<ffffffff813854e0>] ?
pci_acpi_remove_bus_pm_notifier+0x20/0x20
[ 305.409251] [<ffffffff81385481>] remove_pm_notifier+0x71/0x90
[ 305.478931] [<ffffffff813854d5>]
pci_acpi_remove_bus_pm_notifier+0x15/0x20
[ 305.562111] [<ffffffff813aac25>] acpi_pci_root_remove+0x83/0xec
[ 305.633869] [<ffffffff813a69fc>] acpi_device_remove+0x90/0xb2
[ 305.703548] [<ffffffff8143cb2c>] __device_release_driver+0x7c/0xf0
[ 305.778422] [<ffffffff8143ccaf>] device_release_driver+0x2f/0x50
[ 305.851219] [<ffffffff813a79b5>] acpi_bus_remove+0x32/0x4d
[ 305.917785] [<ffffffff813a7a57>] acpi_bus_trim+0x87/0xee
[ 305.982276] [<ffffffff813d888c>] container_notify_cb+0x1c5/0x221
[ 306.055075] [<ffffffff813b6386>] acpi_ev_notify_dispatch+0x41/0x5f
[ 306.129951] [<ffffffff813a3437>] acpi_os_execute_deferred+0x27/0x34
[ 306.205861] [<ffffffff81090589>] process_one_work+0x219/0x680
[ 306.275543] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
[ 306.347299] [<ffffffff813a3410>] ?
acpi_os_wait_events_complete+0x23/0x23
[ 306.429436] [<ffffffff810923be>] worker_thread+0x12e/0x320
[ 306.496001] [<ffffffff81092290>] ? manage_workers+0x110/0x110
[ 306.565680] [<ffffffff81098396>] kthread+0xc6/0xd0
[ 306.623937] [<ffffffff8167c3c4>] kernel_thread_helper+0x4/0x10
[ 306.694656] [<ffffffff81671e30>] ? retint_restore_args+0x13/0x13
[ 306.767451] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
[ 306.842323] [<ffffffff8167c3c0>] ? gs_change+0x13/0x13



[ 519.588281] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[ 519.667375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 519.761130] kworker/0:0 D ffff8801bdcb7b60 5608 4 2
0x00000000
[ 519.846044] ffff8801bdcb7a50 0000000000000046 ffff8801bdcb7a50
ffff8801bdcb7fd8
[ 519.935363] ffff8801bdcb6000 ffff8801bdcb6010 ffff8801bdcb6000
ffff8801bdcb6000
[ 520.024650] ffff8801bdcb7fd8 ffff8801bdcb6000 ffffffff81c13420
ffff8801bde18000
[ 520.114003] Call Trace:
[ 520.143402] [<ffffffff8166ff49>] schedule+0x29/0x70
[ 520.202939] [<ffffffff8166dd55>] schedule_timeout+0x235/0x2d0
[ 520.272834] [<ffffffff810d9377>] ? __lock_acquired+0x347/0x380
[ 520.343789] [<ffffffff8166fd0f>] ? wait_for_common+0x4f/0x180
[ 520.413583] [<ffffffff8166fde3>] ? wait_for_common+0x123/0x180
[ 520.484526] [<ffffffff8166fdeb>] wait_for_common+0x12b/0x180
[ 520.553422] [<ffffffff810b0b60>] ? try_to_wake_up+0x2f0/0x2f0
[ 520.623342] [<ffffffff810db9bd>] ? trace_hardirqs_on+0xd/0x10
[ 520.693270] [<ffffffff8166ff1d>] wait_for_completion+0x1d/0x20
[ 520.764219] [<ffffffff81091587>] flush_workqueue+0x287/0x5c0
[ 520.833087] [<ffffffff81091300>] ? flush_workqueue_prep_cwqs+0x260/0x260
[ 520.914421] [<ffffffff813a340e>] acpi_os_wait_events_complete+0x21/0x23
[ 520.994730] [<ffffffff813a3430>] acpi_os_execute_deferred+0x20/0x34
[ 521.070882] [<ffffffff81090589>] process_one_work+0x219/0x680
[ 521.140790] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
[ 521.212780] [<ffffffff810922e9>] ? worker_thread+0x59/0x320
[ 521.280609] [<ffffffff813a3410>] ?
acpi_os_wait_events_complete+0x23/0x23
[ 521.362994] [<ffffffff810923be>] worker_thread+0x12e/0x320
[ 521.429815] [<ffffffff81092290>] ? manage_workers+0x110/0x110
[ 521.499739] [<ffffffff81098396>] kthread+0xc6/0xd0
[ 521.558261] [<ffffffff8167c3c4>] kernel_thread_helper+0x4/0x10
[ 521.629217] [<ffffffff81671e30>] ? retint_restore_args+0x13/0x13
[ 521.702220] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
[ 521.777303] [<ffffffff8167c3c0>] ? gs_change+0x13/0x13
[ 521.839913] INFO: lockdep is turned off.









2012-10-24 08:09:42

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] Use kacpi_hotplug_wq to handle container hotplug event.

Hi Tang,

2012/10/24 16:24, Tang Chen wrote:
> On 10/24/2012 02:54 PM, Yasuaki Ishimatsu wrote:
>> Hi Tang,
>>
>> 2012/10/24 15:05, Tang Chen wrote:
>>> As the comments in __acpi_os_execute() said:
>>>
>>> We can't run hotplug code in keventd_wq/kacpid_wq/kacpid_notify_wq
>>> because the hotplug code may call driver .remove() functions,
>>> which invoke flush_scheduled_work/acpi_os_wait_events_complete
>>> to flush these workqueues.
>>>
>>> we should keep the hotplug code in kacpi_hotplug_wq.
>>>
>>> But we have the following call series in kernel now:
>>> acpi_ev_queue_notify_request()
>>> |--> acpi_os_execute()
>>> |--> __acpi_os_execute(type, function, context, 0)
>>>
>>> The last parameter 0 makes the container_notify_cb() executed in
>>> kacpi_notify_wq or kacpid_wq. So, we need to put the real hotplug code
>>> into kacpi_hotplug_wq.
>>
>> I cannot understand the purpose of the patch.
>> Is the patch a bug fix patch? If yes, what problem happens?
>
> Hi Yasuaki-san,
>
> Actually, it is a problem. But container hot-remove was not implemented
> in container_notify_cb(), so this problem would never be triggered. So I
> cannot say it is a bug in kernel.
>
> The problem is here:
>
> acpi_pci_root_remove() will finally call acpi_os_wait_events_complete():
>
> void acpi_os_wait_events_complete(void)
> {
> flush_workqueue(kacpid_wq);
> flush_workqueue(kacpi_notify_wq);
> }
>
> which means it will flush kacpid_wq and kacpi_notify_wq. So the current
> work should not be in these 2 workqueue, otherwise it will cause
> deadlock: the worker will wait for itself to complete.
>
> But unfortunately, in the beginning, we have:
>
> acpi_ev_queue_notify_request()
> |--> acpi_os_execute()
> |--> __acpi_os_execute(type, function, context, 0)
>
> Please refer to the code, you will see the last parameter 0 will make
> the hotplug call serial in kacpid_wq or kacpi_notify_wq. And it is hard
> coded in kernel. I don't know why and I don't how to fix it.
>
> So I made this patch, and want to see what you guys think about it. :)
>
>
> The deadlock call trace is like below:
>
>
> [ 302.383606] =============================================
> [ 302.448094] [ INFO: possible recursive locking detected ]
> [ 302.512578] 3.6.0-rc5-luyh-hostbridge-hotplug+ #13 Not tainted
> [ 302.582252] ---------------------------------------------
> [ 302.646736] kworker/0:2/1412 is trying to acquire lock:
> [ 302.709143] (kacpi_notify){++++.+}, at: [<ffffffff81091300>]
> flush_workqueue+0x0/0x5c0
> [ 302.805222]
> [ 302.805222] but task is already holding lock:
> [ 302.874898] (kacpi_notify){++++.+}, at: [<ffffffff81090528>]
> process_one_work+0x1b8/0x680
> [ 302.974083]
> [ 302.974083] other info that might help us debug this:
> [ 303.052067] Possible unsafe locking scenario:
> [ 303.052067]
> [ 303.122785] CPU0
> [ 303.151965] ----
> [ 303.181150] lock(kacpi_notify);
> [ 303.220935] lock(kacpi_notify);
> [ 303.260721]
> [ 303.260721] *** DEADLOCK ***
> [ 303.260721]
> [ 303.331434] May be due to missing lock nesting notation
> [ 303.331434]
> [ 303.412529] 4 locks held by kworker/0:2/1412:
> [ 303.464553] #0: (kacpi_notify){++++.+}, at: [<ffffffff81090528>]
> process_one_work+0x1b8/0x680
> [ 303.569042] #1: ((&dpc->work)#2){+.+.+.}, at: [<ffffffff81090528>]
> process_one_work+0x1b8/0x680
> [ 303.675718] #2: (&__lockdep_no_validate__){......}, at:
> [<ffffffff8143cca7>] device_release_driver+0x27/0x50
> [ 303.795782] #3: (pci_acpi_pm_notify_mtx){+.+.+.}, at:
> [<ffffffff81385443>] remove_pm_notifier+0x33/0x90
> [ 303.910662]
> [ 303.910662] stack backtrace:
> [ 303.962687] Pid: 1412, comm: kworker/0:2 Not tainted
> 3.6.0-rc5-luyh-hostbridge-hotplug+ #13
> [ 304.062470] Call Trace:
> [ 304.091666] [<ffffffff810da704>] print_deadlock_bug+0xf4/0x100
> [ 304.162384] [<ffffffff810dc6a9>] validate_chain+0x549/0x7e0
> [ 304.229987] [<ffffffff810dcc36>] __lock_acquire+0x2f6/0x4f0
> [ 304.297587] [<ffffffff810dba65>] ? debug_check_no_locks_freed+0xa5/0xf0
> [ 304.377650] [<ffffffff810dcecd>] lock_acquire+0x9d/0x190
> [ 304.442141] [<ffffffff81091300>] ? flush_workqueue_prep_cwqs+0x260/0x260
> [ 304.523242] [<ffffffff810d8759>] ? lockdep_init_map+0x59/0x150
> [ 304.593963] [<ffffffff810914af>] flush_workqueue+0x1af/0x5c0
> [ 304.662605] [<ffffffff81091300>] ? flush_workqueue_prep_cwqs+0x260/0x260
> [ 304.743713] [<ffffffff810a6ab8>] ? complete+0x28/0x60
> [ 304.805084] [<ffffffff810a6ab8>] ? complete+0x28/0x60
> [ 304.866457] [<ffffffff810db925>] ? trace_hardirqs_on_caller+0x105/0x190
> [ 304.946515] [<ffffffff810a6ab8>] ? complete+0x28/0x60
> [ 305.007891] [<ffffffff81385443>] ? remove_pm_notifier+0x33/0x90
> [ 305.079649] [<ffffffff813854e0>] ?
> pci_acpi_remove_bus_pm_notifier+0x20/0x20
> [ 305.164905] [<ffffffff813a340e>] acpi_os_wait_events_complete+0x21/0x23
> [ 305.244970] [<ffffffff813b7b3c>] acpi_remove_notify_handler+0x47/0x183
> [ 305.323994] [<ffffffff813854e0>] ?
> pci_acpi_remove_bus_pm_notifier+0x20/0x20
> [ 305.409251] [<ffffffff81385481>] remove_pm_notifier+0x71/0x90
> [ 305.478931] [<ffffffff813854d5>]
> pci_acpi_remove_bus_pm_notifier+0x15/0x20
> [ 305.562111] [<ffffffff813aac25>] acpi_pci_root_remove+0x83/0xec
> [ 305.633869] [<ffffffff813a69fc>] acpi_device_remove+0x90/0xb2
> [ 305.703548] [<ffffffff8143cb2c>] __device_release_driver+0x7c/0xf0
> [ 305.778422] [<ffffffff8143ccaf>] device_release_driver+0x2f/0x50
> [ 305.851219] [<ffffffff813a79b5>] acpi_bus_remove+0x32/0x4d
> [ 305.917785] [<ffffffff813a7a57>] acpi_bus_trim+0x87/0xee
> [ 305.982276] [<ffffffff813d888c>] container_notify_cb+0x1c5/0x221
> [ 306.055075] [<ffffffff813b6386>] acpi_ev_notify_dispatch+0x41/0x5f
> [ 306.129951] [<ffffffff813a3437>] acpi_os_execute_deferred+0x27/0x34
> [ 306.205861] [<ffffffff81090589>] process_one_work+0x219/0x680
> [ 306.275543] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
> [ 306.347299] [<ffffffff813a3410>] ?
> acpi_os_wait_events_complete+0x23/0x23
> [ 306.429436] [<ffffffff810923be>] worker_thread+0x12e/0x320
> [ 306.496001] [<ffffffff81092290>] ? manage_workers+0x110/0x110
> [ 306.565680] [<ffffffff81098396>] kthread+0xc6/0xd0
> [ 306.623937] [<ffffffff8167c3c4>] kernel_thread_helper+0x4/0x10
> [ 306.694656] [<ffffffff81671e30>] ? retint_restore_args+0x13/0x13
> [ 306.767451] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
> [ 306.842323] [<ffffffff8167c3c0>] ? gs_change+0x13/0x13
>
>
>
> [ 519.588281] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
> [ 519.667375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 519.761130] kworker/0:0 D ffff8801bdcb7b60 5608 4 2
> 0x00000000
> [ 519.846044] ffff8801bdcb7a50 0000000000000046 ffff8801bdcb7a50
> ffff8801bdcb7fd8
> [ 519.935363] ffff8801bdcb6000 ffff8801bdcb6010 ffff8801bdcb6000
> ffff8801bdcb6000
> [ 520.024650] ffff8801bdcb7fd8 ffff8801bdcb6000 ffffffff81c13420
> ffff8801bde18000
> [ 520.114003] Call Trace:
> [ 520.143402] [<ffffffff8166ff49>] schedule+0x29/0x70
> [ 520.202939] [<ffffffff8166dd55>] schedule_timeout+0x235/0x2d0
> [ 520.272834] [<ffffffff810d9377>] ? __lock_acquired+0x347/0x380
> [ 520.343789] [<ffffffff8166fd0f>] ? wait_for_common+0x4f/0x180
> [ 520.413583] [<ffffffff8166fde3>] ? wait_for_common+0x123/0x180
> [ 520.484526] [<ffffffff8166fdeb>] wait_for_common+0x12b/0x180
> [ 520.553422] [<ffffffff810b0b60>] ? try_to_wake_up+0x2f0/0x2f0
> [ 520.623342] [<ffffffff810db9bd>] ? trace_hardirqs_on+0xd/0x10
> [ 520.693270] [<ffffffff8166ff1d>] wait_for_completion+0x1d/0x20
> [ 520.764219] [<ffffffff81091587>] flush_workqueue+0x287/0x5c0
> [ 520.833087] [<ffffffff81091300>] ? flush_workqueue_prep_cwqs+0x260/0x260
> [ 520.914421] [<ffffffff813a340e>] acpi_os_wait_events_complete+0x21/0x23
> [ 520.994730] [<ffffffff813a3430>] acpi_os_execute_deferred+0x20/0x34
> [ 521.070882] [<ffffffff81090589>] process_one_work+0x219/0x680
> [ 521.140790] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
> [ 521.212780] [<ffffffff810922e9>] ? worker_thread+0x59/0x320
> [ 521.280609] [<ffffffff813a3410>] ?
> acpi_os_wait_events_complete+0x23/0x23
> [ 521.362994] [<ffffffff810923be>] worker_thread+0x12e/0x320
> [ 521.429815] [<ffffffff81092290>] ? manage_workers+0x110/0x110
> [ 521.499739] [<ffffffff81098396>] kthread+0xc6/0xd0
> [ 521.558261] [<ffffffff8167c3c4>] kernel_thread_helper+0x4/0x10
> [ 521.629217] [<ffffffff81671e30>] ? retint_restore_args+0x13/0x13
> [ 521.702220] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
> [ 521.777303] [<ffffffff8167c3c0>] ? gs_change+0x13/0x13
> [ 521.839913] INFO: lockdep is turned off.
>

Thank you for your explanation.
I understang the purpose of the patch.

Thanks,
Yasuaki Ishimatsu

2012-10-24 17:22:11

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:
> This patch introduces a new function container_device_remove() to do
> the container hot-remove job. It works like the following:
>
> 1. call acpi_bus_trim(device, 0) to stop the container device.
> 2. generate the KOBJ_OFFLINE uevent. (Did I do this correct ?)
> 3. call acpi_bus_hot_remove_device(), which will call acpi_bus_trim(device, 1)
> to remove the container.
>
> This patch is based on Lu Yinghai's work.
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-split-pci-root-hp-2
>
> Signed-off-by: Tang Chen <[email protected]>
> ---
> drivers/acpi/container.c | 58 ++++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 49 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
> index d300e03..9676578 100644
> --- a/drivers/acpi/container.c
> +++ b/drivers/acpi/container.c
> @@ -166,6 +166,35 @@ static int container_device_add(struct acpi_device **device, acpi_handle handle)
> return result;
> }
>
> +static int container_device_remove(struct acpi_device *device)
> +{
> + int ret;
> + struct acpi_eject_event *ej_event;
> +
> + /* stop container device at first */
> + ret = acpi_bus_trim(device, 0);

Hi Tang,

Why do you need to call acpi_bus_trim(device,0) to stop the container
device first?

> + printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);

Do you need this message in the normal case? If so, I'd suggest to use
pr_debug().

> + if (ret)
> + return ret;
> +
> + /* event originated from ACPI eject notification */
> + device->flags.eject_pending = 1;

You do not need to set the eject_pending flag when the handler calls
acpi_bus_hot_remove_device(). It was set before because the handler did
not initiate the hot-remove operation.

> +
> + /* send the uevent before remove the device */
> + kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
> +
> + ej_event = kmalloc(sizeof(*ej_event), GFP_KERNEL);
> + if (!ej_event)
> + return -ENOMEM;
> +
> + ej_event->device = device;
> + ej_event->event = ACPI_NOTIFY_EJECT_REQUEST;
> +
> + acpi_bus_hot_remove_device(ej_event);
> +
> + return 0;
> +}
> +
> static void __container_notify_cb(struct work_struct *work)
> {
> struct acpi_device *device = NULL;
> @@ -181,6 +210,9 @@ static void __container_notify_cb(struct work_struct *work)
> handle = hp_work->handle;
> type = hp_work->type;
>
> + present = is_device_present(handle);
> + status = acpi_bus_get_device(handle, &device);
> +
> switch (type) {
> case ACPI_NOTIFY_BUS_CHECK:
> /* Fall through */
> @@ -189,13 +221,14 @@ static void __container_notify_cb(struct work_struct *work)
> (type == ACPI_NOTIFY_BUS_CHECK) ?
> "ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK");
>
> - present = is_device_present(handle);
> - status = acpi_bus_get_device(handle, &device);
> if (!present) {
> if (ACPI_SUCCESS(status)) {
> /* device exist and this is a remove request */
> - device->flags.eject_pending = 1;
> - kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
> + result = container_device_remove(device);
> + if (result) {
> + printk(KERN_WARNING "Failed to remove container\n");
> + break;
> + }
> return;
> }
> break;
> @@ -215,12 +248,19 @@ static void __container_notify_cb(struct work_struct *work)
> break;
>
> case ACPI_NOTIFY_EJECT_REQUEST:
> - if (!acpi_bus_get_device(handle, &device) && device) {
> - device->flags.eject_pending = 1;
> - kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
> - return;
> + printk(KERN_WARNING "Container driver received %s event\n",
> + "ACPI_NOTIFY_EJECT_REQUEST");

Same as other comment. Suggest to use pr_debug().

> +
> + if (!present || ACPI_FAILURE(status) || !device)
> + break;
> +
> + result = container_device_remove(device);
> + if (result) {
> + printk(KERN_WARNING "Failed to remove container\n");

Please use pr_warn().

Thanks,
-Toshi

> + break;
> }
> - break;
> +
> + return;
>
> default:
> /* non-hotplug event; possibly handled by other handler */

2012-10-24 18:02:30

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

On Wed, Oct 24, 2012 at 11:14 AM, Toshi Kani <[email protected]> wrote:
> On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:

>> + result = container_device_remove(device);
>> + if (result) {
>> + printk(KERN_WARNING "Failed to remove container\n");
>
> Please use pr_warn().

I think you should use dev_warn() and similar when possible. In a
year or two, the text "Failed to remove container" all by itself in a
dmesg log is not going to mean anything to anybody except you, and it
doesn't give any clue where to start looking for issues.

I also have a larger question. I'm not sure if
drivers/acpi/container.c does anything important in the first place.
The code in it looks an awful lot like the code in
drivers/acpi/processor_driver.c, drivers/acpi/acpi_memhotplug.c, and
drivers/pci/hotplug/acpiphp_glue.c.

To me, it looks like container.c (as well as the other places I
mentioned) is an attempt to compensate for the lack of hotplug support
in the ACPI core. If the ACPI core had generic hotplug support, e.g.,
if it handled BUS_CHECK, DEVICE_CHECK, and EJECT_REQUEST notifications
and turned those into the appropriate calls to driver .add()/.start()
and .remove() methods, would we need container.c at all?

Bjorn

2012-10-25 00:54:30

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

On 2012-10-25 2:02, Bjorn Helgaas wrote:
> On Wed, Oct 24, 2012 at 11:14 AM, Toshi Kani <[email protected]> wrote:
>> On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:
>
>>> + result = container_device_remove(device);
>>> + if (result) {
>>> + printk(KERN_WARNING "Failed to remove container\n");
>>
>> Please use pr_warn().
>
> I think you should use dev_warn() and similar when possible. In a
> year or two, the text "Failed to remove container" all by itself in a
> dmesg log is not going to mean anything to anybody except you, and it
> doesn't give any clue where to start looking for issues.
>
> I also have a larger question. I'm not sure if
> drivers/acpi/container.c does anything important in the first place.
> The code in it looks an awful lot like the code in
> drivers/acpi/processor_driver.c, drivers/acpi/acpi_memhotplug.c, and
> drivers/pci/hotplug/acpiphp_glue.c.
>
> To me, it looks like container.c (as well as the other places I
> mentioned) is an attempt to compensate for the lack of hotplug support
> in the ACPI core. If the ACPI core had generic hotplug support, e.g.,
> if it handled BUS_CHECK, DEVICE_CHECK, and EJECT_REQUEST notifications
> and turned those into the appropriate calls to driver .add()/.start()
> and .remove() methods, would we need container.c at all?
Hi Bjorn,
It's true that container driver is just for hotplug. We are working
on a hotplug framework which will consolidate all hotplug logic into one
driver.
--Gerry

>
> Bjorn
>
> .
>

2012-10-25 01:32:58

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

Hi Toshi,

On 10/25/2012 01:14 AM, Toshi Kani wrote:
> On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:
>> +static int container_device_remove(struct acpi_device *device)
>> +{
>> + int ret;
>> + struct acpi_eject_event *ej_event;
>> +
>> + /* stop container device at first */
>> + ret = acpi_bus_trim(device, 0);
>
> Hi Tang,
>
> Why do you need to call acpi_bus_trim(device,0) to stop the container
> device first?

This issue was introduced by Lu Yinghai, I think he could give a better
answer than me. :)
Please refer to the following url:

http://www.spinics.net/lists/linux-pci/msg17667.html

However, this is not applied into the pci tree yet.

>
>> + printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);
>
> Do you need this message in the normal case? If so, I'd suggest to use
> pr_debug().
>
>> + if (ret)
>> + return ret;
>> +
>> + /* event originated from ACPI eject notification */
>> + device->flags.eject_pending = 1;
>
> You do not need to set the eject_pending flag when the handler calls
> acpi_bus_hot_remove_device(). It was set before because the handler did
> not initiate the hot-remove operation.

I just set it to keep the logic the same as before.
And thanks for telling me this. :)

>
...
>> + printk(KERN_WARNING "Container driver received %s event\n",
>> + "ACPI_NOTIFY_EJECT_REQUEST");
>
> Same as other comment. Suggest to use pr_debug().

OK. :)

>
>> +
>> + if (!present || ACPI_FAILURE(status) || !device)
>> + break;
>> +
>> + result = container_device_remove(device);
>> + if (result) {
>> + printk(KERN_WARNING "Failed to remove container\n");
>
> Please use pr_warn().
>
> Thanks,
> -Toshi
>
>> + break;
>> }
>> - break;
>> +
>> + return;
>>
>> default:
>> /* non-hotplug event; possibly handled by other handler */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2012-10-25 01:47:45

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

On 2012-10-25 9:31, Tang Chen wrote:
> Hi Toshi,
>
> On 10/25/2012 01:14 AM, Toshi Kani wrote:
>> On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:
>>> +static int container_device_remove(struct acpi_device *device)
>>> +{
>>> + int ret;
>>> + struct acpi_eject_event *ej_event;
>>> +
>>> + /* stop container device at first */
>>> + ret = acpi_bus_trim(device, 0);
>>
>> Hi Tang,
>>
>> Why do you need to call acpi_bus_trim(device,0) to stop the container
>> device first?
>
> This issue was introduced by Lu Yinghai, I think he could give a better
> answer than me. :)
> Please refer to the following url:
>
> http://www.spinics.net/lists/linux-pci/msg17667.html
>
> However, this is not applied into the pci tree yet.
We have worked out a patch set to clean up the logic for PCI/ACPI binding
relationship. It updates PCI/ACPI binding relationship by registering bus
notification onto pci_bus_type instead of hooking into the ACPI/glue.c.

To accommodate that patch set, the ACPI device destroy process has been
split into two steps:
1) acpi_bus_trim(device,0) to unbind ACPI drivers
2) acpi_bus_trim(device,1) to destroy ACPI devices

>
>>
>>> + printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);
>>
>> Do you need this message in the normal case? If so, I'd suggest to use
>> pr_debug().
>>
>>> + if (ret)
>>> + return ret;
>>> +
>>> + /* event originated from ACPI eject notification */
>>> + device->flags.eject_pending = 1;
>>
>> You do not need to set the eject_pending flag when the handler calls
>> acpi_bus_hot_remove_device(). It was set before because the handler did
>> not initiate the hot-remove operation.
>
> I just set it to keep the logic the same as before.
> And thanks for telling me this. :)
>
>>
> ...
>>> + printk(KERN_WARNING "Container driver received %s event\n",
>>> + "ACPI_NOTIFY_EJECT_REQUEST");
>>
>> Same as other comment. Suggest to use pr_debug().
>
> OK. :)
>
>>
>>> +
>>> + if (!present || ACPI_FAILURE(status) || !device)
>>> + break;
>>> +
>>> + result = container_device_remove(device);
>>> + if (result) {
>>> + printk(KERN_WARNING "Failed to remove container\n");
>>
>> Please use pr_warn().
>>
>> Thanks,
>> -Toshi
>>
>>> + break;
>>> }
>>> - break;
>>> +
>>> + return;
>>>
>>> default:
>>> /* non-hotplug event; possibly handled by other handler */
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> .
>

2012-10-25 17:28:23

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

On Thu, 2012-10-25 at 09:47 +0800, Jiang Liu wrote:
> On 2012-10-25 9:31, Tang Chen wrote:
> > Hi Toshi,
> >
> > On 10/25/2012 01:14 AM, Toshi Kani wrote:
> >> On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:
> >>> +static int container_device_remove(struct acpi_device *device)
> >>> +{
> >>> + int ret;
> >>> + struct acpi_eject_event *ej_event;
> >>> +
> >>> + /* stop container device at first */
> >>> + ret = acpi_bus_trim(device, 0);
> >>
> >> Hi Tang,
> >>
> >> Why do you need to call acpi_bus_trim(device,0) to stop the container
> >> device first?
> >
> > This issue was introduced by Lu Yinghai, I think he could give a better
> > answer than me. :)
> > Please refer to the following url:
> >
> > http://www.spinics.net/lists/linux-pci/msg17667.html
> >
> > However, this is not applied into the pci tree yet.
> We have worked out a patch set to clean up the logic for PCI/ACPI binding
> relationship. It updates PCI/ACPI binding relationship by registering bus
> notification onto pci_bus_type instead of hooking into the ACPI/glue.c.

Thanks for the info and pointer. Tang, I'd suggest you add such info to
the comment so that others know that this step is needed for removing
PCI bridges. It helps us to know where to look at...

> To accommodate that patch set, the ACPI device destroy process has been
> split into two steps:
> 1) acpi_bus_trim(device,0) to unbind ACPI drivers

Does this step also detach PCI drivers from PCI cards as well?

Thanks,
-Toshi


> 2) acpi_bus_trim(device,1) to destroy ACPI devices
>
> >
> >>
> >>> + printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);
> >>
> >> Do you need this message in the normal case? If so, I'd suggest to use
> >> pr_debug().
> >>
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + /* event originated from ACPI eject notification */
> >>> + device->flags.eject_pending = 1;
> >>
> >> You do not need to set the eject_pending flag when the handler calls
> >> acpi_bus_hot_remove_device(). It was set before because the handler did
> >> not initiate the hot-remove operation.
> >
> > I just set it to keep the logic the same as before.
> > And thanks for telling me this. :)
> >
> >>
> > ...
> >>> + printk(KERN_WARNING "Container driver received %s event\n",
> >>> + "ACPI_NOTIFY_EJECT_REQUEST");
> >>
> >> Same as other comment. Suggest to use pr_debug().
> >
> > OK. :)
> >
> >>
> >>> +
> >>> + if (!present || ACPI_FAILURE(status) || !device)
> >>> + break;
> >>> +
> >>> + result = container_device_remove(device);
> >>> + if (result) {
> >>> + printk(KERN_WARNING "Failed to remove container\n");
> >>
> >> Please use pr_warn().
> >>
> >> Thanks,
> >> -Toshi
> >>
> >>> + break;
> >>> }
> >>> - break;
> >>> +
> >>> + return;
> >>>
> >>> default:
> >>> /* non-hotplug event; possibly handled by other handler */
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> > .
> >
>
>

2012-10-26 05:44:32

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

Hi Toshi,

On 10/26/2012 01:20 AM, Toshi Kani wrote:
...
>>>> Why do you need to call acpi_bus_trim(device,0) to stop the container
>>>> device first?
>>>
>>> This issue was introduced by Lu Yinghai, I think he could give a better
>>> answer than me. :)
>>> Please refer to the following url:
>>>
>>> http://www.spinics.net/lists/linux-pci/msg17667.html
>>>
>>> However, this is not applied into the pci tree yet.
>> We have worked out a patch set to clean up the logic for PCI/ACPI binding
>> relationship. It updates PCI/ACPI binding relationship by registering bus
>> notification onto pci_bus_type instead of hooking into the ACPI/glue.c.
>
> Thanks for the info and pointer. Tang, I'd suggest you add such info to
> the comment so that others know that this step is needed for removing
> PCI bridges. It helps us to know where to look at...

OK, I'll add it in the next version. :)

>
>> To accommodate that patch set, the ACPI device destroy process has been
>> split into two steps:
>> 1) acpi_bus_trim(device,0) to unbind ACPI drivers
>
> Does this step also detach PCI drivers from PCI cards as well?

Yes, it calls device_release_driver() to release the device driver.

device_release_driver()
|->__device_release_driver()
|->dev->driver = NULL;

Thanks. :)

>
> Thanks,
> -Toshi
>
>

2012-10-26 20:09:52

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

On Fri, 2012-10-26 at 13:43 +0800, Tang Chen wrote:
> Hi Toshi,
>
> On 10/26/2012 01:20 AM, Toshi Kani wrote:
> ...
> >>>> Why do you need to call acpi_bus_trim(device,0) to stop the container
> >>>> device first?
> >>>
> >>> This issue was introduced by Lu Yinghai, I think he could give a better
> >>> answer than me. :)
> >>> Please refer to the following url:
> >>>
> >>> http://www.spinics.net/lists/linux-pci/msg17667.html
> >>>
> >>> However, this is not applied into the pci tree yet.
> >> We have worked out a patch set to clean up the logic for PCI/ACPI binding
> >> relationship. It updates PCI/ACPI binding relationship by registering bus
> >> notification onto pci_bus_type instead of hooking into the ACPI/glue.c.
> >
> > Thanks for the info and pointer. Tang, I'd suggest you add such info to
> > the comment so that others know that this step is needed for removing
> > PCI bridges. It helps us to know where to look at...
>
> OK, I'll add it in the next version. :)
>
> >
> >> To accommodate that patch set, the ACPI device destroy process has been
> >> split into two steps:
> >> 1) acpi_bus_trim(device,0) to unbind ACPI drivers
> >
> > Does this step also detach PCI drivers from PCI cards as well?
>
> Yes, it calls device_release_driver() to release the device driver.
>
> device_release_driver()
> |->__device_release_driver()
> |->dev->driver = NULL;

I see. Thanks for the info.
-Toshi


>
> Thanks. :)
>
> >
> > Thanks,
> > -Toshi
> >
> >
>