2013-06-14 19:34:24

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

Alexander E. Patrakov <[email protected]> reports two bugs related to
dock station support on Sony VAIO VPCZ23A4R. Actually there are at least
four bugs related to Sony VAIO VPCZ23A4R dock support.
1) can't correctly detect hotplug slot for dock state
2) resource leak on undocking
3) resource allocation failure for dock devices
4) one bug in intel_snd_hda driver

Hi Alexander, could you please help to test the whole patchset?
Really get sleepy, it's already 3 oclock now, so please forgive me
if there any stupid mistakes.

Jiang Liu (4):
ACPI, DOCK: initialize dock subsystem before scanning PCI root buses
ACPI, DOCK: resolve possible deadlock scenarios
PCI, ACPI: fix device destroying order issue when handling dock
notification
ACPIPHP: fix bug 56531 Sony VAIO VPCZ23A4R: can't assign mem/io
after docking

drivers/acpi/dock.c | 116 +++++++++++++++++++++----------------
drivers/acpi/internal.h | 5 ++
drivers/acpi/scan.c | 1 +
drivers/pci/hotplug/acpiphp_glue.c | 54 ++++++++++++-----
drivers/pci/pci.h | 5 ++
drivers/pci/setup-bus.c | 8 +--
include/acpi/acpi_drivers.h | 2 +
7 files changed, 121 insertions(+), 70 deletions(-)

--
1.8.1.2


2013-06-14 19:34:36

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: [BUGFIX v2 1/4] ACPI, DOCK: initialize dock subsystem before scanning PCI root buses

Changeset "3b63aaa70e1 PCI: acpiphp: Do not use ACPI PCI subdriver
mechanism" causes a regression which breaks ACPI dock support,
please refer to https://bugzilla.kernel.org/show_bug.cgi?id=59501

The root cause is that changeset 3b63aaa70e1 changed the relative
initialization order of ACPI dock subsystem and acpiphp driver,
and acpiphp driver has dependency on ACPI dock subsystem's
initialization result, so that acpiphp can't correctly detect ACPI
dock stations now.

On the other hand, ACPI dock is a built-in driver, so we could
explicitly initialize it before the acpiphp driver is used.

Signed-off-by: Jiang Liu <[email protected]>
Reported-by: Alexander E. Patrakov <[email protected]>
Tested-by: Alexander E. Patrakov <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # 3.9+
---
drivers/acpi/dock.c | 7 +------
drivers/acpi/internal.h | 5 +++++
drivers/acpi/scan.c | 1 +
3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
index 4fdea38..02b0563 100644
--- a/drivers/acpi/dock.c
+++ b/drivers/acpi/dock.c
@@ -1033,7 +1033,7 @@ find_dock_and_bay(acpi_handle handle, u32 lvl, void *context, void **rv)
return AE_OK;
}

-static int __init dock_init(void)
+int __init acpi_dock_init(void)
{
if (acpi_disabled)
return 0;
@@ -1062,9 +1062,4 @@ static void __exit dock_exit(void)
dock_remove(dock_station);
}

-/*
- * Must be called before drivers of devices in dock, otherwise we can't know
- * which devices are in a dock
- */
-subsys_initcall(dock_init);
module_exit(dock_exit);
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index 297cbf4..c610a76 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -40,6 +40,11 @@ void acpi_container_init(void);
#else
static inline void acpi_container_init(void) {}
#endif
+#ifdef CONFIG_ACPI_DOCK
+void acpi_dock_init(void);
+#else
+static inline void acpi_dock_init(void) {}
+#endif
#ifdef CONFIG_ACPI_HOTPLUG_MEMORY
void acpi_memory_hotplug_init(void);
#else
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 44225cb..4148163 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -2045,6 +2045,7 @@ int __init acpi_scan_init(void)
acpi_lpss_init();
acpi_container_init();
acpi_memory_hotplug_init();
+ acpi_dock_init();

mutex_lock(&acpi_scan_lock);
/*
--
1.8.1.2

2013-06-14 19:34:57

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

This is a preparation for next patch to avoid breaking bisecting.
If next patch is applied without this one, it will cause deadlock
as below:

Case 1:
[ 31.015593] Possible unsafe locking scenario:

[ 31.018350] CPU0 CPU1
[ 31.019691] ---- ----
[ 31.021002] lock(&dock_station->hp_lock);
[ 31.022327] lock(&slot->crit_sect);
[ 31.023650] lock(&dock_station->hp_lock);
[ 31.025010] lock(&slot->crit_sect);
[ 31.026342]

Case 2:
hotplug_dock_devices()
mutex_lock(&ds->hp_lock)
dd->ops->handler()
register_hotplug_dock_device()
mutex_lock(&ds->hp_lock)
[ 34.316570] [ INFO: possible recursive locking detected ]
[ 34.316573] 3.10.0-rc4 #6 Tainted: G C
[ 34.316575] ---------------------------------------------
[ 34.316577] kworker/0:0/4 is trying to acquire lock:
[ 34.316579] (&dock_station->hp_lock){+.+.+.}, at:
[<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
[ 34.316588]
but task is already holding lock:
[ 34.316590] (&dock_station->hp_lock){+.+.+.}, at:
[<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
[ 34.316595]
other info that might help us debug this:
[ 34.316597] Possible unsafe locking scenario:

[ 34.316599] CPU0
[ 34.316601] ----
[ 34.316602] lock(&dock_station->hp_lock);
[ 34.316605] lock(&dock_station->hp_lock);
[ 34.316608]
*** DEADLOCK ***

So fix this deadlock by not taking ds->hp_lock in function
register_hotplug_dock_device(). This patch also fixes a possible
race conditions in function dock_event() because previously it
accesses ds->hotplug_devices list without holding ds->hp_lock.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Yijing Wang <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # 3.8+
---
drivers/acpi/dock.c | 109 ++++++++++++++++++++++---------------
drivers/pci/hotplug/acpiphp_glue.c | 15 +++++
include/acpi/acpi_drivers.h | 2 +
3 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
index 02b0563..602bce5 100644
--- a/drivers/acpi/dock.c
+++ b/drivers/acpi/dock.c
@@ -66,7 +66,7 @@ struct dock_station {
spinlock_t dd_lock;
struct mutex hp_lock;
struct list_head dependent_devices;
- struct list_head hotplug_devices;
+ struct klist hotplug_devices;

struct list_head sibling;
struct platform_device *dock_device;
@@ -76,12 +76,18 @@ static int dock_station_count;

struct dock_dependent_device {
struct list_head list;
- struct list_head hotplug_list;
+ acpi_handle handle;
+};
+
+struct dock_hotplug_info {
+ struct klist_node node;
acpi_handle handle;
const struct acpi_dock_ops *ops;
void *context;
};

+#define node_to_info(n) container_of((n), struct dock_hotplug_info, node)
+
#define DOCK_DOCKING 0x00000001
#define DOCK_UNDOCKING 0x00000002
#define DOCK_IS_DOCK 0x00000010
@@ -111,7 +117,6 @@ add_dock_dependent_device(struct dock_station *ds, acpi_handle handle)

dd->handle = handle;
INIT_LIST_HEAD(&dd->list);
- INIT_LIST_HEAD(&dd->hotplug_list);

spin_lock(&ds->dd_lock);
list_add_tail(&dd->list, &ds->dependent_devices);
@@ -120,36 +125,19 @@ add_dock_dependent_device(struct dock_station *ds, acpi_handle handle)
return 0;
}

-/**
- * dock_add_hotplug_device - associate a hotplug handler with the dock station
- * @ds: The dock station
- * @dd: The dependent device struct
- *
- * Add the dependent device to the dock's hotplug device list
- */
-static void
-dock_add_hotplug_device(struct dock_station *ds,
- struct dock_dependent_device *dd)
+static void hotplug_info_get(struct klist_node *node)
{
- mutex_lock(&ds->hp_lock);
- list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
- mutex_unlock(&ds->hp_lock);
+ struct dock_hotplug_info *info = node_to_info(node);
+
+ info->ops->get(info->context);
}

-/**
- * dock_del_hotplug_device - remove a hotplug handler from the dock station
- * @ds: The dock station
- * @dd: the dependent device struct
- *
- * Delete the dependent device from the dock's hotplug device list
- */
-static void
-dock_del_hotplug_device(struct dock_station *ds,
- struct dock_dependent_device *dd)
+static void hotplug_info_put(struct klist_node *node)
{
- mutex_lock(&ds->hp_lock);
- list_del(&dd->hotplug_list);
- mutex_unlock(&ds->hp_lock);
+ struct dock_hotplug_info *info = node_to_info(node);
+
+ info->ops->put(info->context);
+ kfree(info);
}

/**
@@ -354,15 +342,22 @@ static void dock_remove_acpi_device(acpi_handle handle)
static void hotplug_dock_devices(struct dock_station *ds, u32 event)
{
struct dock_dependent_device *dd;
+ struct klist_iter iter;
+ struct klist_node *node;
+ struct dock_hotplug_info *info;

mutex_lock(&ds->hp_lock);

/*
* First call driver specific hotplug functions
*/
- list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
- if (dd->ops && dd->ops->handler)
- dd->ops->handler(dd->handle, event, dd->context);
+ klist_iter_init(&ds->hotplug_devices, &iter);
+ while ((node = klist_next(&iter))) {
+ info = node_to_info(node);
+ if (info->ops && info->ops->handler)
+ info->ops->handler(info->handle, event, info->context);
+ }
+ klist_iter_exit(&iter);

/*
* Now make sure that an acpi_device is created for each
@@ -384,7 +379,9 @@ static void dock_event(struct dock_station *ds, u32 event, int num)
struct device *dev = &ds->dock_device->dev;
char event_string[13];
char *envp[] = { event_string, NULL };
- struct dock_dependent_device *dd;
+ struct klist_iter iter;
+ struct klist_node *node;
+ struct dock_hotplug_info *info;

if (num == UNDOCK_EVENT)
sprintf(event_string, "EVENT=undock");
@@ -398,9 +395,13 @@ static void dock_event(struct dock_station *ds, u32 event, int num)
if (num == DOCK_EVENT)
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);

- list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
- if (dd->ops && dd->ops->uevent)
- dd->ops->uevent(dd->handle, event, dd->context);
+ klist_iter_init(&ds->hotplug_devices, &iter);
+ while ((node = klist_next(&iter))) {
+ info = node_to_info(node);
+ if (info->ops && info->ops->handler)
+ info->ops->handler(info->handle, event, info->context);
+ }
+ klist_iter_exit(&iter);

if (num != DOCK_EVENT)
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
@@ -580,12 +581,16 @@ register_hotplug_dock_device(acpi_handle handle, const struct acpi_dock_ops *ops
void *context)
{
struct dock_dependent_device *dd;
+ struct dock_hotplug_info *info;
struct dock_station *dock_station;
int ret = -EINVAL;

if (!dock_station_count)
return -ENODEV;

+ if (ops == NULL || ops->get == NULL || ops->put == NULL)
+ return -EINVAL;
+
/*
* make sure this handle is for a device dependent on the dock,
* this would include the dock station itself
@@ -598,9 +603,18 @@ register_hotplug_dock_device(acpi_handle handle, const struct acpi_dock_ops *ops
*/
dd = find_dock_dependent_device(dock_station, handle);
if (dd) {
- dd->ops = ops;
- dd->context = context;
- dock_add_hotplug_device(dock_station, dd);
+ info = kzalloc(sizeof(*info), GFP_KERNEL);
+ if (!info) {
+ unregister_hotplug_dock_device(handle);
+ ret = -ENOMEM;
+ break;
+ }
+
+ info->ops = ops;
+ info->context = context;
+ info->handle = dd->handle;
+ klist_add_tail(&info->node,
+ &dock_station->hotplug_devices);
ret = 0;
}
}
@@ -615,16 +629,22 @@ EXPORT_SYMBOL_GPL(register_hotplug_dock_device);
*/
void unregister_hotplug_dock_device(acpi_handle handle)
{
- struct dock_dependent_device *dd;
struct dock_station *dock_station;
+ struct klist_iter iter;
+ struct klist_node *node;
+ struct dock_hotplug_info *info;

if (!dock_station_count)
return;

list_for_each_entry(dock_station, &dock_stations, sibling) {
- dd = find_dock_dependent_device(dock_station, handle);
- if (dd)
- dock_del_hotplug_device(dock_station, dd);
+ klist_iter_init(&dock_station->hotplug_devices, &iter);
+ while ((node = klist_next(&iter))) {
+ info = node_to_info(node);
+ if (info->handle == handle)
+ klist_del(&info->node);
+ }
+ klist_iter_exit(&iter);
}
}
EXPORT_SYMBOL_GPL(unregister_hotplug_dock_device);
@@ -951,7 +971,8 @@ static int __init dock_add(acpi_handle handle)
mutex_init(&dock_station->hp_lock);
spin_lock_init(&dock_station->dd_lock);
INIT_LIST_HEAD(&dock_station->sibling);
- INIT_LIST_HEAD(&dock_station->hotplug_devices);
+ klist_init(&dock_station->hotplug_devices,
+ hotplug_info_get, hotplug_info_put);
ATOMIC_INIT_NOTIFIER_HEAD(&dock_notifier_list);
INIT_LIST_HEAD(&dock_station->dependent_devices);

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 716aa93..5d696f5 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -145,9 +145,24 @@ static int post_dock_fixups(struct notifier_block *nb, unsigned long val,
return NOTIFY_OK;
}

+static void acpiphp_dock_get(void *data)
+{
+ struct acpiphp_func *func = data;
+
+ get_bridge(func->slot->bridge);
+}
+
+static void acpiphp_dock_put(void *data)
+{
+ struct acpiphp_func *func = data;
+
+ put_bridge(func->slot->bridge);
+}

static const struct acpi_dock_ops acpiphp_dock_ops = {
.handler = handle_hotplug_event_func,
+ .get = acpiphp_dock_get,
+ .put = acpiphp_dock_put,
};

/* Check whether the PCI device is managed by native PCIe hotplug driver */
diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h
index e6168a2..8fcc9ac 100644
--- a/include/acpi/acpi_drivers.h
+++ b/include/acpi/acpi_drivers.h
@@ -115,6 +115,8 @@ void pci_acpi_crs_quirks(void);
struct acpi_dock_ops {
acpi_notify_handler handler;
acpi_notify_handler uevent;
+ void (*get)(void *data);
+ void (*put)(void *data);
};

#if defined(CONFIG_ACPI_DOCK) || defined(CONFIG_ACPI_DOCK_MODULE)
--
1.8.1.2

2013-06-14 19:36:49

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: [BUGFIX v2 3/4] PCI, ACPI: fix device destroying order issue when handling dock notification

Current ACPI glue logic expects that physical devices are destroyed
before destroying companion ACPI devices, otherwise it will break the
ACPI unbind logic and cause following warning messages:
[ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt
[ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
[ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
[ 180.013656] port1: Oops, 'acpi_handle' corrupt
Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
for full log message.

Above warning messages are caused by following scenario:
1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
3) hotplug_dock_devices() first invokes registered hotplug callbacks to
destroy physical devices, then destroys all affected ACPI devices.
Everything seems perfect until now. But the acpiphp dock notification
handler will queue another task (T2) onto kacpi_hotplug_wq to really
destroy affected physical devices.
4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
been destroyed.
5) kacpi_hotplug_wq handles T2, which destroys all affected physical
devices.

So it breaks ACPI glue logic's expection because ACPI devices are destroyed
in step 3 and physical devices are destroyed in step 5.

Signed-off-by: Jiang Liu <[email protected]>
Reported-by: Alexander E. Patrakov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # 3.8+
---
drivers/pci/hotplug/acpiphp_glue.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 5d696f5..a65203b 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -61,6 +61,8 @@ static DEFINE_MUTEX(bridge_mutex);
static void handle_hotplug_event_bridge (acpi_handle, u32, void *);
static void acpiphp_sanitize_bus(struct pci_bus *bus);
static void acpiphp_set_hpp_values(struct pci_bus *bus);
+static void _handle_hotplug_event_func(acpi_handle handle, u32 type,
+ void *context);
static void handle_hotplug_event_func(acpi_handle handle, u32 type, void *context);
static void free_bridge(struct kref *kref);

@@ -160,7 +162,7 @@ static void acpiphp_dock_put(void *data)
}

static const struct acpi_dock_ops acpiphp_dock_ops = {
- .handler = handle_hotplug_event_func,
+ .handler = _handle_hotplug_event_func,
.get = acpiphp_dock_get,
.put = acpiphp_dock_put,
};
@@ -1080,22 +1082,13 @@ static void handle_hotplug_event_bridge(acpi_handle handle, u32 type,
alloc_acpi_hp_work(handle, type, context, _handle_hotplug_event_bridge);
}

-static void _handle_hotplug_event_func(struct work_struct *work)
+static void _handle_hotplug_event_func(acpi_handle handle, u32 type,
+ void *context)
{
- struct acpiphp_func *func;
+ struct acpiphp_func *func = context;
char objname[64];
struct acpi_buffer buffer = { .length = sizeof(objname),
.pointer = objname };
- struct acpi_hp_work *hp_work;
- acpi_handle handle;
- u32 type;
-
- hp_work = container_of(work, struct acpi_hp_work, work);
- handle = hp_work->handle;
- type = hp_work->type;
- func = (struct acpiphp_func *)hp_work->context;
-
- acpi_scan_lock_acquire();

acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);

@@ -1128,7 +1121,18 @@ static void _handle_hotplug_event_func(struct work_struct *work)
warn("notify_handler: unknown event type 0x%x for %s\n", type, objname);
break;
}
+}
+
+static void _handle_hotplug_event_cb(struct work_struct *work)
+{
+ struct acpiphp_func *func;
+ struct acpi_hp_work *hp_work;

+ hp_work = container_of(work, struct acpi_hp_work, work);
+ func = (struct acpiphp_func *)hp_work->context;
+ acpi_scan_lock_acquire();
+ _handle_hotplug_event_func(hp_work->handle, hp_work->type,
+ hp_work->context);
acpi_scan_lock_release();
kfree(hp_work); /* allocated in handle_hotplug_event_func */
put_bridge(func->slot->bridge);
@@ -1156,7 +1160,7 @@ static void handle_hotplug_event_func(acpi_handle handle, u32 type,
* don't deadlock on hotplug actions.
*/
get_bridge(func->slot->bridge);
- alloc_acpi_hp_work(handle, type, context, _handle_hotplug_event_func);
+ alloc_acpi_hp_work(handle, type, context, _handle_hotplug_event_cb);
}

/*
--
1.8.1.2

2013-06-14 19:36:55

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: [BUGFIX v2 4/4] ACPIPHP: fix bug 56531 Sony VAIO VPCZ23A4R: can't assign mem/io after docking

Please refer to https://bugzilla.kernel.org/show_bug.cgi?id=56531 for
more information.

This issue is caused by differences in PCI resource assignment between
boot time and runtime hotplug. On x86 platforms, OS respects PCI
resource assignment from BIOS and only reassign resources for unassigned
BARs at boot time. But with acpiphp, it ignores BIOS resource assignment
and reassign all resources by itself.

If we have enough resources, reassigning all PCI resources should work
too, but it may fail if we are under resource constraints. On the other
handle, current PCI MMIO alignment algorithm may waste huge MMIO address
space if we have some PCI devices with huge MMIO BARs.

On this Sony laptop, BIOS allocates limited MMIO resources for the dock
station and the dock station has a gfx which has a 256MB MMIO BAR.
So current acpiphp driver fails to allocate resources for most devices
on the dock station.

So change acpiphp driver to follow boot time behavior to respect BIOS
resource assignment.

Signed-off-by: Jiang Liu <[email protected]>
Reported-by: Alexander E. Patrakov <[email protected]>
Tested-by: Alexander E. Patrakov <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]>
---
drivers/pci/hotplug/acpiphp_glue.c | 7 +++++--
drivers/pci/pci.h | 5 +++++
drivers/pci/setup-bus.c | 8 ++++----
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index a65203b..f4a53e9 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -687,6 +687,7 @@ static int __ref enable_device(struct acpiphp_slot *slot)
struct pci_bus *bus = slot->bridge->pci_bus;
struct acpiphp_func *func;
int num, max, pass;
+ LIST_HEAD(add_list);

if (slot->flags & SLOT_ENABLED)
goto err_exit;
@@ -711,13 +712,15 @@ static int __ref enable_device(struct acpiphp_slot *slot)
max = pci_scan_bridge(bus, dev, max, pass);
if (pass && dev->subordinate) {
check_hotplug_bridge(slot, dev);
- pci_bus_size_bridges(dev->subordinate);
+ pcibios_resource_survey_bus(dev->subordinate);
+ __pci_bus_size_bridges(dev->subordinate,
+ &add_list);
}
}
}
}

- pci_bus_assign_resources(bus);
+ __pci_bus_assign_resources(bus, &add_list, NULL);
acpiphp_sanitize_bus(bus);
acpiphp_set_hpp_values(bus);
acpiphp_set_acpi_region(slot);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 68678ed..d1182c4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -202,6 +202,11 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
struct resource *res, unsigned int reg);
int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type);
void pci_configure_ari(struct pci_dev *dev);
+void __ref __pci_bus_size_bridges(struct pci_bus *bus,
+ struct list_head *realloc_head);
+void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
+ struct list_head *realloc_head,
+ struct list_head *fail_head);

/**
* pci_ari_enabled - query ARI forwarding status
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 16abaaa..d254e23 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1044,7 +1044,7 @@ handle_done:
;
}

-static void __ref __pci_bus_size_bridges(struct pci_bus *bus,
+void __ref __pci_bus_size_bridges(struct pci_bus *bus,
struct list_head *realloc_head)
{
struct pci_dev *dev;
@@ -1115,9 +1115,9 @@ void __ref pci_bus_size_bridges(struct pci_bus *bus)
}
EXPORT_SYMBOL(pci_bus_size_bridges);

-static void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
- struct list_head *realloc_head,
- struct list_head *fail_head)
+void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
+ struct list_head *realloc_head,
+ struct list_head *fail_head)
{
struct pci_bus *b;
struct pci_dev *dev;
--
1.8.1.2

2013-06-14 21:04:02

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUGFIX v2 4/4] ACPIPHP: fix bug 56531 Sony VAIO VPCZ23A4R: can't assign mem/io after docking

On Fri, Jun 14, 2013 at 12:28 PM, Jiang Liu <[email protected]> wrote:
> Please refer to https://bugzilla.kernel.org/show_bug.cgi?id=56531 for
> more information.
>
> This issue is caused by differences in PCI resource assignment between
> boot time and runtime hotplug. On x86 platforms, OS respects PCI
> resource assignment from BIOS and only reassign resources for unassigned
> BARs at boot time. But with acpiphp, it ignores BIOS resource assignment
> and reassign all resources by itself.
>
> If we have enough resources, reassigning all PCI resources should work
> too, but it may fail if we are under resource constraints. On the other
> handle, current PCI MMIO alignment algorithm may waste huge MMIO address
> space if we have some PCI devices with huge MMIO BARs.
>
> On this Sony laptop, BIOS allocates limited MMIO resources for the dock
> station and the dock station has a gfx which has a 256MB MMIO BAR.
> So current acpiphp driver fails to allocate resources for most devices
> on the dock station.
>
> So change acpiphp driver to follow boot time behavior to respect BIOS
> resource assignment.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Reported-by: Alexander E. Patrakov <[email protected]>
> Tested-by: Alexander E. Patrakov <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: <[email protected]>

Acked-by: Yinghai Lu <[email protected]>

> ---
> drivers/pci/hotplug/acpiphp_glue.c | 7 +++++--
> drivers/pci/pci.h | 5 +++++
> drivers/pci/setup-bus.c | 8 ++++----
> 3 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index a65203b..f4a53e9 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -687,6 +687,7 @@ static int __ref enable_device(struct acpiphp_slot *slot)
> struct pci_bus *bus = slot->bridge->pci_bus;
> struct acpiphp_func *func;
> int num, max, pass;
> + LIST_HEAD(add_list);
>
> if (slot->flags & SLOT_ENABLED)
> goto err_exit;
> @@ -711,13 +712,15 @@ static int __ref enable_device(struct acpiphp_slot *slot)
> max = pci_scan_bridge(bus, dev, max, pass);
> if (pass && dev->subordinate) {
> check_hotplug_bridge(slot, dev);
> - pci_bus_size_bridges(dev->subordinate);
> + pcibios_resource_survey_bus(dev->subordinate);
> + __pci_bus_size_bridges(dev->subordinate,
> + &add_list);
> }
> }
> }
> }
>
> - pci_bus_assign_resources(bus);
> + __pci_bus_assign_resources(bus, &add_list, NULL);
> acpiphp_sanitize_bus(bus);
> acpiphp_set_hpp_values(bus);
> acpiphp_set_acpi_region(slot);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 68678ed..d1182c4 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -202,6 +202,11 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
> struct resource *res, unsigned int reg);
> int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type);
> void pci_configure_ari(struct pci_dev *dev);
> +void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> + struct list_head *realloc_head);
> +void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
> + struct list_head *realloc_head,
> + struct list_head *fail_head);
>
> /**
> * pci_ari_enabled - query ARI forwarding status
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 16abaaa..d254e23 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1044,7 +1044,7 @@ handle_done:
> ;
> }
>
> -static void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> +void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> @@ -1115,9 +1115,9 @@ void __ref pci_bus_size_bridges(struct pci_bus *bus)
> }
> EXPORT_SYMBOL(pci_bus_size_bridges);
>
> -static void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
> - struct list_head *realloc_head,
> - struct list_head *fail_head)
> +void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
> + struct list_head *realloc_head,
> + struct list_head *fail_head)
> {
> struct pci_bus *b;
> struct pci_dev *dev;
> --
> 1.8.1.2
>

2013-06-14 22:11:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Saturday, June 15, 2013 03:27:59 AM Jiang Liu wrote:
> This is a preparation for next patch to avoid breaking bisecting.
> If next patch is applied without this one, it will cause deadlock
> as below:
>
> Case 1:
> [ 31.015593] Possible unsafe locking scenario:
>
> [ 31.018350] CPU0 CPU1
> [ 31.019691] ---- ----
> [ 31.021002] lock(&dock_station->hp_lock);
> [ 31.022327] lock(&slot->crit_sect);
> [ 31.023650] lock(&dock_station->hp_lock);
> [ 31.025010] lock(&slot->crit_sect);
> [ 31.026342]
>
> Case 2:
> hotplug_dock_devices()
> mutex_lock(&ds->hp_lock)
> dd->ops->handler()
> register_hotplug_dock_device()
> mutex_lock(&ds->hp_lock)
> [ 34.316570] [ INFO: possible recursive locking detected ]
> [ 34.316573] 3.10.0-rc4 #6 Tainted: G C
> [ 34.316575] ---------------------------------------------
> [ 34.316577] kworker/0:0/4 is trying to acquire lock:
> [ 34.316579] (&dock_station->hp_lock){+.+.+.}, at:
> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
> [ 34.316588]
> but task is already holding lock:
> [ 34.316590] (&dock_station->hp_lock){+.+.+.}, at:
> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
> [ 34.316595]
> other info that might help us debug this:
> [ 34.316597] Possible unsafe locking scenario:
>
> [ 34.316599] CPU0
> [ 34.316601] ----
> [ 34.316602] lock(&dock_station->hp_lock);
> [ 34.316605] lock(&dock_station->hp_lock);
> [ 34.316608]
> *** DEADLOCK ***
>
> So fix this deadlock by not taking ds->hp_lock in function
> register_hotplug_dock_device(). This patch also fixes a possible
> race conditions in function dock_event() because previously it
> accesses ds->hotplug_devices list without holding ds->hp_lock.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Yinghai Lu <[email protected]>
> Cc: Yijing Wang <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: <[email protected]> # 3.8+
> ---
> drivers/acpi/dock.c | 109 ++++++++++++++++++++++---------------
> drivers/pci/hotplug/acpiphp_glue.c | 15 +++++
> include/acpi/acpi_drivers.h | 2 +
> 3 files changed, 82 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> index 02b0563..602bce5 100644
> --- a/drivers/acpi/dock.c
> +++ b/drivers/acpi/dock.c
> @@ -66,7 +66,7 @@ struct dock_station {
> spinlock_t dd_lock;
> struct mutex hp_lock;
> struct list_head dependent_devices;
> - struct list_head hotplug_devices;
> + struct klist hotplug_devices;
>
> struct list_head sibling;
> struct platform_device *dock_device;
> @@ -76,12 +76,18 @@ static int dock_station_count;
>
> struct dock_dependent_device {
> struct list_head list;
> - struct list_head hotplug_list;
> + acpi_handle handle;
> +};
> +
> +struct dock_hotplug_info {
> + struct klist_node node;
> acpi_handle handle;
> const struct acpi_dock_ops *ops;
> void *context;
> };

Can we please relax a bit and possibly take a step back?

So since your last reply to me wasn't particularly helpful, I went through the
code in dock.c and acpiphp_glue.c and I simply think that the whole
hotplug_list thing is simply redundant.

It looks like instead of using it (or the klist in this patch), we can add a
"hotlpug_device" flag to dock_dependent_device and set that flag instead of
adding dd to hotplug_devices or clear it instead of removing dd from that list.

That would allow us to avoid the deadlock, because we wouldn't need the hp_lock
any more and perhaps we could make the code simpler instead of making it more
complex.

How does that sound?

Rafael


> +#define node_to_info(n) container_of((n), struct dock_hotplug_info, node)
> +
> #define DOCK_DOCKING 0x00000001
> #define DOCK_UNDOCKING 0x00000002
> #define DOCK_IS_DOCK 0x00000010
> @@ -111,7 +117,6 @@ add_dock_dependent_device(struct dock_station *ds, acpi_handle handle)
>
> dd->handle = handle;
> INIT_LIST_HEAD(&dd->list);
> - INIT_LIST_HEAD(&dd->hotplug_list);
>
> spin_lock(&ds->dd_lock);
> list_add_tail(&dd->list, &ds->dependent_devices);
> @@ -120,36 +125,19 @@ add_dock_dependent_device(struct dock_station *ds, acpi_handle handle)
> return 0;
> }
>
> -/**
> - * dock_add_hotplug_device - associate a hotplug handler with the dock station
> - * @ds: The dock station
> - * @dd: The dependent device struct
> - *
> - * Add the dependent device to the dock's hotplug device list
> - */
> -static void
> -dock_add_hotplug_device(struct dock_station *ds,
> - struct dock_dependent_device *dd)
> +static void hotplug_info_get(struct klist_node *node)
> {
> - mutex_lock(&ds->hp_lock);
> - list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
> - mutex_unlock(&ds->hp_lock);
> + struct dock_hotplug_info *info = node_to_info(node);
> +
> + info->ops->get(info->context);
> }
>
> -/**
> - * dock_del_hotplug_device - remove a hotplug handler from the dock station
> - * @ds: The dock station
> - * @dd: the dependent device struct
> - *
> - * Delete the dependent device from the dock's hotplug device list
> - */
> -static void
> -dock_del_hotplug_device(struct dock_station *ds,
> - struct dock_dependent_device *dd)
> +static void hotplug_info_put(struct klist_node *node)
> {
> - mutex_lock(&ds->hp_lock);
> - list_del(&dd->hotplug_list);
> - mutex_unlock(&ds->hp_lock);
> + struct dock_hotplug_info *info = node_to_info(node);
> +
> + info->ops->put(info->context);
> + kfree(info);
> }
>
> /**
> @@ -354,15 +342,22 @@ static void dock_remove_acpi_device(acpi_handle handle)
> static void hotplug_dock_devices(struct dock_station *ds, u32 event)
> {
> struct dock_dependent_device *dd;
> + struct klist_iter iter;
> + struct klist_node *node;
> + struct dock_hotplug_info *info;
>
> mutex_lock(&ds->hp_lock);
>
> /*
> * First call driver specific hotplug functions
> */
> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
> - if (dd->ops && dd->ops->handler)
> - dd->ops->handler(dd->handle, event, dd->context);
> + klist_iter_init(&ds->hotplug_devices, &iter);
> + while ((node = klist_next(&iter))) {
> + info = node_to_info(node);
> + if (info->ops && info->ops->handler)
> + info->ops->handler(info->handle, event, info->context);
> + }
> + klist_iter_exit(&iter);
>
> /*
> * Now make sure that an acpi_device is created for each
> @@ -384,7 +379,9 @@ static void dock_event(struct dock_station *ds, u32 event, int num)
> struct device *dev = &ds->dock_device->dev;
> char event_string[13];
> char *envp[] = { event_string, NULL };
> - struct dock_dependent_device *dd;
> + struct klist_iter iter;
> + struct klist_node *node;
> + struct dock_hotplug_info *info;
>
> if (num == UNDOCK_EVENT)
> sprintf(event_string, "EVENT=undock");
> @@ -398,9 +395,13 @@ static void dock_event(struct dock_station *ds, u32 event, int num)
> if (num == DOCK_EVENT)
> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
>
> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
> - if (dd->ops && dd->ops->uevent)
> - dd->ops->uevent(dd->handle, event, dd->context);
> + klist_iter_init(&ds->hotplug_devices, &iter);
> + while ((node = klist_next(&iter))) {
> + info = node_to_info(node);
> + if (info->ops && info->ops->handler)
> + info->ops->handler(info->handle, event, info->context);
> + }
> + klist_iter_exit(&iter);
>
> if (num != DOCK_EVENT)
> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
> @@ -580,12 +581,16 @@ register_hotplug_dock_device(acpi_handle handle, const struct acpi_dock_ops *ops
> void *context)
> {
> struct dock_dependent_device *dd;
> + struct dock_hotplug_info *info;
> struct dock_station *dock_station;
> int ret = -EINVAL;
>
> if (!dock_station_count)
> return -ENODEV;
>
> + if (ops == NULL || ops->get == NULL || ops->put == NULL)
> + return -EINVAL;
> +
> /*
> * make sure this handle is for a device dependent on the dock,
> * this would include the dock station itself
> @@ -598,9 +603,18 @@ register_hotplug_dock_device(acpi_handle handle, const struct acpi_dock_ops *ops
> */
> dd = find_dock_dependent_device(dock_station, handle);
> if (dd) {
> - dd->ops = ops;
> - dd->context = context;
> - dock_add_hotplug_device(dock_station, dd);
> + info = kzalloc(sizeof(*info), GFP_KERNEL);
> + if (!info) {
> + unregister_hotplug_dock_device(handle);
> + ret = -ENOMEM;
> + break;
> + }
> +
> + info->ops = ops;
> + info->context = context;
> + info->handle = dd->handle;
> + klist_add_tail(&info->node,
> + &dock_station->hotplug_devices);
> ret = 0;
> }
> }
> @@ -615,16 +629,22 @@ EXPORT_SYMBOL_GPL(register_hotplug_dock_device);
> */
> void unregister_hotplug_dock_device(acpi_handle handle)
> {
> - struct dock_dependent_device *dd;
> struct dock_station *dock_station;
> + struct klist_iter iter;
> + struct klist_node *node;
> + struct dock_hotplug_info *info;
>
> if (!dock_station_count)
> return;
>
> list_for_each_entry(dock_station, &dock_stations, sibling) {
> - dd = find_dock_dependent_device(dock_station, handle);
> - if (dd)
> - dock_del_hotplug_device(dock_station, dd);
> + klist_iter_init(&dock_station->hotplug_devices, &iter);
> + while ((node = klist_next(&iter))) {
> + info = node_to_info(node);
> + if (info->handle == handle)
> + klist_del(&info->node);
> + }
> + klist_iter_exit(&iter);
> }
> }
> EXPORT_SYMBOL_GPL(unregister_hotplug_dock_device);
> @@ -951,7 +971,8 @@ static int __init dock_add(acpi_handle handle)
> mutex_init(&dock_station->hp_lock);
> spin_lock_init(&dock_station->dd_lock);
> INIT_LIST_HEAD(&dock_station->sibling);
> - INIT_LIST_HEAD(&dock_station->hotplug_devices);
> + klist_init(&dock_station->hotplug_devices,
> + hotplug_info_get, hotplug_info_put);
> ATOMIC_INIT_NOTIFIER_HEAD(&dock_notifier_list);
> INIT_LIST_HEAD(&dock_station->dependent_devices);
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 716aa93..5d696f5 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -145,9 +145,24 @@ static int post_dock_fixups(struct notifier_block *nb, unsigned long val,
> return NOTIFY_OK;
> }
>
> +static void acpiphp_dock_get(void *data)
> +{
> + struct acpiphp_func *func = data;
> +
> + get_bridge(func->slot->bridge);
> +}
> +
> +static void acpiphp_dock_put(void *data)
> +{
> + struct acpiphp_func *func = data;
> +
> + put_bridge(func->slot->bridge);
> +}
>
> static const struct acpi_dock_ops acpiphp_dock_ops = {
> .handler = handle_hotplug_event_func,
> + .get = acpiphp_dock_get,
> + .put = acpiphp_dock_put,
> };
>
> /* Check whether the PCI device is managed by native PCIe hotplug driver */
> diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h
> index e6168a2..8fcc9ac 100644
> --- a/include/acpi/acpi_drivers.h
> +++ b/include/acpi/acpi_drivers.h
> @@ -115,6 +115,8 @@ void pci_acpi_crs_quirks(void);
> struct acpi_dock_ops {
> acpi_notify_handler handler;
> acpi_notify_handler uevent;
> + void (*get)(void *data);
> + void (*put)(void *data);
> };
>
> #if defined(CONFIG_ACPI_DOCK) || defined(CONFIG_ACPI_DOCK_MODULE)
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-15 01:44:40

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Sat 15 Jun 2013 06:21:02 AM CST, Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 03:27:59 AM Jiang Liu wrote:
>> This is a preparation for next patch to avoid breaking bisecting.
>> If next patch is applied without this one, it will cause deadlock
>> as below:
>>
>> Case 1:
>> [ 31.015593] Possible unsafe locking scenario:
>>
>> [ 31.018350] CPU0 CPU1
>> [ 31.019691] ---- ----
>> [ 31.021002] lock(&dock_station->hp_lock);
>> [ 31.022327] lock(&slot->crit_sect);
>> [ 31.023650] lock(&dock_station->hp_lock);
>> [ 31.025010] lock(&slot->crit_sect);
>> [ 31.026342]
>>
>> Case 2:
>> hotplug_dock_devices()
>> mutex_lock(&ds->hp_lock)
>> dd->ops->handler()
>> register_hotplug_dock_device()
>> mutex_lock(&ds->hp_lock)
>> [ 34.316570] [ INFO: possible recursive locking detected ]
>> [ 34.316573] 3.10.0-rc4 #6 Tainted: G C
>> [ 34.316575] ---------------------------------------------
>> [ 34.316577] kworker/0:0/4 is trying to acquire lock:
>> [ 34.316579] (&dock_station->hp_lock){+.+.+.}, at:
>> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
>> [ 34.316588]
>> but task is already holding lock:
>> [ 34.316590] (&dock_station->hp_lock){+.+.+.}, at:
>> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
>> [ 34.316595]
>> other info that might help us debug this:
>> [ 34.316597] Possible unsafe locking scenario:
>>
>> [ 34.316599] CPU0
>> [ 34.316601] ----
>> [ 34.316602] lock(&dock_station->hp_lock);
>> [ 34.316605] lock(&dock_station->hp_lock);
>> [ 34.316608]
>> *** DEADLOCK ***
>>
>> So fix this deadlock by not taking ds->hp_lock in function
>> register_hotplug_dock_device(). This patch also fixes a possible
>> race conditions in function dock_event() because previously it
>> accesses ds->hotplug_devices list without holding ds->hp_lock.
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Cc: "Rafael J. Wysocki" <[email protected]>
>> Cc: Bjorn Helgaas <[email protected]>
>> Cc: Yinghai Lu <[email protected]>
>> Cc: Yijing Wang <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: <[email protected]> # 3.8+
>> ---
>> drivers/acpi/dock.c | 109 ++++++++++++++++++++++---------------
>> drivers/pci/hotplug/acpiphp_glue.c | 15 +++++
>> include/acpi/acpi_drivers.h | 2 +
>> 3 files changed, 82 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
>> index 02b0563..602bce5 100644
>> --- a/drivers/acpi/dock.c
>> +++ b/drivers/acpi/dock.c
>> @@ -66,7 +66,7 @@ struct dock_station {
>> spinlock_t dd_lock;
>> struct mutex hp_lock;
>> struct list_head dependent_devices;
>> - struct list_head hotplug_devices;
>> + struct klist hotplug_devices;
>>
>> struct list_head sibling;
>> struct platform_device *dock_device;
>> @@ -76,12 +76,18 @@ static int dock_station_count;
>>
>> struct dock_dependent_device {
>> struct list_head list;
>> - struct list_head hotplug_list;
>> + acpi_handle handle;
>> +};
>> +
>> +struct dock_hotplug_info {
>> + struct klist_node node;
>> acpi_handle handle;
>> const struct acpi_dock_ops *ops;
>> void *context;
>> };
>
> Can we please relax a bit and possibly take a step back?
>
> So since your last reply to me wasn't particularly helpful, I went through the
> code in dock.c and acpiphp_glue.c and I simply think that the whole
> hotplug_list thing is simply redundant.
>
> It looks like instead of using it (or the klist in this patch), we can add a
> "hotlpug_device" flag to dock_dependent_device and set that flag instead of
> adding dd to hotplug_devices or clear it instead of removing dd from that list.
>
> That would allow us to avoid the deadlock, because we wouldn't need the hp_lock
> any more and perhaps we could make the code simpler instead of making it more
> complex.
>
> How does that sound?
>
> Rafael
Hi Rafael,
Thanks for comments! It would be great if we could kill the
hotplug_devices
list so thing gets simple. But there are still some special cases:(

As you have mentioned, ds->hp_lock is used to make both addition and
removal
of hotplug devices wait for us to complete walking ds->hotplug_devices.
So it acts as two roles:
1) protect the hotplug_devices list,
2) serialize unregister_hotplug_dock_device() and
hotplug_dock_devices() so
the dock driver doesn't access registered handler and associated data
structure
once returing from unregister_hotplug_dock_device().

If we simply use a flag to mark presence of registered callback, we
can't achieve
the second goal. Take the sony laptop as an example. It has several PCI
hotplug
slot associated with the dock station:
[ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB
[ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM0
[ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM1
[ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM2
[ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
[ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
[ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
[ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
[ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB

So it still has some race windows if we undock the station while
repeatedly rescanning/removing
the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces. For
example, thread 1 is
handling undocking event, walking the dependent device list and
invoking registered callback
handler with associated data. While that, thread 2 may step in to
unregister the callback for
\_SB_.PCI0.RP07.LPMB.LPM0. Then thread 1 may cause access-after-free
issue.

The klist patch solves this issue by adding a "put" callback method to
explicitly notify
dock client that the dock core has done with previously registered
handler and associated
data.

>
>
>> +#define node_to_info(n) container_of((n), struct dock_hotplug_info, node)
>> +
>> #define DOCK_DOCKING 0x00000001
>> #define DOCK_UNDOCKING 0x00000002
>> #define DOCK_IS_DOCK 0x00000010
>> @@ -111,7 +117,6 @@ add_dock_dependent_device(struct dock_station *ds, acpi_handle handle)
>>
>> dd->handle = handle;
>> INIT_LIST_HEAD(&dd->list);
>> - INIT_LIST_HEAD(&dd->hotplug_list);
>>
>> spin_lock(&ds->dd_lock);
>> list_add_tail(&dd->list, &ds->dependent_devices);
>> @@ -120,36 +125,19 @@ add_dock_dependent_device(struct dock_station *ds, acpi_handle handle)
>> return 0;
>> }
>>
>> -/**
>> - * dock_add_hotplug_device - associate a hotplug handler with the dock station
>> - * @ds: The dock station
>> - * @dd: The dependent device struct
>> - *
>> - * Add the dependent device to the dock's hotplug device list
>> - */
>> -static void
>> -dock_add_hotplug_device(struct dock_station *ds,
>> - struct dock_dependent_device *dd)
>> +static void hotplug_info_get(struct klist_node *node)
>> {
>> - mutex_lock(&ds->hp_lock);
>> - list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
>> - mutex_unlock(&ds->hp_lock);
>> + struct dock_hotplug_info *info = node_to_info(node);
>> +
>> + info->ops->get(info->context);
>> }
>>
>> -/**
>> - * dock_del_hotplug_device - remove a hotplug handler from the dock station
>> - * @ds: The dock station
>> - * @dd: the dependent device struct
>> - *
>> - * Delete the dependent device from the dock's hotplug device list
>> - */
>> -static void
>> -dock_del_hotplug_device(struct dock_station *ds,
>> - struct dock_dependent_device *dd)
>> +static void hotplug_info_put(struct klist_node *node)
>> {
>> - mutex_lock(&ds->hp_lock);
>> - list_del(&dd->hotplug_list);
>> - mutex_unlock(&ds->hp_lock);
>> + struct dock_hotplug_info *info = node_to_info(node);
>> +
>> + info->ops->put(info->context);
>> + kfree(info);
>> }
>>
>> /**
>> @@ -354,15 +342,22 @@ static void dock_remove_acpi_device(acpi_handle handle)
>> static void hotplug_dock_devices(struct dock_station *ds, u32 event)
>> {
>> struct dock_dependent_device *dd;
>> + struct klist_iter iter;
>> + struct klist_node *node;
>> + struct dock_hotplug_info *info;
>>
>> mutex_lock(&ds->hp_lock);
>>
>> /*
>> * First call driver specific hotplug functions
>> */
>> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
>> - if (dd->ops && dd->ops->handler)
>> - dd->ops->handler(dd->handle, event, dd->context);
>> + klist_iter_init(&ds->hotplug_devices, &iter);
>> + while ((node = klist_next(&iter))) {
>> + info = node_to_info(node);
>> + if (info->ops && info->ops->handler)
>> + info->ops->handler(info->handle, event, info->context);
>> + }
>> + klist_iter_exit(&iter);
>>
>> /*
>> * Now make sure that an acpi_device is created for each
>> @@ -384,7 +379,9 @@ static void dock_event(struct dock_station *ds, u32 event, int num)
>> struct device *dev = &ds->dock_device->dev;
>> char event_string[13];
>> char *envp[] = { event_string, NULL };
>> - struct dock_dependent_device *dd;
>> + struct klist_iter iter;
>> + struct klist_node *node;
>> + struct dock_hotplug_info *info;
>>
>> if (num == UNDOCK_EVENT)
>> sprintf(event_string, "EVENT=undock");
>> @@ -398,9 +395,13 @@ static void dock_event(struct dock_station *ds, u32 event, int num)
>> if (num == DOCK_EVENT)
>> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
>>
>> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
>> - if (dd->ops && dd->ops->uevent)
>> - dd->ops->uevent(dd->handle, event, dd->context);
>> + klist_iter_init(&ds->hotplug_devices, &iter);
>> + while ((node = klist_next(&iter))) {
>> + info = node_to_info(node);
>> + if (info->ops && info->ops->handler)
>> + info->ops->handler(info->handle, event, info->context);
>> + }
>> + klist_iter_exit(&iter);
>>
>> if (num != DOCK_EVENT)
>> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
>> @@ -580,12 +581,16 @@ register_hotplug_dock_device(acpi_handle handle, const struct acpi_dock_ops *ops
>> void *context)
>> {
>> struct dock_dependent_device *dd;
>> + struct dock_hotplug_info *info;
>> struct dock_station *dock_station;
>> int ret = -EINVAL;
>>
>> if (!dock_station_count)
>> return -ENODEV;
>>
>> + if (ops == NULL || ops->get == NULL || ops->put == NULL)
>> + return -EINVAL;
>> +
>> /*
>> * make sure this handle is for a device dependent on the dock,
>> * this would include the dock station itself
>> @@ -598,9 +603,18 @@ register_hotplug_dock_device(acpi_handle handle, const struct acpi_dock_ops *ops
>> */
>> dd = find_dock_dependent_device(dock_station, handle);
>> if (dd) {
>> - dd->ops = ops;
>> - dd->context = context;
>> - dock_add_hotplug_device(dock_station, dd);
>> + info = kzalloc(sizeof(*info), GFP_KERNEL);
>> + if (!info) {
>> + unregister_hotplug_dock_device(handle);
>> + ret = -ENOMEM;
>> + break;
>> + }
>> +
>> + info->ops = ops;
>> + info->context = context;
>> + info->handle = dd->handle;
>> + klist_add_tail(&info->node,
>> + &dock_station->hotplug_devices);
>> ret = 0;
>> }
>> }
>> @@ -615,16 +629,22 @@ EXPORT_SYMBOL_GPL(register_hotplug_dock_device);
>> */
>> void unregister_hotplug_dock_device(acpi_handle handle)
>> {
>> - struct dock_dependent_device *dd;
>> struct dock_station *dock_station;
>> + struct klist_iter iter;
>> + struct klist_node *node;
>> + struct dock_hotplug_info *info;
>>
>> if (!dock_station_count)
>> return;
>>
>> list_for_each_entry(dock_station, &dock_stations, sibling) {
>> - dd = find_dock_dependent_device(dock_station, handle);
>> - if (dd)
>> - dock_del_hotplug_device(dock_station, dd);
>> + klist_iter_init(&dock_station->hotplug_devices, &iter);
>> + while ((node = klist_next(&iter))) {
>> + info = node_to_info(node);
>> + if (info->handle == handle)
>> + klist_del(&info->node);
>> + }
>> + klist_iter_exit(&iter);
>> }
>> }
>> EXPORT_SYMBOL_GPL(unregister_hotplug_dock_device);
>> @@ -951,7 +971,8 @@ static int __init dock_add(acpi_handle handle)
>> mutex_init(&dock_station->hp_lock);
>> spin_lock_init(&dock_station->dd_lock);
>> INIT_LIST_HEAD(&dock_station->sibling);
>> - INIT_LIST_HEAD(&dock_station->hotplug_devices);
>> + klist_init(&dock_station->hotplug_devices,
>> + hotplug_info_get, hotplug_info_put);
>> ATOMIC_INIT_NOTIFIER_HEAD(&dock_notifier_list);
>> INIT_LIST_HEAD(&dock_station->dependent_devices);
>>
>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>> index 716aa93..5d696f5 100644
>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>> @@ -145,9 +145,24 @@ static int post_dock_fixups(struct notifier_block *nb, unsigned long val,
>> return NOTIFY_OK;
>> }
>>
>> +static void acpiphp_dock_get(void *data)
>> +{
>> + struct acpiphp_func *func = data;
>> +
>> + get_bridge(func->slot->bridge);
>> +}
>> +
>> +static void acpiphp_dock_put(void *data)
>> +{
>> + struct acpiphp_func *func = data;
>> +
>> + put_bridge(func->slot->bridge);
>> +}
>>
>> static const struct acpi_dock_ops acpiphp_dock_ops = {
>> .handler = handle_hotplug_event_func,
>> + .get = acpiphp_dock_get,
>> + .put = acpiphp_dock_put,
>> };
>>
>> /* Check whether the PCI device is managed by native PCIe hotplug driver */
>> diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h
>> index e6168a2..8fcc9ac 100644
>> --- a/include/acpi/acpi_drivers.h
>> +++ b/include/acpi/acpi_drivers.h
>> @@ -115,6 +115,8 @@ void pci_acpi_crs_quirks(void);
>> struct acpi_dock_ops {
>> acpi_notify_handler handler;
>> acpi_notify_handler uevent;
>> + void (*get)(void *data);
>> + void (*put)(void *data);
>> };
>>
>> #if defined(CONFIG_ACPI_DOCK) || defined(CONFIG_ACPI_DOCK_MODULE)
>>

2013-06-15 06:42:55

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/15 Jiang Liu <[email protected]>:
> Alexander E. Patrakov <[email protected]> reports two bugs related to
> dock station support on Sony VAIO VPCZ23A4R. Actually there are at least
> four bugs related to Sony VAIO VPCZ23A4R dock support.
> 1) can't correctly detect hotplug slot for dock state
> 2) resource leak on undocking
> 3) resource allocation failure for dock devices
> 4) one bug in intel_snd_hda driver
5) one or two bugs in radeon driver

As for the subject - only a hotplug-related part of bug 59581 is fixed.

> Hi Alexander, could you please help to test the whole patchset?
> Really get sleepy, it's already 3 oclock now, so please forgive me
> if there any stupid mistakes.

I have tested this patchset on top of 3.10-rc5 using the following two
testcases.

1. Boot docked with radeon blacklisted. Switch to text-based virtual
console, undock, dock, modprobe radeon, rmmod radeon, undock, dock,
modprobe radeon again. Tested in this sequence because Alex Deucher
told me in bug 59681 to rmmod radeon before undocking.

2. Boot undocked, with radeon not blacklisted. Stay in X. Dock,
undock, dock, undock, dock.

Both cases work, and exhibit similar backtraces in dmesg. So I am
attaching a dmesg only from the first testcase. Please look for "INFO:
trying to register non-static key" and for "*ERROR* Memory manager not
clean. Delaying takedown". I understand that the second trace is
unrelated to this patchset, but would like you to fix the first.

Note that the snd_hda_intel bug somehow didn't manifest itself today,
probably because the TV that is connected to the HDMI output of the
radeon card was off and because Xorg never really tried to use the
card.

--
Alexander E. Patrakov


Attachments:
dmesg.txt (98.78 kB)

2013-06-15 06:50:39

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUGFIX v2 3/4] PCI, ACPI: fix device destroying order issue when handling dock notification

On Fri, Jun 14, 2013 at 12:28 PM, Jiang Liu <[email protected]> wrote:
> Current ACPI glue logic expects that physical devices are destroyed
> before destroying companion ACPI devices, otherwise it will break the
> ACPI unbind logic and cause following warning messages:
> [ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt
> [ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
> [ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
> [ 180.013656] port1: Oops, 'acpi_handle' corrupt
> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
> for full log message.
>
> Above warning messages are caused by following scenario:
> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
> ->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to
> destroy physical devices, then destroys all affected ACPI devices.
> Everything seems perfect until now. But the acpiphp dock notification
> handler will queue another task (T2) onto kacpi_hotplug_wq to really
> destroy affected physical devices.
> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
> been destroyed.
> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical
> devices.
>
> So it breaks ACPI glue logic's expection because ACPI devices are destroyed
> in step 3 and physical devices are destroyed in step 5.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Reported-by: Alexander E. Patrakov <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Yinghai Lu <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected] # 3.8+

Acked-by: Yinghai Lu <[email protected]>

> ---
> drivers/pci/hotplug/acpiphp_glue.c | 32 ++++++++++++++++++--------------
> 1 file changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 5d696f5..a65203b 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -61,6 +61,8 @@ static DEFINE_MUTEX(bridge_mutex);
> static void handle_hotplug_event_bridge (acpi_handle, u32, void *);
> static void acpiphp_sanitize_bus(struct pci_bus *bus);
> static void acpiphp_set_hpp_values(struct pci_bus *bus);
> +static void _handle_hotplug_event_func(acpi_handle handle, u32 type,
> + void *context);
> static void handle_hotplug_event_func(acpi_handle handle, u32 type, void *context);
> static void free_bridge(struct kref *kref);
>
> @@ -160,7 +162,7 @@ static void acpiphp_dock_put(void *data)
> }
>
> static const struct acpi_dock_ops acpiphp_dock_ops = {
> - .handler = handle_hotplug_event_func,
> + .handler = _handle_hotplug_event_func,
> .get = acpiphp_dock_get,
> .put = acpiphp_dock_put,
> };
> @@ -1080,22 +1082,13 @@ static void handle_hotplug_event_bridge(acpi_handle handle, u32 type,
> alloc_acpi_hp_work(handle, type, context, _handle_hotplug_event_bridge);
> }
>
> -static void _handle_hotplug_event_func(struct work_struct *work)
> +static void _handle_hotplug_event_func(acpi_handle handle, u32 type,
> + void *context)
> {
> - struct acpiphp_func *func;
> + struct acpiphp_func *func = context;
> char objname[64];
> struct acpi_buffer buffer = { .length = sizeof(objname),
> .pointer = objname };
> - struct acpi_hp_work *hp_work;
> - acpi_handle handle;
> - u32 type;
> -
> - hp_work = container_of(work, struct acpi_hp_work, work);
> - handle = hp_work->handle;
> - type = hp_work->type;
> - func = (struct acpiphp_func *)hp_work->context;
> -
> - acpi_scan_lock_acquire();
>
> acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);
>
> @@ -1128,7 +1121,18 @@ static void _handle_hotplug_event_func(struct work_struct *work)
> warn("notify_handler: unknown event type 0x%x for %s\n", type, objname);
> break;
> }
> +}
> +
> +static void _handle_hotplug_event_cb(struct work_struct *work)
> +{
> + struct acpiphp_func *func;
> + struct acpi_hp_work *hp_work;
>
> + hp_work = container_of(work, struct acpi_hp_work, work);
> + func = (struct acpiphp_func *)hp_work->context;
> + acpi_scan_lock_acquire();
> + _handle_hotplug_event_func(hp_work->handle, hp_work->type,
> + hp_work->context);
> acpi_scan_lock_release();
> kfree(hp_work); /* allocated in handle_hotplug_event_func */
> put_bridge(func->slot->bridge);
> @@ -1156,7 +1160,7 @@ static void handle_hotplug_event_func(acpi_handle handle, u32 type,
> * don't deadlock on hotplug actions.
> */
> get_bridge(func->slot->bridge);
> - alloc_acpi_hp_work(handle, type, context, _handle_hotplug_event_func);
> + alloc_acpi_hp_work(handle, type, context, _handle_hotplug_event_cb);
> }
>
> /*
> --
> 1.8.1.2
>

2013-06-15 06:51:40

by Yinghai Lu

[permalink] [raw]
Subject: Re: [BUGFIX v2 1/4] ACPI, DOCK: initialize dock subsystem before scanning PCI root buses

On Fri, Jun 14, 2013 at 12:27 PM, Jiang Liu <[email protected]> wrote:
> Changeset "3b63aaa70e1 PCI: acpiphp: Do not use ACPI PCI subdriver
> mechanism" causes a regression which breaks ACPI dock support,
> please refer to https://bugzilla.kernel.org/show_bug.cgi?id=59501
>
> The root cause is that changeset 3b63aaa70e1 changed the relative
> initialization order of ACPI dock subsystem and acpiphp driver,
> and acpiphp driver has dependency on ACPI dock subsystem's
> initialization result, so that acpiphp can't correctly detect ACPI
> dock stations now.
>
> On the other hand, ACPI dock is a built-in driver, so we could
> explicitly initialize it before the acpiphp driver is used.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Reported-by: Alexander E. Patrakov <[email protected]>
> Tested-by: Alexander E. Patrakov <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: <[email protected]> # 3.9+
> ---
> drivers/acpi/dock.c | 7 +------
> drivers/acpi/internal.h | 5 +++++
> drivers/acpi/scan.c | 1 +
> 3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> index 4fdea38..02b0563 100644
> --- a/drivers/acpi/dock.c
> +++ b/drivers/acpi/dock.c
> @@ -1033,7 +1033,7 @@ find_dock_and_bay(acpi_handle handle, u32 lvl, void *context, void **rv)
> return AE_OK;
> }
>
> -static int __init dock_init(void)
> +int __init acpi_dock_init(void)
> {
> if (acpi_disabled)
> return 0;
> @@ -1062,9 +1062,4 @@ static void __exit dock_exit(void)
> dock_remove(dock_station);
> }
>
> -/*
> - * Must be called before drivers of devices in dock, otherwise we can't know
> - * which devices are in a dock
> - */
> -subsys_initcall(dock_init);
> module_exit(dock_exit);

why not remove dock_exit?

> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 297cbf4..c610a76 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -40,6 +40,11 @@ void acpi_container_init(void);
> #else
> static inline void acpi_container_init(void) {}
> #endif
> +#ifdef CONFIG_ACPI_DOCK
> +void acpi_dock_init(void);
> +#else
> +static inline void acpi_dock_init(void) {}
> +#endif
> #ifdef CONFIG_ACPI_HOTPLUG_MEMORY
> void acpi_memory_hotplug_init(void);
> #else
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 44225cb..4148163 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -2045,6 +2045,7 @@ int __init acpi_scan_init(void)
> acpi_lpss_init();
> acpi_container_init();
> acpi_memory_hotplug_init();
> + acpi_dock_init();
>
> mutex_lock(&acpi_scan_lock);
> /*
> --
> 1.8.1.2
>

2013-06-15 07:25:51

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/15 Alexander E. Patrakov <[email protected]>:
> Note that the snd_hda_intel bug somehow didn't manifest itself today,
> probably because the TV that is connected to the HDMI output of the
> radeon card was off and because Xorg never really tried to use the
> card.

Well, it did. The sympthoms are the same as before: incomplete
undocking. The 16:00.1 device (HD audio controller on the radeon card)
has disappeared from lspci output, while 16:00.0 (the radeon card
itself) didn't, and the "Docked" LED didn't turn off. Fixed up by this
command:

fuser -k /dev/snd/* ; rmmod snd-hda-intel

thus confirming again that the extra references that acpiphp waits to
go away are from open fds.

--
Alexander E. Patrakov

2013-06-15 10:05:41

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 1/4] ACPI, DOCK: initialize dock subsystem before scanning PCI root buses

On 06/15/2013 02:51 PM, Yinghai Lu wrote:
> On Fri, Jun 14, 2013 at 12:27 PM, Jiang Liu <[email protected]> wrote:
>> Changeset "3b63aaa70e1 PCI: acpiphp: Do not use ACPI PCI subdriver
>> mechanism" causes a regression which breaks ACPI dock support,
>> please refer to https://bugzilla.kernel.org/show_bug.cgi?id=59501
>>
>> The root cause is that changeset 3b63aaa70e1 changed the relative
>> initialization order of ACPI dock subsystem and acpiphp driver,
>> and acpiphp driver has dependency on ACPI dock subsystem's
>> initialization result, so that acpiphp can't correctly detect ACPI
>> dock stations now.
>>
>> On the other hand, ACPI dock is a built-in driver, so we could
>> explicitly initialize it before the acpiphp driver is used.
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Reported-by: Alexander E. Patrakov <[email protected]>
>> Tested-by: Alexander E. Patrakov <[email protected]>
>> Cc: "Rafael J. Wysocki" <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: <[email protected]> # 3.9+
>> ---
>> drivers/acpi/dock.c | 7 +------
>> drivers/acpi/internal.h | 5 +++++
>> drivers/acpi/scan.c | 1 +
>> 3 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
>> index 4fdea38..02b0563 100644
>> --- a/drivers/acpi/dock.c
>> +++ b/drivers/acpi/dock.c
>> @@ -1033,7 +1033,7 @@ find_dock_and_bay(acpi_handle handle, u32 lvl, void *context, void **rv)
>> return AE_OK;
>> }
>>
>> -static int __init dock_init(void)
>> +int __init acpi_dock_init(void)
>> {
>> if (acpi_disabled)
>> return 0;
>> @@ -1062,9 +1062,4 @@ static void __exit dock_exit(void)
>> dock_remove(dock_station);
>> }
>>
>> -/*
>> - * Must be called before drivers of devices in dock, otherwise we can't know
>> - * which devices are in a dock
>> - */
>> -subsys_initcall(dock_init);
>> module_exit(dock_exit);
>
> why not remove dock_exit?
I have a pending patchset to clean up dock driver, Rafael suggested to
focus on bugfix first, so I will send out the clean up patchset later.

>
>> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
>> index 297cbf4..c610a76 100644
>> --- a/drivers/acpi/internal.h
>> +++ b/drivers/acpi/internal.h
>> @@ -40,6 +40,11 @@ void acpi_container_init(void);
>> #else
>> static inline void acpi_container_init(void) {}
>> #endif
>> +#ifdef CONFIG_ACPI_DOCK
>> +void acpi_dock_init(void);
>> +#else
>> +static inline void acpi_dock_init(void) {}
>> +#endif
>> #ifdef CONFIG_ACPI_HOTPLUG_MEMORY
>> void acpi_memory_hotplug_init(void);
>> #else
>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>> index 44225cb..4148163 100644
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ -2045,6 +2045,7 @@ int __init acpi_scan_init(void)
>> acpi_lpss_init();
>> acpi_container_init();
>> acpi_memory_hotplug_init();
>> + acpi_dock_init();
>>
>> mutex_lock(&acpi_scan_lock);
>> /*
>> --
>> 1.8.1.2
>>

2013-06-15 19:54:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 1/4] ACPI, DOCK: initialize dock subsystem before scanning PCI root buses

On Saturday, June 15, 2013 06:05:31 PM Jiang Liu wrote:
> On 06/15/2013 02:51 PM, Yinghai Lu wrote:
> > On Fri, Jun 14, 2013 at 12:27 PM, Jiang Liu <[email protected]> wrote:
> >> Changeset "3b63aaa70e1 PCI: acpiphp: Do not use ACPI PCI subdriver
> >> mechanism" causes a regression which breaks ACPI dock support,
> >> please refer to https://bugzilla.kernel.org/show_bug.cgi?id=59501
> >>
> >> The root cause is that changeset 3b63aaa70e1 changed the relative
> >> initialization order of ACPI dock subsystem and acpiphp driver,
> >> and acpiphp driver has dependency on ACPI dock subsystem's
> >> initialization result, so that acpiphp can't correctly detect ACPI
> >> dock stations now.
> >>
> >> On the other hand, ACPI dock is a built-in driver, so we could
> >> explicitly initialize it before the acpiphp driver is used.
> >>
> >> Signed-off-by: Jiang Liu <[email protected]>
> >> Reported-by: Alexander E. Patrakov <[email protected]>
> >> Tested-by: Alexander E. Patrakov <[email protected]>
> >> Cc: "Rafael J. Wysocki" <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Cc: <[email protected]> # 3.9+
> >> ---
> >> drivers/acpi/dock.c | 7 +------
> >> drivers/acpi/internal.h | 5 +++++
> >> drivers/acpi/scan.c | 1 +
> >> 3 files changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> >> index 4fdea38..02b0563 100644
> >> --- a/drivers/acpi/dock.c
> >> +++ b/drivers/acpi/dock.c
> >> @@ -1033,7 +1033,7 @@ find_dock_and_bay(acpi_handle handle, u32 lvl, void *context, void **rv)
> >> return AE_OK;
> >> }
> >>
> >> -static int __init dock_init(void)
> >> +int __init acpi_dock_init(void)
> >> {
> >> if (acpi_disabled)
> >> return 0;
> >> @@ -1062,9 +1062,4 @@ static void __exit dock_exit(void)
> >> dock_remove(dock_station);
> >> }
> >>
> >> -/*
> >> - * Must be called before drivers of devices in dock, otherwise we can't know
> >> - * which devices are in a dock
> >> - */
> >> -subsys_initcall(dock_init);
> >> module_exit(dock_exit);
> >
> > why not remove dock_exit?
> I have a pending patchset to clean up dock driver, Rafael suggested to
> focus on bugfix first, so I will send out the clean up patchset later.

Well, you can remove dock_exit() in the same patch. This is kind of not
directly related to the fix, but since you're changing it from modular to
non-modular, it'd be prudent to remove all of the "modular" code in the
process.

Thanks,
Rafael


> >> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> >> index 297cbf4..c610a76 100644
> >> --- a/drivers/acpi/internal.h
> >> +++ b/drivers/acpi/internal.h
> >> @@ -40,6 +40,11 @@ void acpi_container_init(void);
> >> #else
> >> static inline void acpi_container_init(void) {}
> >> #endif
> >> +#ifdef CONFIG_ACPI_DOCK
> >> +void acpi_dock_init(void);
> >> +#else
> >> +static inline void acpi_dock_init(void) {}
> >> +#endif
> >> #ifdef CONFIG_ACPI_HOTPLUG_MEMORY
> >> void acpi_memory_hotplug_init(void);
> >> #else
> >> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> >> index 44225cb..4148163 100644
> >> --- a/drivers/acpi/scan.c
> >> +++ b/drivers/acpi/scan.c
> >> @@ -2045,6 +2045,7 @@ int __init acpi_scan_init(void)
> >> acpi_lpss_init();
> >> acpi_container_init();
> >> acpi_memory_hotplug_init();
> >> + acpi_dock_init();
> >>
> >> mutex_lock(&acpi_scan_lock);
> >> /*
> >> --
> >> 1.8.1.2
> >>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-15 20:08:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
> On Sat 15 Jun 2013 06:21:02 AM CST, Rafael J. Wysocki wrote:
> > On Saturday, June 15, 2013 03:27:59 AM Jiang Liu wrote:
> >> This is a preparation for next patch to avoid breaking bisecting.
> >> If next patch is applied without this one, it will cause deadlock
> >> as below:
> >>
> >> Case 1:
> >> [ 31.015593] Possible unsafe locking scenario:
> >>
> >> [ 31.018350] CPU0 CPU1
> >> [ 31.019691] ---- ----
> >> [ 31.021002] lock(&dock_station->hp_lock);
> >> [ 31.022327] lock(&slot->crit_sect);
> >> [ 31.023650] lock(&dock_station->hp_lock);
> >> [ 31.025010] lock(&slot->crit_sect);
> >> [ 31.026342]
> >>
> >> Case 2:
> >> hotplug_dock_devices()
> >> mutex_lock(&ds->hp_lock)
> >> dd->ops->handler()
> >> register_hotplug_dock_device()
> >> mutex_lock(&ds->hp_lock)
> >> [ 34.316570] [ INFO: possible recursive locking detected ]
> >> [ 34.316573] 3.10.0-rc4 #6 Tainted: G C
> >> [ 34.316575] ---------------------------------------------
> >> [ 34.316577] kworker/0:0/4 is trying to acquire lock:
> >> [ 34.316579] (&dock_station->hp_lock){+.+.+.}, at:
> >> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
> >> [ 34.316588]
> >> but task is already holding lock:
> >> [ 34.316590] (&dock_station->hp_lock){+.+.+.}, at:
> >> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
> >> [ 34.316595]
> >> other info that might help us debug this:
> >> [ 34.316597] Possible unsafe locking scenario:
> >>
> >> [ 34.316599] CPU0
> >> [ 34.316601] ----
> >> [ 34.316602] lock(&dock_station->hp_lock);
> >> [ 34.316605] lock(&dock_station->hp_lock);
> >> [ 34.316608]
> >> *** DEADLOCK ***
> >>
> >> So fix this deadlock by not taking ds->hp_lock in function
> >> register_hotplug_dock_device(). This patch also fixes a possible
> >> race conditions in function dock_event() because previously it
> >> accesses ds->hotplug_devices list without holding ds->hp_lock.
> >>
> >> Signed-off-by: Jiang Liu <[email protected]>
> >> Cc: Len Brown <[email protected]>
> >> Cc: "Rafael J. Wysocki" <[email protected]>
> >> Cc: Bjorn Helgaas <[email protected]>
> >> Cc: Yinghai Lu <[email protected]>
> >> Cc: Yijing Wang <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Cc: <[email protected]> # 3.8+
> >> ---
> >> drivers/acpi/dock.c | 109 ++++++++++++++++++++++---------------
> >> drivers/pci/hotplug/acpiphp_glue.c | 15 +++++
> >> include/acpi/acpi_drivers.h | 2 +
> >> 3 files changed, 82 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> >> index 02b0563..602bce5 100644
> >> --- a/drivers/acpi/dock.c
> >> +++ b/drivers/acpi/dock.c
> >> @@ -66,7 +66,7 @@ struct dock_station {
> >> spinlock_t dd_lock;
> >> struct mutex hp_lock;
> >> struct list_head dependent_devices;
> >> - struct list_head hotplug_devices;
> >> + struct klist hotplug_devices;
> >>
> >> struct list_head sibling;
> >> struct platform_device *dock_device;
> >> @@ -76,12 +76,18 @@ static int dock_station_count;
> >>
> >> struct dock_dependent_device {
> >> struct list_head list;
> >> - struct list_head hotplug_list;
> >> + acpi_handle handle;
> >> +};
> >> +
> >> +struct dock_hotplug_info {
> >> + struct klist_node node;
> >> acpi_handle handle;
> >> const struct acpi_dock_ops *ops;
> >> void *context;
> >> };
> >
> > Can we please relax a bit and possibly take a step back?
> >
> > So since your last reply to me wasn't particularly helpful, I went through the
> > code in dock.c and acpiphp_glue.c and I simply think that the whole
> > hotplug_list thing is simply redundant.
> >
> > It looks like instead of using it (or the klist in this patch), we can add a
> > "hotlpug_device" flag to dock_dependent_device and set that flag instead of
> > adding dd to hotplug_devices or clear it instead of removing dd from that list.
> >
> > That would allow us to avoid the deadlock, because we wouldn't need the hp_lock
> > any more and perhaps we could make the code simpler instead of making it more
> > complex.
> >
> > How does that sound?
> >
> > Rafael
> Hi Rafael,
> Thanks for comments! It would be great if we could kill the
> hotplug_devices
> list so thing gets simple. But there are still some special cases:(
>
> As you have mentioned, ds->hp_lock is used to make both addition and
> removal
> of hotplug devices wait for us to complete walking ds->hotplug_devices.
> So it acts as two roles:
> 1) protect the hotplug_devices list,
> 2) serialize unregister_hotplug_dock_device() and
> hotplug_dock_devices() so
> the dock driver doesn't access registered handler and associated data
> structure
> once returing from unregister_hotplug_dock_device().

When it returns from unregister_hotplug_dock_device(), nothing prevents it
from accessing whatever it wants, because ds->hp_lock is not used outside
of the add/del and hotplug_dock_devices(). So, the actual role of
ds->hp_lock (not the one that it is supposed to play, but the real one)
is to prevent addition/deletion from happening when hotplug_dock_devices()
is running. [Yes, it does protect the list, but since the list is in fact
unnecessary, that doesn't matter.]

> If we simply use a flag to mark presence of registered callback, we
> can't achieve the second goal.

I don't mean using the flag *alone*.

> Take the sony laptop as an example. It has several PCI
> hotplug
> slot associated with the dock station:
> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB
> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM0
> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM1
> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM2
> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
>
> So it still has some race windows if we undock the station while
> repeatedly rescanning/removing
> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces. For
> example, thread 1 is
> handling undocking event, walking the dependent device list and
> invoking registered callback
> handler with associated data. While that, thread 2 may step in to
> unregister the callback for
> \_SB_.PCI0.RP07.LPMB.LPM0. Then thread 1 may cause access-after-free
> issue.

That should be handled in acpiphp_glue instead of dock. What you're trying to
do is to make dock work around synchronization issues in the acpiphp driver.

> The klist patch solves this issue by adding a "put" callback method to
> explicitly notify
> dock client that the dock core has done with previously registered
> handler and associated
> data.

Honestly, don't you think this is overly compilcated?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-15 21:11:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
> > On Sat 15 Jun 2013 06:21:02 AM CST, Rafael J. Wysocki wrote:
> > > On Saturday, June 15, 2013 03:27:59 AM Jiang Liu wrote:
> > >> This is a preparation for next patch to avoid breaking bisecting.
> > >> If next patch is applied without this one, it will cause deadlock
> > >> as below:
> > >>
> > >> Case 1:
> > >> [ 31.015593] Possible unsafe locking scenario:
> > >>
> > >> [ 31.018350] CPU0 CPU1
> > >> [ 31.019691] ---- ----
> > >> [ 31.021002] lock(&dock_station->hp_lock);
> > >> [ 31.022327] lock(&slot->crit_sect);
> > >> [ 31.023650] lock(&dock_station->hp_lock);
> > >> [ 31.025010] lock(&slot->crit_sect);
> > >> [ 31.026342]
> > >>
> > >> Case 2:
> > >> hotplug_dock_devices()
> > >> mutex_lock(&ds->hp_lock)
> > >> dd->ops->handler()
> > >> register_hotplug_dock_device()
> > >> mutex_lock(&ds->hp_lock)
> > >> [ 34.316570] [ INFO: possible recursive locking detected ]
> > >> [ 34.316573] 3.10.0-rc4 #6 Tainted: G C
> > >> [ 34.316575] ---------------------------------------------
> > >> [ 34.316577] kworker/0:0/4 is trying to acquire lock:
> > >> [ 34.316579] (&dock_station->hp_lock){+.+.+.}, at:
> > >> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
> > >> [ 34.316588]
> > >> but task is already holding lock:
> > >> [ 34.316590] (&dock_station->hp_lock){+.+.+.}, at:
> > >> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
> > >> [ 34.316595]
> > >> other info that might help us debug this:
> > >> [ 34.316597] Possible unsafe locking scenario:
> > >>
> > >> [ 34.316599] CPU0
> > >> [ 34.316601] ----
> > >> [ 34.316602] lock(&dock_station->hp_lock);
> > >> [ 34.316605] lock(&dock_station->hp_lock);
> > >> [ 34.316608]
> > >> *** DEADLOCK ***
> > >>
> > >> So fix this deadlock by not taking ds->hp_lock in function
> > >> register_hotplug_dock_device(). This patch also fixes a possible
> > >> race conditions in function dock_event() because previously it
> > >> accesses ds->hotplug_devices list without holding ds->hp_lock.
> > >>
> > >> Signed-off-by: Jiang Liu <[email protected]>
> > >> Cc: Len Brown <[email protected]>
> > >> Cc: "Rafael J. Wysocki" <[email protected]>
> > >> Cc: Bjorn Helgaas <[email protected]>
> > >> Cc: Yinghai Lu <[email protected]>
> > >> Cc: Yijing Wang <[email protected]>
> > >> Cc: [email protected]
> > >> Cc: [email protected]
> > >> Cc: [email protected]
> > >> Cc: <[email protected]> # 3.8+
> > >> ---
> > >> drivers/acpi/dock.c | 109 ++++++++++++++++++++++---------------
> > >> drivers/pci/hotplug/acpiphp_glue.c | 15 +++++
> > >> include/acpi/acpi_drivers.h | 2 +
> > >> 3 files changed, 82 insertions(+), 44 deletions(-)
> > >>
> > >> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> > >> index 02b0563..602bce5 100644
> > >> --- a/drivers/acpi/dock.c
> > >> +++ b/drivers/acpi/dock.c
> > >> @@ -66,7 +66,7 @@ struct dock_station {
> > >> spinlock_t dd_lock;
> > >> struct mutex hp_lock;
> > >> struct list_head dependent_devices;
> > >> - struct list_head hotplug_devices;
> > >> + struct klist hotplug_devices;
> > >>
> > >> struct list_head sibling;
> > >> struct platform_device *dock_device;
> > >> @@ -76,12 +76,18 @@ static int dock_station_count;
> > >>
> > >> struct dock_dependent_device {
> > >> struct list_head list;
> > >> - struct list_head hotplug_list;
> > >> + acpi_handle handle;
> > >> +};
> > >> +
> > >> +struct dock_hotplug_info {
> > >> + struct klist_node node;
> > >> acpi_handle handle;
> > >> const struct acpi_dock_ops *ops;
> > >> void *context;
> > >> };
> > >
> > > Can we please relax a bit and possibly take a step back?
> > >
> > > So since your last reply to me wasn't particularly helpful, I went through the
> > > code in dock.c and acpiphp_glue.c and I simply think that the whole
> > > hotplug_list thing is simply redundant.
> > >
> > > It looks like instead of using it (or the klist in this patch), we can add a
> > > "hotlpug_device" flag to dock_dependent_device and set that flag instead of
> > > adding dd to hotplug_devices or clear it instead of removing dd from that list.
> > >
> > > That would allow us to avoid the deadlock, because we wouldn't need the hp_lock
> > > any more and perhaps we could make the code simpler instead of making it more
> > > complex.
> > >
> > > How does that sound?
> > >
> > > Rafael
> > Hi Rafael,
> > Thanks for comments! It would be great if we could kill the
> > hotplug_devices
> > list so thing gets simple. But there are still some special cases:(
> >
> > As you have mentioned, ds->hp_lock is used to make both addition and
> > removal
> > of hotplug devices wait for us to complete walking ds->hotplug_devices.
> > So it acts as two roles:
> > 1) protect the hotplug_devices list,
> > 2) serialize unregister_hotplug_dock_device() and
> > hotplug_dock_devices() so
> > the dock driver doesn't access registered handler and associated data
> > structure
> > once returing from unregister_hotplug_dock_device().
>
> When it returns from unregister_hotplug_dock_device(), nothing prevents it
> from accessing whatever it wants, because ds->hp_lock is not used outside
> of the add/del and hotplug_dock_devices(). So, the actual role of
> ds->hp_lock (not the one that it is supposed to play, but the real one)
> is to prevent addition/deletion from happening when hotplug_dock_devices()
> is running. [Yes, it does protect the list, but since the list is in fact
> unnecessary, that doesn't matter.]
>
> > If we simply use a flag to mark presence of registered callback, we
> > can't achieve the second goal.
>
> I don't mean using the flag *alone*.
>
> > Take the sony laptop as an example. It has several PCI
> > hotplug
> > slot associated with the dock station:
> > [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB
> > [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM0
> > [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM1
> > [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2
> > [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
> > [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
> > [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
> > [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
> > [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
> >
> > So it still has some race windows if we undock the station while
> > repeatedly rescanning/removing
> > the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.

Which sysfs interfaces do you mean, by the way?

If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
should always be run under acpi_scan_lock too. It isn't at the moment,
because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
bug (so I'm going to send a patch to fix it in a while).

With that bug fixed, the possible race between acpi_eject_store() and
hotplug_dock_devices() should be prevented from happening, so perhaps we're
worrying about something that cannot happen?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-15 22:45:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Saturday, June 15, 2013 11:20:40 PM Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:

[...]

>
> Which sysfs interfaces do you mean, by the way?
>
> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
> should always be run under acpi_scan_lock too. It isn't at the moment,
> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
> bug (so I'm going to send a patch to fix it in a while).
>
> With that bug fixed, the possible race between acpi_eject_store() and
> hotplug_dock_devices() should be prevented from happening, so perhaps we're
> worrying about something that cannot happen?

So here's a question: What particular races are possible if we remove
ds->hp_lock entirely without doing anything else just yet? I mean, how to
*trigger* them from the start to the end and not how they can possibly happen
but never do, because there's no way they can be actually triggered?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-16 16:27:23

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On 06/16/2013 04:17 AM, Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
>> On Sat 15 Jun 2013 06:21:02 AM CST, Rafael J. Wysocki wrote:
[...]
>>> Can we please relax a bit and possibly take a step back?
>>>
>>> So since your last reply to me wasn't particularly helpful, I went through the
>>> code in dock.c and acpiphp_glue.c and I simply think that the whole
>>> hotplug_list thing is simply redundant.
>>>
>>> It looks like instead of using it (or the klist in this patch), we can add a
>>> "hotlpug_device" flag to dock_dependent_device and set that flag instead of
>>> adding dd to hotplug_devices or clear it instead of removing dd from that list.
>>>
>>> That would allow us to avoid the deadlock, because we wouldn't need the hp_lock
>>> any more and perhaps we could make the code simpler instead of making it more
>>> complex.
>>>
>>> How does that sound?
>>>
>>> Rafael
>> Hi Rafael,
>> Thanks for comments! It would be great if we could kill the
>> hotplug_devices
>> list so thing gets simple. But there are still some special cases:(
>>
>> As you have mentioned, ds->hp_lock is used to make both addition and
>> removal
>> of hotplug devices wait for us to complete walking ds->hotplug_devices.
>> So it acts as two roles:
>> 1) protect the hotplug_devices list,
>> 2) serialize unregister_hotplug_dock_device() and
>> hotplug_dock_devices() so
>> the dock driver doesn't access registered handler and associated data
>> structure
>> once returing from unregister_hotplug_dock_device().
>
> When it returns from unregister_hotplug_dock_device(), nothing prevents it
> from accessing whatever it wants, because ds->hp_lock is not used outside
> of the add/del and hotplug_dock_devices(). So, the actual role of
> ds->hp_lock (not the one that it is supposed to play, but the real one)
> is to prevent addition/deletion from happening when hotplug_dock_devices()
> is running. [Yes, it does protect the list, but since the list is in fact
> unnecessary, that doesn't matter.]
Hi Rafael,
With current implementation function dock_add_hotplug_device(),
dock_del_hotplug_device(), hotplug_dock_devices() and dock_event()
access hotplug_devices list, registered callback(ops) and associated
data(context).

Caller may free the associated data(context) once returned from function
unregister_hotplug_dock_device(), so the dock core shouldn't access the
data(context) any more after returning from
unregister_hotplug_dock_device(). With current implementation, this is
guaranteed by acquiring ds->hp_lock in function
dock_del_hotplug_device(). But there's another issue with current
implementation here, we should use ds->hp_lock to protect dock_event() too.

>
>> If we simply use a flag to mark presence of registered callback, we
>> can't achieve the second goal.
>
> I don't mean using the flag *alone*.
>
>> Take the sony laptop as an example. It has several PCI
>> hotplug
>> slot associated with the dock station:
>> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB
>> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM0
>> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM1
>> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Btus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM2
>> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
>> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
>> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
>> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
>> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
>>
>> So it still has some race windows if we undock the station while
>> repeatedly rescanning/removing
>> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces. For
>> example, thread 1 is
>> handling undocking event, walking the dependent device list and
>> invoking registered callback
>> handler with associated data. While that, thread 2 may step in to
>> unregister the callback for
>> \_SB_.PCI0.RP07.LPMB.LPM0. Then thread 1 may cause access-after-free
>> issue.
>
> That should be handled in acpiphp_glue instead of dock. What you're trying to
> do is to make dock work around synchronization issues in the acpiphp driver.
I'm not sure whether we could solve this issue from acpiphp_glue side.
If dock driver can't guarantee that it doesn't access registered
handler(ops) and associated data(context) after returning from
unregister_hotplug_dock_device(), it will be hard for acpiphp_glue to
manage the data structure(context). So I introduced a "put" method into
the acpi_dock_ops to help manage lifecycle of "context".

>
>> The klist patch solves this issue by adding a "put" callback method to
>> explicitly notify
>> dock client that the dock core has done with previously registered
>> handler and associated
>> data.
>
> Honestly, don't you think this is overly compilcated?
Yeah, I must admire that it's really a little over complicated, but I
can't find a simpler solution here:(

>
> Rafael
>
>

2013-06-16 17:02:02

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On 06/16/2013 05:20 AM, Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
>> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
[...]
>> When it returns from unregister_hotplug_dock_device(), nothing prevents it
>> from accessing whatever it wants, because ds->hp_lock is not used outside
>> of the add/del and hotplug_dock_devices(). So, the actual role of
>> ds->hp_lock (not the one that it is supposed to play, but the real one)
>> is to prevent addition/deletion from happening when hotplug_dock_devices()
>> is running. [Yes, it does protect the list, but since the list is in fact
>> unnecessary, that doesn't matter.]
>>
>>> If we simply use a flag to mark presence of registered callback, we
>>> can't achieve the second goal.
>>
>> I don't mean using the flag *alone*.
>>
>>> Take the sony laptop as an example. It has several PCI
>>> hotplug
>>> slot associated with the dock station:
>>> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB
>>> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM0
>>> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM1
>>> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2
>>> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
>>> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
>>> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
>>> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
>>> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
>>>
>>> So it still has some race windows if we undock the station while
>>> repeatedly rescanning/removing
>>> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.
>
> Which sysfs interfaces do you mean, by the way?
>
> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
> should always be run under acpi_scan_lock too. It isn't at the moment,
> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
> bug (so I'm going to send a patch to fix it in a while).
>
> With that bug fixed, the possible race between acpi_eject_store() and
> hotplug_dock_devices() should be prevented from happening, so perhaps we're
> worrying about something that cannot happen?
Hi Rafael,
I mean the "remove" method of each PCI device, and the "power" method
of PCI hotplug slot here.
These methods may be used to remove P2P bridges with associated ACPIPHP
hotplug slots, which in turn will cause invoking of
unregister_hotplug_dock_device().
So theoretical we may trigger the bug by undocking while repeatedly
adding/removing P2P bridges with ACPIPHP hotplug slot through PCI
"rescan" and "remove" sysfs interface,
Regards!
Gerry
>
> Rafael
>
>

2013-06-16 17:12:10

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On 06/16/2013 06:54 AM, Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 11:20:40 PM Rafael J. Wysocki wrote:
>> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
>
> [...]
>
>>
>> Which sysfs interfaces do you mean, by the way?
>>
>> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
>> should always be run under acpi_scan_lock too. It isn't at the moment,
>> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
>> bug (so I'm going to send a patch to fix it in a while).
>>
>> With that bug fixed, the possible race between acpi_eject_store() and
>> hotplug_dock_devices() should be prevented from happening, so perhaps we're
>> worrying about something that cannot happen?
>
> So here's a question: What particular races are possible if we remove
> ds->hp_lock entirely without doing anything else just yet? I mean, how to
> *trigger* them from the start to the end and not how they can possibly happen
> but never do, because there's no way they can be actually triggered?
Hi Rafael,
I have no really platform which triggers this bug, but I may imagine
a possible platform if it's valid for explanation.
Let's think about a laptop dock station with a thunderbolt
controller built-in. The dock station is managed by dock driver and
acpiphp driver. And the PCIe hierarchy managed by the thunderbolt
controller may be managed by dock driver and ACPIPHP driver too.
So it may trigger the issue by pressing the dock button and unplugging
thunderbolt cable concurrently.
But after all, this is all by imagination:). We may need to find a
simple and quick solution for 3.10 and the stable trees and enhance the
solution later to avoid introducing new bugs while fixing a bug.
Regards!
Gerry
>
> Rafael
>
>

2013-06-16 17:34:07

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On 06/15/2013 02:42 PM, Alexander E. Patrakov wrote:
> 2013/6/15 Jiang Liu <[email protected]>:
>> Alexander E. Patrakov <[email protected]> reports two bugs related to
>> dock station support on Sony VAIO VPCZ23A4R. Actually there are at least
>> four bugs related to Sony VAIO VPCZ23A4R dock support.
>> 1) can't correctly detect hotplug slot for dock state
>> 2) resource leak on undocking
>> 3) resource allocation failure for dock devices
>> 4) one bug in intel_snd_hda driver
> 5) one or two bugs in radeon driver
>
> As for the subject - only a hotplug-related part of bug 59581 is fixed.
>
>> Hi Alexander, could you please help to test the whole patchset?
>> Really get sleepy, it's already 3 oclock now, so please forgive me
>> if there any stupid mistakes.
>
> I have tested this patchset on top of 3.10-rc5 using the following two
> testcases.
>
> 1. Boot docked with radeon blacklisted. Switch to text-based virtual
> console, undock, dock, modprobe radeon, rmmod radeon, undock, dock,
> modprobe radeon again. Tested in this sequence because Alex Deucher
> told me in bug 59681 to rmmod radeon before undocking.
>
> 2. Boot undocked, with radeon not blacklisted. Stay in X. Dock,
> undock, dock, undock, dock.
>
> Both cases work, and exhibit similar backtraces in dmesg. So I am
> attaching a dmesg only from the first testcase. Please look for "INFO:
> trying to register non-static key" and for "*ERROR* Memory manager not
> clean. Delaying takedown". I understand that the second trace is
> unrelated to this patchset, but would like you to fix the first.
Hi Alexander,
Great thanks for your testing!
I have investigated the first issue but still have no idea about it.
According to source code, everything should be OK.
The r8169 driver has invoked INIT_WORK(&tp->wk.work, rtl_task) to
initialize the work item, so it shouldn't trigger the warning.
Have you found this issue in other test cases?
Regards!
Gerry

>
> Note that the snd_hda_intel bug somehow didn't manifest itself today,
> probably because the TV that is connected to the HDMI output of the
> radeon card was off and because Xorg never really tried to use the
> card.
>
> --
> Alexander E. Patrakov
>

2013-06-17 03:27:57

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/16 Jiang Liu <[email protected]>:
> On 06/15/2013 02:42 PM, Alexander E. Patrakov wrote:

>> Both cases work, and exhibit similar backtraces in dmesg. So I am
>> attaching a dmesg only from the first testcase. Please look for "INFO:
>> trying to register non-static key" and for "*ERROR* Memory manager not
>> clean. Delaying takedown". I understand that the second trace is
>> unrelated to this patchset, but would like you to fix the first.
> Hi Alexander,
> Great thanks for your testing!
> I have investigated the first issue but still have no idea about it.
> According to source code, everything should be OK.
> The r8169 driver has invoked INIT_WORK(&tp->wk.work, rtl_task) to
> initialize the work item, so it shouldn't trigger the warning.
> Have you found this issue in other test cases?

I am not 100% sure that the static key backtrace is the same in all
cases. However, I do remember that it does appear both in the
non-hotplug and hotplug cases.

I will retest this after work, i.e. in ~12 hours.

--
Alexander E. Patrakov

2013-06-17 11:29:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Monday, June 17, 2013 01:01:51 AM Jiang Liu wrote:
> On 06/16/2013 05:20 AM, Rafael J. Wysocki wrote:
> > On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
> >> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
> [...]
> >> When it returns from unregister_hotplug_dock_device(), nothing prevents it
> >> from accessing whatever it wants, because ds->hp_lock is not used outside
> >> of the add/del and hotplug_dock_devices(). So, the actual role of
> >> ds->hp_lock (not the one that it is supposed to play, but the real one)
> >> is to prevent addition/deletion from happening when hotplug_dock_devices()
> >> is running. [Yes, it does protect the list, but since the list is in fact
> >> unnecessary, that doesn't matter.]
> >>
> >>> If we simply use a flag to mark presence of registered callback, we
> >>> can't achieve the second goal.
> >>
> >> I don't mean using the flag *alone*.
> >>
> >>> Take the sony laptop as an example. It has several PCI
> >>> hotplug
> >>> slot associated with the dock station:
> >>> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB
> >>> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM0
> >>> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM1
> >>> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2
> >>> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
> >>> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
> >>> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
> >>> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
> >>> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
> >>>
> >>> So it still has some race windows if we undock the station while
> >>> repeatedly rescanning/removing
> >>> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.
> >
> > Which sysfs interfaces do you mean, by the way?
> >
> > If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
> > should always be run under acpi_scan_lock too. It isn't at the moment,
> > because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
> > bug (so I'm going to send a patch to fix it in a while).
> >
> > With that bug fixed, the possible race between acpi_eject_store() and
> > hotplug_dock_devices() should be prevented from happening, so perhaps we're
> > worrying about something that cannot happen?
> Hi Rafael,
> I mean the "remove" method of each PCI device, and the "power" method
> of PCI hotplug slot here.
> These methods may be used to remove P2P bridges with associated ACPIPHP
> hotplug slots, which in turn will cause invoking of
> unregister_hotplug_dock_device().
> So theoretical we may trigger the bug by undocking while repeatedly
> adding/removing P2P bridges with ACPIPHP hotplug slot through PCI
> "rescan" and "remove" sysfs interface,

Why don't we make these things take acpi_scan_lock upfront, then?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-17 11:31:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Monday, June 17, 2013 01:12:00 AM Jiang Liu wrote:
> On 06/16/2013 06:54 AM, Rafael J. Wysocki wrote:
> > On Saturday, June 15, 2013 11:20:40 PM Rafael J. Wysocki wrote:
> >> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
> >
> > [...]
> >
> >>
> >> Which sysfs interfaces do you mean, by the way?
> >>
> >> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
> >> should always be run under acpi_scan_lock too. It isn't at the moment,
> >> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
> >> bug (so I'm going to send a patch to fix it in a while).
> >>
> >> With that bug fixed, the possible race between acpi_eject_store() and
> >> hotplug_dock_devices() should be prevented from happening, so perhaps we're
> >> worrying about something that cannot happen?
> >
> > So here's a question: What particular races are possible if we remove
> > ds->hp_lock entirely without doing anything else just yet? I mean, how to
> > *trigger* them from the start to the end and not how they can possibly happen
> > but never do, because there's no way they can be actually triggered?
> Hi Rafael,
> I have no really platform which triggers this bug, but I may imagine
> a possible platform if it's valid for explanation.
> Let's think about a laptop dock station with a thunderbolt
> controller built-in. The dock station is managed by dock driver and
> acpiphp driver. And the PCIe hierarchy managed by the thunderbolt
> controller may be managed by dock driver and ACPIPHP driver too.
> So it may trigger the issue by pressing the dock button and unplugging
> thunderbolt cable concurrently.
> But after all, this is all by imagination:). We may need to find a
> simple and quick solution for 3.10 and the stable trees and enhance the
> solution later to avoid introducing new bugs while fixing a bug.

Well, if that particular bug is not fixed in 3,10, it won't be a tragedy,
and I *really* don't want that code to get substantially more complex than
it is now.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-17 11:48:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 4/4] ACPIPHP: fix bug 56531 Sony VAIO VPCZ23A4R: can't assign mem/io after docking

On Saturday, June 15, 2013 03:28:01 AM Jiang Liu wrote:
> Please refer to https://bugzilla.kernel.org/show_bug.cgi?id=56531 for
> more information.
>
> This issue is caused by differences in PCI resource assignment between
> boot time and runtime hotplug. On x86 platforms, OS respects PCI
> resource assignment from BIOS and only reassign resources for unassigned
> BARs at boot time. But with acpiphp, it ignores BIOS resource assignment
> and reassign all resources by itself.
>
> If we have enough resources, reassigning all PCI resources should work
> too, but it may fail if we are under resource constraints. On the other
> handle, current PCI MMIO alignment algorithm may waste huge MMIO address
> space if we have some PCI devices with huge MMIO BARs.
>
> On this Sony laptop, BIOS allocates limited MMIO resources for the dock
> station and the dock station has a gfx which has a 256MB MMIO BAR.
> So current acpiphp driver fails to allocate resources for most devices
> on the dock station.
>
> So change acpiphp driver to follow boot time behavior to respect BIOS
> resource assignment.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Reported-by: Alexander E. Patrakov <[email protected]>
> Tested-by: Alexander E. Patrakov <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: <[email protected]>

The code is fine as far as I'm concerned, but I have one request. ->

> ---
> drivers/pci/hotplug/acpiphp_glue.c | 7 +++++--
> drivers/pci/pci.h | 5 +++++
> drivers/pci/setup-bus.c | 8 ++++----
> 3 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index a65203b..f4a53e9 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -687,6 +687,7 @@ static int __ref enable_device(struct acpiphp_slot *slot)
> struct pci_bus *bus = slot->bridge->pci_bus;
> struct acpiphp_func *func;
> int num, max, pass;
> + LIST_HEAD(add_list);
>
> if (slot->flags & SLOT_ENABLED)
> goto err_exit;
> @@ -711,13 +712,15 @@ static int __ref enable_device(struct acpiphp_slot *slot)
> max = pci_scan_bridge(bus, dev, max, pass);
> if (pass && dev->subordinate) {
> check_hotplug_bridge(slot, dev);
> - pci_bus_size_bridges(dev->subordinate);

Please add a comment explaining why we need to do this here.

> + pcibios_resource_survey_bus(dev->subordinate);
> + __pci_bus_size_bridges(dev->subordinate,
> + &add_list);
> }
> }
> }
> }
>
> - pci_bus_assign_resources(bus);
> + __pci_bus_assign_resources(bus, &add_list, NULL);
> acpiphp_sanitize_bus(bus);
> acpiphp_set_hpp_values(bus);
> acpiphp_set_acpi_region(slot);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 68678ed..d1182c4 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -202,6 +202,11 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
> struct resource *res, unsigned int reg);
> int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type);
> void pci_configure_ari(struct pci_dev *dev);
> +void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> + struct list_head *realloc_head);
> +void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
> + struct list_head *realloc_head,
> + struct list_head *fail_head);
>
> /**
> * pci_ari_enabled - query ARI forwarding status
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 16abaaa..d254e23 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1044,7 +1044,7 @@ handle_done:
> ;
> }
>
> -static void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> +void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> @@ -1115,9 +1115,9 @@ void __ref pci_bus_size_bridges(struct pci_bus *bus)
> }
> EXPORT_SYMBOL(pci_bus_size_bridges);
>
> -static void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
> - struct list_head *realloc_head,
> - struct list_head *fail_head)
> +void __ref __pci_bus_assign_resources(const struct pci_bus *bus,
> + struct list_head *realloc_head,
> + struct list_head *fail_head)
> {
> struct pci_bus *b;
> struct pci_dev *dev;

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-17 12:45:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Monday, June 17, 2013 01:39:04 PM Rafael J. Wysocki wrote:
> On Monday, June 17, 2013 01:01:51 AM Jiang Liu wrote:
> > On 06/16/2013 05:20 AM, Rafael J. Wysocki wrote:
> > > On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
> > >> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
> > [...]
> > >> When it returns from unregister_hotplug_dock_device(), nothing prevents it
> > >> from accessing whatever it wants, because ds->hp_lock is not used outside
> > >> of the add/del and hotplug_dock_devices(). So, the actual role of
> > >> ds->hp_lock (not the one that it is supposed to play, but the real one)
> > >> is to prevent addition/deletion from happening when hotplug_dock_devices()
> > >> is running. [Yes, it does protect the list, but since the list is in fact
> > >> unnecessary, that doesn't matter.]
> > >>
> > >>> If we simply use a flag to mark presence of registered callback, we
> > >>> can't achieve the second goal.
> > >>
> > >> I don't mean using the flag *alone*.
> > >>
> > >>> Take the sony laptop as an example. It has several PCI
> > >>> hotplug
> > >>> slot associated with the dock station:
> > >>> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB
> > >>> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM0
> > >>> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM1
> > >>> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2
> > >>> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
> > >>> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
> > >>> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
> > >>> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
> > >>> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
> > >>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
> > >>>
> > >>> So it still has some race windows if we undock the station while
> > >>> repeatedly rescanning/removing
> > >>> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.
> > >
> > > Which sysfs interfaces do you mean, by the way?
> > >
> > > If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
> > > should always be run under acpi_scan_lock too. It isn't at the moment,
> > > because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
> > > bug (so I'm going to send a patch to fix it in a while).
> > >
> > > With that bug fixed, the possible race between acpi_eject_store() and
> > > hotplug_dock_devices() should be prevented from happening, so perhaps we're
> > > worrying about something that cannot happen?
> > Hi Rafael,
> > I mean the "remove" method of each PCI device, and the "power" method
> > of PCI hotplug slot here.
> > These methods may be used to remove P2P bridges with associated ACPIPHP
> > hotplug slots, which in turn will cause invoking of
> > unregister_hotplug_dock_device().
> > So theoretical we may trigger the bug by undocking while repeatedly
> > adding/removing P2P bridges with ACPIPHP hotplug slot through PCI
> > "rescan" and "remove" sysfs interface,
>
> Why don't we make these things take acpi_scan_lock upfront, then?

Or perhaps (and maybe better) why don't we replace ds->hp_lock by another
lock that will be acquired upper in the call chain so that
dock_add_hotplug_device(), dock_del_hotplug_device(), hotplug_dock_devices()
and dock_event() are all guaranteed to be called under that lock?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-17 17:07:11

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/17 Alexander E. Patrakov <[email protected]>:
> I am not 100% sure that the static key backtrace is the same in all
> cases. However, I do remember that it does appear both in the
> non-hotplug and hotplug cases.
>
> I will retest this after work, i.e. in ~12 hours.

Tried again, failed to reproduce this warning. Sorry :(

--
Alexander E. Patrakov

2013-06-18 15:13:21

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On 06/18/2013 01:07 AM, Alexander E. Patrakov wrote:
> 2013/6/17 Alexander E. Patrakov <[email protected]>:
>> I am not 100% sure that the static key backtrace is the same in all
>> cases. However, I do remember that it does appear both in the
>> non-hotplug and hotplug cases.
>>
>> I will retest this after work, i.e. in ~12 hours.
>
> Tried again, failed to reproduce this warning. Sorry :(
Sounds like another memory corruption caused by race conditions:(

>
> --
> Alexander E. Patrakov
>

2013-06-18 15:36:59

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On 06/17/2013 07:39 PM, Rafael J. Wysocki wrote:
> On Monday, June 17, 2013 01:01:51 AM Jiang Liu wrote:
>> On 06/16/2013 05:20 AM, Rafael J. Wysocki wrote:
>>> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
>>>> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
>> [...]
>>>> When it returns from unregister_hotplug_dock_device(), nothing prevents it
>>>> from accessing whatever it wants, because ds->hp_lock is not used outside
>>>> of the add/del and hotplug_dock_devices(). So, the actual role of
>>>> ds->hp_lock (not the one that it is supposed to play, but the real one)
>>>> is to prevent addition/deletion from happening when hotplug_dock_devices()
>>>> is running. [Yes, it does protect the list, but since the list is in fact
>>>> unnecessary, that doesn't matter.]
>>>>
>>>>> If we simply use a flag to mark presence of registered callback, we
>>>>> can't achieve the second goal.
>>>>
>>>> I don't mean using the flag *alone*.
>>>>
>>>>> Take the sony laptop as an example. It has several PCI
>>>>> hotplug
>>>>> slot associated with the dock station:
>>>>> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB
>>>>> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM0
>>>>> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM1
>>>>> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2
>>>>> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
>>>>> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
>>>>> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
>>>>> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
>>>>> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
>>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
>>>>>
>>>>> So it still has some race windows if we undock the station while
>>>>> repeatedly rescanning/removing
>>>>> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.
>>>
>>> Which sysfs interfaces do you mean, by the way?
>>>
>>> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
>>> should always be run under acpi_scan_lock too. It isn't at the moment,t
>>> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
>>> bug (so I'm going to send a patch to fix it in a while).
>>>
>>> With that bug fixed, the possible race between acpi_eject_store() and
>>> hotplug_dock_devices() should be prevented from happening, so perhaps we're
>>> worrying about something that cannot happen?
>> Hi Rafael,
>> I mean the "remove" method of each PCI device, and the "power" method
>> of PCI hotplug slot here.
>> These methods may be used to remove P2P bridges with associated ACPIPHP
>> hotplug slots, which in turn will cause invoking of
>> unregister_hotplug_dock_device().
>> So theoretical we may trigger the bug by undocking while repeatedly
>> adding/removing P2P bridges with ACPIPHP hotplug slot through PCI
>> "rescan" and "remove" sysfs interface,
>
> Why don't we make these things take acpi_scan_lock upfront, then?
Hi Rafael,
Seems we can't rely on acpi_scan_lock here, it may cause another
deadlock scenario:
1) thread 1 acquired the acpi_scan_lock and tries to destroy all sysfs
interfaces for PCI devices.
2) thread 2 opens a PCI sysfs which then tries to acquire the
acpi_scan_lock.
Regards!
Gerry

>
> Rafael
>
>

2013-06-18 16:03:45

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On 06/17/2013 07:40 PM, Rafael J. Wysocki wrote:
> On Monday, June 17, 2013 01:12:00 AM Jiang Liu wrote:
>> On 06/16/2013 06:54 AM, Rafael J. Wysocki wrote:
>>> On Saturday, June 15, 2013 11:20:40 PM Rafael J. Wysocki wrote:
>>>> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
>>>
>>> [...]
>>>
>>>>
>>>> Which sysfs interfaces do you mean, by the way?
>>>>
>>>> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
>>>> should always be run under acpi_scan_lock too. It isn't at the moment,
>>>> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
>>>> bug (so I'm going to send a patch to fix it in a while).
>>>>
>>>> With that bug fixed, the possible race between acpi_eject_store() and
>>>> hotplug_dock_devices() should be prevented from happening, so perhaps we're
>>>> worrying about something that cannot happen?
>>>
>>> So here's a question: What particular races are possible if we remove
>>> ds->hp_lock entirely without doing anything else just yet? I mean, how to
>>> *trigger* them from the start to the end and not how they can possibly happen
>>> but never do, because there's no way they can be actually triggered?
>> Hi Rafael,
>> I have no really platform which triggers this bug, but I may imagine
>> a possible platform if it's valid for explanation.
>> Let's think about a laptop dock station with a thunderbolt
>> controller built-in. The dock station is managed by dock driver and
>> acpiphp driver. And the PCIe hierarchy managed by the thunderbolt
>> controller may be managed by dock driver and ACPIPHP driver too.
>> So it may trigger the issue by pressing the dock button and unplugging
>> thunderbolt cable concurrently.
>> But after all, this is all by imagination:). We may need to find a
>> simple and quick solution for 3.10 and the stable trees and enhance the
>> solution later to avoid introducing new bugs while fixing a bug.
>
> Well, if that particular bug is not fixed in 3,10, it won't be a tragedy,
> and I *really* don't want that code to get substantially more complex than
> it is now.
Hi Rafael,
I hope I could help to simplify the implementation too, but failed
until now:(. The PCI hotplug core has the same re-entrance issue too,
and I have struggled with that issue about one year now:(
I have tried another solution by removing ds->hp_lock and
hotplug_devices list, please refer to the attachment. It could be used
to solve the deadlock issue in acpiphp, but may not be used to support
the coming fix for an ATA driver
regression(https://bugzilla.kernel.org/show_bug.cgi?id=59871).
The time window for 3.10 is closing, it would be great if we could
reach a quick solution here.
Regards!
Gerry

>
> Thanks,
> Rafael
>
>


Attachments:
0001-.patch (9.33 kB)

2013-06-18 21:03:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Tuesday, June 18, 2013 11:36:50 PM Jiang Liu wrote:
> On 06/17/2013 07:39 PM, Rafael J. Wysocki wrote:
> > On Monday, June 17, 2013 01:01:51 AM Jiang Liu wrote:
> >> On 06/16/2013 05:20 AM, Rafael J. Wysocki wrote:
> >>> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
> >>>> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
> >> [...]
> >>>> When it returns from unregister_hotplug_dock_device(), nothing prevents it
> >>>> from accessing whatever it wants, because ds->hp_lock is not used outside
> >>>> of the add/del and hotplug_dock_devices(). So, the actual role of
> >>>> ds->hp_lock (not the one that it is supposed to play, but the real one)
> >>>> is to prevent addition/deletion from happening when hotplug_dock_devices()
> >>>> is running. [Yes, it does protect the list, but since the list is in fact
> >>>> unnecessary, that doesn't matter.]
> >>>>
> >>>>> If we simply use a flag to mark presence of registered callback, we
> >>>>> can't achieve the second goal.
> >>>>
> >>>> I don't mean using the flag *alone*.
> >>>>
> >>>>> Take the sony laptop as an example. It has several PCI
> >>>>> hotplug
> >>>>> slot associated with the dock station:
> >>>>> [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB
> >>>>> [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM0
> >>>>> [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM1
> >>>>> [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2
> >>>>> [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
> >>>>> [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
> >>>>> [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
> >>>>> [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
> >>>>> [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check
> >>>>> notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
> >>>>>
> >>>>> So it still has some race windows if we undock the station while
> >>>>> repeatedly rescanning/removing
> >>>>> the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.
> >>>
> >>> Which sysfs interfaces do you mean, by the way?
> >>>
> >>> If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
> >>> should always be run under acpi_scan_lock too. It isn't at the moment,t
> >>> because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
> >>> bug (so I'm going to send a patch to fix it in a while).
> >>>
> >>> With that bug fixed, the possible race between acpi_eject_store() and
> >>> hotplug_dock_devices() should be prevented from happening, so perhaps we're
> >>> worrying about something that cannot happen?
> >> Hi Rafael,
> >> I mean the "remove" method of each PCI device, and the "power" method
> >> of PCI hotplug slot here.
> >> These methods may be used to remove P2P bridges with associated ACPIPHP
> >> hotplug slots, which in turn will cause invoking of
> >> unregister_hotplug_dock_device().
> >> So theoretical we may trigger the bug by undocking while repeatedly
> >> adding/removing P2P bridges with ACPIPHP hotplug slot through PCI
> >> "rescan" and "remove" sysfs interface,
> >
> > Why don't we make these things take acpi_scan_lock upfront, then?
> Hi Rafael,
> Seems we can't rely on acpi_scan_lock here, it may cause another
> deadlock scenario:
> 1) thread 1 acquired the acpi_scan_lock and tries to destroy all sysfs
> interfaces for PCI devices.
> 2) thread 2 opens a PCI sysfs which then tries to acquire the
> acpi_scan_lock.

Well, maybe, but you didn't explain how this was going to happen. What code
paths are involved, etc.

Quite frankly, I've already run out of patience, sorry about that. It looks
like I need to go through the code and understand all of these problems myself.
Yes, it will take time.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-18 21:16:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Wednesday, June 19, 2013 12:03:37 AM Jiang Liu wrote:
> On 06/17/2013 07:40 PM, Rafael J. Wysocki wrote:
> > On Monday, June 17, 2013 01:12:00 AM Jiang Liu wrote:
> >> On 06/16/2013 06:54 AM, Rafael J. Wysocki wrote:
> >>> On Saturday, June 15, 2013 11:20:40 PM Rafael J. Wysocki wrote:
> >>>> On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:

[...]

> Hi Rafael,

Hi,

> I hope I could help to simplify the implementation too, but failed
> until now:(. The PCI hotplug core has the same re-entrance issue too,
> and I have struggled with that issue about one year now:(

This sounds like a design issue and we're not likely to fix design issues
in 2 weeks, with all due respect to everyone involved. Even if someone has
an "Eureka!" moment and comes up with a really clever way to fix that issue,
we still need time to prepare patches, review them, test them etc.

> I have tried another solution by removing ds->hp_lock and
> hotplug_devices list, please refer to the attachment. It could be used
> to solve the deadlock issue in acpiphp, but may not be used to support
> the coming fix for an ATA driver
> regression(https://bugzilla.kernel.org/show_bug.cgi?id=59871).

Please stop generating patches in a hurry. That's not really useful.

Even if you think you know what you're doing, someone else has to understand
that too and be able to review your patches. Honestly, my experience with
that code is kind of limited and I need more time.

> The time window for 3.10 is closing, it would be great if we could
> reach a quick solution here.

That is clearly impossible.

My suggestion would be to apply the patches that everyone is reasonably
comfortable with at the moment and stop worrying about "time windows", because
in fact there are none. We're talking about fixes here and we can do -stable
backports once we have a solid solution, but what matters is that this solution
has to be acceptable to *all* of us.

So please stop making it sound like the 3.10 release is a hard deadline or
something. It is not.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-18 21:25:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Saturday, June 15, 2013 01:25:48 PM Alexander E. Patrakov wrote:
> 2013/6/15 Alexander E. Patrakov <[email protected]>:
> > Note that the snd_hda_intel bug somehow didn't manifest itself today,
> > probably because the TV that is connected to the HDMI output of the
> > radeon card was off and because Xorg never really tried to use the
> > card.
>
> Well, it did. The sympthoms are the same as before: incomplete
> undocking. The 16:00.1 device (HD audio controller on the radeon card)
> has disappeared from lspci output, while 16:00.0 (the radeon card
> itself) didn't, and the "Docked" LED didn't turn off. Fixed up by this
> command:
>
> fuser -k /dev/snd/* ; rmmod snd-hda-intel
>
> thus confirming again that the extra references that acpiphp waits to
> go away are from open fds.

OK, let's try to untangle this a bit.

If you applyt patches [1/4] and [4/4] from the $subject series only, what
does remain unfixed?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-19 05:18:44

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/19 Rafael J. Wysocki <[email protected]>:
> OK, let's try to untangle this a bit.
>
> If you applyt patches [1/4] and [4/4] from the $subject series only, what
> does remain unfixed?

[not tested, can do so in 12 hours if needed]

I think there will be problems on undocking and/or on the second
docking, as described in comments #6 - #8 of
https://bugzilla.kernel.org/show_bug.cgi?id=59501

--
Alexander E. Patrakov

2013-06-20 18:57:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
> 2013/6/19 Rafael J. Wysocki <[email protected]>:
> > OK, let's try to untangle this a bit.
> >
> > If you applyt patches [1/4] and [4/4] from the $subject series only, what
> > does remain unfixed?
>
> [not tested, can do so in 12 hours if needed]
>
> I think there will be problems on undocking and/or on the second
> docking, as described in comments #6 - #8 of
> https://bugzilla.kernel.org/show_bug.cgi?id=59501

OK, I think I have something that might work. It may not solve all problems,
but maybe it helps a bit. Unfortunately, I can't really test it, so please do
if you can.

Please apply [1/4] and [4/4] and the one below and see what happens.

Thanks,
Rafael


---
Rationale:
acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
after removing the companion PCI devices, so the dock station code
doesn't need to trim them separately for the dependent devices handled
by acpiphp.

Moreover, acpiphp_glue.c is the only user of
[un]register_hotplug_dock_device(), so *all* devices on the
ds->hotplug_devices list are handled by acpiphp and ops is set for all
of them.

This means that (1) the ds->hotplug_devices list is not necessary (we
can always walk ds->dependent_devices instead and look for those that
have dd->ops set) and (2) we don't need to call
dock_remove_acpi_device(dd->handle) on eject for any of those devices,
because dd->ops->handler() is going to take care of the ACPI device
objects trimming for them anyway.

Taking the above into account make the following changes:
(1) Drop hotplug_devices from struct dock_station.
(2) Drop dock_{add|del}_hotplug_device()
(3) Make [un]register_hotplug_dock_device() [un]set 'ops' and
'context' for the given device under ds->hp_lock.
(4) Add hot_remove_dock_devices() that walks ds->dependent_devices and
either calls dd->ops->handler(), if present, or trims the underlying
ACPI device object, otherwise.
(5) Replace hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST) calls
with hot_remove_dock_devices(ds).
(6) Rename hotplug_dock_devices() to hot_add_dock_devices() and make
it only handle bus check and device check requests. Make it walk
ds->dependent_devices instead of ds->hotplug devices.
(7) Make dock_event() walk ds->dependent_devices (instead of
ds->hotplug devices) under ds->hp_lock.
---
drivers/acpi/dock.c | 111 ++++++++++++++++++++++++----------------------------
1 file changed, 53 insertions(+), 58 deletions(-)

Index: linux-pm/drivers/acpi/dock.c
===================================================================
--- linux-pm.orig/drivers/acpi/dock.c
+++ linux-pm/drivers/acpi/dock.c
@@ -66,7 +66,6 @@ struct dock_station {
spinlock_t dd_lock;
struct mutex hp_lock;
struct list_head dependent_devices;
- struct list_head hotplug_devices;

struct list_head sibling;
struct platform_device *dock_device;
@@ -121,38 +120,6 @@ add_dock_dependent_device(struct dock_st
}

/**
- * dock_add_hotplug_device - associate a hotplug handler with the dock station
- * @ds: The dock station
- * @dd: The dependent device struct
- *
- * Add the dependent device to the dock's hotplug device list
- */
-static void
-dock_add_hotplug_device(struct dock_station *ds,
- struct dock_dependent_device *dd)
-{
- mutex_lock(&ds->hp_lock);
- list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
- mutex_unlock(&ds->hp_lock);
-}
-
-/**
- * dock_del_hotplug_device - remove a hotplug handler from the dock station
- * @ds: The dock station
- * @dd: the dependent device struct
- *
- * Delete the dependent device from the dock's hotplug device list
- */
-static void
-dock_del_hotplug_device(struct dock_station *ds,
- struct dock_dependent_device *dd)
-{
- mutex_lock(&ds->hp_lock);
- list_del(&dd->hotplug_list);
- mutex_unlock(&ds->hp_lock);
-}
-
-/**
* find_dock_dependent_device - get a device dependent on this dock
* @ds: the dock station
* @handle: the acpi_handle of the device we want
@@ -342,40 +309,60 @@ static void dock_remove_acpi_device(acpi
}

/**
- * hotplug_dock_devices - insert or remove devices on the dock station
- * @ds: the dock station
- * @event: either bus check or eject request
+ * hot_remove_dock_devices - Remove devices on a dock station.
+ * @ds: Dock station to remove devices for.
+ *
+ * For each device depending on @ds, if a dock event handler is registered,
+ * call it for the device, or trim the underlying ACPI device object otherwise.
+ *
+ * Dock event handlers are responsible for trimming the underlying ACPI device
+ * objects if present.
+ */
+static void hot_remove_dock_devices(struct dock_station *ds)
+{
+ struct dock_dependent_device *dd;
+
+ mutex_lock(&ds->hp_lock);
+
+ list_for_each_entry(dd, &ds->dependent_devices, list) {
+ if (dd->ops && dd->ops->handler)
+ dd->ops->handler(dd->handle, ACPI_NOTIFY_EJECT_REQUEST,
+ dd->context);
+ else
+ dock_remove_acpi_device(dd->handle);
+ }
+
+ mutex_unlock(&ds->hp_lock);
+}
+
+/**
+ * hot_add_dock_devices - Insert devices on a dock station.
+ * @ds: Dock station to insert devices for.
*
* Some devices on the dock station need to have drivers called
* to perform hotplug operations after a dock event has occurred.
* Traverse the list of dock devices that have registered a
* hotplug handler, and call the handler.
*/
-static void hotplug_dock_devices(struct dock_station *ds, u32 event)
+static void hot_add_dock_devices(struct dock_station *ds, u32 event)
{
struct dock_dependent_device *dd;

mutex_lock(&ds->hp_lock);

- /*
- * First call driver specific hotplug functions
- */
- list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
+ /* First call driver specific hotplug event handlers. */
+ list_for_each_entry(dd, &ds->dependent_devices, list)
if (dd->ops && dd->ops->handler)
dd->ops->handler(dd->handle, event, dd->context);

/*
- * Now make sure that an acpi_device is created for each
- * dependent device, or removed if this is an eject request.
- * This will cause acpi_drivers to be stopped/started if they
- * exist
+ * Now make sure that an acpi_device is created for each dependent
+ * device. That will cause scan handlers to attach to devices objects
+ * or acpi_drivers to be started if they exist.
*/
- list_for_each_entry(dd, &ds->dependent_devices, list) {
- if (event == ACPI_NOTIFY_EJECT_REQUEST)
- dock_remove_acpi_device(dd->handle);
- else
- dock_create_acpi_device(dd->handle);
- }
+ list_for_each_entry(dd, &ds->dependent_devices, list)
+ dock_create_acpi_device(dd->handle);
+
mutex_unlock(&ds->hp_lock);
}

@@ -398,10 +385,14 @@ static void dock_event(struct dock_stati
if (num == DOCK_EVENT)
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);

- list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
+ mutex_lock(&ds->hp_lock);
+
+ list_for_each_entry(dd, &ds->dependent_devices, list)
if (dd->ops && dd->ops->uevent)
dd->ops->uevent(dd->handle, event, dd->context);

+ mutex_unlock(&ds->hp_lock);
+
if (num != DOCK_EVENT)
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
}
@@ -598,9 +589,10 @@ register_hotplug_dock_device(acpi_handle
*/
dd = find_dock_dependent_device(dock_station, handle);
if (dd) {
+ mutex_lock(&dock_station->hp_lock);
dd->ops = ops;
dd->context = context;
- dock_add_hotplug_device(dock_station, dd);
+ mutex_unlock(&dock_station->hp_lock);
ret = 0;
}
}
@@ -623,8 +615,12 @@ void unregister_hotplug_dock_device(acpi

list_for_each_entry(dock_station, &dock_stations, sibling) {
dd = find_dock_dependent_device(dock_station, handle);
- if (dd)
- dock_del_hotplug_device(dock_station, dd);
+ if (dd) {
+ mutex_lock(&dock_station->hp_lock);
+ dd->ops = NULL;
+ dd->context = NULL;
+ mutex_unlock(&dock_station->hp_lock);
+ }
}
}
EXPORT_SYMBOL_GPL(unregister_hotplug_dock_device);
@@ -649,7 +645,7 @@ static int handle_eject_request(struct d
*/
dock_event(ds, event, UNDOCK_EVENT);

- hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST);
+ hot_remove_dock_devices(ds);
undock(ds);
dock_lock(ds, 0);
eject_dock(ds);
@@ -708,7 +704,7 @@ static void dock_notify(acpi_handle hand
}
atomic_notifier_call_chain(&dock_notifier_list,
event, NULL);
- hotplug_dock_devices(ds, event);
+ hot_add_dock_devices(ds, event);
complete_dock(ds);
dock_event(ds, event, DOCK_EVENT);
dock_lock(ds, 1);
@@ -953,7 +949,6 @@ static int __init dock_add(acpi_handle h
mutex_init(&dock_station->hp_lock);
spin_lock_init(&dock_station->dd_lock);
INIT_LIST_HEAD(&dock_station->sibling);
- INIT_LIST_HEAD(&dock_station->hotplug_devices);
ATOMIC_INIT_NOTIFIER_HEAD(&dock_notifier_list);
INIT_LIST_HEAD(&dock_station->dependent_devices);




--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-21 04:36:14

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/21 Rafael J. Wysocki <[email protected]>:
> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
>> > OK, let's try to untangle this a bit.
>> >
>> > If you applyt patches [1/4] and [4/4] from the $subject series only, what
>> > does remain unfixed?
>>
>> [not tested, can do so in 12 hours if needed]
>>
>> I think there will be problems on undocking and/or on the second
>> docking, as described in comments #6 - #8 of
>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
>
> OK, I think I have something that might work. It may not solve all problems,
> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> if you can.
>
> Please apply [1/4] and [4/4] and the one below and see what happens.

Tested on top of 3.10-rc6.

Attached dmesg output for the following testcase: boot undocked, dock,
undock, dock.

The initial dock went OK. The subsequent undock resulted in the blue
led on the dock cable turning off quickly, but in PCI devices slowly,
one-by-one, disappearing from the bus. Also, there were "acpi_handle
corrupt" messages in dmesg. The subsequent dock resulted in no devices
added to the bus. So - your patch is not a good replacement for
patches 2 and 3 in the original series.

>
> Thanks,
> Rafael
>
>
> ---
> Rationale:
> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
> after removing the companion PCI devices, so the dock station code
> doesn't need to trim them separately for the dependent devices handled
> by acpiphp.
>
> Moreover, acpiphp_glue.c is the only user of
> [un]register_hotplug_dock_device(), so *all* devices on the
> ds->hotplug_devices list are handled by acpiphp and ops is set for all
> of them.
>
> This means that (1) the ds->hotplug_devices list is not necessary (we
> can always walk ds->dependent_devices instead and look for those that
> have dd->ops set) and (2) we don't need to call
> dock_remove_acpi_device(dd->handle) on eject for any of those devices,
> because dd->ops->handler() is going to take care of the ACPI device
> objects trimming for them anyway.
>
> Taking the above into account make the following changes:
> (1) Drop hotplug_devices from struct dock_station.
> (2) Drop dock_{add|del}_hotplug_device()
> (3) Make [un]register_hotplug_dock_device() [un]set 'ops' and
> 'context' for the given device under ds->hp_lock.
> (4) Add hot_remove_dock_devices() that walks ds->dependent_devices and
> either calls dd->ops->handler(), if present, or trims the underlying
> ACPI device object, otherwise.
> (5) Replace hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST) calls
> with hot_remove_dock_devices(ds).
> (6) Rename hotplug_dock_devices() to hot_add_dock_devices() and make
> it only handle bus check and device check requests. Make it walk
> ds->dependent_devices instead of ds->hotplug devices.
> (7) Make dock_event() walk ds->dependent_devices (instead of
> ds->hotplug devices) under ds->hp_lock.
> ---
> drivers/acpi/dock.c | 111 ++++++++++++++++++++++++----------------------------
> 1 file changed, 53 insertions(+), 58 deletions(-)
>
> Index: linux-pm/drivers/acpi/dock.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/dock.c
> +++ linux-pm/drivers/acpi/dock.c
> @@ -66,7 +66,6 @@ struct dock_station {
> spinlock_t dd_lock;
> struct mutex hp_lock;
> struct list_head dependent_devices;
> - struct list_head hotplug_devices;
>
> struct list_head sibling;
> struct platform_device *dock_device;
> @@ -121,38 +120,6 @@ add_dock_dependent_device(struct dock_st
> }
>
> /**
> - * dock_add_hotplug_device - associate a hotplug handler with the dock station
> - * @ds: The dock station
> - * @dd: The dependent device struct
> - *
> - * Add the dependent device to the dock's hotplug device list
> - */
> -static void
> -dock_add_hotplug_device(struct dock_station *ds,
> - struct dock_dependent_device *dd)
> -{
> - mutex_lock(&ds->hp_lock);
> - list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
> - mutex_unlock(&ds->hp_lock);
> -}
> -
> -/**
> - * dock_del_hotplug_device - remove a hotplug handler from the dock station
> - * @ds: The dock station
> - * @dd: the dependent device struct
> - *
> - * Delete the dependent device from the dock's hotplug device list
> - */
> -static void
> -dock_del_hotplug_device(struct dock_station *ds,
> - struct dock_dependent_device *dd)
> -{
> - mutex_lock(&ds->hp_lock);
> - list_del(&dd->hotplug_list);
> - mutex_unlock(&ds->hp_lock);
> -}
> -
> -/**
> * find_dock_dependent_device - get a device dependent on this dock
> * @ds: the dock station
> * @handle: the acpi_handle of the device we want
> @@ -342,40 +309,60 @@ static void dock_remove_acpi_device(acpi
> }
>
> /**
> - * hotplug_dock_devices - insert or remove devices on the dock station
> - * @ds: the dock station
> - * @event: either bus check or eject request
> + * hot_remove_dock_devices - Remove devices on a dock station.
> + * @ds: Dock station to remove devices for.
> + *
> + * For each device depending on @ds, if a dock event handler is registered,
> + * call it for the device, or trim the underlying ACPI device object otherwise.
> + *
> + * Dock event handlers are responsible for trimming the underlying ACPI device
> + * objects if present.
> + */
> +static void hot_remove_dock_devices(struct dock_station *ds)
> +{
> + struct dock_dependent_device *dd;
> +
> + mutex_lock(&ds->hp_lock);
> +
> + list_for_each_entry(dd, &ds->dependent_devices, list) {
> + if (dd->ops && dd->ops->handler)
> + dd->ops->handler(dd->handle, ACPI_NOTIFY_EJECT_REQUEST,
> + dd->context);
> + else
> + dock_remove_acpi_device(dd->handle);
> + }
> +
> + mutex_unlock(&ds->hp_lock);
> +}
> +
> +/**
> + * hot_add_dock_devices - Insert devices on a dock station.
> + * @ds: Dock station to insert devices for.
> *
> * Some devices on the dock station need to have drivers called
> * to perform hotplug operations after a dock event has occurred.
> * Traverse the list of dock devices that have registered a
> * hotplug handler, and call the handler.
> */
> -static void hotplug_dock_devices(struct dock_station *ds, u32 event)
> +static void hot_add_dock_devices(struct dock_station *ds, u32 event)
> {
> struct dock_dependent_device *dd;
>
> mutex_lock(&ds->hp_lock);
>
> - /*
> - * First call driver specific hotplug functions
> - */
> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
> + /* First call driver specific hotplug event handlers. */
> + list_for_each_entry(dd, &ds->dependent_devices, list)
> if (dd->ops && dd->ops->handler)
> dd->ops->handler(dd->handle, event, dd->context);
>
> /*
> - * Now make sure that an acpi_device is created for each
> - * dependent device, or removed if this is an eject request.
> - * This will cause acpi_drivers to be stopped/started if they
> - * exist
> + * Now make sure that an acpi_device is created for each dependent
> + * device. That will cause scan handlers to attach to devices objects
> + * or acpi_drivers to be started if they exist.
> */
> - list_for_each_entry(dd, &ds->dependent_devices, list) {
> - if (event == ACPI_NOTIFY_EJECT_REQUEST)
> - dock_remove_acpi_device(dd->handle);
> - else
> - dock_create_acpi_device(dd->handle);
> - }
> + list_for_each_entry(dd, &ds->dependent_devices, list)
> + dock_create_acpi_device(dd->handle);
> +
> mutex_unlock(&ds->hp_lock);
> }
>
> @@ -398,10 +385,14 @@ static void dock_event(struct dock_stati
> if (num == DOCK_EVENT)
> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
>
> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
> + mutex_lock(&ds->hp_lock);
> +
> + list_for_each_entry(dd, &ds->dependent_devices, list)
> if (dd->ops && dd->ops->uevent)
> dd->ops->uevent(dd->handle, event, dd->context);
>
> + mutex_unlock(&ds->hp_lock);
> +
> if (num != DOCK_EVENT)
> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
> }
> @@ -598,9 +589,10 @@ register_hotplug_dock_device(acpi_handle
> */
> dd = find_dock_dependent_device(dock_station, handle);
> if (dd) {
> + mutex_lock(&dock_station->hp_lock);
> dd->ops = ops;
> dd->context = context;
> - dock_add_hotplug_device(dock_station, dd);
> + mutex_unlock(&dock_station->hp_lock);
> ret = 0;
> }
> }
> @@ -623,8 +615,12 @@ void unregister_hotplug_dock_device(acpi
>
> list_for_each_entry(dock_station, &dock_stations, sibling) {
> dd = find_dock_dependent_device(dock_station, handle);
> - if (dd)
> - dock_del_hotplug_device(dock_station, dd);
> + if (dd) {
> + mutex_lock(&dock_station->hp_lock);
> + dd->ops = NULL;
> + dd->context = NULL;
> + mutex_unlock(&dock_station->hp_lock);
> + }
> }
> }
> EXPORT_SYMBOL_GPL(unregister_hotplug_dock_device);
> @@ -649,7 +645,7 @@ static int handle_eject_request(struct d
> */
> dock_event(ds, event, UNDOCK_EVENT);
>
> - hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST);
> + hot_remove_dock_devices(ds);
> undock(ds);
> dock_lock(ds, 0);
> eject_dock(ds);
> @@ -708,7 +704,7 @@ static void dock_notify(acpi_handle hand
> }
> atomic_notifier_call_chain(&dock_notifier_list,
> event, NULL);
> - hotplug_dock_devices(ds, event);
> + hot_add_dock_devices(ds, event);
> complete_dock(ds);
> dock_event(ds, event, DOCK_EVENT);
> dock_lock(ds, 1);
> @@ -953,7 +949,6 @@ static int __init dock_add(acpi_handle h
> mutex_init(&dock_station->hp_lock);
> spin_lock_init(&dock_station->dd_lock);
> INIT_LIST_HEAD(&dock_station->sibling);
> - INIT_LIST_HEAD(&dock_station->hotplug_devices);
> ATOMIC_INIT_NOTIFIER_HEAD(&dock_notifier_list);
> INIT_LIST_HEAD(&dock_station->dependent_devices);
>
>
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.



--
Alexander E. Patrakov


Attachments:
dmesg.txt (93.66 kB)

2013-06-21 04:37:52

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/21 Alexander E. Patrakov <[email protected]>:
> The initial dock went OK. The subsequent undock resulted in the blue
> led on the dock cable turning off quickly, but in PCI devices slowly,
> one-by-one, disappearing from the bus. Also, there were "acpi_handle
> corrupt" messages in dmesg. The subsequent dock resulted in no devices
> added to the bus. So - your patch is not a good replacement for
> patches 2 and 3 in the original series.

In this state, echo 1 > /sys/bus/pci/rescan resulted in a kernel BUG,
see the attached dmesg.

--
Alexander E. Patrakov


Attachments:
dmesg-1.txt (111.76 kB)

2013-06-21 12:38:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Friday, June 21, 2013 10:36:05 AM Alexander E. Patrakov wrote:
> 2013/6/21 Rafael J. Wysocki <[email protected]>:
> > On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
> >> 2013/6/19 Rafael J. Wysocki <[email protected]>:
> >> > OK, let's try to untangle this a bit.
> >> >
> >> > If you applyt patches [1/4] and [4/4] from the $subject series only, what
> >> > does remain unfixed?
> >>
> >> [not tested, can do so in 12 hours if needed]
> >>
> >> I think there will be problems on undocking and/or on the second
> >> docking, as described in comments #6 - #8 of
> >> https://bugzilla.kernel.org/show_bug.cgi?id=59501
> >
> > OK, I think I have something that might work. It may not solve all problems,
> > but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> > if you can.
> >
> > Please apply [1/4] and [4/4] and the one below and see what happens.
>
> Tested on top of 3.10-rc6.
>
> Attached dmesg output for the following testcase: boot undocked, dock,
> undock, dock.
>
> The initial dock went OK. The subsequent undock resulted in the blue
> led on the dock cable turning off quickly, but in PCI devices slowly,
> one-by-one, disappearing from the bus. Also, there were "acpi_handle
> corrupt" messages in dmesg. The subsequent dock resulted in no devices
> added to the bus. So - your patch is not a good replacement for
> patches 2 and 3 in the original series.

Well, this particular patch maybe not, but I like the direction better.

Here's the relevant piece of your dmesg:

[ 43.635516] ACPI: \_SB_.DOCK: undocking
[ 44.108267] acpiphp_glue: _handle_hotplug_event_func: Device eject notify on \_SB_.PCI0.RP07.LPMB
[ 44.110349] xhci_hcd 0000:1b:00.0: remove, state 4
[ 44.110497] usb usb6: USB disconnect, device number 1
[ 44.112203] port1: Oops, 'acpi_handle' corrupt
[ 44.112242] port2: Oops, 'acpi_handle' corrupt

What happens here is that USB ports are in the list of dock dependent devices,
but they don't have the ops pointer set, so dock_remove_acpi_device() will be
called for them *before* the usual USB teardown is triggered from the
acpiphp_glue code for their parent. I *think* the solution may be to simply
skip those things from ds->dependent_list (as they depend on something already
in that list anyway).

I'll attach an updated patch with that change to the bug entry.

[ 44.112332] xHCI xhci_drop_endpoint called for root hub
[ 44.112334] xHCI xhci_check_bandwidth called for root hub
[ 44.114313] usb usb6: Oops, 'acpi_handle' corrupt

Same thing again.

[ 44.114349] xhci_hcd 0000:1b:00.0: Host not halted after 16000 microseconds.
[ 44.114351] xhci_hcd 0000:1b:00.0: USB bus 6 deregistered
[ 44.114360] xhci_hcd 0000:1b:00.0: remove, state 4
[ 44.114392] usb usb5: USB disconnect, device number 1
[ 44.114394] usb 5-1: USB disconnect, device number 2
[ 49.120740] xhci_hcd 0000:1b:00.0: Timeout while waiting for configure endpoint command
[ 49.120750] xhci_hcd 0000:1b:00.0: Stopped the command ring failed, maybe the host is dead
[ 49.120757] xhci_hcd 0000:1b:00.0: Host not halted after 16000 microseconds.
[ 49.120759] xhci_hcd 0000:1b:00.0: Abort command ring failed
[ 49.120760] xhci_hcd 0000:1b:00.0: HC died; cleaning up
[ 49.122075] xHCI xhci_drop_endpoint called for root hub
[ 49.122076] xHCI xhci_check_bandwidth called for root hub
[ 49.123825] usb usb5: Oops, 'acpi_handle' corrupt

Same thing again.

[ 49.123906] xhci_hcd 0000:1b:00.0: Host not halted after 16000 microseconds.
[ 49.124208] xhci_hcd 0000:1b:00.0: USB bus 5 deregistered
[ 52.150756] ata8.00: disabled
[ 52.172219] r8169 0000:19:00.0 enp25s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 52.182516] r8169 0000:19:00.0 enp25s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[ 52.188939] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25).
[ 52.189542] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25).
[ 52.190191] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25).
[ 52.190746] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25).
[ 52.191299] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25).
[ 52.212009] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100).
[ 52.222185] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100).
[ 52.232357] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100).
[ 52.242529] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100).
[ 52.260158] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=none,decodes=io+mem:owns=none
[ 52.265799] [drm] radeon: finishing device.
[ 52.265804] [drm] Disabling audio support
[ 52.267668] radeon 0000:16:00.0: ffff880251e28400 unpin not necessary
[ 52.268033] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[ 52.268164] [TTM] Finalizing pool allocator
[ 52.268340] [TTM] Finalizing DMA pool allocator

The thing below looks like a radeon problem?

[ 52.268389] ------------[ cut here ]------------
[ 52.268395] WARNING: at drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:533 ttm_dma_free_pool+0x101/0x110 [ttm]()
[ 52.268396] Modules linked in: ata_generic pata_acpi pata_marvell radeon ttm zram(C) rfcomm bnep rtsx_pci_ms rtsx_pci_sdmmc mmc_core memstick iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm joydev pcspkr i2c_i801 uvcvideo qcserial usb_wwan videobuf2_vmalloc videobuf2_memops videobuf2_core usbserial btusb videodev bluetooth arc4 media iwldvm mac80211 snd_hda_codec_hdmi iwlwifi r8169 rtsx_pci mii cfg80211 snd_hda_codec_realtek i915 sony_laptop rfkill intel_agp snd_hda_intel intel_gtt snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm snd_page_alloc drm snd_timer agpgart lpc_ich snd mfd_core sha256_ssse3 sha256_generic dm_crypt raid0 md_mod crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd xhci_hcd ehci_pci ehci_hcd dm_mirror
[ 52.268444] dm_region_hash dm_log dm_mod [last unloaded: microcode]
[ 52.268449] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G C 3.10.0-rc6-rafael #1
[ 52.268450] Hardware name: Sony Corporation VPCZ23A4R/VAIO, BIOS R1013H5 05/21/2012
[ 52.268455] Workqueue: kacpi_hotplug _handle_hotplug_event_func
[ 52.268456] ffffffffa081f268 ffff8802540d38b8 ffffffff8165af98 ffff8802540d38f8
[ 52.268460] ffffffff8103c8cb ffff8802540d38c8 ffff880251f53700 ffff8802513e23c0
[ 52.268463] 0000000000000008 ffff880253d02530 ffff880254079730 ffff8802540d3908
[ 52.268466] Call Trace:
[ 52.268470] [<ffffffff8165af98>] dump_stack+0x19/0x1b
[ 52.268474] [<ffffffff8103c8cb>] warn_slowpath_common+0x6b/0xa0
[ 52.268476] [<ffffffff8103c915>] warn_slowpath_null+0x15/0x20
[ 52.268480] [<ffffffffa081cd01>] ttm_dma_free_pool+0x101/0x110 [ttm]
[ 52.268484] [<ffffffffa081dee5>] ttm_dma_page_alloc_fini+0x85/0xcc [ttm]
[ 52.268487] [<ffffffffa0813455>] ttm_mem_global_release+0x25/0xa0 [ttm]
[ 52.268497] [<ffffffffa08494ed>] radeon_ttm_mem_global_release+0xd/0x10 [radeon]
[ 52.268507] [<ffffffffa0176213>] drm_global_item_unref+0x63/0x90 [drm]
[ 52.268514] [<ffffffffa084a721>] radeon_ttm_fini+0xd1/0xe0 [radeon]
[ 52.268522] [<ffffffffa084b129>] radeon_bo_fini+0x9/0x10 [radeon]
[ 52.268531] [<ffffffffa0893bf1>] evergreen_fini+0x91/0xc0 [radeon]
[ 52.268537] [<ffffffffa08336ea>] radeon_device_fini+0x3a/0xf0 [radeon]
[ 52.268543] [<ffffffffa08353d1>] radeon_driver_unload_kms+0x41/0x70 [radeon]
[ 52.268550] [<ffffffffa016524e>] drm_put_dev+0x6e/0x210 [drm]
[ 52.268555] [<ffffffffa0832068>] radeon_pci_remove+0x18/0x20 [radeon]
[ 52.268557] [<ffffffff81385831>] pci_device_remove+0x41/0xc0
[ 52.268561] [<ffffffff8143bfd7>] __device_release_driver+0x77/0xe0
[ 52.268563] [<ffffffff8143c069>] device_release_driver+0x29/0x40
[ 52.268566] [<ffffffff8143ba71>] bus_remove_device+0xf1/0x140
[ 52.268568] [<ffffffff8143915d>] device_del+0x11d/0x1b0
[ 52.268572] [<ffffffff8138082c>] pci_stop_bus_device+0x9c/0xb0
[ 52.268574] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0
[ 52.268576] [<ffffffff813a11e5>] ? acpiphp_disable_slot+0x35/0x140
[ 52.268579] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0
[ 52.268581] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0
[ 52.268584] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0
[ 52.268586] [<ffffffff81380991>] pci_stop_and_remove_bus_device+0x11/0x20
[ 52.268588] [<ffffffff813a1236>] acpiphp_disable_slot+0x86/0x140
[ 52.268591] [<ffffffff813a1622>] _handle_hotplug_event_func+0x102/0x1e0
[ 52.268594] [<ffffffff8105c2c2>] process_one_work+0x1c2/0x560
[ 52.268597] [<ffffffff8105c257>] ? process_one_work+0x157/0x560
[ 52.268599] [<ffffffff8105d1d6>] worker_thread+0x116/0x370
[ 52.268601] [<ffffffff8105d0c0>] ? manage_workers.isra.20+0x2d0/0x2d0
[ 52.268604] [<ffffffff81063a36>] kthread+0xd6/0xe0
[ 52.268606] [<ffffffff816611ab>] ? _raw_spin_unlock_irq+0x2b/0x60
[ 52.268609] [<ffffffff81063960>] ? __init_kthread_worker+0x70/0x70
[ 52.268612] [<ffffffff8166856c>] ret_from_fork+0x7c/0xb0
[ 52.268615] [<ffffffff81063960>] ? __init_kthread_worker+0x70/0x70
[ 52.268669] ---[ end trace 46b4e977738a3df6 ]---
[ 52.269258] [TTM] Zone kernel: Used memory at exit: 13 kiB
[ 52.269266] [TTM] Zone dma32: Used memory at exit: 9 kiB
[ 52.269295] [drm] radeon: ttm finalized

But it looks like the removal actually succeeded.

[ 52.273103] pci_bus 0000:0b: busn_res: [bus 0b] is released
[ 52.273732] pci_bus 0000:0c: busn_res: [bus 0c] is released
[ 52.273816] pci_bus 0000:16: busn_res: [bus 16] is released

Is the re-dock attempt included? It doesn't seem to leave any trace ...

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-21 12:56:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Friday, June 21, 2013 10:37:47 AM Alexander E. Patrakov wrote:
> 2013/6/21 Alexander E. Patrakov <[email protected]>:
> > The initial dock went OK. The subsequent undock resulted in the blue
> > led on the dock cable turning off quickly, but in PCI devices slowly,
> > one-by-one, disappearing from the bus. Also, there were "acpi_handle
> > corrupt" messages in dmesg. The subsequent dock resulted in no devices
> > added to the bus. So - your patch is not a good replacement for
> > patches 2 and 3 in the original series.
>
> In this state, echo 1 > /sys/bus/pci/rescan resulted in a kernel BUG,
> see the attached dmesg.

This one is much more interesting than the previous one.

So first of all we have a deadlock related to the
flush_workqueue(kacpi_notify_wq) in acpi_os_wait_events_complete(). I beleve
it is related to the removal of a notify handler in cleanup_bridge().

[ 242.063120] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[ 242.063124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.063126] kworker/0:0 D ffff8802540adf40 0 4 2 0x00000000
[ 242.063134] Workqueue: kacpi_hotplug _handle_hotplug_event_func
[ 242.063136] ffff8802540d3828 0000000000000046 ffff8802540ae6d8 4ba25e032c97a1cb
[ 242.063139] ffff8802540d3738 ffff8802540d3fd8 ffff8802540d3fd8 0000000000004000
[ 242.063142] ffffffff81c11440 ffff8802540adf40 ffff8802540d3758 ffffffff8109cadb
[ 242.063145] Call Trace:
[ 242.063150] [<ffffffff8109cadb>] ? add_lock_to_list.isra.20.constprop.37+0x7b/0xc0
[ 242.063152] [<ffffffff8109fcc5>] ? __lock_acquire+0x1265/0x1ee0
[ 242.063156] [<ffffffff8101020a>] ? save_stack_trace+0x2a/0x50
[ 242.063159] [<ffffffff8165f764>] schedule+0x24/0x70
[ 242.063163] [<ffffffff8165cab5>] schedule_timeout+0x205/0x260
[ 242.063165] [<ffffffff810a1841>] ? mark_held_locks+0x61/0x150
[ 242.063169] [<ffffffff81059deb>] ? flush_workqueue_prep_pwqs+0x6b/0x1b0
[ 242.063171] [<ffffffff816611ab>] ? _raw_spin_unlock_irq+0x2b/0x60
[ 242.063174] [<ffffffff810a1a35>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 242.063176] [<ffffffff8165ea74>] wait_for_common+0x114/0x170
[ 242.063179] [<ffffffff8165e195>] ? __mutex_unlock_slowpath+0xf5/0x1a0
[ 242.063182] [<ffffffff810742a0>] ? try_to_wake_up+0x310/0x310
[ 242.063185] [<ffffffff8165eae8>] wait_for_completion+0x18/0x20
[ 242.063187] [<ffffffff8105abca>] flush_workqueue+0x1ca/0x6c0
[ 242.063190] [<ffffffff8105aa00>] ? wq_cpumask_show+0x80/0x80
[ 242.063193] [<ffffffff814380a5>] ? dev_vprintk_emit+0x55/0x70
[ 242.063196] [<ffffffff813a0260>] ? handle_hotplug_event_func+0x80/0x80
[ 242.063199] [<ffffffff813bfd67>] acpi_os_wait_events_complete+0x1c/0x1e
[ 242.063202] [<ffffffff813d31e2>] acpi_remove_notify_handler+0x40/0x178
[ 242.063204] [<ffffffff813a0ef8>] acpiphp_remove_slots+0x88/0x200
[ 242.063207] [<ffffffff813a5c15>] acpi_pci_remove_bus+0x15/0x20
[ 242.063210] [<ffffffff815335d9>] pcibios_remove_bus+0x9/0x10
[ 242.063215] [<ffffffff81380881>] pci_remove_bus+0x41/0x60
[ 242.063217] [<ffffffff813808eb>] pci_remove_bus_device+0x4b/0xe0
[ 242.063219] [<ffffffff813808db>] pci_remove_bus_device+0x3b/0xe0
[ 242.063222] [<ffffffff813808db>] pci_remove_bus_device+0x3b/0xe0
[ 242.063224] [<ffffffff813808db>] pci_remove_bus_device+0x3b/0xe0
[ 242.063226] [<ffffffff81380999>] pci_stop_and_remove_bus_device+0x19/0x20
[ 242.063229] [<ffffffff813a1236>] acpiphp_disable_slot+0x86/0x140
[ 242.063231] [<ffffffff813a1622>] _handle_hotplug_event_func+0x102/0x1e0
[ 242.063234] [<ffffffff8105c2c2>] process_one_work+0x1c2/0x560
[ 242.063236] [<ffffffff8105c257>] ? process_one_work+0x157/0x560
[ 242.063238] [<ffffffff8105d1d6>] worker_thread+0x116/0x370
[ 242.063241] [<ffffffff8105d0c0>] ? manage_workers.isra.20+0x2d0/0x2d0
[ 242.063243] [<ffffffff81063a36>] kthread+0xd6/0xe0
[ 242.063245] [<ffffffff816611ab>] ? _raw_spin_unlock_irq+0x2b/0x60
[ 242.063248] [<ffffffff81063960>] ? __init_kthread_worker+0x70/0x70
[ 242.063251] [<ffffffff8166856c>] ret_from_fork+0x7c/0xb0
[ 242.063253] [<ffffffff81063960>] ? __init_kthread_worker+0x70/0x70
[ 242.063255] 4 locks held by kworker/0:0/4:
[ 242.063256] #0: (kacpi_hotplug){.+.+.+}, at: [<ffffffff8105c257>] process_one_work+0x157/0x560
[ 242.063262] #1: ((&hp_work->work)){+.+.+.}, at: [<ffffffff8105c257>] process_one_work+0x157/0x560
[ 242.063266] #2: (acpi_scan_lock){+.+.+.}, at: [<ffffffff813c3ac6>] acpi_scan_lock_acquire+0x12/0x14
[ 242.063272] #3: (&slot->crit_sect){+.+.+.}, at: [<ffffffff813a11d0>] acpiphp_disable_slot+0x20/0x140

And that is the source of the subsequent re-dock failure (the event cannot be
processed due to the workqueue being stuck). I don't know why this happens at
the moment, but I'm going to figure out.

And the BUG is because we pass a NULL pointer to sysfs_create_bin_file()
somewhere in pci_create_sysfs_dev_files(), but again I don't know why exactly
at the moment.

Let's communicate further in bug #59501.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-21 13:02:40

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013/6/21 Rafael J. Wysocki <[email protected]>:
> The thing below looks like a radeon problem?
>
> [ 52.268389] ------------[ cut here ]------------
> [ 52.268395] WARNING: at drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:533 ttm_dma_free_pool+0x101/0x110 [ttm]()

Yes. Let's ignore it for now, as it is a separate bug, likely a
special case of https://bugzilla.kernel.org/show_bug.cgi?id=59681

> But it looks like the removal actually succeeded.

It did.

> [ 52.273103] pci_bus 0000:0b: busn_res: [bus 0b] is released
> [ 52.273732] pci_bus 0000:0c: busn_res: [bus 0c] is released
> [ 52.273816] pci_bus 0000:16: busn_res: [bus 16] is released
>
> Is the re-dock attempt included? It doesn't seem to leave any trace ...

It is, and indeed it left no traces. See the followup when I attached
the result of manually rescanning the bus - you have already commented
on the deadlock there.

--
Alexander E. Patrakov

2013-06-21 16:54:33

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
>>> OK, let's try to untangle this a bit.
>>>
>>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
>>> does remain unfixed?
>>
>> [not tested, can do so in 12 hours if needed]
>>
>> I think there will be problems on undocking and/or on the second
>> docking, as described in comments #6 - #8 of
>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
>
> OK, I think I have something that might work. It may not solve all problems,
> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> if you can.
>
> Please apply [1/4] and [4/4] and the one below and see what happens.
>
> Thanks,
> Rafael
>
>
> ---
> Rationale:
> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
> after removing the companion PCI devices, so the dock station code
> doesn't need to trim them separately for the dependent devices handled
> by acpiphp.
>
> Moreover, acpiphp_glue.c is the only user of
> [un]register_hotplug_dock_device(), so *all* devices on the
> ds->hotplug_devices list are handled by acpiphp and ops is set for all
> of them.
Hi Rafael,
There's an ongoing patch to fix a disk bay hotplug regression, which
may add a second caller of register_hotplug_device(). Please refer to
bug 59871, and the proposed patch is at:
https://bugzilla.kernel.org/attachment.cgi?id=105581

>
> This means that (1) the ds->hotplug_devices list is not necessary (we
> can always walk ds->dependent_devices instead and look for those that
> have dd->ops set) and (2) we don't need to call
> dock_remove_acpi_device(dd->handle) on eject for any of those devices,
> because dd->ops->handler() is going to take care of the ACPI device
> objects trimming for them anyway.
>
> Taking the above into account make the following changes:
> (1) Drop hotplug_devices from struct dock_station.
> (2) Drop dock_{add|del}_hotplug_device()
> (3) Make [un]register_hotplug_dock_device() [un]set 'ops' and
> 'context' for the given device under ds->hp_lock.
> (4) Add hot_remove_dock_devices() that walks ds->dependent_devices and
> either calls dd->ops->handler(), if present, or trims the underlying
> ACPI device object, otherwise.
> (5) Replace hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST) calls
> with hot_remove_dock_devices(ds).
> (6) Rename hotplug_dock_devices() to hot_add_dock_devices() and make
> it only handle bus check and device check requests. Make it walk
> ds->dependent_devices instead of ds->hotplug devices.
> (7) Make dock_event() walk ds->dependent_devices (instead of
> ds->hotplug devices) under ds->hp_lock.
> ---
> drivers/acpi/dock.c | 111 ++++++++++++++++++++++++----------------------------
> 1 file changed, 53 insertions(+), 58 deletions(-)
>
> Index: linux-pm/drivers/acpi/dock.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/dock.c
> +++ linux-pm/drivers/acpi/dock.c
> @@ -66,7 +66,6 @@ struct dock_station {
> spinlock_t dd_lock;
> struct mutex hp_lock;
> struct list_head dependent_devices;
> - struct list_head hotplug_devices;
>
> struct list_head sibling;
> struct platform_device *dock_device;
> @@ -121,38 +120,6 @@ add_dock_dependent_device(struct dock_st
> }
>
> /**
> - * dock_add_hotplug_device - associate a hotplug handler with the dock station
> - * @ds: The dock station
> - * @dd: The dependent device struct
> - *
> - * Add the dependent device to the dock's hotplug device list
> - */
> -static void
> -dock_add_hotplug_device(struct dock_station *ds,
> - struct dock_dependent_device *dd)
> -{
> - mutex_lock(&ds->hp_lock);
> - list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
> - mutex_unlock(&ds->hp_lock);
> -}
> -
> -/**
> - * dock_del_hotplug_device - remove a hotplug handler from the dock station
> - * @ds: The dock station
> - * @dd: the dependent device struct
> - *
> - * Delete the dependent device from the dock's hotplug device list
> - */
> -static void
> -dock_del_hotplug_device(struct dock_station *ds,
> - struct dock_dependent_device *dd)
> -{
> - mutex_lock(&ds->hp_lock);
> - list_del(&dd->hotplug_list);
> - mutex_unlock(&ds->hp_lock);
> -}
> -
> -/**
> * find_dock_dependent_device - get a device dependent on this dock
> * @ds: the dock station
> * @handle: the acpi_handle of the device we want
> @@ -342,40 +309,60 @@ static void dock_remove_acpi_device(acpi
> }
>
> /**
> - * hotplug_dock_devices - insert or remove devices on the dock station
> - * @ds: the dock station
> - * @event: either bus check or eject request
> + * hot_remove_dock_devices - Remove devices on a dock station.
> + * @ds: Dock station to remove devices for.
> + *
> + * For each device depending on @ds, if a dock event handler is registered,
> + * call it for the device, or trim the underlying ACPI device object otherwise.
> + *
> + * Dock event handlers are responsible for trimming the underlying ACPI device
> + * objects if present.
> + */
> +static void hot_remove_dock_devices(struct dock_station *ds)
> +{
> + struct dock_dependent_device *dd;
> +
> + mutex_lock(&ds->hp_lock);
> +
> + list_for_each_entry(dd, &ds->dependent_devices, list) {
> + if (dd->ops && dd->ops->handler)
> + dd->ops->handler(dd->handle, ACPI_NOTIFY_EJECT_REQUEST,
> + dd->context);
> + else
> + dock_remove_acpi_device(dd->handle);
> + }
The proposed patch for bug 59871 may not be safe with above changes
because the ACPI ATA hotplug handler may not remove ACPI devices as
acpiphp driver does.

On the other hand, the above change does get rid of the warning message
"Oops, 'acpi_handle' corrupt", but it may hide the real issue. With
current implementation, devices on the dock station are stopped and
removed after invoking ACPI _DCK method, which seems a little dangerous.
I think ACPI _DCK method should be called to power off the dock station
after stopping all affected devices.

Regards!
Gerry

> +
> + mutex_unlock(&ds->hp_lock);
> +}
> +
> +/**
> + * hot_add_dock_devices - Insert devices on a dock station.
> + * @ds: Dock station to insert devices for.
> *
> * Some devices on the dock station need to have drivers called
> * to perform hotplug operations after a dock event has occurred.
> * Traverse the list of dock devices that have registered a
> * hotplug handler, and call the handler.
> */
> -static void hotplug_dock_devices(struct dock_station *ds, u32 event)
> +static void hot_add_dock_devices(struct dock_station *ds, u32 event)
> {
> struct dock_dependent_device *dd;
>
> mutex_lock(&ds->hp_lock);
>
> - /*
> - * First call driver specific hotplug functions
> - */
> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
> + /* First call driver specific hotplug event handlers. */
> + list_for_each_entry(dd, &ds->dependent_devices, list)
> if (dd->ops && dd->ops->handler)
> dd->ops->handler(dd->handle, event, dd->context);
>
> /*
> - * Now make sure that an acpi_device is created for each
> - * dependent device, or removed if this is an eject request.
> - * This will cause acpi_drivers to be stopped/started if they
> - * exist
> + * Now make sure that an acpi_device is created for each dependent
> + * device. That will cause scan handlers to attach to devices objects
> + * or acpi_drivers to be started if they exist.
> */
> - list_for_each_entry(dd, &ds->dependent_devices, list) {
> - if (event == ACPI_NOTIFY_EJECT_REQUEST)
> - dock_remove_acpi_device(dd->handle);
> - else
> - dock_create_acpi_device(dd->handle);
> - }
> + list_for_each_entry(dd, &ds->dependent_devices, list)
> + dock_create_acpi_device(dd->handle);
> +
> mutex_unlock(&ds->hp_lock);
> }
>
> @@ -398,10 +385,14 @@ static void dock_event(struct dock_stati
> if (num == DOCK_EVENT)
> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
>
> - list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list)
> + mutex_lock(&ds->hp_lock);
> +
> + list_for_each_entry(dd, &ds->dependent_devices, list)
> if (dd->ops && dd->ops->uevent)
> dd->ops->uevent(dd->handle, event, dd->context);
>
> + mutex_unlock(&ds->hp_lock);
> +
> if (num != DOCK_EVENT)
> kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
> }
> @@ -598,9 +589,10 @@ register_hotplug_dock_device(acpi_handle
> */
> dd = find_dock_dependent_device(dock_station, handle);
> if (dd) {
> + mutex_lock(&dock_station->hp_lock);
> dd->ops = ops;
> dd->context = context;
> - dock_add_hotplug_device(dock_station, dd);
> + mutex_unlock(&dock_station->hp_lock);
> ret = 0;
> }
> }
> @@ -623,8 +615,12 @@ void unregister_hotplug_dock_device(acpi
>
> list_for_each_entry(dock_station, &dock_stations, sibling) {
> dd = find_dock_dependent_device(dock_station, handle);
> - if (dd)
> - dock_del_hotplug_device(dock_station, dd);
> + if (dd) {
> + mutex_lock(&dock_station->hp_lock);
> + dd->ops = NULL;
> + dd->context = NULL;
> + mutex_unlock(&dock_station->hp_lock);
> + }
> }
> }
> EXPORT_SYMBOL_GPL(unregister_hotplug_dock_device);
> @@ -649,7 +645,7 @@ static int handle_eject_request(struct d
> */
> dock_event(ds, event, UNDOCK_EVENT);
>
> - hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST);
> + hot_remove_dock_devices(ds);
> undock(ds);
> dock_lock(ds, 0);
> eject_dock(ds);
> @@ -708,7 +704,7 @@ static void dock_notify(acpi_handle hand
> }
> atomic_notifier_call_chain(&dock_notifier_list,
> event, NULL);
> - hotplug_dock_devices(ds, event);
> + hot_add_dock_devices(ds, event);
> complete_dock(ds);
> dock_event(ds, event, DOCK_EVENT);
> dock_lock(ds, 1);
> @@ -953,7 +949,6 @@ static int __init dock_add(acpi_handle h
> mutex_init(&dock_station->hp_lock);
> spin_lock_init(&dock_station->dd_lock);
> INIT_LIST_HEAD(&dock_station->sibling);
> - INIT_LIST_HEAD(&dock_station->hotplug_devices);
> ATOMIC_INIT_NOTIFIER_HEAD(&dock_notifier_list);
> INIT_LIST_HEAD(&dock_station->dependent_devices);
>
>
>
>

2013-06-22 00:04:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Saturday, June 22, 2013 12:54:21 AM Jiang Liu wrote:
> On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
> > On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
> >> 2013/6/19 Rafael J. Wysocki <[email protected]>:
> >>> OK, let's try to untangle this a bit.
> >>>
> >>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
> >>> does remain unfixed?
> >>
> >> [not tested, can do so in 12 hours if needed]
> >>
> >> I think there will be problems on undocking and/or on the second
> >> docking, as described in comments #6 - #8 of
> >> https://bugzilla.kernel.org/show_bug.cgi?id=59501
> >
> > OK, I think I have something that might work. It may not solve all problems,
> > but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> > if you can.
> >
> > Please apply [1/4] and [4/4] and the one below and see what happens.
> >
> > Thanks,
> > Rafael
> >
> >
> > ---
> > Rationale:
> > acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
> > after removing the companion PCI devices, so the dock station code
> > doesn't need to trim them separately for the dependent devices handled
> > by acpiphp.
> >
> > Moreover, acpiphp_glue.c is the only user of
> > [un]register_hotplug_dock_device(), so *all* devices on the
> > ds->hotplug_devices list are handled by acpiphp and ops is set for all
> > of them.
> Hi Rafael,
> There's an ongoing patch to fix a disk bay hotplug regression, which
> may add a second caller of register_hotplug_device(). Please refer to
> bug 59871, and the proposed patch is at:
> https://bugzilla.kernel.org/attachment.cgi?id=105581
>
> >
> > This means that (1) the ds->hotplug_devices list is not necessary (we
> > can always walk ds->dependent_devices instead and look for those that
> > have dd->ops set) and (2) we don't need to call
> > dock_remove_acpi_device(dd->handle) on eject for any of those devices,
> > because dd->ops->handler() is going to take care of the ACPI device
> > objects trimming for them anyway.
> >
> > Taking the above into account make the following changes:
> > (1) Drop hotplug_devices from struct dock_station.
> > (2) Drop dock_{add|del}_hotplug_device()
> > (3) Make [un]register_hotplug_dock_device() [un]set 'ops' and
> > 'context' for the given device under ds->hp_lock.
> > (4) Add hot_remove_dock_devices() that walks ds->dependent_devices and
> > either calls dd->ops->handler(), if present, or trims the underlying
> > ACPI device object, otherwise.
> > (5) Replace hotplug_dock_devices(ds, ACPI_NOTIFY_EJECT_REQUEST) calls
> > with hot_remove_dock_devices(ds).
> > (6) Rename hotplug_dock_devices() to hot_add_dock_devices() and make
> > it only handle bus check and device check requests. Make it walk
> > ds->dependent_devices instead of ds->hotplug devices.
> > (7) Make dock_event() walk ds->dependent_devices (instead of
> > ds->hotplug devices) under ds->hp_lock.
> > ---
> > drivers/acpi/dock.c | 111 ++++++++++++++++++++++++----------------------------
> > 1 file changed, 53 insertions(+), 58 deletions(-)
> >
> > Index: linux-pm/drivers/acpi/dock.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/dock.c
> > +++ linux-pm/drivers/acpi/dock.c
> > @@ -66,7 +66,6 @@ struct dock_station {
> > spinlock_t dd_lock;
> > struct mutex hp_lock;
> > struct list_head dependent_devices;
> > - struct list_head hotplug_devices;
> >
> > struct list_head sibling;
> > struct platform_device *dock_device;
> > @@ -121,38 +120,6 @@ add_dock_dependent_device(struct dock_st
> > }
> >
> > /**
> > - * dock_add_hotplug_device - associate a hotplug handler with the dock station
> > - * @ds: The dock station
> > - * @dd: The dependent device struct
> > - *
> > - * Add the dependent device to the dock's hotplug device list
> > - */
> > -static void
> > -dock_add_hotplug_device(struct dock_station *ds,
> > - struct dock_dependent_device *dd)
> > -{
> > - mutex_lock(&ds->hp_lock);
> > - list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
> > - mutex_unlock(&ds->hp_lock);
> > -}
> > -
> > -/**
> > - * dock_del_hotplug_device - remove a hotplug handler from the dock station
> > - * @ds: The dock station
> > - * @dd: the dependent device struct
> > - *
> > - * Delete the dependent device from the dock's hotplug device list
> > - */
> > -static void
> > -dock_del_hotplug_device(struct dock_station *ds,
> > - struct dock_dependent_device *dd)
> > -{
> > - mutex_lock(&ds->hp_lock);
> > - list_del(&dd->hotplug_list);
> > - mutex_unlock(&ds->hp_lock);
> > -}
> > -
> > -/**
> > * find_dock_dependent_device - get a device dependent on this dock
> > * @ds: the dock station
> > * @handle: the acpi_handle of the device we want
> > @@ -342,40 +309,60 @@ static void dock_remove_acpi_device(acpi
> > }
> >
> > /**
> > - * hotplug_dock_devices - insert or remove devices on the dock station
> > - * @ds: the dock station
> > - * @event: either bus check or eject request
> > + * hot_remove_dock_devices - Remove devices on a dock station.
> > + * @ds: Dock station to remove devices for.
> > + *
> > + * For each device depending on @ds, if a dock event handler is registered,
> > + * call it for the device, or trim the underlying ACPI device object otherwise.
> > + *
> > + * Dock event handlers are responsible for trimming the underlying ACPI device
> > + * objects if present.
> > + */
> > +static void hot_remove_dock_devices(struct dock_station *ds)
> > +{
> > + struct dock_dependent_device *dd;
> > +
> > + mutex_lock(&ds->hp_lock);
> > +
> > + list_for_each_entry(dd, &ds->dependent_devices, list) {
> > + if (dd->ops && dd->ops->handler)
> > + dd->ops->handler(dd->handle, ACPI_NOTIFY_EJECT_REQUEST,
> > + dd->context);
> > + else
> > + dock_remove_acpi_device(dd->handle);
> > + }
> The proposed patch for bug 59871 may not be safe with above changes
> because the ACPI ATA hotplug handler may not remove ACPI devices as
> acpiphp driver does.

Well, since there's not ATA hotplug handler in 3.10-rc6, as far as I can say,
I suppose that you're talking about a new patch scheduled for 3.11. If so,
can you please give me a pointer to that patch, possibly the tree it is
queued on etc.?

> On the other hand, the above change does get rid of the warning message
> "Oops, 'acpi_handle' corrupt", but it may hide the real issue. With
> current implementation, devices on the dock station are stopped and
> removed after invoking ACPI _DCK method, which seems a little dangerous.

Yes, that's something I actually overlooked. In fact, the execution of
_DCK has to wait for all of the asynchronous work items spawned by
hotplug_dock_devices() to complete, so we *have* *to* run the acpiphp
stuff (the internals of _handle_hotplug_event_func() basically)
synchronously from the dock context.

> I think ACPI _DCK method should be called to power off the dock station
> after stopping all affected devices.

That's correct. Not really to power off, but to disconnect ("isolate from
connector" as the spec puts that), although from the kernel's point of view
the result is pretty much the same - the devices are gone.

I have attached a new patch for Alexander to try at
https://bugzilla.kernel.org/show_bug.cgi?id=59501#c25

It kind of combines your patches [2/4] and [3/4] from the $subject series with
my last patch. The most obvious difference is that it doesn't use klists. :-)

Seriously, I really think we don't need a separate "hotplug devices" list
for docking stations, one "dependent devices" list should be sufficient for
everything (it doesn't change anyway after the initialization), but I stole
your idea with the "get" and "put" routines (I called them "init" and
"release"). However, I didn't add them to struct acpi_dock_ops, but modified
register_hotplug_dock_device() to pass them directly instead (I believe this is
less prone to errors that way, because the callers of
register_hotplug_dock_device() cannot really overlook them).

We still may need to modify find_dock_devices() on top of that.

To summarize, we have multiple different problems in that code.

First, there is the ordering issue of the dock initialization versus PCI
enumeration, so basically dock_init() has to run before the main ACPI namespace
scan. This is addressed by your patch [1/4] in the $subject series, but I'm
not 100% happy with that approach (I believe we need it as a stopgap fix for
now, though), because it means we have to carry out a full namespace walk
(possibly several of them even) before we even start to create struct
acpi_device objects and that doesn't sound quite right.

Second, there is the resources allocation issue addressed by your patch [4/4]
from the $subject series. I believe that this patch is correct and Yinghai
seems to agree with me.

Next, there is the problem with asynchronous handling of dock events by acpiphp
that we're trying to solve at the moment and a couple of things are clear to
me here:

1. Clearly, the dock driver assumes that dd->ops->handler() will always be
synchronous, because otherwise there would have been a completion mechanism
preventing _DCK from being executed before all of the handlers return.
Since there's no such thing, we're dealing with a genuine acpiphp bug
in its implementation of the dock handler and I'm not sure how it's ever
been supposed to work.

2. Whoever wrote find_dock_devices() didn't seem to know how
acpi_bus_scan() worked, or that function would have been arranged
differently and (moreover) the whay acpi_bus_scan() works has changed
since then, so the assumptions made there may not be valid any more.
Moreover, it looks like ds->dependent_list should be walked in the
reverse order during removal.

3. When we remove dock_{add|del}_hotplug_device(), this way or another,
ds->hp_lock will be pointless, because its only user is
hotplug_dock_devices() and it is always called under acpi_scan_lock.

Finally, there seem to be problems with PCI device drivers' .remove() callbacks
not working correctly in some cases and causing problems to happen.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-22 02:47:37

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On 06/22/2013 08:13 AM, Rafael J. Wysocki wrote:
> On Saturday, June 22, 2013 12:54:21 AM Jiang Liu wrote:
>> On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
>>> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
>>>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
>>>>> OK, let's try to untangle this a bit.
>>>>>
>>>>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
>>>>> does remain unfixed?
>>>>
>>>> [not tested, can do so in 12 hours if needed]
>>>>
>>>> I think there will be problems on undocking and/or on the second
>>>> docking, as described in comments #6 - #8 of
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
>>>
>>> OK, I think I have something that might work. It may not solve all problems,
>>> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
>>> if you can.
>>>
>>> Please apply [1/4] and [4/4] and the one below and see what happens.
>>>
>>> Thanks,
>>> Rafael
>>>
>>>
>>> ---
>>> Rationale:
>>> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
>>> after removing the companion PCI devices, so the dock station code
>>> doesn't need to trim them separately for the dependent devices handled
>>> by acpiphp.
>>>
>>> Moreover, acpiphp_glue.c is the only user of
>>> [un]register_hotplug_dock_device(), so *all* devices on the
>>> ds->hotplug_devices list are handled by acpiphp and ops is set for all
>>> of them.
>> Hi Rafael,
>> There's an ongoing patch to fix a disk bay hotplug regression, which
>> may add a second caller of register_hotplug_device(). Please refer to
>> bug 59871, and the proposed patch is at:
>> https://bugzilla.kernel.org/attachment.cgi?id=105581
Hi Rafael,
You can find the proposed patch for ATA Bay hotplug at
https://bugzilla.kernel.org/attachment.cgi?id=105581
Regards!
Gerry

>>
>> The proposed patch for bug 59871 may not be safe with above changes
>> because the ACPI ATA hotplug handler may not remove ACPI devices as
>> acpiphp driver does.
>
> Well, since there's not ATA hotplug handler in 3.10-rc6, as far as I can say,
> I suppose that you're talking about a new patch scheduled for 3.11. If so,u
> can you please give me a pointer to that patch, possibly the tree it is
> queued on etc.?
[...]
>
> Thanks,
> Rafael
>
>

2013-06-22 19:50:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Saturday, June 22, 2013 10:47:24 AM Jiang Liu wrote:
> On 06/22/2013 08:13 AM, Rafael J. Wysocki wrote:
> > On Saturday, June 22, 2013 12:54:21 AM Jiang Liu wrote:
> >> On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
> >>> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
> >>>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
> >>>>> OK, let's try to untangle this a bit.
> >>>>>
> >>>>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
> >>>>> does remain unfixed?
> >>>>
> >>>> [not tested, can do so in 12 hours if needed]
> >>>>
> >>>> I think there will be problems on undocking and/or on the second
> >>>> docking, as described in comments #6 - #8 of
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
> >>>
> >>> OK, I think I have something that might work. It may not solve all problems,
> >>> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> >>> if you can.
> >>>
> >>> Please apply [1/4] and [4/4] and the one below and see what happens.
> >>>
> >>> Thanks,
> >>> Rafael
> >>>
> >>>
> >>> ---
> >>> Rationale:
> >>> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
> >>> after removing the companion PCI devices, so the dock station code
> >>> doesn't need to trim them separately for the dependent devices handled
> >>> by acpiphp.
> >>>
> >>> Moreover, acpiphp_glue.c is the only user of
> >>> [un]register_hotplug_dock_device(), so *all* devices on the
> >>> ds->hotplug_devices list are handled by acpiphp and ops is set for all
> >>> of them.
> >> Hi Rafael,
> >> There's an ongoing patch to fix a disk bay hotplug regression, which
> >> may add a second caller of register_hotplug_device(). Please refer to
> >> bug 59871, and the proposed patch is at:
> >> https://bugzilla.kernel.org/attachment.cgi?id=105581
> Hi Rafael,
> You can find the proposed patch for ATA Bay hotplug at
> https://bugzilla.kernel.org/attachment.cgi?id=105581

Well, maybe it'll have to be reworked slightly.

Which BZ entry is that?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-23 15:57:58

by Jiang Liu

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On 06/23/2013 03:59 AM, Rafael J. Wysocki wrote:
> On Saturday, June 22, 2013 10:47:24 AM Jiang Liu wrote:
>> On 06/22/2013 08:13 AM, Rafael J. Wysocki wrote:
>>> On Saturday, June 22, 2013 12:54:21 AM Jiang Liu wrote:
>>>> On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
>>>>> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
>>>>>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
>>>>>>> OK, let's try to untangle this a bit.
>>>>>>>
>>>>>>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
>>>>>>> does remain unfixed?
>>>>>>
>>>>>> [not tested, can do so in 12 hours if needed]
>>>>>>
>>>>>> I think there will be problems on undocking and/or on the second
>>>>>> docking, as described in comments #6 - #8 of
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
>>>>>
>>>>> OK, I think I have something that might work. It may not solve all problems,
>>>>> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
>>>>> if you can.
>>>>>
>>>>> Please apply [1/4] and [4/4] and the one below and see what happens.
>>>>>
>>>>> Thanks,
>>>>> Rafael
>>>>>
>>>>>
>>>>> ---
>>>>> Rationale:
>>>>> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
>>>>> after removing the companion PCI devices, so the dock station code
>>>>> doesn't need to trim them separately for the dependent devices handled
>>>>> by acpiphp.
>>>>>
>>>>> Moreover, acpiphp_glue.c is the only user of
>>>>> [un]register_hotplug_dock_device(), so *all* devices on the
>>>>> ds->hotplug_devices list are handled by acpiphp and ops is set for all
>>>>> of them.
>>>> Hi Rafael,
>>>> There's an ongoing patch to fix a disk bay hotplug regression, which
>>>> may add a second caller of register_hotplug_device(). Please refer to
>>>> bug 59871, and the proposed patch is at:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=105581
>> Hi Rafael,
>> You can find the proposed patch for ATA Bay hotplug at
>> https://bugzilla.kernel.org/attachment.cgi?id=105581
>
> Well, maybe it'll have to be reworked slightly.
>
> Which BZ entry is that?
Hi Rafael,
It's bug 59871, please refer to:
https://bugzilla.kernel.org/show_bug.cgi?id=59871
Regards!
Gerry

>
> Rafael
>
>

2013-06-23 21:41:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Sunday, June 23, 2013 11:57:46 PM Jiang Liu wrote:
> On 06/23/2013 03:59 AM, Rafael J. Wysocki wrote:
> > On Saturday, June 22, 2013 10:47:24 AM Jiang Liu wrote:
> >> On 06/22/2013 08:13 AM, Rafael J. Wysocki wrote:
> >>> On Saturday, June 22, 2013 12:54:21 AM Jiang Liu wrote:
> >>>> On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
> >>>>> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
> >>>>>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
> >>>>>>> OK, let's try to untangle this a bit.
> >>>>>>>
> >>>>>>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
> >>>>>>> does remain unfixed?
> >>>>>>
> >>>>>> [not tested, can do so in 12 hours if needed]
> >>>>>>
> >>>>>> I think there will be problems on undocking and/or on the second
> >>>>>> docking, as described in comments #6 - #8 of
> >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
> >>>>>
> >>>>> OK, I think I have something that might work. It may not solve all problems,
> >>>>> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> >>>>> if you can.
> >>>>>
> >>>>> Please apply [1/4] and [4/4] and the one below and see what happens.
> >>>>>
> >>>>> Thanks,
> >>>>> Rafael
> >>>>>
> >>>>>
> >>>>> ---
> >>>>> Rationale:
> >>>>> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
> >>>>> after removing the companion PCI devices, so the dock station code
> >>>>> doesn't need to trim them separately for the dependent devices handled
> >>>>> by acpiphp.
> >>>>>
> >>>>> Moreover, acpiphp_glue.c is the only user of
> >>>>> [un]register_hotplug_dock_device(), so *all* devices on the
> >>>>> ds->hotplug_devices list are handled by acpiphp and ops is set for all
> >>>>> of them.
> >>>> Hi Rafael,
> >>>> There's an ongoing patch to fix a disk bay hotplug regression, which
> >>>> may add a second caller of register_hotplug_device(). Please refer to
> >>>> bug 59871, and the proposed patch is at:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=105581
> >> Hi Rafael,
> >> You can find the proposed patch for ATA Bay hotplug at
> >> https://bugzilla.kernel.org/attachment.cgi?id=105581
> >
> > Well, maybe it'll have to be reworked slightly.
> >
> > Which BZ entry is that?
> Hi Rafael,
> It's bug 59871, please refer to:
> https://bugzilla.kernel.org/show_bug.cgi?id=59871

OK, thanks!

I suppose it would be most convenient to simply add that patch to this
patchset.

Thanks,
Rafael

2013-06-23 21:43:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

On Sunday, June 23, 2013 11:57:46 PM Jiang Liu wrote:
> On 06/23/2013 03:59 AM, Rafael J. Wysocki wrote:
> > On Saturday, June 22, 2013 10:47:24 AM Jiang Liu wrote:
> >> On 06/22/2013 08:13 AM, Rafael J. Wysocki wrote:
> >>> On Saturday, June 22, 2013 12:54:21 AM Jiang Liu wrote:
> >>>> On 06/21/2013 03:06 AM, Rafael J. Wysocki wrote:
> >>>>> On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote:
> >>>>>> 2013/6/19 Rafael J. Wysocki <[email protected]>:
> >>>>>>> OK, let's try to untangle this a bit.
> >>>>>>>
> >>>>>>> If you applyt patches [1/4] and [4/4] from the $subject series only, what
> >>>>>>> does remain unfixed?
> >>>>>>
> >>>>>> [not tested, can do so in 12 hours if needed]
> >>>>>>
> >>>>>> I think there will be problems on undocking and/or on the second
> >>>>>> docking, as described in comments #6 - #8 of
> >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=59501
> >>>>>
> >>>>> OK, I think I have something that might work. It may not solve all problems,
> >>>>> but maybe it helps a bit. Unfortunately, I can't really test it, so please do
> >>>>> if you can.
> >>>>>
> >>>>> Please apply [1/4] and [4/4] and the one below and see what happens.
> >>>>>
> >>>>> Thanks,
> >>>>> Rafael
> >>>>>
> >>>>>
> >>>>> ---
> >>>>> Rationale:
> >>>>> acpiphp_glue.c:disable_device() trims the underlying ACPI device objects
> >>>>> after removing the companion PCI devices, so the dock station code
> >>>>> doesn't need to trim them separately for the dependent devices handled
> >>>>> by acpiphp.
> >>>>>
> >>>>> Moreover, acpiphp_glue.c is the only user of
> >>>>> [un]register_hotplug_dock_device(), so *all* devices on the
> >>>>> ds->hotplug_devices list are handled by acpiphp and ops is set for all
> >>>>> of them.
> >>>> Hi Rafael,
> >>>> There's an ongoing patch to fix a disk bay hotplug regression, which
> >>>> may add a second caller of register_hotplug_device(). Please refer to
> >>>> bug 59871, and the proposed patch is at:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=105581
> >> Hi Rafael,
> >> You can find the proposed patch for ATA Bay hotplug at
> >> https://bugzilla.kernel.org/attachment.cgi?id=105581
> >
> > Well, maybe it'll have to be reworked slightly.
> >
> > Which BZ entry is that?
> Hi Rafael,
> It's bug 59871, please refer to:
> https://bugzilla.kernel.org/show_bug.cgi?id=59871

OK, thanks!

I suppose it would be most convenient to simply add that patch to this
patchset.

Thanks,
Rafael