2014-04-10 09:18:47

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH] Suppress a device hot remove related lockdep warning

I noticed following lockdep warning when trying acpi hot-remove cpus:

[84154.204080] ======================================================
[84154.204080] [ INFO: possible circular locking dependency detected ]
[84154.204080] 3.14.0-next-20140408+ #24 Tainted: G W
[84154.204080] -------------------------------------------------------
[84154.204080] bash/777 is trying to acquire lock:
[84154.204080] (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810664a7>] cpu_maps_update_begin+0x17/0x20
[84154.213203]
[84154.213203] but task is already holding lock:
[84154.213203] (s_active#79){++++.+}, at: [<ffffffff81256e14>] kernfs_fop_write+0xe4/0x190
[84154.213203]
[84154.213203] which lock already depends on the new lock.
[84154.213203]
[84154.213203]
[84154.213203] the existing dependency chain (in reverse order) is:
[84154.213203]
-> #3 (s_active#79){++++.+}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff812552e0>] __kernfs_remove+0x250/0x310
[84154.213203] [<ffffffff81256470>] kernfs_remove_by_name_ns+0x50/0xc0
[84154.213203] [<ffffffff81257af5>] sysfs_remove_file_ns+0x15/0x20
[84154.213203] [<ffffffff813df339>] device_remove_file+0x19/0x20
[84154.213203] [<ffffffff813dff33>] device_remove_attrs+0x33/0x80
[84154.213203] [<ffffffff813e00a7>] device_del+0x127/0x1c0
[84154.213203] [<ffffffff813e0162>] device_unregister+0x22/0x60
[84154.213203] [<ffffffff813e66de>] unregister_cpu+0x1e/0x40
[84154.213203] [<ffffffff81009a33>] arch_unregister_cpu+0x23/0x30
[84154.213203] [<ffffffff8136f619>] acpi_processor_remove+0x8d/0xb2
[84154.213203] [<ffffffff8136cfff>] acpi_bus_trim+0x5a/0x8d
[84154.213203] [<ffffffff8136ec3b>] acpi_device_hotplug+0x1a8/0x3ec
[84154.213203] [<ffffffff81369002>] acpi_hotplug_work_fn+0x1f/0x2b
[84154.213203] [<ffffffff8108754b>] process_one_work+0x1eb/0x6b0
[84154.213203] [<ffffffff81087e9b>] worker_thread+0x11b/0x370
[84154.213203] [<ffffffff81090324>] kthread+0xe4/0x100
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #2 (cpu_hotplug.lock#2){+.+.+.}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff815c7700>] mutex_lock_nested+0x50/0x3c0
[84154.213203] [<ffffffff810665bf>] cpu_hotplug_begin+0x4f/0x80
[84154.213203] [<ffffffff8106666f>] _cpu_up+0x3f/0x160
[84154.213203] [<ffffffff810667f9>] cpu_up+0x69/0x80
[84154.213203] [<ffffffff81b18f14>] smp_init+0x60/0x8c
[84154.213203] [<ffffffff81b00fd8>] kernel_init_freeable+0x126/0x23b
[84154.213203] [<ffffffff815b4a3e>] kernel_init+0xe/0xf0
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #1 (cpu_hotplug.lock){++++++}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff810665b1>] cpu_hotplug_begin+0x41/0x80
[84154.213203] [<ffffffff8106666f>] _cpu_up+0x3f/0x160
[84154.213203] [<ffffffff810667f9>] cpu_up+0x69/0x80
[84154.213203] [<ffffffff81b18f14>] smp_init+0x60/0x8c
[84154.213203] [<ffffffff81b00fd8>] kernel_init_freeable+0x126/0x23b
[84154.213203] [<ffffffff815b4a3e>] kernel_init+0xe/0xf0
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #0 (cpu_add_remove_lock){+.+.+.}:
[84154.213203] [<ffffffff810c397a>] __lock_acquire+0x1f2a/0x1f60
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff815c7700>] mutex_lock_nested+0x50/0x3c0
[84154.213203] [<ffffffff810664a7>] cpu_maps_update_begin+0x17/0x20
[84154.213203] [<ffffffff815b582d>] cpu_down+0x1d/0x50
[84154.213203] [<ffffffff813e63b4>] cpu_subsys_offline+0x14/0x20
[84154.213203] [<ffffffff813e145d>] device_offline+0xad/0xd0
[84154.213203] [<ffffffff813e1562>] online_store+0x42/0x80
[84154.213203] [<ffffffff813deab8>] dev_attr_store+0x18/0x30
[84154.213203] [<ffffffff81257bb9>] sysfs_kf_write+0x49/0x60
[84154.213203] [<ffffffff81256e39>] kernfs_fop_write+0x109/0x190
[84154.213203] [<ffffffff811d15be>] vfs_write+0xbe/0x1c0
[84154.213203] [<ffffffff811d1a52>] SyS_write+0x52/0xb0
[84154.213203] [<ffffffff815d3162>] tracesys+0xd0/0xd5
[84154.213203]
[84154.213203] other info that might help us debug this:
[84154.213203]
[84154.213203] Chain exists of:
cpu_add_remove_lock --> cpu_hotplug.lock#2 --> s_active#79

[84154.213203] Possible unsafe locking scenario:
[84154.213203]
[84154.213203] CPU0 CPU1
[84154.213203] ---- ----
[84154.213203] lock(s_active#79);
[84154.213203] lock(cpu_hotplug.lock#2);
[84154.213203] lock(s_active#79);
[84154.213203] lock(cpu_add_remove_lock);
[84154.213203]
[84154.213203] *** DEADLOCK ***
.............

The deadlock itself seems already fixed in commit 5e33bc41.

This patch uses DEVICE_ATTR_IGNORE_LOCKDEP for "online" attr to suppress
this lockdep warning. But I'm a little afraid it might also hide
(future) potential real dead lock scenarios?

Signed-off-by: Li Zhong <[email protected]>
---
drivers/base/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..98e3e09 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -452,7 +452,9 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
unlock_device_hotplug();
return ret < 0 ? ret : count;
}
-static DEVICE_ATTR_RW(online);
+
+static DEVICE_ATTR_IGNORE_LOCKDEP(online, (S_IWUSR | S_IRUGO),
+ online_show, online_store);

int device_add_groups(struct device *dev, const struct attribute_group **groups)
{




2014-04-10 13:31:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH] Suppress a device hot remove related lockdep warning

Hello,

On Thu, Apr 10, 2014 at 05:18:34PM +0800, Li Zhong wrote:
> I noticed following lockdep warning when trying acpi hot-remove cpus:
>
> [84154.204080] ======================================================
> [84154.204080] [ INFO: possible circular locking dependency detected ]
> [84154.204080] 3.14.0-next-20140408+ #24 Tainted: G W
> [84154.204080] -------------------------------------------------------
> [84154.204080] bash/777 is trying to acquire lock:
> [84154.204080] (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810664a7>] cpu_maps_update_begin+0x17/0x20
> [84154.213203]
> [84154.213203] but task is already holding lock:
> [84154.213203] (s_active#79){++++.+}, at: [<ffffffff81256e14>] kernfs_fop_write+0xe4/0x190
> [84154.213203]
> [84154.213203] which lock already depends on the new lock.
> [84154.213203]
> [84154.213203]
> [84154.213203] the existing dependency chain (in reverse order) is:
> [84154.213203]
...
> [84154.213203] Possible unsafe locking scenario:
> [84154.213203]
> [84154.213203] CPU0 CPU1
> [84154.213203] ---- ----
> [84154.213203] lock(s_active#79);
> [84154.213203] lock(cpu_hotplug.lock#2);
> [84154.213203] lock(s_active#79);
> [84154.213203] lock(cpu_add_remove_lock);
> [84154.213203]
> [84154.213203] *** DEADLOCK ***
> .............
>
> The deadlock itself seems already fixed in commit 5e33bc41.
>
> This patch uses DEVICE_ATTR_IGNORE_LOCKDEP for "online" attr to suppress
> this lockdep warning. But I'm a little afraid it might also hide
> (future) potential real dead lock scenarios?

I suppose this is happening because offlining a cpu involves removing
some sysfs files? If the file isn't trying to remove itself, it
should be okay but I think it could be better to use
kernfs_break_active_protection() from online callback so that the
operation is actually outside the dependency chain rather than just
suppressing lockdep.

Thanks.

--
tejun

2014-04-11 04:10:57

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH v2] Use kernfs_break_active_protection() for device online store callbacks

I noticed following lockdep warning when trying acpi hot-remove cpus:

[84154.204080] ======================================================
[84154.204080] [ INFO: possible circular locking dependency detected ]
[84154.204080] 3.14.0-next-20140408+ #24 Tainted: G W
[84154.204080] -------------------------------------------------------
[84154.204080] bash/777 is trying to acquire lock:
[84154.204080] (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810664a7>]
cpu_maps_update_begin+0x17/0x20
[84154.213203]
[84154.213203] but task is already holding lock:
[84154.213203] (s_active#79){++++.+}, at: [<ffffffff81256e14>]
kernfs_fop_write+0xe4/0x190
[84154.213203]
[84154.213203] which lock already depends on the new lock.
[84154.213203]
[84154.213203]
[84154.213203] the existing dependency chain (in reverse order) is:
[84154.213203]
-> #3 (s_active#79){++++.+}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff812552e0>] __kernfs_remove+0x250/0x310
[84154.213203] [<ffffffff81256470>] kernfs_remove_by_name_ns
+0x50/0xc0
[84154.213203] [<ffffffff81257af5>] sysfs_remove_file_ns
+0x15/0x20
[84154.213203] [<ffffffff813df339>] device_remove_file+0x19/0x20
[84154.213203] [<ffffffff813dff33>] device_remove_attrs+0x33/0x80
[84154.213203] [<ffffffff813e00a7>] device_del+0x127/0x1c0
[84154.213203] [<ffffffff813e0162>] device_unregister+0x22/0x60
[84154.213203] [<ffffffff813e66de>] unregister_cpu+0x1e/0x40
[84154.213203] [<ffffffff81009a33>] arch_unregister_cpu+0x23/0x30
[84154.213203] [<ffffffff8136f619>] acpi_processor_remove
+0x8d/0xb2
[84154.213203] [<ffffffff8136cfff>] acpi_bus_trim+0x5a/0x8d
[84154.213203] [<ffffffff8136ec3b>] acpi_device_hotplug
+0x1a8/0x3ec
[84154.213203] [<ffffffff81369002>] acpi_hotplug_work_fn
+0x1f/0x2b
[84154.213203] [<ffffffff8108754b>] process_one_work+0x1eb/0x6b0
[84154.213203] [<ffffffff81087e9b>] worker_thread+0x11b/0x370
[84154.213203] [<ffffffff81090324>] kthread+0xe4/0x100
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #2 (cpu_hotplug.lock#2){+.+.+.}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff815c7700>] mutex_lock_nested+0x50/0x3c0
[84154.213203] [<ffffffff810665bf>] cpu_hotplug_begin+0x4f/0x80
[84154.213203] [<ffffffff8106666f>] _cpu_up+0x3f/0x160
[84154.213203] [<ffffffff810667f9>] cpu_up+0x69/0x80
[84154.213203] [<ffffffff81b18f14>] smp_init+0x60/0x8c
[84154.213203] [<ffffffff81b00fd8>] kernel_init_freeable
+0x126/0x23b
[84154.213203] [<ffffffff815b4a3e>] kernel_init+0xe/0xf0
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #1 (cpu_hotplug.lock){++++++}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff810665b1>] cpu_hotplug_begin+0x41/0x80
[84154.213203] [<ffffffff8106666f>] _cpu_up+0x3f/0x160
[84154.213203] [<ffffffff810667f9>] cpu_up+0x69/0x80
[84154.213203] [<ffffffff81b18f14>] smp_init+0x60/0x8c
[84154.213203] [<ffffffff81b00fd8>] kernel_init_freeable
+0x126/0x23b
[84154.213203] [<ffffffff815b4a3e>] kernel_init+0xe/0xf0
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #0 (cpu_add_remove_lock){+.+.+.}:
[84154.213203] [<ffffffff810c397a>] __lock_acquire+0x1f2a/0x1f60
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff815c7700>] mutex_lock_nested+0x50/0x3c0
[84154.213203] [<ffffffff810664a7>] cpu_maps_update_begin
+0x17/0x20
[84154.213203] [<ffffffff815b582d>] cpu_down+0x1d/0x50
[84154.213203] [<ffffffff813e63b4>] cpu_subsys_offline+0x14/0x20
[84154.213203] [<ffffffff813e145d>] device_offline+0xad/0xd0
[84154.213203] [<ffffffff813e1562>] online_store+0x42/0x80
[84154.213203] [<ffffffff813deab8>] dev_attr_store+0x18/0x30
[84154.213203] [<ffffffff81257bb9>] sysfs_kf_write+0x49/0x60
[84154.213203] [<ffffffff81256e39>] kernfs_fop_write+0x109/0x190
[84154.213203] [<ffffffff811d15be>] vfs_write+0xbe/0x1c0
[84154.213203] [<ffffffff811d1a52>] SyS_write+0x52/0xb0
[84154.213203] [<ffffffff815d3162>] tracesys+0xd0/0xd5
[84154.213203]
[84154.213203] other info that might help us debug this:
[84154.213203]
[84154.213203] Chain exists of:
cpu_add_remove_lock --> cpu_hotplug.lock#2 --> s_active#79

[84154.213203] Possible unsafe locking scenario:
[84154.213203]
[84154.213203] CPU0 CPU1
[84154.213203] ---- ----
[84154.213203] lock(s_active#79);
[84154.213203] lock(cpu_hotplug.lock#2);
[84154.213203] lock(s_active#79);
[84154.213203] lock(cpu_add_remove_lock);
[84154.213203]
[84154.213203] *** DEADLOCK ***
.............

The deadlock itself seems already fixed in commit 5e33bc41.

As Tejun suggested, to avoid this lockdep warning,
kernfs_break_active_protection() is used before online/offline callbacks
to take the s_active lock out of the dependency chain during
online/offline operations.

Signed-off-by: Li Zhong <[email protected]>
---
drivers/base/core.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..2b9f68e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
{
bool val;
int ret;
+ struct kernfs_node *kn;

ret = strtobool(buf, &val);
if (ret < 0)
@@ -448,7 +449,15 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
if (ret)
return ret;

+ kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+ if (WARN_ON_ONCE(!kn))
+ goto out;
+
+ kernfs_break_active_protection(kn);
+
ret = val ? device_online(dev) : device_offline(dev);
+ kernfs_unbreak_active_protection(kn);
+out:
unlock_device_hotplug();
return ret < 0 ? ret : count;
}

2014-04-11 10:27:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v2] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Fri, Apr 11, 2014 at 12:10:45PM +0800, Li Zhong wrote:
> @@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
> {
> bool val;
> int ret;
> + struct kernfs_node *kn;
>
> ret = strtobool(buf, &val);
> if (ret < 0)
> @@ -448,7 +449,15 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
> if (ret)
> return ret;
>
> + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);

Wouldn't find_and_get need to be paired with put?

> + if (WARN_ON_ONCE(!kn))
> + goto out;
> +
> + kernfs_break_active_protection(kn);
> +
> ret = val ? device_online(dev) : device_offline(dev);

With active protection protection @dev may go away at any time. There
should be some protection / synchronization to prevent that, no?

> + kernfs_unbreak_active_protection(kn);
> +out:
> unlock_device_hotplug();
> return ret < 0 ? ret : count;
> }

Thanks.

--
tejun

2014-04-14 07:47:40

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

I noticed following lockdep warning when trying acpi hot-remove cpus:

[84154.204080] ======================================================
[84154.204080] [ INFO: possible circular locking dependency detected ]
[84154.204080] 3.14.0-next-20140408+ #24 Tainted: G W
[84154.204080] -------------------------------------------------------
[84154.204080] bash/777 is trying to acquire lock:
[84154.204080] (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810664a7>]
cpu_maps_update_begin+0x17/0x20
[84154.213203]
[84154.213203] but task is already holding lock:
[84154.213203] (s_active#79){++++.+}, at: [<ffffffff81256e14>]
kernfs_fop_write+0xe4/0x190
[84154.213203]
[84154.213203] which lock already depends on the new lock.
[84154.213203]
[84154.213203]
[84154.213203] the existing dependency chain (in reverse order) is:
[84154.213203]
-> #3 (s_active#79){++++.+}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff812552e0>] __kernfs_remove+0x250/0x310
[84154.213203] [<ffffffff81256470>] kernfs_remove_by_name_ns
+0x50/0xc0
[84154.213203] [<ffffffff81257af5>] sysfs_remove_file_ns
+0x15/0x20
[84154.213203] [<ffffffff813df339>] device_remove_file+0x19/0x20
[84154.213203] [<ffffffff813dff33>] device_remove_attrs+0x33/0x80
[84154.213203] [<ffffffff813e00a7>] device_del+0x127/0x1c0
[84154.213203] [<ffffffff813e0162>] device_unregister+0x22/0x60
[84154.213203] [<ffffffff813e66de>] unregister_cpu+0x1e/0x40
[84154.213203] [<ffffffff81009a33>] arch_unregister_cpu+0x23/0x30
[84154.213203] [<ffffffff8136f619>] acpi_processor_remove
+0x8d/0xb2
[84154.213203] [<ffffffff8136cfff>] acpi_bus_trim+0x5a/0x8d
[84154.213203] [<ffffffff8136ec3b>] acpi_device_hotplug
+0x1a8/0x3ec
[84154.213203] [<ffffffff81369002>] acpi_hotplug_work_fn
+0x1f/0x2b
[84154.213203] [<ffffffff8108754b>] process_one_work+0x1eb/0x6b0
[84154.213203] [<ffffffff81087e9b>] worker_thread+0x11b/0x370
[84154.213203] [<ffffffff81090324>] kthread+0xe4/0x100
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #2 (cpu_hotplug.lock#2){+.+.+.}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff815c7700>] mutex_lock_nested+0x50/0x3c0
[84154.213203] [<ffffffff810665bf>] cpu_hotplug_begin+0x4f/0x80
[84154.213203] [<ffffffff8106666f>] _cpu_up+0x3f/0x160
[84154.213203] [<ffffffff810667f9>] cpu_up+0x69/0x80
[84154.213203] [<ffffffff81b18f14>] smp_init+0x60/0x8c
[84154.213203] [<ffffffff81b00fd8>] kernel_init_freeable
+0x126/0x23b
[84154.213203] [<ffffffff815b4a3e>] kernel_init+0xe/0xf0
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #1 (cpu_hotplug.lock){++++++}:
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff810665b1>] cpu_hotplug_begin+0x41/0x80
[84154.213203] [<ffffffff8106666f>] _cpu_up+0x3f/0x160
[84154.213203] [<ffffffff810667f9>] cpu_up+0x69/0x80
[84154.213203] [<ffffffff81b18f14>] smp_init+0x60/0x8c
[84154.213203] [<ffffffff81b00fd8>] kernel_init_freeable
+0x126/0x23b
[84154.213203] [<ffffffff815b4a3e>] kernel_init+0xe/0xf0
[84154.213203] [<ffffffff815d2f2c>] ret_from_fork+0x7c/0xb0
[84154.213203]
-> #0 (cpu_add_remove_lock){+.+.+.}:
[84154.213203] [<ffffffff810c397a>] __lock_acquire+0x1f2a/0x1f60
[84154.213203] [<ffffffff810c408b>] lock_acquire+0x9b/0x1d0
[84154.213203] [<ffffffff815c7700>] mutex_lock_nested+0x50/0x3c0
[84154.213203] [<ffffffff810664a7>] cpu_maps_update_begin
+0x17/0x20
[84154.213203] [<ffffffff815b582d>] cpu_down+0x1d/0x50
[84154.213203] [<ffffffff813e63b4>] cpu_subsys_offline+0x14/0x20
[84154.213203] [<ffffffff813e145d>] device_offline+0xad/0xd0
[84154.213203] [<ffffffff813e1562>] online_store+0x42/0x80
[84154.213203] [<ffffffff813deab8>] dev_attr_store+0x18/0x30
[84154.213203] [<ffffffff81257bb9>] sysfs_kf_write+0x49/0x60
[84154.213203] [<ffffffff81256e39>] kernfs_fop_write+0x109/0x190
[84154.213203] [<ffffffff811d15be>] vfs_write+0xbe/0x1c0
[84154.213203] [<ffffffff811d1a52>] SyS_write+0x52/0xb0
[84154.213203] [<ffffffff815d3162>] tracesys+0xd0/0xd5
[84154.213203]
[84154.213203] other info that might help us debug this:
[84154.213203]
[84154.213203] Chain exists of:
cpu_add_remove_lock --> cpu_hotplug.lock#2 --> s_active#79

[84154.213203] Possible unsafe locking scenario:
[84154.213203]
[84154.213203] CPU0 CPU1
[84154.213203] ---- ----
[84154.213203] lock(s_active#79);
[84154.213203] lock(cpu_hotplug.lock#2);
[84154.213203] lock(s_active#79);
[84154.213203] lock(cpu_add_remove_lock);
[84154.213203]
[84154.213203] *** DEADLOCK ***
.............

The deadlock itself seems already fixed in commit 5e33bc41.

As Tejun suggested, to avoid this lockdep warning,
kernfs_break_active_protection() is used before online/offline callbacks
to take the s_active lock out of the dependency chain during
online/offline operations.

Signed-off-by: Li Zhong <[email protected]>
---
drivers/base/core.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..b313ad7 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
{
bool val;
int ret;
+ struct kernfs_node *kn;

ret = strtobool(buf, &val);
if (ret < 0)
@@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
if (ret)
return ret;

+ kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+ if (WARN_ON_ONCE(!kn))
+ goto out;
+
+ get_device(dev);
+ kernfs_break_active_protection(kn);
ret = val ? device_online(dev) : device_offline(dev);
+ kernfs_unbreak_active_protection(kn);
+ put_device(dev);
+
+ kernfs_put(kn);
+
+out:
unlock_device_hotplug();
return ret < 0 ? ret : count;
}


2014-04-14 20:13:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Mon, Apr 14, 2014 at 03:47:29PM +0800, Li Zhong wrote:
> @@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
> {
> bool val;
> int ret;
> + struct kernfs_node *kn;
>
> ret = strtobool(buf, &val);
> if (ret < 0)
> @@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
> if (ret)
> return ret;
>
> + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> + if (WARN_ON_ONCE(!kn))
> + goto out;
> +
> + get_device(dev);
> + kernfs_break_active_protection(kn);
> ret = val ? device_online(dev) : device_offline(dev);
> + kernfs_unbreak_active_protection(kn);
> + put_device(dev);
> +
> + kernfs_put(kn);
> +
> +out:
> unlock_device_hotplug();
> return ret < 0 ? ret : count;
> }

Can you please add comment explainin why this is being down? As it
currently stands, it's quite a bit of complexity without any
indication what's going on why. Also, if device_hotplug is locked, is
it really necessary to get @dev? Can it go away inbetween? The code
snippet looks weird because getting @dev indicates that the device
might go away without doing it but then it proceeds to invoke
device_{on|off}line() which probably isn't safe to invoke on a removed
device.

Thanks.

--
tejun

2014-04-15 02:44:48

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

On Mon, 2014-04-14 at 16:13 -0400, Tejun Heo wrote:
> Hello,
>
> On Mon, Apr 14, 2014 at 03:47:29PM +0800, Li Zhong wrote:
> > @@ -439,6 +439,7 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
> > {
> > bool val;
> > int ret;
> > + struct kernfs_node *kn;
> >
> > ret = strtobool(buf, &val);
> > if (ret < 0)
> > @@ -448,7 +449,19 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
> > if (ret)
> > return ret;
> >
> > + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > + if (WARN_ON_ONCE(!kn))
> > + goto out;
> > +
> > + get_device(dev);
> > + kernfs_break_active_protection(kn);
> > ret = val ? device_online(dev) : device_offline(dev);
> > + kernfs_unbreak_active_protection(kn);
> > + put_device(dev);
> > +
> > + kernfs_put(kn);
> > +
> > +out:
> > unlock_device_hotplug();
> > return ret < 0 ? ret : count;
> > }
>
> Can you please add comment explainin why this is being down? As it
> currently stands, it's quite a bit of complexity without any
> indication what's going on why. Also, if device_hotplug is locked, is
> it really necessary to get @dev? Can it go away inbetween? The code
> snippet looks weird because getting @dev indicates that the device
> might go away without doing it but then it proceeds to invoke
> device_{on|off}line() which probably isn't safe to invoke on a removed
> device.

Hi Tejun,

I tried to write some draft words here... (I'm not good at writing
it...) Could you please help to have a review and comment? thanks.

/ *
* This process might deadlock with another process trying to
* remove this device:
* This process holding the s_active of "online" attribute, and tries
* to online/offline the device with some locks protecting hotplug.
* Device removing process holding some locks protecting hotplug, and
* tries to remove the "online" attribute, waiting for the s_active to
* be released.
*
* The deadlock described above should be solved with
* lock_device_hotplug_sysfs(). We temporarily drop the active
* protection here to avoid some lockdep warnings.
*
* If device_hotplug_lock is forgotten to be used when removing
* device(possibly some very simple device even don't need this lock?),
* @dev could go away any time after dropping the active protection.
* So increase its ref count before dropping active protection.
* Though invoking device_{on|off}line() on a removed device seems
* unreasonable, it should be less disastrous than playing with freed
* @dev. Also, we might be able to have some mechanism abort
* device_{on|off}line() if @dev already removed.
*/

Thanks, Zhong
>
> Thanks.
>

2014-04-15 14:50:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Tue, Apr 15, 2014 at 10:44:37AM +0800, Li Zhong wrote:
> / *
> * This process might deadlock with another process trying to
> * remove this device:
> * This process holding the s_active of "online" attribute, and tries
> * to online/offline the device with some locks protecting hotplug.
> * Device removing process holding some locks protecting hotplug, and
> * tries to remove the "online" attribute, waiting for the s_active to
> * be released.
> *
> * The deadlock described above should be solved with
> * lock_device_hotplug_sysfs(). We temporarily drop the active
> * protection here to avoid some lockdep warnings.
> *
> * If device_hotplug_lock is forgotten to be used when removing
> * device(possibly some very simple device even don't need this lock?),
> * @dev could go away any time after dropping the active protection.
> * So increase its ref count before dropping active protection.
> * Though invoking device_{on|off}line() on a removed device seems
> * unreasonable, it should be less disastrous than playing with freed
> * @dev. Also, we might be able to have some mechanism abort
> * device_{on|off}line() if @dev already removed.
> */

Hmmm... I'm not sure I fully understand the problem. Does the code
ever try to remove "online" while holding cpu_add_remove_lock and,
when written 0, online knob grabs cpu_add_remove_lock? If so, that is
an actually possible deadlock, no?

Thanks.

--
tejun

2014-04-16 01:41:51

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

On Tue, 2014-04-15 at 10:50 -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 15, 2014 at 10:44:37AM +0800, Li Zhong wrote:
> > / *
> > * This process might deadlock with another process trying to
> > * remove this device:
> > * This process holding the s_active of "online" attribute, and tries
> > * to online/offline the device with some locks protecting hotplug.
> > * Device removing process holding some locks protecting hotplug, and
> > * tries to remove the "online" attribute, waiting for the s_active to
> > * be released.
> > *
> > * The deadlock described above should be solved with
> > * lock_device_hotplug_sysfs(). We temporarily drop the active
> > * protection here to avoid some lockdep warnings.
> > *
> > * If device_hotplug_lock is forgotten to be used when removing
> > * device(possibly some very simple device even don't need this lock?),
> > * @dev could go away any time after dropping the active protection.
> > * So increase its ref count before dropping active protection.
> > * Though invoking device_{on|off}line() on a removed device seems
> > * unreasonable, it should be less disastrous than playing with freed
> > * @dev. Also, we might be able to have some mechanism abort
> > * device_{on|off}line() if @dev already removed.
> > */
>
> Hmmm... I'm not sure I fully understand the problem. Does the code
> ever try to remove "online" while holding cpu_add_remove_lock and,
> when written 0, online knob grabs cpu_add_remove_lock?

Yes.

In acpi_processor_remove(), cpu_maps_update_begin() is called to hold
cpu_add_remove_lock, and then arch_unregister_cpu calls
unregister_cpu(), which will try to remove dir cpu1 including "online".

while written 0 to online, cpu_down() will also try to grab
cpu_add_remove_lock with cpu_maps_update_begin().

> If so, that is
> an actually possible deadlock, no?

Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
lock_device_hotplug_sysfs() to return a restart syscall error if not
able to try lock the device_hotplug_lock. That also requires the device
removing code path to take the device_hotplug_lock.

Thanks, Zhong

>
> Thanks.
>

2014-04-16 15:17:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Wed, Apr 16, 2014 at 09:41:40AM +0800, Li Zhong wrote:
> > If so, that is
> > an actually possible deadlock, no?
>
> Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
> lock_device_hotplug_sysfs() to return a restart syscall error if not
> able to try lock the device_hotplug_lock. That also requires the device
> removing code path to take the device_hotplug_lock.

But that patch only takes out device_hotplug_lock out of the
dependency graph and does nothing for cpu_add_remove_lock. It seems
to be that there still is a deadlock condition involving s_active and
cpu_add_remove_lock. Am I missing something here?

Now that kernfs has a proper mechanism to deal with it, wouldn't it
make more sense to replace 5e33bc41 with prper s_active protection
breaking?

Thanks.

--
tejun

2014-04-17 03:06:04

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

On Wed, 2014-04-16 at 11:17 -0400, Tejun Heo wrote:
> Hello,
>
> On Wed, Apr 16, 2014 at 09:41:40AM +0800, Li Zhong wrote:
> > > If so, that is
> > > an actually possible deadlock, no?
> >
> > Yes, but it seems to me that it is solved in commit 5e33bc41, which uses
> > lock_device_hotplug_sysfs() to return a restart syscall error if not
> > able to try lock the device_hotplug_lock. That also requires the device
> > removing code path to take the device_hotplug_lock.
>
> But that patch only takes out device_hotplug_lock out of the
> dependency graph and does nothing for cpu_add_remove_lock. It seems
> to be that there still is a deadlock condition involving s_active and
> cpu_add_remove_lock. Am I missing something here?

It seems to me cpu_add_remove_lock is always taken after
device_hotplug_lock.

So if cpu_add_remove_lock has been acquired by device removing process,
then it means the other online/offline process couldn't successfully try
lock device_hotplug_lock, and will release s_active with a restart
syscall error;

if cpu_add_remove_lock has been acquired by online/offline process, then
it should already hold device_hotlug_lock, and keeps the device removing
process waiting at device_hotplug_lock. So online/offline process could
release the lock, and finally release s_active soon.

But after some further thinking, I seem to understand your point.
s_active has lock order problem with the other series of hotplug related
locks, so it's better to take s_active out of the dependency chain,
rather than the first of the other series of locks? like you suggested
below.

>
> Now that kernfs has a proper mechanism to deal with it, wouldn't it
> make more sense to replace 5e33bc41 with prper s_active protection
> breaking?

I'll try this way and send you the code for review.

Thanks,
Zhong

>
> Thanks.
>

2014-04-17 06:50:58

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism
to break active protection.

The problem here is the order of s_active, and series of hotplug related
lock.

This patch takes s_active out of the lock dependency graph, so it won't
dead lock with any hotplug realted locks.

Signed-off-by: Li Zhong <[email protected]>
---
drivers/base/core.c | 37 ++++++++++++++++++++++++++++++++++---
drivers/base/memory.c | 18 +++++++++++++++---
2 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..1b96cb9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -439,17 +439,48 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
{
bool val;
int ret;
+ struct kernfs_node *kn;

ret = strtobool(buf, &val);
if (ret < 0)
return ret;

- ret = lock_device_hotplug_sysfs();
- if (ret)
- return ret;
+ kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+ if (WARN_ON_ONCE(!kn))
+ return -ENODEV;
+
+ /*
+ * Break active protection here to avoid deadlocks with device
+ * removing process, which tries to remove sysfs entries including this
+ * "online" attribute while holding some hotplug related locks.
+ *
+ * @dev needs to be protected here, or it could go away any time after
+ * dropping active protection. But it is still unreasonable/unsafe to
+ * online/offline a device after it being removed. Fortunately, there
+ * are some checks in online/offline knobs. Like cpu, it checks cpu
+ * present/online mask before doing the real work.
+ */
+
+ get_device(dev);
+ kernfs_break_active_protection(kn);
+
+ lock_device_hotplug();
+
+ /*
+ * If we assume device_hotplug_lock must be acquired before removing
+ * device, we may try to find a way to check whether the device has
+ * been removed here, so we don't call device_{on|off}line against
+ * removed device.
+ */

ret = val ? device_online(dev) : device_offline(dev);
unlock_device_hotplug();
+
+ kernfs_unbreak_active_protection(kn);
+ put_device(dev);
+
+ kernfs_put(kn);
+
return ret < 0 ? ret : count;
}
static DEVICE_ATTR_RW(online);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..0d2f3a5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
{
struct memory_block *mem = to_memory_block(dev);
int ret, online_type;
+ struct kernfs_node *kn;

- ret = lock_device_hotplug_sysfs();
- if (ret)
- return ret;
+ kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+ if (WARN_ON_ONCE(!kn))
+ return -ENODEV;
+
+ /* refer to comments in online_store() for more information */
+ get_device(dev);
+ kernfs_break_active_protection(kn);
+
+ lock_device_hotplug();

if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
online_type = ONLINE_KERNEL;
@@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
err:
unlock_device_hotplug();

+ kernfs_unbreak_active_protection(kn);
+ put_device(dev);
+
+ kernfs_put(kn);
+
if (ret)
return ret;
return count;

2014-04-17 15:12:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v3] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Thu, Apr 17, 2014 at 11:05:53AM +0800, Li Zhong wrote:
> It seems to me cpu_add_remove_lock is always taken after
> device_hotplug_lock.
>
> So if cpu_add_remove_lock has been acquired by device removing process,
> then it means the other online/offline process couldn't successfully try
> lock device_hotplug_lock, and will release s_active with a restart
> syscall error;
>
> if cpu_add_remove_lock has been acquired by online/offline process, then
> it should already hold device_hotlug_lock, and keeps the device removing
> process waiting at device_hotplug_lock. So online/offline process could
> release the lock, and finally release s_active soon.

I see. That's kinda nasty tho and lockdep of course doesn't know
about it and generates spurious warnings.

> But after some further thinking, I seem to understand your point.
> s_active has lock order problem with the other series of hotplug related
> locks, so it's better to take s_active out of the dependency chain,
> rather than the first of the other series of locks? like you suggested
> below.

Yeah, I think that'd be the right thing to do and we can revert the
previous convolution.

Thanks.

--
tejun

2014-04-17 15:17:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Thu, Apr 17, 2014 at 02:50:44PM +0800, Li Zhong wrote:
> This patch tries to solve the device hot remove locking issues in a
> different way from commit 5e33bc41, as kernfs already has a mechanism
> to break active protection.
>
> The problem here is the order of s_active, and series of hotplug related
> lock.

It prolly deservse more detailed explanation of the deadlock along
with how 5e33bc41 ("$SUBJ") tried to solve it. The active protetion
is there to keep the file alive by blocking deletion while operations
are on-going in the file. This blocking creates a dependency loop
when an operation running off a sysfs knob ends up grabbing a lock
which may be held while removing the said sysfs knob.

> + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> + if (WARN_ON_ONCE(!kn))
> + return -ENODEV;
> +
> + /*
> + * Break active protection here to avoid deadlocks with device
> + * removing process, which tries to remove sysfs entries including this
> + * "online" attribute while holding some hotplug related locks.
> + *
> + * @dev needs to be protected here, or it could go away any time after
> + * dropping active protection. But it is still unreasonable/unsafe to
> + * online/offline a device after it being removed. Fortunately, there

I think this is something driver layer proper should provide
synchronization for. It shouldn't be difficult to synchronize this
function against device_del(), right? And, please note that @dev is
guaranteed to have not been removed (at least hasn't gone through attr
removal) upto this point.

> + * are some checks in online/offline knobs. Like cpu, it checks cpu
> + * present/online mask before doing the real work.
> + */
> +
> + get_device(dev);
> + kernfs_break_active_protection(kn);
> +
> + lock_device_hotplug();
> +
> + /*
> + * If we assume device_hotplug_lock must be acquired before removing
> + * device, we may try to find a way to check whether the device has
> + * been removed here, so we don't call device_{on|off}line against
> + * removed device.
> + */

Yeah, let's please fix this.

> ret = val ? device_online(dev) : device_offline(dev);
> unlock_device_hotplug();
> +
> + kernfs_unbreak_active_protection(kn);
> + put_device(dev);
> +
> + kernfs_put(kn);
> +
> return ret < 0 ? ret : count;
> }
> static DEVICE_ATTR_RW(online);
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index bece691..0d2f3a5 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
> {
> struct memory_block *mem = to_memory_block(dev);
> int ret, online_type;
> + struct kernfs_node *kn;
>
> - ret = lock_device_hotplug_sysfs();
> - if (ret)
> - return ret;
> + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> + if (WARN_ON_ONCE(!kn))
> + return -ENODEV;
> +
> + /* refer to comments in online_store() for more information */
> + get_device(dev);
> + kernfs_break_active_protection(kn);
> +
> + lock_device_hotplug();
>
> if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
> online_type = ONLINE_KERNEL;
> @@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
> err:
> unlock_device_hotplug();
>
> + kernfs_unbreak_active_protection(kn);
> + put_device(dev);
> +
> + kernfs_put(kn);

There are other users of lock_device_hotplug_sysfs(). We probably
want to audit them and convert them too, preferably with helper
routines so that they don't end up duplicating the complexity?

Thanks.

--
tejun

2014-04-18 08:34:24

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks

On Thu, 2014-04-17 at 11:17 -0400, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 17, 2014 at 02:50:44PM +0800, Li Zhong wrote:
> > This patch tries to solve the device hot remove locking issues in a
> > different way from commit 5e33bc41, as kernfs already has a mechanism
> > to break active protection.
> >
> > The problem here is the order of s_active, and series of hotplug related
> > lock.
>
> It prolly deservse more detailed explanation of the deadlock along
> with how 5e33bc41 ("$SUBJ") tried to solve it. The active protetion
> is there to keep the file alive by blocking deletion while operations
> are on-going in the file. This blocking creates a dependency loop
> when an operation running off a sysfs knob ends up grabbing a lock
> which may be held while removing the said sysfs knob.

OK, I'll try to add these and something more in next version.

>
> > + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > + if (WARN_ON_ONCE(!kn))
> > + return -ENODEV;
> > +
> > + /*
> > + * Break active protection here to avoid deadlocks with device
> > + * removing process, which tries to remove sysfs entries including this
> > + * "online" attribute while holding some hotplug related locks.
> > + *
> > + * @dev needs to be protected here, or it could go away any time after
> > + * dropping active protection. But it is still unreasonable/unsafe to
> > + * online/offline a device after it being removed. Fortunately, there
>
> I think this is something driver layer proper should provide
> synchronization for. It shouldn't be difficult to synchronize this
> function against device_del(), right? And, please note that @dev is
> guaranteed to have not been removed (at least hasn't gone through attr
> removal) upto this point.

Ok, I think what we need here is the check below, after getting
device_hotplug_lock, and abort this function if device already removed.
We should allow device_del() to remove the device in the other process,
which is why we are breaking the active protection.

>
> > + * are some checks in online/offline knobs. Like cpu, it checks cpu
> > + * present/online mask before doing the real work.
> > + */
> > +
> > + get_device(dev);
> > + kernfs_break_active_protection(kn);
> > +
> > + lock_device_hotplug();
> > +
> > + /*
> > + * If we assume device_hotplug_lock must be acquired before removing
> > + * device, we may try to find a way to check whether the device has
> > + * been removed here, so we don't call device_{on|off}line against
> > + * removed device.
> > + */
>
> Yeah, let's please fix this.

OK, I guess we can check whether dev->kobj.sd becomes NULL. If so, it
means the device has already been deleted by device_del().

>
> > ret = val ? device_online(dev) : device_offline(dev);
> > unlock_device_hotplug();
> > +
> > + kernfs_unbreak_active_protection(kn);
> > + put_device(dev);
> > +
> > + kernfs_put(kn);
> > +
> > return ret < 0 ? ret : count;
> > }
> > static DEVICE_ATTR_RW(online);
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index bece691..0d2f3a5 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
> > {
> > struct memory_block *mem = to_memory_block(dev);
> > int ret, online_type;
> > + struct kernfs_node *kn;
> >
> > - ret = lock_device_hotplug_sysfs();
> > - if (ret)
> > - return ret;
> > + kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > + if (WARN_ON_ONCE(!kn))
> > + return -ENODEV;
> > +
> > + /* refer to comments in online_store() for more information */
> > + get_device(dev);
> > + kernfs_break_active_protection(kn);
> > +
> > + lock_device_hotplug();
> >
> > if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
> > online_type = ONLINE_KERNEL;
> > @@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
> > err:
> > unlock_device_hotplug();
> >
> > + kernfs_unbreak_active_protection(kn);
> > + put_device(dev);
> > +
> > + kernfs_put(kn);
>
> There are other users of lock_device_hotplug_sysfs(). We probably
> want to audit them and convert them too, preferably with helper
> routines so that they don't end up duplicating the complexity?

I see, I guess I could keep lock_device_hotplug_sysfs(), just replace it
with the implementation here; and provide a new
unlock_device_hotplug_sysfs(), which will do the unlock, unbreak, and
puts.

I'll try to get the code ready sometime next week for your review.

Thanks, Zhong

>
> Thanks.
>

2014-04-21 09:21:12

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

While auditing the usage of lock_device_hotplug_sysfs() for implementing
it in another way in following patch, it seems to me that the code here
is to add/remove device, and the files probe/release for cpu bus
themselves won't be removed.

So it seems to me there is no s_active related deadlock here, and we
could just use lock_device_hotplug().

Signed-off-by: Li Zhong <[email protected]>
---
drivers/base/cpu.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc..9483225 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -88,9 +88,7 @@ static ssize_t cpu_probe_store(struct device *dev,
ssize_t cnt;
int ret;

- ret = lock_device_hotplug_sysfs();
- if (ret)
- return ret;
+ lock_device_hotplug();

cnt = arch_cpu_probe(buf, count);

@@ -106,9 +104,7 @@ static ssize_t cpu_release_store(struct device *dev,
ssize_t cnt;
int ret;

- ret = lock_device_hotplug_sysfs();
- if (ret)
- return ret;
+ lock_device_hotplug();

cnt = arch_cpu_release(buf, count);


2014-04-21 09:24:04

by Li Zhong

[permalink] [raw]
Subject: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

This patch tries to solve the device hot remove locking issues in a
different way from commit 5e33bc41, as kernfs already has a mechanism
to break active protection.

The active protection is there to keep the file alive by blocking
deletion while operations are on-going in the file. This blocking
creates a dependency loop when an operation running off a sysfs knob
ends up grabbing a lock which may be held while removing the said sysfs
knob.

So the problem here is the order of s_active, and the series of hotplug
related locks.

commit 5e33bc41 solves it by taking out the first of the series of
hoplug related locks, device_hotplug_lock, with a try lock. And if that
try lock fails, it returns a restart syscall error, and drops s_active
temporarily to allow the other process to remove the sysfs knob.

This doesn't help with lockdep warnings reported against s_active and
other hotplug related locks in the series.

This patch instead tries to take s_active out of the lock dependency
graph using the active protection breaking mechanism.

lock_device_hotplug_sysfs() function name is kept here, two more
arguments are added, dev, attr, to help find the kernfs_node
corresponding to the sysfs knob (which should always be able to be
found, WARN if not). The found kernfs_node is recorded as the return
value, to be released in unlock_device_hotplug_sysfs().

As we break the active protection here, the device removing process
might remove the sysfs entries after that point. So after we grab the
device_hotplug_lock, we need check whether that really happens. If so,
we should abort and not invoke the online/offline callbacks anymore. In
this case, NULL is returned to the caller, so it could return with
ENODEV.

Signed-off-by: Li Zhong <[email protected]>
---
drivers/base/core.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------
drivers/base/memory.c | 9 ++++----
include/linux/device.h | 5 +++-
3 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0dd6528..4d37a2b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -61,14 +61,55 @@ void unlock_device_hotplug(void)
mutex_unlock(&device_hotplug_lock);
}

-int lock_device_hotplug_sysfs(void)
+struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
+ struct device_attribute *attr)
{
- if (mutex_trylock(&device_hotplug_lock))
- return 0;
+ struct kernfs_node *kn;
+
+ kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
+
+ if (WARN_ON_ONCE(!kn))
+ return NULL;
+
+ /*
+ * Break active protection here to avoid deadlocks with device
+ * removing process, which tries to remove sysfs entries including this
+ * "online" attribute while holding some hotplug related locks.
+ *
+ * @dev needs to be protected here, or it could go away any time after
+ * dropping active protection.
+ */
+
+ get_device(dev);
+ kernfs_break_active_protection(kn);
+
+ lock_device_hotplug();
+
+ /*
+ * We assume device_hotplug_lock must be acquired before removing
+ * device, we can check here whether the device has been removed, so
+ * we don't call device_{on|off}line against removed device.
+ */

- /* Avoid busy looping (5 ms of sleep should do). */
- msleep(5);
- return restart_syscall();
+ if (!dev->kobj.sd) {
+ /* device_del() already called on @dev, we should also abort */
+ unlock_device_hotplug();
+ kernfs_unbreak_active_protection(kn);
+ put_device(dev);
+ kernfs_put(kn);
+ return NULL;
+ }
+
+ return kn;
+}
+
+void unlock_device_hotplug_sysfs(struct device *dev,
+ struct kernfs_node *kn)
+{
+ unlock_device_hotplug();
+ kernfs_unbreak_active_protection(kn);
+ put_device(dev);
+ kernfs_put(kn);
}

#ifdef CONFIG_BLOCK
@@ -439,17 +480,19 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr,
{
bool val;
int ret;
+ struct kernfs_node *kn;

ret = strtobool(buf, &val);
if (ret < 0)
return ret;

- ret = lock_device_hotplug_sysfs();
- if (ret)
- return ret;
+ kn = lock_device_hotplug_sysfs(dev, attr);
+ if (!kn)
+ return -ENODEV;

ret = val ? device_online(dev) : device_offline(dev);
- unlock_device_hotplug();
+ unlock_device_hotplug_sysfs(dev, kn);
+
return ret < 0 ? ret : count;
}
static DEVICE_ATTR_RW(online);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..c2b66d4 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -320,10 +320,11 @@ store_mem_state(struct device *dev,
{
struct memory_block *mem = to_memory_block(dev);
int ret, online_type;
+ struct kernfs_node *kn;

- ret = lock_device_hotplug_sysfs();
- if (ret)
- return ret;
+ kn = lock_device_hotplug_sysfs(dev, attr);
+ if (!kn)
+ return -ENODEV;

if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
online_type = ONLINE_KERNEL;
@@ -360,7 +361,7 @@ store_mem_state(struct device *dev,
}

err:
- unlock_device_hotplug();
+ unlock_device_hotplug_sysfs(dev, kn);

if (ret)
return ret;
diff --git a/include/linux/device.h b/include/linux/device.h
index 233bbbe..1ffd5ea 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -923,7 +923,10 @@ static inline bool device_supports_offline(struct device *dev)

extern void lock_device_hotplug(void);
extern void unlock_device_hotplug(void);
-extern int lock_device_hotplug_sysfs(void);
+extern struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
+ struct device_attribute *attr);
+extern void unlock_device_hotplug_sysfs(struct device *dev,
+ struct kernfs_node *kn);
extern int device_offline(struct device *dev);
extern int device_online(struct device *dev);
/*

2014-04-21 22:38:11

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

Hello,

On Mon, Apr 21, 2014 at 05:20:59PM +0800, Li Zhong wrote:
> While auditing the usage of lock_device_hotplug_sysfs() for implementing
> it in another way in following patch, it seems to me that the code here
> is to add/remove device, and the files probe/release for cpu bus
> themselves won't be removed.
>
> So it seems to me there is no s_active related deadlock here, and we
> could just use lock_device_hotplug().

It may still cause issue if offlining ends up removing sysfs files or
gets involved with the same lock used during cpu hot[un]plug
operations (e.g. offlining racing against cpu hotunplug) and offlining
a cpu does remove files. Has this change been tested?

Thanks.

--
tejun

2014-04-21 22:46:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:

Proper /** function comment would be nice.

> +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> + struct device_attribute *attr)

I can see why you did this but let's please not require the user of
this function to see how the thing is working internally. Let's
return int and keep track of (or look up again) the kernfs_node
internally.

> {
...
> + /*
> + * We assume device_hotplug_lock must be acquired before removing

Is this assumption true? If so, can we add lockdep assertions in
places to verify and enforce this? If not, aren't we just feeling
good when the reality is broken?

...

Function comment please.
> +void unlock_device_hotplug_sysfs(struct device *dev,
> + struct kernfs_node *kn)
> +{
> + unlock_device_hotplug();
> + kernfs_unbreak_active_protection(kn);
> + put_device(dev);
> + kernfs_put(kn);
> }

Thanks.

--
tejun

2014-04-22 02:29:49

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Mon, 2014-04-21 at 18:38 -0400, Tejun Heo wrote:
> Hello,
>
> On Mon, Apr 21, 2014 at 05:20:59PM +0800, Li Zhong wrote:
> > While auditing the usage of lock_device_hotplug_sysfs() for implementing
> > it in another way in following patch, it seems to me that the code here
> > is to add/remove device, and the files probe/release for cpu bus
> > themselves won't be removed.
> >
> > So it seems to me there is no s_active related deadlock here, and we
> > could just use lock_device_hotplug().
>
> It may still cause issue if offlining ends up removing sysfs files or
> gets involved with the same lock used during cpu hot[un]plug
> operations (e.g. offlining racing against cpu hotunplug) and offlining
> a cpu does remove files. Has this change been tested?

The probe/release files are attribute files for cpu subsys, looks like
following in sysfs dirs

$ cd /sys/devices/system/cpu/
$ ls -l
total 0
drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
......
drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
-rw-------. 1 root root 65536 Apr 21 00:28 dscr_default
-r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
-r--r--r--. 1 root root 65536 Apr 21 00:28 offline
-r--r--r--. 1 root root 65536 Sep 4 2014 online
-r--r--r--. 1 root root 65536 Apr 21 00:28 possible
drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
-r--r--r--. 1 root root 65536 Apr 17 20:46 present
--w-------. 1 root root 65536 Apr 21 00:28 probe <-----
--w-------. 1 root root 65536 Apr 21 00:28 release <-----
-rw-------. 1 root root 65536 Apr 21 00:28 subcores_per_core
-rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent

>From the code, it seems cpu subsys won't be unregistered, and it doesn't
make sense to remove all the cpus in the system.

Thanks, Zhong

>
> Thanks.
>

2014-04-22 03:34:53

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> Hello,
>
> On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
>
> Proper /** function comment would be nice.

Ok, will try to write some in next version.

>
> > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > + struct device_attribute *attr)
>
> I can see why you did this but let's please not require the user of
> this function to see how the thing is working internally. Let's
> return int and keep track of (or look up again) the kernfs_node
> internally.

Ok, it also makes the prototype of lock and unlock look more consistent
and comfortable.

>
> > {
> ...
> > + /*
> > + * We assume device_hotplug_lock must be acquired before removing
>
> Is this assumption true? If so, can we add lockdep assertions in
> places to verify and enforce this? If not, aren't we just feeling
> good when the reality is broken?

It seems not true ... I think there are devices that don't have the
online/offline concept, we just need to add it, remove it, like ethernet
cards.

Maybe we could change the comments above, like:
/* We assume device_hotplug_lock must be acquired before
* removing devices, which have online/offline sysfs knob,
* and some locks are needed to serialize the online/offline
* callbacks and device removing. ...
?

And we could add lockdep assertions in cpu and memory related code? e.g.
remove_memory(), unregister_cpu()

Currently, remove_memory() has comments for the function:

* NOTE: The caller must call lock_device_hotplug() to serialize hotplug
* and online/offline operations before this call, as required by
* try_offline_node().
*/

maybe it could be removed with the lockdep assertion.

> ...
>
> Function comment please.

OK.

Thanks, Zhong

> > +void unlock_device_hotplug_sysfs(struct device *dev,
> > + struct kernfs_node *kn)
> > +{
> > + unlock_device_hotplug();
> > + kernfs_unbreak_active_protection(kn);
> > + put_device(dev);
> > + kernfs_put(kn);
> > }
>
> Thanks.
>

2014-04-22 09:55:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
> On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > Hello,
> >
> > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> >
> > Proper /** function comment would be nice.
>
> Ok, will try to write some in next version.
>
> >
> > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > + struct device_attribute *attr)
> >
> > I can see why you did this but let's please not require the user of
> > this function to see how the thing is working internally. Let's
> > return int and keep track of (or look up again) the kernfs_node
> > internally.
>
> Ok, it also makes the prototype of lock and unlock look more consistent
> and comfortable.
>
> >
> > > {
> > ...
> > > + /*
> > > + * We assume device_hotplug_lock must be acquired before removing
> >
> > Is this assumption true? If so, can we add lockdep assertions in
> > places to verify and enforce this? If not, aren't we just feeling
> > good when the reality is broken?
>
> It seems not true ... I think there are devices that don't have the
> online/offline concept, we just need to add it, remove it, like ethernet
> cards.

Well, I haven't been following this closely (I was travelling, sorry), but
there certainly are devices without online/offline. That currently is only
present for CPUs, memory blocks and ACPI containers (if I remember correctly).

>
> Maybe we could change the comments above, like:
> /* We assume device_hotplug_lock must be acquired before
> * removing devices, which have online/offline sysfs knob,
> * and some locks are needed to serialize the online/offline
> * callbacks and device removing. ...
> ?

Lockdep assertions would be better than this in my opinion.

>
> And we could add lockdep assertions in cpu and memory related code? e.g.
> remove_memory(), unregister_cpu()
>
> Currently, remove_memory() has comments for the function:
>
> * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> * and online/offline operations before this call, as required by
> * try_offline_node().
> */
>
> maybe it could be removed with the lockdep assertion.

No, please don't remove it. It is there to explain where the locking requirement
comes from.

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-04-22 20:40:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

Hello,

On Tue, Apr 22, 2014 at 10:29:37AM +0800, Li Zhong wrote:
> The probe/release files are attribute files for cpu subsys, looks like
> following in sysfs dirs
>
> $ cd /sys/devices/system/cpu/
> $ ls -l
> total 0
> drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
> drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
> drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
> ......
> drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
> drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
> -rw-------. 1 root root 65536 Apr 21 00:28 dscr_default
> -r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
> -r--r--r--. 1 root root 65536 Apr 21 00:28 offline
> -r--r--r--. 1 root root 65536 Sep 4 2014 online
> -r--r--r--. 1 root root 65536 Apr 21 00:28 possible
> drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
> -r--r--r--. 1 root root 65536 Apr 17 20:46 present
> --w-------. 1 root root 65536 Apr 21 00:28 probe <-----
> --w-------. 1 root root 65536 Apr 21 00:28 release <-----
> -rw-------. 1 root root 65536 Apr 21 00:28 subcores_per_core
> -rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent
>
> From the code, it seems cpu subsys won't be unregistered, and it doesn't
> make sense to remove all the cpus in the system.

I don't think I'm following you. Are you saying that no files which
are protected under the hotplug lock that cpu subsys uses are removed
during offlining?

Thanks.

--
tejun

2014-04-22 20:45:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

Hello,

On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
> > Is this assumption true? If so, can we add lockdep assertions in
> > places to verify and enforce this? If not, aren't we just feeling
> > good when the reality is broken?
>
> It seems not true ... I think there are devices that don't have the
> online/offline concept, we just need to add it, remove it, like ethernet
> cards.
>
> Maybe we could change the comments above, like:
> /* We assume device_hotplug_lock must be acquired before
> * removing devices, which have online/offline sysfs knob,
> * and some locks are needed to serialize the online/offline
> * callbacks and device removing. ...
> ?
>
> And we could add lockdep assertions in cpu and memory related code? e.g.
> remove_memory(), unregister_cpu()
>
> Currently, remove_memory() has comments for the function:
>
> * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> * and online/offline operations before this call, as required by
> * try_offline_node().
> */
>
> maybe it could be removed with the lockdep assertion.

I'm confused about the overall locking scheme. What's the role of
device_hotplug_lock? Is that solely to prevent the sysfs deadlock
issue? Or does it serve other synchronization purposes depending on
the specific subsystem? If the former, the lock no longer needs to
exist. The only thing necessary would be synchronization between
device_del() deleting the sysfs file and the unbreak helper invoking
device-specific callback. If the latter, we probably should change
that. Sharing hotplug lock across multiple subsystems through driver
core sounds like a pretty bad idea.

Thanks.

--
tejun

2014-04-22 22:21:42

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On 4/22/2014 10:44 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
>>> Is this assumption true? If so, can we add lockdep assertions in
>>> places to verify and enforce this? If not, aren't we just feeling
>>> good when the reality is broken?
>> It seems not true ... I think there are devices that don't have the
>> online/offline concept, we just need to add it, remove it, like ethernet
>> cards.
>>
>> Maybe we could change the comments above, like:
>> /* We assume device_hotplug_lock must be acquired before
>> * removing devices, which have online/offline sysfs knob,
>> * and some locks are needed to serialize the online/offline
>> * callbacks and device removing. ...
>> ?
>>
>> And we could add lockdep assertions in cpu and memory related code? e.g.
>> remove_memory(), unregister_cpu()
>>
>> Currently, remove_memory() has comments for the function:
>>
>> * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
>> * and online/offline operations before this call, as required by
>> * try_offline_node().
>> */
>>
>> maybe it could be removed with the lockdep assertion.
> I'm confused about the overall locking scheme. What's the role of
> device_hotplug_lock? Is that solely to prevent the sysfs deadlock
> issue? Or does it serve other synchronization purposes depending on
> the specific subsystem? If the former, the lock no longer needs to
> exist. The only thing necessary would be synchronization between
> device_del() deleting the sysfs file and the unbreak helper invoking
> device-specific callback. If the latter, we probably should change
> that. Sharing hotplug lock across multiple subsystems through driver
> core sounds like a pretty bad idea.

Can you please elaborate a bit?

It is there to protect hotplug operations involving multiple devices (in
different subsystems) from racing with each other. Why exactly is it bad?

Rafael

2014-04-23 01:50:45

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
> On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
> > On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > > Hello,
> > >
> > > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> > >
> > > Proper /** function comment would be nice.
> >
> > Ok, will try to write some in next version.
> >
> > >
> > > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > > + struct device_attribute *attr)
> > >
> > > I can see why you did this but let's please not require the user of
> > > this function to see how the thing is working internally. Let's
> > > return int and keep track of (or look up again) the kernfs_node
> > > internally.
> >
> > Ok, it also makes the prototype of lock and unlock look more consistent
> > and comfortable.
> >
> > >
> > > > {
> > > ...
> > > > + /*
> > > > + * We assume device_hotplug_lock must be acquired before removing
> > >
> > > Is this assumption true? If so, can we add lockdep assertions in
> > > places to verify and enforce this? If not, aren't we just feeling
> > > good when the reality is broken?
> >
> > It seems not true ... I think there are devices that don't have the
> > online/offline concept, we just need to add it, remove it, like ethernet
> > cards.
>
> Well, I haven't been following this closely (I was travelling, sorry), but
> there certainly are devices without online/offline. That currently is only
> present for CPUs, memory blocks and ACPI containers (if I remember correctly).
>
> >
> > Maybe we could change the comments above, like:
> > /* We assume device_hotplug_lock must be acquired before
> > * removing devices, which have online/offline sysfs knob,
> > * and some locks are needed to serialize the online/offline
> > * callbacks and device removing. ...
> > ?
>
> Lockdep assertions would be better than this in my opinion.

This is talking about the lock required in the other process, the device
removing process, e.g. that in remove_memory() below. So I guess no
lockdep assertions needed here. Or I misunderstand your point?

> >
> > And we could add lockdep assertions in cpu and memory related code? e.g.
> > remove_memory(), unregister_cpu()
> >
> > Currently, remove_memory() has comments for the function:
> >
> > * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > * and online/offline operations before this call, as required by
> > * try_offline_node().
> > */
> >
> > maybe it could be removed with the lockdep assertion.
>
> No, please don't remove it. It is there to explain where the locking requirement
> comes from.

OK, I see. I think I'll just add lockdep assertions, and keep the
comments there.

Thanks, Zhong

>

2014-04-23 02:00:37

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Tue, 2014-04-22 at 16:40 -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 22, 2014 at 10:29:37AM +0800, Li Zhong wrote:
> > The probe/release files are attribute files for cpu subsys, looks like
> > following in sysfs dirs
> >
> > $ cd /sys/devices/system/cpu/
> > $ ls -l
> > total 0
> > drwxr-xr-x. 7 root root 0 Apr 17 19:00 cpu0
> > drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu1
> > drwxr-xr-x. 4 root root 0 Apr 17 19:00 cpu10
> > ......
> > drwxr-xr-x. 3 root root 0 Apr 20 08:00 cpufreq
> > drwxr-xr-x. 2 root root 0 Apr 20 08:00 cpuidle
> > -rw-------. 1 root root 65536 Apr 21 00:28 dscr_default
> > -r--r--r--. 1 root root 65536 Apr 21 00:28 kernel_max
> > -r--r--r--. 1 root root 65536 Apr 21 00:28 offline
> > -r--r--r--. 1 root root 65536 Sep 4 2014 online
> > -r--r--r--. 1 root root 65536 Apr 21 00:28 possible
> > drwxr-xr-x. 2 root root 0 Apr 20 08:00 power
> > -r--r--r--. 1 root root 65536 Apr 17 20:46 present
> > --w-------. 1 root root 65536 Apr 21 00:28 probe <-----
> > --w-------. 1 root root 65536 Apr 21 00:28 release <-----
> > -rw-------. 1 root root 65536 Apr 21 00:28 subcores_per_core
> > -rw-r--r--. 1 root root 65536 Apr 21 00:28 uevent
> >
> > From the code, it seems cpu subsys won't be unregistered, and it doesn't
> > make sense to remove all the cpus in the system.
>
> I don't think I'm following you. Are you saying that no files which
> are protected under the hotplug lock that cpu subsys uses are removed
> during offlining?

Sorry, Maybe I didn't say it clearly.

There are files under cpu#, e.g.

$ cd /sys/devices/system/cpu/cpu0
$ ls
cache dscr node1 pmc1 pmc5 smt_snooze_delay
uevent
cpuidle mmcr0 online pmc2 pmc6 spurr
crash_notes mmcr1 physical_id pmc3 power subsystem
crash_notes_size mmcra pir pmc4 purr topology

If you remove cpu0, then the cpu0 directory will be removed, together
with the "online" file in the directory, while some other process might
be writing 0 or 1 to it.

But here, for the probe/release, take "release" for example, if user
writes something that stands for cpu0 to it, the cpu0 will be removed,
and the cpu0 directory and the files under it will be removed. But
"release" itself is not removed.

They are attributes of cpu_subsys, not of some specific cpus.

Hopes the above makes things a bit clearer.

Thanks, Zhong

>
> Thanks.
>

2014-04-23 05:03:52

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
> > > Is this assumption true? If so, can we add lockdep assertions in
> > > places to verify and enforce this? If not, aren't we just feeling
> > > good when the reality is broken?
> >
> > It seems not true ... I think there are devices that don't have the
> > online/offline concept, we just need to add it, remove it, like ethernet
> > cards.
> >
> > Maybe we could change the comments above, like:
> > /* We assume device_hotplug_lock must be acquired before
> > * removing devices, which have online/offline sysfs knob,
> > * and some locks are needed to serialize the online/offline
> > * callbacks and device removing. ...
> > ?
> >
> > And we could add lockdep assertions in cpu and memory related code? e.g.
> > remove_memory(), unregister_cpu()
> >
> > Currently, remove_memory() has comments for the function:
> >
> > * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > * and online/offline operations before this call, as required by
> > * try_offline_node().
> > */
> >
> > maybe it could be removed with the lockdep assertion.
>
> I'm confused about the overall locking scheme. What's the role of
> device_hotplug_lock? Is that solely to prevent the sysfs deadlock
> issue? Or does it serve other synchronization purposes depending on
> the specific subsystem? If the former, the lock no longer needs to
> exist. The only thing necessary would be synchronization between
> device_del() deleting the sysfs file and the unbreak helper invoking
> device-specific callback. If the latter, we probably should change
> that. Sharing hotplug lock across multiple subsystems through driver
> core sounds like a pretty bad idea.

I think it's the latter. I think device_{on|off}line is better to be
done in some sort of lock which prevents the device from being removed,
including some preparation work that needs be done before device_del().

Thanks, Zhong

>
> Thanks.
>

2014-04-23 10:38:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Wednesday, April 23, 2014 09:50:32 AM Li Zhong wrote:
> On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
> > > On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > > > Hello,
> > > >
> > > > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> > > >
> > > > Proper /** function comment would be nice.
> > >
> > > Ok, will try to write some in next version.
> > >
> > > >
> > > > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > > > + struct device_attribute *attr)
> > > >
> > > > I can see why you did this but let's please not require the user of
> > > > this function to see how the thing is working internally. Let's
> > > > return int and keep track of (or look up again) the kernfs_node
> > > > internally.
> > >
> > > Ok, it also makes the prototype of lock and unlock look more consistent
> > > and comfortable.
> > >
> > > >
> > > > > {
> > > > ...
> > > > > + /*
> > > > > + * We assume device_hotplug_lock must be acquired before removing
> > > >
> > > > Is this assumption true? If so, can we add lockdep assertions in
> > > > places to verify and enforce this? If not, aren't we just feeling
> > > > good when the reality is broken?
> > >
> > > It seems not true ... I think there are devices that don't have the
> > > online/offline concept, we just need to add it, remove it, like ethernet
> > > cards.
> >
> > Well, I haven't been following this closely (I was travelling, sorry), but
> > there certainly are devices without online/offline. That currently is only
> > present for CPUs, memory blocks and ACPI containers (if I remember correctly).
> >
> > >
> > > Maybe we could change the comments above, like:
> > > /* We assume device_hotplug_lock must be acquired before
> > > * removing devices, which have online/offline sysfs knob,
> > > * and some locks are needed to serialize the online/offline
> > > * callbacks and device removing. ...
> > > ?
> >
> > Lockdep assertions would be better than this in my opinion.
>
> This is talking about the lock required in the other process, the device
> removing process, e.g. that in remove_memory() below. So I guess no
> lockdep assertions needed here. Or I misunderstand your point?

I mean if you assume certain lock to be held somewhere, it is better to use
lockdep annotations to express that assumption, because that will cause users
to *see* the problem when it happens.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-04-23 10:42:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Wednesday, April 23, 2014 01:03:42 PM Li Zhong wrote:
> On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
> > Hello,
> >
> > On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
> > > > Is this assumption true? If so, can we add lockdep assertions in
> > > > places to verify and enforce this? If not, aren't we just feeling
> > > > good when the reality is broken?
> > >
> > > It seems not true ... I think there are devices that don't have the
> > > online/offline concept, we just need to add it, remove it, like ethernet
> > > cards.
> > >
> > > Maybe we could change the comments above, like:
> > > /* We assume device_hotplug_lock must be acquired before
> > > * removing devices, which have online/offline sysfs knob,
> > > * and some locks are needed to serialize the online/offline
> > > * callbacks and device removing. ...
> > > ?
> > >
> > > And we could add lockdep assertions in cpu and memory related code? e.g.
> > > remove_memory(), unregister_cpu()
> > >
> > > Currently, remove_memory() has comments for the function:
> > >
> > > * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > > * and online/offline operations before this call, as required by
> > > * try_offline_node().
> > > */
> > >
> > > maybe it could be removed with the lockdep assertion.
> >
> > I'm confused about the overall locking scheme. What's the role of
> > device_hotplug_lock? Is that solely to prevent the sysfs deadlock
> > issue? Or does it serve other synchronization purposes depending on
> > the specific subsystem? If the former, the lock no longer needs to
> > exist. The only thing necessary would be synchronization between
> > device_del() deleting the sysfs file and the unbreak helper invoking
> > device-specific callback. If the latter, we probably should change
> > that. Sharing hotplug lock across multiple subsystems through driver
> > core sounds like a pretty bad idea.
>
> I think it's the latter.

Actually, no, this is not the case if I understand you correctly.

> I think device_{on|off}line is better to be
> done in some sort of lock which prevents the device from being removed,
> including some preparation work that needs be done before device_del().

Quite frankly, you should be confident that you understand the code you're
trying to modify or please don't touch it.

I'll have a deeper look at this issue later today or tomorrow and will get
back to you then.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-04-23 14:23:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

Hello, Rafael.

On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> Can you please elaborate a bit?

Because it can get involved in larger locking dependency issues by
joining dependency graphs of two otherwise largely disjoint
subsystems. It has potential to create possible deadlocks which don't
need to exist.

> It is there to protect hotplug operations involving multiple devices
> (in different subsystems) from racing with each other. Why exactly
> is it bad?

But why would different subsystems, say cpu and memory, use the same
lock? Wouldn't those subsystems already have proper locking inside
their own subsystems? Why add this additional global lock across
multiple subsystems?

Thanks.

--
tejun

2014-04-23 14:39:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Wed, Apr 23, 2014 at 10:00:26AM +0800, Li Zhong wrote:
> If you remove cpu0, then the cpu0 directory will be removed, together
> with the "online" file in the directory, while some other process might
> be writing 0 or 1 to it.
>
> But here, for the probe/release, take "release" for example, if user
> writes something that stands for cpu0 to it, the cpu0 will be removed,
> and the cpu0 directory and the files under it will be removed. But
> "release" itself is not removed.
>
> They are attributes of cpu_subsys, not of some specific cpus.

OIC, so the file itself which triggers removal doesn't get removed.
Hmmm... but can't you still fall into deadlock? If on/offline takes
the same lock under active protection which is also taken while
removing the cpu files, it doesn't really matter whether the release
file itself is removed in the process or not. You can still have ABBA
deadlock. What am I missing here?

Thanks.

--
tejun

2014-04-23 16:12:39

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On 4/23/2014 4:23 PM, Tejun Heo wrote:
> Hello, Rafael.

Hi,

> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
>> Can you please elaborate a bit?
> Because it can get involved in larger locking dependency issues by
> joining dependency graphs of two otherwise largely disjoint
> subsystems. It has potential to create possible deadlocks which don't
> need to exist.

Well, I do my best not to add unnecessary locks if that's what you mean.

>> It is there to protect hotplug operations involving multiple devices
>> (in different subsystems) from racing with each other. Why exactly
>> is it bad?
> But why would different subsystems, say cpu and memory, use the same
> lock? Wouldn't those subsystems already have proper locking inside
> their own subsystems?

That locking is not sufficient.

> Why add this additional global lock across multiple subsystems?

That basically is because of how eject works when it is triggered via ACPI.

It is signaled for a device at the top of a subtree. It may be a
container of some sort and the eject involves everything below that
device in the ACPI namespace. That may involve multiple subsystem
(CPUs, memory, PCI host bridge, etc.).

We do that in two steps, offline (which can fail) and eject proper
(which can't fail and makes all of the involved devices go away). All
that has to be done in one go with respect to the sysfs-triggered
offline/online and that's why the lock is there.

Thanks,
Rafael

2014-04-23 16:52:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

Hello, Rafael.

On Wed, Apr 23, 2014 at 06:12:01PM +0200, Rafael J. Wysocki wrote:
> >Why add this additional global lock across multiple subsystems?
>
> That basically is because of how eject works when it is triggered via ACPI.
>
> It is signaled for a device at the top of a subtree. It may be a
> container of some sort and the eject involves everything below that
> device in the ACPI namespace. That may involve multiple subsystem
> (CPUs, memory, PCI host bridge, etc.).
>
> We do that in two steps, offline (which can fail) and eject proper
> (which can't fail and makes all of the involved devices go away).
> All that has to be done in one go with respect to the
> sysfs-triggered offline/online and that's why the lock is there.

Ah, okay, so it's actually synchronzing across multiple subsystems. I
think it definitely calls for more documentation. The requirement is
pretty unusual after all.

Thanks!

--
tejun

2014-04-24 01:13:14

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Wed, 2014-04-23 at 12:54 +0200, Rafael J. Wysocki wrote:
> On Wednesday, April 23, 2014 09:50:32 AM Li Zhong wrote:
> > On Tue, 2014-04-22 at 12:11 +0200, Rafael J. Wysocki wrote:
> > > On Tuesday, April 22, 2014 11:34:39 AM Li Zhong wrote:
> > > > On Mon, 2014-04-21 at 18:46 -0400, Tejun Heo wrote:
> > > > > Hello,
> > > > >
> > > > > On Mon, Apr 21, 2014 at 05:23:50PM +0800, Li Zhong wrote:
> > > > >
> > > > > Proper /** function comment would be nice.
> > > >
> > > > Ok, will try to write some in next version.
> > > >
> > > > >
> > > > > > +struct kernfs_node *lock_device_hotplug_sysfs(struct device *dev,
> > > > > > + struct device_attribute *attr)
> > > > >
> > > > > I can see why you did this but let's please not require the user of
> > > > > this function to see how the thing is working internally. Let's
> > > > > return int and keep track of (or look up again) the kernfs_node
> > > > > internally.
> > > >
> > > > Ok, it also makes the prototype of lock and unlock look more consistent
> > > > and comfortable.
> > > >
> > > > >
> > > > > > {
> > > > > ...
> > > > > > + /*
> > > > > > + * We assume device_hotplug_lock must be acquired before removing
> > > > >
> > > > > Is this assumption true? If so, can we add lockdep assertions in
> > > > > places to verify and enforce this? If not, aren't we just feeling
> > > > > good when the reality is broken?
> > > >
> > > > It seems not true ... I think there are devices that don't have the
> > > > online/offline concept, we just need to add it, remove it, like ethernet
> > > > cards.
> > >
> > > Well, I haven't been following this closely (I was travelling, sorry), but
> > > there certainly are devices without online/offline. That currently is only
> > > present for CPUs, memory blocks and ACPI containers (if I remember correctly).
> > >
> > > >
> > > > Maybe we could change the comments above, like:
> > > > /* We assume device_hotplug_lock must be acquired before
> > > > * removing devices, which have online/offline sysfs knob,
> > > > * and some locks are needed to serialize the online/offline
> > > > * callbacks and device removing. ...
> > > > ?
> > >
> > > Lockdep assertions would be better than this in my opinion.
> >
> > This is talking about the lock required in the other process, the device
> > removing process, e.g. that in remove_memory() below. So I guess no
> > lockdep assertions needed here. Or I misunderstand your point?
>
> I mean if you assume certain lock to be held somewhere, it is better to use
> lockdep annotations to express that assumption, because that will cause users
> to *see* the problem when it happens.

OK, I see, I think you were suggesting the same thing as Tejun, just I
misunderstood it.

Thanks, Zhong

>
> Thanks!
>

2014-04-24 01:33:29

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Wed, 2014-04-23 at 12:58 +0200, Rafael J. Wysocki wrote:
> On Wednesday, April 23, 2014 01:03:42 PM Li Zhong wrote:
> > On Tue, 2014-04-22 at 16:44 -0400, Tejun Heo wrote:
> > > Hello,
> > >
> > > On Tue, Apr 22, 2014 at 11:34:39AM +0800, Li Zhong wrote:
> > > > > Is this assumption true? If so, can we add lockdep assertions in
> > > > > places to verify and enforce this? If not, aren't we just feeling
> > > > > good when the reality is broken?
> > > >
> > > > It seems not true ... I think there are devices that don't have the
> > > > online/offline concept, we just need to add it, remove it, like ethernet
> > > > cards.
> > > >
> > > > Maybe we could change the comments above, like:
> > > > /* We assume device_hotplug_lock must be acquired before
> > > > * removing devices, which have online/offline sysfs knob,
> > > > * and some locks are needed to serialize the online/offline
> > > > * callbacks and device removing. ...
> > > > ?
> > > >
> > > > And we could add lockdep assertions in cpu and memory related code? e.g.
> > > > remove_memory(), unregister_cpu()
> > > >
> > > > Currently, remove_memory() has comments for the function:
> > > >
> > > > * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> > > > * and online/offline operations before this call, as required by
> > > > * try_offline_node().
> > > > */
> > > >
> > > > maybe it could be removed with the lockdep assertion.
> > >
> > > I'm confused about the overall locking scheme. What's the role of
> > > device_hotplug_lock? Is that solely to prevent the sysfs deadlock
> > > issue? Or does it serve other synchronization purposes depending on
> > > the specific subsystem? If the former, the lock no longer needs to
> > > exist. The only thing necessary would be synchronization between
> > > device_del() deleting the sysfs file and the unbreak helper invoking
> > > device-specific callback. If the latter, we probably should change
> > > that. Sharing hotplug lock across multiple subsystems through driver
> > > core sounds like a pretty bad idea.
> >
> > I think it's the latter.
>
> Actually, no, this is not the case if I understand you correctly.

Oh, Sorry, I didn't read carefully. Yes, it's not specific subsystem.
After seeing your reply, I understand it is for protecting device hot
remove involving multiple subsystems.

>
> > I think device_{on|off}line is better to be
> > done in some sort of lock which prevents the device from being removed,
> > including some preparation work that needs be done before device_del().
>
> Quite frankly, you should be confident that you understand the code you're
> trying to modify or please don't touch it.
>
> I'll have a deeper look at this issue later today or tomorrow and will get
> back to you then.

Ok, thank you,
Zhong

>
> Thanks!
>

2014-04-24 08:37:37

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
> On Wed, Apr 23, 2014 at 10:00:26AM +0800, Li Zhong wrote:
> > If you remove cpu0, then the cpu0 directory will be removed, together
> > with the "online" file in the directory, while some other process might
> > be writing 0 or 1 to it.
> >
> > But here, for the probe/release, take "release" for example, if user
> > writes something that stands for cpu0 to it, the cpu0 will be removed,
> > and the cpu0 directory and the files under it will be removed. But
> > "release" itself is not removed.
> >
> > They are attributes of cpu_subsys, not of some specific cpus.
>
> OIC, so the file itself which triggers removal doesn't get removed.
> Hmmm... but can't you still fall into deadlock? If on/offline takes
> the same lock under active protection which is also taken while
> removing the cpu files, it doesn't really matter whether the release
> file itself is removed in the process or not. You can still have ABBA
> deadlock. What am I missing here?

After thinking it harder, I still couldn't see ABBA here ...

the active protection taken here is for "probe/release" which will not
be waited for removing something like "online" under cpu#? Or my
assumption that s_active for different files are different locks are
completely wrong? Or I missed something else?

Thanks, Zhong
>
> Thanks.
>

2014-04-24 09:00:11

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
> On 4/23/2014 4:23 PM, Tejun Heo wrote:
> > Hello, Rafael.
>
> Hi,
>
> > On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> >> Can you please elaborate a bit?
> > Because it can get involved in larger locking dependency issues by
> > joining dependency graphs of two otherwise largely disjoint
> > subsystems. It has potential to create possible deadlocks which don't
> > need to exist.
>
> Well, I do my best not to add unnecessary locks if that's what you mean.
>
> >> It is there to protect hotplug operations involving multiple devices
> >> (in different subsystems) from racing with each other. Why exactly
> >> is it bad?
> > But why would different subsystems, say cpu and memory, use the same
> > lock? Wouldn't those subsystems already have proper locking inside
> > their own subsystems?
>
> That locking is not sufficient.
>
> > Why add this additional global lock across multiple subsystems?
>
> That basically is because of how eject works when it is triggered via ACPI.
>
> It is signaled for a device at the top of a subtree. It may be a
> container of some sort and the eject involves everything below that
> device in the ACPI namespace. That may involve multiple subsystem
> (CPUs, memory, PCI host bridge, etc.).
>
> We do that in two steps, offline (which can fail) and eject proper
> (which can't fail and makes all of the involved devices go away). All
> that has to be done in one go with respect to the sysfs-triggered
> offline/online and that's why the lock is there.

Thank you for the education. It's more clear to me now why we need this
lock.

I still have some small questions about when this lock is needed:

I could understand sysfs-triggered online is not acceptable when
removing devices in multiple subsystems. But maybe concurrent offline
and remove(with proper per subsystem locks) seems not harmful?

And if we just want to remove some devices in a specific subsystem, e.g.
like writing /cpu/release, if it just wants to offline and remove some
cpus, then maybe we don't require the device_hotplug_lock to be taken?

Thanks, Zhong
>
> Thanks,
> Rafael
>

2014-04-24 10:02:29

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On 4/24/2014 10:59 AM, Li Zhong wrote:
> On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
>> On 4/23/2014 4:23 PM, Tejun Heo wrote:
>>> Hello, Rafael.
>> Hi,
>>
>>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
>>>> Can you please elaborate a bit?
>>> Because it can get involved in larger locking dependency issues by
>>> joining dependency graphs of two otherwise largely disjoint
>>> subsystems. It has potential to create possible deadlocks which don't
>>> need to exist.
>> Well, I do my best not to add unnecessary locks if that's what you mean.
>>
>>>> It is there to protect hotplug operations involving multiple devices
>>>> (in different subsystems) from racing with each other. Why exactly
>>>> is it bad?
>>> But why would different subsystems, say cpu and memory, use the same
>>> lock? Wouldn't those subsystems already have proper locking inside
>>> their own subsystems?
>> That locking is not sufficient.
>>
>>> Why add this additional global lock across multiple subsystems?
>> That basically is because of how eject works when it is triggered via ACPI.
>>
>> It is signaled for a device at the top of a subtree. It may be a
>> container of some sort and the eject involves everything below that
>> device in the ACPI namespace. That may involve multiple subsystem
>> (CPUs, memory, PCI host bridge, etc.).
>>
>> We do that in two steps, offline (which can fail) and eject proper
>> (which can't fail and makes all of the involved devices go away). All
>> that has to be done in one go with respect to the sysfs-triggered
>> offline/online and that's why the lock is there.
> Thank you for the education. It's more clear to me now why we need this
> lock.
>
> I still have some small questions about when this lock is needed:
>
> I could understand sysfs-triggered online is not acceptable when
> removing devices in multiple subsystems. But maybe concurrent offline
> and remove(with proper per subsystem locks) seems not harmful?
>
> And if we just want to remove some devices in a specific subsystem, e.g.
> like writing /cpu/release, if it just wants to offline and remove some
> cpus, then maybe we don't require the device_hotplug_lock to be taken?

No and no.

If the offline phase fails for any device in the subtree, we roll back
the operation
and online devices that have already been offlined by it. Also the ACPI
hot-addition
needs to acquire device_hotplug_lock so that it doesn't race with ejects
and so
that lock needs to be taken by sysfs-triggered offline too.

Thanks,
Rafael

2014-04-24 14:32:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

Hello,

On Thu, Apr 24, 2014 at 04:37:23PM +0800, Li Zhong wrote:
> On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
> After thinking it harder, I still couldn't see ABBA here ...
>
> the active protection taken here is for "probe/release" which will not
> be waited for removing something like "online" under cpu#? Or my
> assumption that s_active for different files are different locks are
> completely wrong? Or I missed something else?

I'm probably confused about the locking. I was thinking a scenario
like the following.

A. CPU on/offline

grabs s_active protection of online node
grabs cpu subsys mutex
perform on/offline
releases cpu subsys mutex
releases s_active protection of online node

B. CPU release

grabs s_active protection of release node
grabs cpu subsys mutex
performs removal of the CPU
removes the online node
releases cpu subsys mutex
releases s_active protection of release node

A nests cpu subsys mutex under s_active of the online node. B nests
s_active of the online node under the cpu subsys mutex. What am I
missing?

Thanks.

--
tejun

2014-04-25 01:47:06

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
> On 4/24/2014 10:59 AM, Li Zhong wrote:
> > On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
> >> On 4/23/2014 4:23 PM, Tejun Heo wrote:
> >>> Hello, Rafael.
> >> Hi,
> >>
> >>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> >>>> Can you please elaborate a bit?
> >>> Because it can get involved in larger locking dependency issues by
> >>> joining dependency graphs of two otherwise largely disjoint
> >>> subsystems. It has potential to create possible deadlocks which don't
> >>> need to exist.
> >> Well, I do my best not to add unnecessary locks if that's what you mean.
> >>
> >>>> It is there to protect hotplug operations involving multiple devices
> >>>> (in different subsystems) from racing with each other. Why exactly
> >>>> is it bad?
> >>> But why would different subsystems, say cpu and memory, use the same
> >>> lock? Wouldn't those subsystems already have proper locking inside
> >>> their own subsystems?
> >> That locking is not sufficient.
> >>
> >>> Why add this additional global lock across multiple subsystems?
> >> That basically is because of how eject works when it is triggered via ACPI.
> >>
> >> It is signaled for a device at the top of a subtree. It may be a
> >> container of some sort and the eject involves everything below that
> >> device in the ACPI namespace. That may involve multiple subsystem
> >> (CPUs, memory, PCI host bridge, etc.).
> >>
> >> We do that in two steps, offline (which can fail) and eject proper
> >> (which can't fail and makes all of the involved devices go away). All
> >> that has to be done in one go with respect to the sysfs-triggered
> >> offline/online and that's why the lock is there.
> > Thank you for the education. It's more clear to me now why we need this
> > lock.
> >
> > I still have some small questions about when this lock is needed:
> >
> > I could understand sysfs-triggered online is not acceptable when
> > removing devices in multiple subsystems. But maybe concurrent offline
> > and remove(with proper per subsystem locks) seems not harmful?
> >
> > And if we just want to remove some devices in a specific subsystem, e.g.
> > like writing /cpu/release, if it just wants to offline and remove some
> > cpus, then maybe we don't require the device_hotplug_lock to be taken?
>
> No and no.
>
> If the offline phase fails for any device in the subtree, we roll back
> the operation
> and online devices that have already been offlined by it. Also the ACPI
> hot-addition
> needs to acquire device_hotplug_lock so that it doesn't race with ejects
> and so
> that lock needs to be taken by sysfs-triggered offline too.

I can understand that hot-addition needs the device_hotplug lock, but
still not very clear about the offline.

I guess your are describing following scenario:

user A: (trying remove cpu 1 and memory 1-10)

lock_device_hotplug
offline cpu with cpu locks -- successful
offline memories with memory locks -- failed, e.g. for memory8
online cpu and memory with their locks
unlock_device_hotplug

user B: (trying offline cpu 1)

offline cpu with cpu locks

But I don't see any problem for user B not taking the device_hotplug
lock. The result may be different for user B to take or not take the
lock. But I think it could be seen as concurrent online/offline for cpu1
under cpu hotplug locks, which just depends on which is executed last?

Or did I miss something here?

Thanks, Zhong

> Thanks,
> Rafael
>

2014-04-25 01:56:20

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Thu, 2014-04-24 at 10:32 -0400, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 24, 2014 at 04:37:23PM +0800, Li Zhong wrote:
> > On Wed, 2014-04-23 at 10:39 -0400, Tejun Heo wrote:
> > After thinking it harder, I still couldn't see ABBA here ...
> >
> > the active protection taken here is for "probe/release" which will not
> > be waited for removing something like "online" under cpu#? Or my
> > assumption that s_active for different files are different locks are
> > completely wrong? Or I missed something else?
>
> I'm probably confused about the locking. I was thinking a scenario
> like the following.
>
> A. CPU on/offline
>
> grabs s_active protection of online node
> grabs cpu subsys mutex
> perform on/offline
> releases cpu subsys mutex
> releases s_active protection of online node

so the chain here is s_active(online) -> cpu_subsys_mutex

>
> B. CPU release
>
> grabs s_active protection of release node
> grabs cpu subsys mutex
> performs removal of the CPU
> removes the online node
> releases cpu subsys mutex
> releases s_active protection of release node

and the chain here is s_active(release) -> cpu_subsys_mutex ->
s_active(online)

>
> A nests cpu subsys mutex under s_active of the online node. B nests
> s_active of the online node under the cpu subsys mutex. What am I
> missing?

>From the above two chain, I think the problem is cpu_subsys_mutex and
s_active(online), which is the deadlock we are trying to solve in patch
#2. I seems to me s_active(release) here doesn't have lock issues?

Thanks, Zhong

>
> Thanks.
>

2014-04-25 12:28:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Fri, Apr 25, 2014 at 09:56:10AM +0800, Li Zhong wrote:
> > A nests cpu subsys mutex under s_active of the online node. B nests
> > s_active of the online node under the cpu subsys mutex. What am I
> > missing?
>
> From the above two chain, I think the problem is cpu_subsys_mutex and
> s_active(online), which is the deadlock we are trying to solve in patch

Yeap, that one was what I was thinking about.

> #2. I seems to me s_active(release) here doesn't have lock issues?

And your patch doesn't change the situation for online. Alright,
thanks a lot for the explanation.

--
tejun

2014-04-25 12:47:17

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On 4/25/2014 3:46 AM, Li Zhong wrote:
> On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
>> On 4/24/2014 10:59 AM, Li Zhong wrote:
>>> On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
>>>> On 4/23/2014 4:23 PM, Tejun Heo wrote:
>>>>> Hello, Rafael.
>>>> Hi,
>>>>
>>>>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
>>>>>> Can you please elaborate a bit?
>>>>> Because it can get involved in larger locking dependency issues by
>>>>> joining dependency graphs of two otherwise largely disjoint
>>>>> subsystems. It has potential to create possible deadlocks which don't
>>>>> need to exist.
>>>> Well, I do my best not to add unnecessary locks if that's what you mean.
>>>>
>>>>>> It is there to protect hotplug operations involving multiple devices
>>>>>> (in different subsystems) from racing with each other. Why exactly
>>>>>> is it bad?
>>>>> But why would different subsystems, say cpu and memory, use the same
>>>>> lock? Wouldn't those subsystems already have proper locking inside
>>>>> their own subsystems?
>>>> That locking is not sufficient.
>>>>
>>>>> Why add this additional global lock across multiple subsystems?
>>>> That basically is because of how eject works when it is triggered via ACPI.
>>>>
>>>> It is signaled for a device at the top of a subtree. It may be a
>>>> container of some sort and the eject involves everything below that
>>>> device in the ACPI namespace. That may involve multiple subsystem
>>>> (CPUs, memory, PCI host bridge, etc.).
>>>>
>>>> We do that in two steps, offline (which can fail) and eject proper
>>>> (which can't fail and makes all of the involved devices go away). All
>>>> that has to be done in one go with respect to the sysfs-triggered
>>>> offline/online and that's why the lock is there.
>>> Thank you for the education. It's more clear to me now why we need this
>>> lock.
>>>
>>> I still have some small questions about when this lock is needed:
>>>
>>> I could understand sysfs-triggered online is not acceptable when
>>> removing devices in multiple subsystems. But maybe concurrent offline
>>> and remove(with proper per subsystem locks) seems not harmful?
>>>
>>> And if we just want to remove some devices in a specific subsystem, e.g.
>>> like writing /cpu/release, if it just wants to offline and remove some
>>> cpus, then maybe we don't require the device_hotplug_lock to be taken?
>> No and no.
>>
>> If the offline phase fails for any device in the subtree, we roll back
>> the operation
>> and online devices that have already been offlined by it. Also the ACPI
>> hot-addition
>> needs to acquire device_hotplug_lock so that it doesn't race with ejects
>> and so
>> that lock needs to be taken by sysfs-triggered offline too.
> I can understand that hot-addition needs the device_hotplug lock, but
> still not very clear about the offline.
>
> I guess your are describing following scenario:
>
> user A: (trying remove cpu 1 and memory 1-10)
>
> lock_device_hotplug
> offline cpu with cpu locks -- successful
> offline memories with memory locks -- failed, e.g. for memory8
> online cpu and memory with their locks
> unlock_device_hotplug

What about if all is successful and CPU1 is gone before
device_hotplug_lock is released?

> user B: (trying offline cpu 1)
>
> offline cpu with cpu locks
>
> But I don't see any problem for user B not taking the device_hotplug
> lock. The result may be different for user B to take or not take the
> lock. But I think it could be seen as concurrent online/offline for cpu1
> under cpu hotplug locks, which just depends on which is executed last?
>
> Or did I miss something here?

Yes, you could do offline in parallel with user A without taking
device_hotplug_lock, but the result may be surprising to user B then.

With device _hotplug_lock user B will always see CPU1 off line (or gone)
after his offline in this scenario, while without taking the lock user B
may sometimes see CPU1 on line after his offline. I don't think that's
a good thing.

Rafael

2014-04-28 00:51:48

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store()

On Fri, 2014-04-25 at 08:28 -0400, Tejun Heo wrote:
> On Fri, Apr 25, 2014 at 09:56:10AM +0800, Li Zhong wrote:
> > > A nests cpu subsys mutex under s_active of the online node. B nests
> > > s_active of the online node under the cpu subsys mutex. What am I
> > > missing?
> >
> > From the above two chain, I think the problem is cpu_subsys_mutex and
> > s_active(online), which is the deadlock we are trying to solve in patch
>
> Yeap, that one was what I was thinking about.
>
> > #2. I seems to me s_active(release) here doesn't have lock issues?
>
> And your patch doesn't change the situation for online.

I'm a little confused here...

To clarify for this patch(#1):

Currently(before patch#2), the s_active(online) and cpu subsys mutex
deadlock is solved by lock_device_hotplug_sysfs(). (lockdep warnings are
still there, so we want to change the implementation of
lock_device_hotplug_sysfs() in patch #2 to take out of s_active_online,
instead of device_hotplug_lock, which is grabbed before cpu subsys
mutex).

But here, for the probe/release, I think we really don't need to use
this _sysfs() version.

So it is replaced with lock_device_hotplug() directly to avoid two
unnecessary conversions of lock_device_hotplug_sysfs(), that otherwise
need to be done in patch #2.

Thanks, Zhong

> Alright,
> thanks a lot for the explanation.
>

2014-04-28 01:49:33

by Li Zhong

[permalink] [raw]
Subject: Re: [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks

On Fri, 2014-04-25 at 14:47 +0200, Rafael J. Wysocki wrote:
> On 4/25/2014 3:46 AM, Li Zhong wrote:
> > On Thu, 2014-04-24 at 12:02 +0200, Rafael J. Wysocki wrote:
> >> On 4/24/2014 10:59 AM, Li Zhong wrote:
> >>> On Wed, 2014-04-23 at 18:12 +0200, Rafael J. Wysocki wrote:
> >>>> On 4/23/2014 4:23 PM, Tejun Heo wrote:
> >>>>> Hello, Rafael.
> >>>> Hi,
> >>>>
> >>>>> On Wed, Apr 23, 2014 at 12:21:33AM +0200, Rafael J. Wysocki wrote:
> >>>>>> Can you please elaborate a bit?
> >>>>> Because it can get involved in larger locking dependency issues by
> >>>>> joining dependency graphs of two otherwise largely disjoint
> >>>>> subsystems. It has potential to create possible deadlocks which don't
> >>>>> need to exist.
> >>>> Well, I do my best not to add unnecessary locks if that's what you mean.
> >>>>
> >>>>>> It is there to protect hotplug operations involving multiple devices
> >>>>>> (in different subsystems) from racing with each other. Why exactly
> >>>>>> is it bad?
> >>>>> But why would different subsystems, say cpu and memory, use the same
> >>>>> lock? Wouldn't those subsystems already have proper locking inside
> >>>>> their own subsystems?
> >>>> That locking is not sufficient.
> >>>>
> >>>>> Why add this additional global lock across multiple subsystems?
> >>>> That basically is because of how eject works when it is triggered via ACPI.
> >>>>
> >>>> It is signaled for a device at the top of a subtree. It may be a
> >>>> container of some sort and the eject involves everything below that
> >>>> device in the ACPI namespace. That may involve multiple subsystem
> >>>> (CPUs, memory, PCI host bridge, etc.).
> >>>>
> >>>> We do that in two steps, offline (which can fail) and eject proper
> >>>> (which can't fail and makes all of the involved devices go away). All
> >>>> that has to be done in one go with respect to the sysfs-triggered
> >>>> offline/online and that's why the lock is there.
> >>> Thank you for the education. It's more clear to me now why we need this
> >>> lock.
> >>>
> >>> I still have some small questions about when this lock is needed:
> >>>
> >>> I could understand sysfs-triggered online is not acceptable when
> >>> removing devices in multiple subsystems. But maybe concurrent offline
> >>> and remove(with proper per subsystem locks) seems not harmful?
> >>>
> >>> And if we just want to remove some devices in a specific subsystem, e.g.
> >>> like writing /cpu/release, if it just wants to offline and remove some
> >>> cpus, then maybe we don't require the device_hotplug_lock to be taken?
> >> No and no.
> >>
> >> If the offline phase fails for any device in the subtree, we roll back
> >> the operation
> >> and online devices that have already been offlined by it. Also the ACPI
> >> hot-addition
> >> needs to acquire device_hotplug_lock so that it doesn't race with ejects
> >> and so
> >> that lock needs to be taken by sysfs-triggered offline too.
> > I can understand that hot-addition needs the device_hotplug lock, but
> > still not very clear about the offline.
> >
> > I guess your are describing following scenario:
> >
> > user A: (trying remove cpu 1 and memory 1-10)
> >
> > lock_device_hotplug
> > offline cpu with cpu locks -- successful
> > offline memories with memory locks -- failed, e.g. for memory8
> > online cpu and memory with their locks
> > unlock_device_hotplug
>
> What about if all is successful and CPU1 is gone before
> device_hotplug_lock is released?

You mean user B will try to offline an already removed cpu1? But I think
the cpu subsys locks should be able to handle such situation?

>
> > user B: (trying offline cpu 1)
> >
> > offline cpu with cpu locks
> >
> > But I don't see any problem for user B not taking the device_hotplug
> > lock. The result may be different for user B to take or not take the
> > lock. But I think it could be seen as concurrent online/offline for cpu1
> > under cpu hotplug locks, which just depends on which is executed last?
> >
> > Or did I miss something here?
>
> Yes, you could do offline in parallel with user A without taking
> device_hotplug_lock, but the result may be surprising to user B then.
>
> With device _hotplug_lock user B will always see CPU1 off line (or gone)
> after his offline in this scenario, while without taking the lock user B
> may sometimes see CPU1 on line after his offline. I don't think that's
> a good thing.

That seems complicated after some more thinking.

I think I missed something when describing the steps for A. I think the
initial online/offline state needs to be recorded by offline operations
in A, so the rollback could be done based on the initial state.

If adding the above, then
1) B offline cpu 1 before A offline cpu 1
A could see the initial state of cpu1 as offline, and the rollback would
not put cpu1 online again.

In the code, I think the check is done at
if (!cpu_online(cpu))
return -EINVAL;
So the pn->put_online is kept as false.

So the result is cpu1 offline.

2) B offline cpu 1 after A offline cpu1
then the rollback would online cpu1

2.1) B offline cpu1 after A rollback
The result is cpu1 offline, good.

2.2) B offline cpu1 before A rollback

B would see a -EINVAL error, and the result is cpu1 online.

I guess this is the case you mentioned.

I agree it is not a good thing, though B still gets some sort of errors
while do the offlining.

I think now I get some better understandings of the lock, will try to
give an updated version of the patches some time later.

Thanks, Zhong

>
> Rafael
>