2018-08-17 08:00:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock(),
and the approach to fix it fixed one inversion, but dropped the
mem_hotplug_lock on another instance (.online).

As far as I understand from the code and from b93e0f329e24 ("mm,
memory_hotplug: get rid of zonelists_mutex"), mem_hotplug_lock is required
because we assume that
"both memory online and offline are fully serialized."
and this is not the case if we only hold the device_lock().

I propose the general rules:

1. add_memory/add_memory_resource() must only be called with
device_hotplug_lock. For now only done in ACPI code.
2. remove_memory() must only be called with device_hotplug_lock. This is
already documented and true in ACPI code.
3. device_online()/device_offline() must only be called with
device_hotplug_lock. This is already documented and true for now in core
code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
online_pages/offline_pages. For now this is only true for the first two
instances.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural. Second patch could
maybe split up.

But let's first hear if this is actually a problem and if there migh be
alternatives (or cleanups). Only tested with DIMM-based hotplug.

David Hildenbrand (1):
mm/memory_hotplug: fix online/offline_pages called w.o.
mem_hotplug_lock

Vitaly Kuznetsov (1):
drivers/base: export lock_device_hotplug/unlock_device_hotplug

arch/powerpc/platforms/powernv/memtrace.c | 3 ++
drivers/acpi/acpi_memhotplug.c | 1 +
drivers/base/core.c | 2 ++
drivers/base/memory.c | 18 +++++-----
drivers/hv/hv_balloon.c | 4 +++
drivers/s390/char/sclp_cmd.c | 3 ++
drivers/xen/balloon.c | 3 ++
mm/memory_hotplug.c | 42 ++++++++++++++++++-----
8 files changed, 57 insertions(+), 19 deletions(-)

--
2.17.1



2018-08-17 08:00:33

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

From: Vitaly Kuznetsov <[email protected]>

Well require to call add_memory()/add_memory_resource() with
device_hotplug_lock held, to avoid a lock inversion. Allow external modules
(e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
lock device hotplug.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
[modify patch description]
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/base/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 04bbcd779e11..9010b9e942b5 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -700,11 +700,13 @@ void lock_device_hotplug(void)
{
mutex_lock(&device_hotplug_lock);
}
+EXPORT_SYMBOL_GPL(lock_device_hotplug);

void unlock_device_hotplug(void)
{
mutex_unlock(&device_hotplug_lock);
}
+EXPORT_SYMBOL_GPL(unlock_device_hotplug);

int lock_device_hotplug_sysfs(void)
{
--
2.17.1


2018-08-17 08:00:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
a) device_lock()
b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order. Looking at 1., this order is not always
satisfied when calling device_online() - essentially we simply don't take
one of both locks anymore - and fixing this would require us to
take the mem_hotplug_lock in core driver code (online_store()), which
sounds wrong.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
mem_hotplug_begin() and the calls device_online() - which takes
device_lock() - .online does no longer call mem_hotplug_begin(), so
effectively calls online_pages() without mem_hotplug_lock. onlining/
offlining of pages is no longer serialised across different devices.

2. device_online() should be called under device_hotplug_lock, however
onlining memory during add_memory() does not take care of that. (I
didn't follow how strictly this is needed, but there seems to be a
reason because it is documented at device_online() and
device_offline()).

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
(and device_online()) without locks. This was introduced after
30467e0b3be. And skimming over the code, I assume it could need some
more care in regards to locking.

ACPI code already holds the device_hotplug_lock, and as we are
effectively hotplugging memory block devices, requiring to hold that
lock does not sound too wrong, although not chosen in [1], as
"I don't think resolving a locking dependency is appropriate by
just serializing them with another lock."
I think this is the cleanest solution.

Requiring add_memory()/add_memory_resource() to be called under
device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
online_pages/offline_pages() fixes 1. and 3.

Fixup all callers of add_memory/add_memory_resource to hold the lock if
not already done.

So this is essentially a revert of 30467e0b3be, implementation of what
was suggested in [1] by Vitaly, applied to the current tree.

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 3 ++
drivers/acpi/acpi_memhotplug.c | 1 +
drivers/base/memory.c | 18 +++++-----
drivers/hv/hv_balloon.c | 4 +++
drivers/s390/char/sclp_cmd.c | 3 ++
drivers/xen/balloon.c | 3 ++
mm/memory_hotplug.c | 42 ++++++++++++++++++-----
7 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..4c2737a33020 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -206,6 +206,8 @@ static int memtrace_online(void)
int i, ret = 0;
struct memtrace_entry *ent;

+ /* add_memory() requires device_hotplug_lock */
+ lock_device_hotplug();
for (i = memtrace_array_nr - 1; i >= 0; i--) {
ent = &memtrace_array[i];

@@ -244,6 +246,7 @@ static int memtrace_online(void)
pr_info("Added trace memory back to node %d\n", ent->nid);
ent->size = ent->start = ent->nid = -1;
}
+ unlock_device_hotplug();
if (ret)
return ret;

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..e7a4c7900967 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);

+ /* we already hold the device_hotplug lock at this point */
result = add_memory(node, info->start_addr, info->length);

/*
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..f60507f994df 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -341,19 +341,11 @@ store_mem_state(struct device *dev,
goto err;
}

- /*
- * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
- * the correct memory block to online before doing device_online(dev),
- * which will take dev->mutex. Take the lock early to prevent an
- * inversion, memory_subsys_online() callbacks will be implemented by
- * assuming it's already protected.
- */
- mem_hotplug_begin();
-
switch (online_type) {
case MMOP_ONLINE_KERNEL:
case MMOP_ONLINE_MOVABLE:
case MMOP_ONLINE_KEEP:
+ /* mem->online_type is protected by device_hotplug_lock */
mem->online_type = online_type;
ret = device_online(&mem->dev);
break;
@@ -364,7 +356,6 @@ store_mem_state(struct device *dev,
ret = -EINVAL; /* should never happen */
}

- mem_hotplug_done();
err:
unlock_device_hotplug();

@@ -522,6 +513,12 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
return -EINVAL;

nid = memory_add_physaddr_to_nid(phys_addr);
+
+ /* add_memory() requires the device_hotplug_lock */
+ ret = lock_device_hotplug_sysfs();
+ if (ret)
+ return ret;
+
ret = add_memory(nid, phys_addr,
MIN_MEMORY_BLOCK_SIZE * sections_per_block);

@@ -530,6 +527,7 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,

ret = count;
out:
+ unlock_device_hotplug();
return ret;
}

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index b1b788082793..c6d0b654f109 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -735,8 +735,12 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
dm_device.ha_waiting = !memhp_auto_online;

nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
+
+ /* add_memory() requires the device_hotplug lock */
+ lock_device_hotplug();
ret = add_memory(nid, PFN_PHYS((start_pfn)),
(HA_CHUNK << PAGE_SHIFT));
+ unlock_device_hotplug();

if (ret) {
pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index d7686a68c093..cd0cdab8fcfb 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -405,8 +405,11 @@ static void __init add_memory_merged(u16 rn)
align_to_block_size(&start, &size, block_size);
if (!size)
goto skip_add;
+ /* add_memory() requires the device_hotplug lock */
+ lock_device_hotplug();
for (addr = start; addr < start + size; addr += block_size)
add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+ unlock_device_hotplug();
skip_add:
first_rn = rn;
num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 559e77a20a4d..df1ced4c0692 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,8 +395,11 @@ static enum bp_state reserve_additional_memory(void)
* callers drop the mutex before trying again.
*/
mutex_unlock(&balloon_mutex);
+ /* add_memory_resource() requires the device_hotplug lock */
+ lock_device_hotplug();
rc = add_memory_resource(nid, resource->start, resource_size(resource),
memhp_auto_online);
+ unlock_device_hotplug();
mutex_lock(&balloon_mutex);

if (rc) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a2726920ed2..492e558f791b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -885,7 +885,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
return zone;
}

-/* Must be protected by mem_hotplug_begin() or a device_lock */
int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
{
unsigned long flags;
@@ -897,6 +896,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
struct memory_notify arg;
struct memory_block *mem;

+ mem_hotplug_begin();
+
/*
* We can't use pfn_to_nid() because nid might be stored in struct page
* which is not yet initialized. Instead, we find nid from memory block.
@@ -961,6 +962,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ

if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);
+ mem_hotplug_done();
return 0;

failed_addition:
@@ -968,6 +970,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
(unsigned long long) pfn << PAGE_SHIFT,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
+ mem_hotplug_done();
return ret;
}
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -1115,7 +1118,18 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}

-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
+/*
+ * Requires device_hotplug_lock:
+ *
+ * add_memory_resource() will first take the mem_hotplug_lock and then
+ * device_lock() the created memory block device (via bus_probe_device()).
+ * However, device_online() calls device_lock() and then takes the
+ * mem_hotplug lock (via online_pages()). The device_hotplug_lock makes
+ * sure this lock inversion can never happen - and also device_online()
+ * needs it.
+ *
+ * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
+ */
int __ref add_memory_resource(int nid, u64 start, u64 size, bool online)
{
bool new_node = false;
@@ -1163,25 +1177,26 @@ int __ref add_memory_resource(int nid, u64 start, u64 size, bool online)
/* create new memmap entry */
firmware_map_add_hotplug(start, start + size, "System RAM");

+ /* device_online() will take the lock when calling online_pages() */
+ mem_hotplug_done();
+
/* online pages if requested */
if (online)
walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
NULL, online_memory_block);

- goto out;
-
+ return ret;
error:
/* rollback pgdat allocation and others */
if (new_node)
rollback_node_hotadd(nid);
memblock_remove(start, size);
-
-out:
mem_hotplug_done();
return ret;
}
EXPORT_SYMBOL_GPL(add_memory_resource);

+/* requires device_hotplug_lock, see add_memory_resource() */
int __ref add_memory(int nid, u64 start, u64 size)
{
struct resource *res;
@@ -1605,10 +1620,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
return -EINVAL;
if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
return -EINVAL;
+
+ mem_hotplug_begin();
+
/* This makes hotplug much easier...and readable.
we assume this for now. .*/
- if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
+ if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
+ &valid_end)) {
+ mem_hotplug_done();
return -EINVAL;
+ }

zone = page_zone(pfn_to_page(valid_start));
node = zone_to_nid(zone);
@@ -1617,8 +1638,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn,
MIGRATE_MOVABLE, true);
- if (ret)
+ if (ret) {
+ mem_hotplug_done();
return ret;
+ }

arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
@@ -1689,6 +1712,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
writeback_set_ratelimit();

memory_notify(MEM_OFFLINE, &arg);
+ mem_hotplug_done();
return 0;

failed_removal:
@@ -1698,10 +1722,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
memory_notify(MEM_CANCEL_OFFLINE, &arg);
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+ mem_hotplug_done();
return ret;
}

-/* Must be protected by mem_hotplug_begin() or a device_lock */
int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
{
return __offline_pages(start_pfn, start_pfn + nr_pages);
--
2.17.1


2018-08-17 08:22:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <[email protected]> wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
> mem_hotplug_begin() and the calls device_online() - which takes
> device_lock() - .online does no longer call mem_hotplug_begin(), so
> effectively calls online_pages() without mem_hotplug_lock. onlining/
> offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
> onlining memory during add_memory() does not take care of that. (I
> didn't follow how strictly this is needed, but there seems to be a
> reason because it is documented at device_online() and
> device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
> (and device_online()) without locks. This was introduced after
> 30467e0b3be. And skimming over the code, I assume it could need some
> more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
> "I don't think resolving a locking dependency is appropriate by
> just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/powerpc/platforms/powernv/memtrace.c | 3 ++
> drivers/acpi/acpi_memhotplug.c | 1 +
> drivers/base/memory.c | 18 +++++-----
> drivers/hv/hv_balloon.c | 4 +++
> drivers/s390/char/sclp_cmd.c | 3 ++
> drivers/xen/balloon.c | 3 ++
> mm/memory_hotplug.c | 42 ++++++++++++++++++-----
> 7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
> int i, ret = 0;
> struct memtrace_entry *ent;
>
> + /* add_memory() requires device_hotplug_lock */
> + lock_device_hotplug();
> for (i = memtrace_array_nr - 1; i >= 0; i--) {
> ent = &memtrace_array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
> pr_info("Added trace memory back to node %d\n", ent->nid);
> ent->size = ent->start = ent->nid = -1;
> }
> + unlock_device_hotplug();
> if (ret)
> return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> + /* we already hold the device_hotplug lock at this point */
> result = add_memory(node, info->start_addr, info->length);
>
> /*

A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what group of people
is represented by that and the lock is actually called
device_hotplug_lock).

Otherwise the approach is fine by me.

BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
hotplug is because it generally needs to be synchronized with respect
CPU hot-remove and similar. I believe that this may be the case in
non-ACPI setups as well.

Thanks,
Rafael

2018-08-17 08:29:18

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <[email protected]> wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> a) device_lock()
>> b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>> mem_hotplug_begin() and the calls device_online() - which takes
>> device_lock() - .online does no longer call mem_hotplug_begin(), so
>> effectively calls online_pages() without mem_hotplug_lock. onlining/
>> offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>> onlining memory during add_memory() does not take care of that. (I
>> didn't follow how strictly this is needed, but there seems to be a
>> reason because it is documented at device_online() and
>> device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>> (and device_online()) without locks. This was introduced after
>> 30467e0b3be. And skimming over the code, I assume it could need some
>> more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>> "I don't think resolving a locking dependency is appropriate by
>> just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>> 2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> arch/powerpc/platforms/powernv/memtrace.c | 3 ++
>> drivers/acpi/acpi_memhotplug.c | 1 +
>> drivers/base/memory.c | 18 +++++-----
>> drivers/hv/hv_balloon.c | 4 +++
>> drivers/s390/char/sclp_cmd.c | 3 ++
>> drivers/xen/balloon.c | 3 ++
>> mm/memory_hotplug.c | 42 ++++++++++++++++++-----
>> 7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>> int i, ret = 0;
>> struct memtrace_entry *ent;
>>
>> + /* add_memory() requires device_hotplug_lock */
>> + lock_device_hotplug();
>> for (i = memtrace_array_nr - 1; i >= 0; i--) {
>> ent = &memtrace_array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>> pr_info("Added trace memory back to node %d\n", ent->nid);
>> ent->size = ent->start = ent->nid = -1;
>> }
>> + unlock_device_hotplug();
>> if (ret)
>> return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>> if (node < 0)
>> node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> + /* we already hold the device_hotplug lock at this point */
>> result = add_memory(node, info->start_addr, info->length);
>>
>> /*
>
> A very minor nit here: I would say "device_hotplug_lock is already
> held at this point" in the comment (I sort of don't like to say "we"
> in code comments as it is not particularly clear what group of people
> is represented by that and the lock is actually called
> device_hotplug_lock).

Easy to fix, thanks!

>
> Otherwise the approach is fine by me.
>
> BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
> hotplug is because it generally needs to be synchronized with respect
> CPU hot-remove and similar. I believe that this may be the case in
> non-ACPI setups as well.

Yes, and that lock is the reason why we didn't have real problems with
ACPI memory hotplug in this respect so far. (as user triggered
online/offline also takes that lock already)

>
> Thanks,
> Rafael
>


--

Thanks,

David / dhildenb

2018-08-17 08:43:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> From: Vitaly Kuznetsov <[email protected]>
>
> Well require to call add_memory()/add_memory_resource() with
> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> lock device hotplug.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> [modify patch description]
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> drivers/base/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 04bbcd779e11..9010b9e942b5 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> {
> mutex_lock(&device_hotplug_lock);
> }
> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>
> void unlock_device_hotplug(void)
> {
> mutex_unlock(&device_hotplug_lock);
> }
> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);

If these are going to be "global" symbols, let's properly name them.
device_hotplug_lock/unlock would be better. But I am _really_ nervous
about letting stuff outside of the driver core mess with this, as people
better know what they are doing.

Can't we just "lock" the memory stuff instead? Why does the entirety of
the driver core need to be messed with here?

thanks,

greg k-h

2018-08-17 08:58:49

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>> From: Vitaly Kuznetsov <[email protected]>
>>
>> Well require to call add_memory()/add_memory_resource() with
>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>> lock device hotplug.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> [modify patch description]
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> drivers/base/core.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 04bbcd779e11..9010b9e942b5 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>> {
>> mutex_lock(&device_hotplug_lock);
>> }
>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>
>> void unlock_device_hotplug(void)
>> {
>> mutex_unlock(&device_hotplug_lock);
>> }
>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better. But I am _really_ nervous
> about letting stuff outside of the driver core mess with this, as people
> better know what they are doing.

The only "problem" is that we have kernel modules (for paravirtualized
devices) that call add_memory(). This is Hyper-V right now, but we might
have other ones in the future. Without them we would not have to export
it. We might also get kernel modules that want to call remove_memory() -
which will require the device_hotplug_lock as of now.

What we could do is

a) add_memory() -> _add_memory() and don't export it
b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
We export that one.
c) Use add_memory() in external modules only

Similar wrapper would be needed e.g. for remove_memory() later on.

>
> Can't we just "lock" the memory stuff instead? Why does the entirety of
> the driver core need to be messed with here?

The main problem is that add_memory() will first take the
mem_hotplug_lock, to later on create and attach the memory block devices
(to a bus without any drivers) via bus_probe_device(). Here, we will
take the device_lock() of these devices.

Setting a device online from user space (.online = true) will first take
the device_hotplug_lock, then the device_lock(), and _right now_ not the
mem_hotplug_lock (see cover letter/patch 2).

We have to take mem_hotplug_lock here, but doing it down in e.g.
online_pages() will right now create the possibility for a lock
inversion. So one alternative is to take the mem_hotplug_lock in core.c
before calling device_online()/device_offline(). But this feels very wrong.

Thanks!

>
> thanks,
>
> greg k-h
>


--

Thanks,

David / dhildenb

2018-08-17 08:58:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On Fri, Aug 17, 2018 at 10:41 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> > From: Vitaly Kuznetsov <[email protected]>
> >
> > Well require to call add_memory()/add_memory_resource() with
> > device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> > (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> > lock device hotplug.
> >
> > Signed-off-by: Vitaly Kuznetsov <[email protected]>
> > [modify patch description]
> > Signed-off-by: David Hildenbrand <[email protected]>
> > ---
> > drivers/base/core.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 04bbcd779e11..9010b9e942b5 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> > {
> > mutex_lock(&device_hotplug_lock);
> > }
> > +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >
> > void unlock_device_hotplug(void)
> > {
> > mutex_unlock(&device_hotplug_lock);
> > }
> > +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.

Well, device_hotplug_lock is the name of the lock itself. :-)

> But I am _really_ nervous about letting stuff outside of the driver core mess
> with this, as people better know what they are doing.
>
> Can't we just "lock" the memory stuff instead? Why does the entirety of
> the driver core need to be messed with here?

Because, in general, memory hotplug and hotplug of devices are not
independent. CPUs and memory may only be possible to take away
together and that may be the case for other devices too (say, it
wouldn't be a good idea to access a memory block that has just gone
away from a device, for DMA and the like). That's why
device_hotplug_lock was introduced in the first place.

That said I agree that exporting this to drivers doesn't feel particularly safe.

2018-08-17 09:04:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <[email protected]> wrote:
>
> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> > On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >> From: Vitaly Kuznetsov <[email protected]>
> >>
> >> Well require to call add_memory()/add_memory_resource() with
> >> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >> lock device hotplug.
> >>
> >> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> >> [modify patch description]
> >> Signed-off-by: David Hildenbrand <[email protected]>
> >> ---
> >> drivers/base/core.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 04bbcd779e11..9010b9e942b5 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >> {
> >> mutex_lock(&device_hotplug_lock);
> >> }
> >> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>
> >> void unlock_device_hotplug(void)
> >> {
> >> mutex_unlock(&device_hotplug_lock);
> >> }
> >> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >
> > If these are going to be "global" symbols, let's properly name them.
> > device_hotplug_lock/unlock would be better. But I am _really_ nervous
> > about letting stuff outside of the driver core mess with this, as people
> > better know what they are doing.
>
> The only "problem" is that we have kernel modules (for paravirtualized
> devices) that call add_memory(). This is Hyper-V right now, but we might
> have other ones in the future. Without them we would not have to export
> it. We might also get kernel modules that want to call remove_memory() -
> which will require the device_hotplug_lock as of now.
>
> What we could do is
>
> a) add_memory() -> _add_memory() and don't export it
> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> We export that one.
> c) Use add_memory() in external modules only
>
> Similar wrapper would be needed e.g. for remove_memory() later on.

That would be safer IMO, as it would prevent developers from using
add_memory() without the lock, say.

If the lock is always going to be required for add_memory(), make it
hard (or event impossible) to use the latter without it.

2018-08-17 09:42:49

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On 17.08.2018 11:03, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <[email protected]> wrote:
>>
>> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
>>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>>>> From: Vitaly Kuznetsov <[email protected]>
>>>>
>>>> Well require to call add_memory()/add_memory_resource() with
>>>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>>>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>>>> lock device hotplug.
>>>>
>>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>>>> [modify patch description]
>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>> ---
>>>> drivers/base/core.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>> index 04bbcd779e11..9010b9e942b5 100644
>>>> --- a/drivers/base/core.c
>>>> +++ b/drivers/base/core.c
>>>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>>> {
>>>> mutex_lock(&device_hotplug_lock);
>>>> }
>>>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>>>
>>>> void unlock_device_hotplug(void)
>>>> {
>>>> mutex_unlock(&device_hotplug_lock);
>>>> }
>>>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>>>
>>> If these are going to be "global" symbols, let's properly name them.
>>> device_hotplug_lock/unlock would be better. But I am _really_ nervous
>>> about letting stuff outside of the driver core mess with this, as people
>>> better know what they are doing.
>>
>> The only "problem" is that we have kernel modules (for paravirtualized
>> devices) that call add_memory(). This is Hyper-V right now, but we might
>> have other ones in the future. Without them we would not have to export
>> it. We might also get kernel modules that want to call remove_memory() -
>> which will require the device_hotplug_lock as of now.
>>
>> What we could do is
>>
>> a) add_memory() -> _add_memory() and don't export it
>> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
>> We export that one.
>> c) Use add_memory() in external modules only
>>
>> Similar wrapper would be needed e.g. for remove_memory() later on.
>
> That would be safer IMO, as it would prevent developers from using
> add_memory() without the lock, say.
>
> If the lock is always going to be required for add_memory(), make it
> hard (or event impossible) to use the latter without it.
>

If there are no objections, I'll go into that direction. But I'll wait
for more comments regarding the general concept first.

Thanks!

--

Thanks,

David / dhildenb

2018-08-17 10:07:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On Fri, Aug 17, 2018 at 11:41:24AM +0200, David Hildenbrand wrote:
> On 17.08.2018 11:03, Rafael J. Wysocki wrote:
> > On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <[email protected]> wrote:
> >>
> >> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> >>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >>>> From: Vitaly Kuznetsov <[email protected]>
> >>>>
> >>>> Well require to call add_memory()/add_memory_resource() with
> >>>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >>>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >>>> lock device hotplug.
> >>>>
> >>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> >>>> [modify patch description]
> >>>> Signed-off-by: David Hildenbrand <[email protected]>
> >>>> ---
> >>>> drivers/base/core.c | 2 ++
> >>>> 1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>> index 04bbcd779e11..9010b9e942b5 100644
> >>>> --- a/drivers/base/core.c
> >>>> +++ b/drivers/base/core.c
> >>>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >>>> {
> >>>> mutex_lock(&device_hotplug_lock);
> >>>> }
> >>>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>>>
> >>>> void unlock_device_hotplug(void)
> >>>> {
> >>>> mutex_unlock(&device_hotplug_lock);
> >>>> }
> >>>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >>>
> >>> If these are going to be "global" symbols, let's properly name them.
> >>> device_hotplug_lock/unlock would be better. But I am _really_ nervous
> >>> about letting stuff outside of the driver core mess with this, as people
> >>> better know what they are doing.
> >>
> >> The only "problem" is that we have kernel modules (for paravirtualized
> >> devices) that call add_memory(). This is Hyper-V right now, but we might
> >> have other ones in the future. Without them we would not have to export
> >> it. We might also get kernel modules that want to call remove_memory() -
> >> which will require the device_hotplug_lock as of now.
> >>
> >> What we could do is
> >>
> >> a) add_memory() -> _add_memory() and don't export it
> >> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> >> We export that one.
> >> c) Use add_memory() in external modules only
> >>
> >> Similar wrapper would be needed e.g. for remove_memory() later on.
> >
> > That would be safer IMO, as it would prevent developers from using
> > add_memory() without the lock, say.
> >
> > If the lock is always going to be required for add_memory(), make it
> > hard (or event impossible) to use the latter without it.
> >
>
> If there are no objections, I'll go into that direction. But I'll wait
> for more comments regarding the general concept first.

It is the middle of the merge window, and maintainers are really busy
right now. I doubt you will get many review comments just yet...

2018-08-17 11:06:25

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On 17.08.2018 12:06, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 11:41:24AM +0200, David Hildenbrand wrote:
>> On 17.08.2018 11:03, Rafael J. Wysocki wrote:
>>> On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <[email protected]> wrote:
>>>>
>>>> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
>>>>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>>>>>> From: Vitaly Kuznetsov <[email protected]>
>>>>>>
>>>>>> Well require to call add_memory()/add_memory_resource() with
>>>>>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>>>>>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>>>>>> lock device hotplug.
>>>>>>
>>>>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>>>>>> [modify patch description]
>>>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>>>> ---
>>>>>> drivers/base/core.c | 2 ++
>>>>>> 1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>>> index 04bbcd779e11..9010b9e942b5 100644
>>>>>> --- a/drivers/base/core.c
>>>>>> +++ b/drivers/base/core.c
>>>>>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>>>>> {
>>>>>> mutex_lock(&device_hotplug_lock);
>>>>>> }
>>>>>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>>>>>
>>>>>> void unlock_device_hotplug(void)
>>>>>> {
>>>>>> mutex_unlock(&device_hotplug_lock);
>>>>>> }
>>>>>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>>>>>
>>>>> If these are going to be "global" symbols, let's properly name them.
>>>>> device_hotplug_lock/unlock would be better. But I am _really_ nervous
>>>>> about letting stuff outside of the driver core mess with this, as people
>>>>> better know what they are doing.
>>>>
>>>> The only "problem" is that we have kernel modules (for paravirtualized
>>>> devices) that call add_memory(). This is Hyper-V right now, but we might
>>>> have other ones in the future. Without them we would not have to export
>>>> it. We might also get kernel modules that want to call remove_memory() -
>>>> which will require the device_hotplug_lock as of now.
>>>>
>>>> What we could do is
>>>>
>>>> a) add_memory() -> _add_memory() and don't export it
>>>> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
>>>> We export that one.
>>>> c) Use add_memory() in external modules only
>>>>
>>>> Similar wrapper would be needed e.g. for remove_memory() later on.
>>>
>>> That would be safer IMO, as it would prevent developers from using
>>> add_memory() without the lock, say.
>>>
>>> If the lock is always going to be required for add_memory(), make it
>>> hard (or event impossible) to use the latter without it.
>>>
>>
>> If there are no objections, I'll go into that direction. But I'll wait
>> for more comments regarding the general concept first.
>
> It is the middle of the merge window, and maintainers are really busy
> right now. I doubt you will get many review comments just yet...
>

This has been broken since 2015, so I guess it can wait a bit :)

--

Thanks,

David / dhildenb

2018-08-17 11:30:29

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On Fri, Aug 17, 2018 at 01:04:58PM +0200, David Hildenbrand wrote:
> >> If there are no objections, I'll go into that direction. But I'll wait
> >> for more comments regarding the general concept first.
> >
> > It is the middle of the merge window, and maintainers are really busy
> > right now. I doubt you will get many review comments just yet...
> >
>
> This has been broken since 2015, so I guess it can wait a bit :)

I hope you figured out what needs to be locked why. Your patch description
seems to be "only" about locking order ;)

I tried to figure out and document that partially with 55adc1d05dca ("mm:
add private lock to serialize memory hotplug operations"), and that wasn't
easy to figure out. I was especially concerned about sprinkling
lock/unlock_device_hotplug() calls, which has the potential to make it the
next BKL thing.


2018-08-17 11:58:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On 17.08.2018 13:28, Heiko Carstens wrote:
> On Fri, Aug 17, 2018 at 01:04:58PM +0200, David Hildenbrand wrote:
>>>> If there are no objections, I'll go into that direction. But I'll wait
>>>> for more comments regarding the general concept first.
>>>
>>> It is the middle of the merge window, and maintainers are really busy
>>> right now. I doubt you will get many review comments just yet...
>>>
>>
>> This has been broken since 2015, so I guess it can wait a bit :)
>
> I hope you figured out what needs to be locked why. Your patch description
> seems to be "only" about locking order ;)

Well I hope so, too ... but there is a reason for the RFC mark ;) There
is definitely a lot of magic in the current code. And that's why it is
also not that obvious that locking is wrong.

To avoid/fix the locking order problem was the motivation for the
original patch that dropped mem_hotplug_lock on one path. So I focused
on that in my description.

>
> I tried to figure out and document that partially with 55adc1d05dca ("mm:
> add private lock to serialize memory hotplug operations"), and that wasn't
> easy to figure out. I was especially concerned about sprinkling

Haven't seen that so far as that was reworked by 3f906ba23689
("mm/memory-hotplug: switch locking to a percpu rwsem"). Thanks for the
pointer. There is a long history to all this.

> lock/unlock_device_hotplug() calls, which has the potential to make it the
> next BKL thing.

Well, the thing with memory hotplug and device_hotplug_lock is that

a) ACPI already holds it while adding/removing memory via add_memory()
b) we hold it during online/offline of memory (via sysfs calls to
device_online()/device_offline())

So it is already pretty much involved in all memory hotplug/unplug
activities on x86 (except paravirt). And as far as I understand, there
are good reasons to hold the lock in core.c and ACPI. (as mentioned by
Rafael)

The exceptions are add_memory() called on s390x, hyper-v, xen and ppc
(including manual probing). And device_online()/device_offline() called
from the kernel.

Holding device_hotplug_lock when adding/removing memory from the system
doesn't sound too wrong (especially as devices are created/removed). At
least that way (documenting and following the rules in the patch
description) we might at least get locking right.


I am very open to other suggestions (but as noted by Greg, many
maintainers might be busy by know).

E.g. When adding the memory block devices, we know that there won't be a
driver to attach to (as there are no drivers for the "memory" subsystem)
- the bus_probe_device() function that takes the device_lock() could
pretty much be avoided for that case. But burying such special cases
down in core driver code definitely won't make locking related to memory
hotplug easier.

Thanks for having a look!

--

Thanks,

David / dhildenb

2018-08-17 17:06:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

On Fri, Aug 17, 2018 at 01:56:35PM +0200, David Hildenbrand wrote:
> E.g. When adding the memory block devices, we know that there won't be a
> driver to attach to (as there are no drivers for the "memory" subsystem)
> - the bus_probe_device() function that takes the device_lock() could
> pretty much be avoided for that case. But burying such special cases
> down in core driver code definitely won't make locking related to memory
> hotplug easier.

You don't have to have a driver for a device if you don't want to, or
you can just have a default one for all memory devices if you somehow
need it. No reason to not do this if it makes things easier for you.

thanks,

greg k-h