2018-09-27 09:27:58

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock

@Andrew, Only patch #5 changed (see change notes below). Thanks!


Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
device_hotplug_lock. This is already documented and true for now in core
code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.


v2 -> v3:
- Take device_hotplug_lock outside of loop in patch #5
- Added Ack to patch #5

v1 -> v2:
- Upstream changes in powerpc/powernv code required modifications to
patch #1, #4 and #5.
- Minor patch description changes.
- Added more locking details in patch #6.
- Added rb's

RFCv2 -> v1:
- Dropped an unnecessary _ref from remove_memory() in patch #1
- Minor patch description fixes.
- Added rb's

RFC -> RFCv2:
- Don't export device_hotplug_lock, provide proper remove_memory/add_memory
wrappers.
- Split up the patches a bit.
- Try to improve powernv memtrace locking
- Add some documentation for locking that matches my knowledg

David Hildenbrand (6):
mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
mm/memory_hotplug: make add_memory() take the device_hotplug_lock
mm/memory_hotplug: fix online/offline_pages called w.o.
mem_hotplug_lock
powerpc/powernv: hold device_hotplug_lock when calling device_online()
powerpc/powernv: hold device_hotplug_lock when calling
memtrace_offline_pages()
memory-hotplug.txt: Add some details about locking internals

Documentation/memory-hotplug.txt | 42 ++++++++++++-
arch/powerpc/platforms/powernv/memtrace.c | 8 ++-
.../platforms/pseries/hotplug-memory.c | 8 +--
drivers/acpi/acpi_memhotplug.c | 4 +-
drivers/base/memory.c | 22 +++----
drivers/xen/balloon.c | 3 +
include/linux/memory_hotplug.h | 4 +-
mm/memory_hotplug.c | 59 +++++++++++++++----
8 files changed, 114 insertions(+), 36 deletions(-)

--
2.17.1



2018-09-27 09:26:59

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
arch/powerpc/platforms/powernv/memtrace.c

Apart from that, there are not other users in the tree.

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: John Allen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 2 +-
arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
drivers/acpi/acpi_memhotplug.c | 2 +-
include/linux/memory_hotplug.h | 3 ++-
mm/memory_hotplug.c | 9 ++++++++-
5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index a29fdf8a2e56..773623f6bfb1 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -121,7 +121,7 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
lock_device_hotplug();
end_pfn = base_pfn + nr_pages;
for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> PAGE_SHIFT) {
- remove_memory(nid, pfn << PAGE_SHIFT, bytes);
+ __remove_memory(nid, pfn << PAGE_SHIFT, bytes);
}
unlock_device_hotplug();
return base_pfn << PAGE_SHIFT;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 9a15d39995e5..dd0264c43f3e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -305,7 +305,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
nid = memory_add_physaddr_to_nid(base);

for (i = 0; i < sections_per_block; i++) {
- remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
+ __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
base += MIN_MEMORY_BLOCK_SIZE;
}

@@ -394,7 +394,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
block_sz = pseries_memory_block_size();
nid = memory_add_physaddr_to_nid(lmb->base_addr);

- remove_memory(nid, lmb->base_addr, block_sz);
+ __remove_memory(nid, lmb->base_addr, block_sz);

/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);
@@ -681,7 +681,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)

rc = dlpar_online_lmb(lmb);
if (rc) {
- remove_memory(nid, lmb->base_addr, block_sz);
+ __remove_memory(nid, lmb->base_addr, block_sz);
invalidate_lmb_associativity_index(lmb);
} else {
lmb->flags |= DRCONF_MEM_ASSIGNED;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..811148415993 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
nid = memory_add_physaddr_to_nid(info->start_addr);

acpi_unbind_memory_blocks(info);
- remove_memory(nid, info->start_addr, info->length);
+ __remove_memory(nid, info->start_addr, info->length);
list_del(&info->list);
kfree(info);
}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 34a28227068d..1f096852f479 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
extern void try_offline_node(int nid);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern void remove_memory(int nid, u64 start, u64 size);
+extern void __remove_memory(int nid, u64 start, u64 size);

#else
static inline bool is_mem_section_removable(unsigned long pfn,
@@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
}

static inline void remove_memory(int nid, u64 start, u64 size) {}
+static inline void __remove_memory(int nid, u64 start, u64 size) {}
#endif /* CONFIG_MEMORY_HOTREMOVE */

extern void __ref free_area_init_core_hotplug(int nid);
@@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern bool is_memblock_offlined(struct memory_block *mem);
-extern void remove_memory(int nid, u64 start, u64 size);
extern int sparse_add_one_section(struct pglist_data *pgdat,
unsigned long start_pfn, struct vmem_altmap *altmap);
extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2698664bfd54..f6dbd5d8fffd 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1839,7 +1839,7 @@ EXPORT_SYMBOL(try_offline_node);
* and online/offline operations before this call, as required by
* try_offline_node().
*/
-void __ref remove_memory(int nid, u64 start, u64 size)
+void __ref __remove_memory(int nid, u64 start, u64 size)
{
int ret;

@@ -1868,5 +1868,12 @@ void __ref remove_memory(int nid, u64 start, u64 size)

mem_hotplug_done();
}
+
+void remove_memory(int nid, u64 start, u64 size)
+{
+ lock_device_hotplug();
+ __remove_memory(nid, start, size);
+ unlock_device_hotplug();
+}
EXPORT_SYMBOL_GPL(remove_memory);
#endif /* CONFIG_MEMORY_HOTREMOVE */
--
2.17.1


2018-09-27 09:27:22

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
a) device_lock()
b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took a),
followed by b), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
mem_hotplug_begin() and the calls device_online() - which takes
device_lock() - .online does no longer call mem_hotplug_begin(), so
effectively calls online_pages() without mem_hotplug_lock.

2. device_online() should be called under device_hotplug_lock, however
onlining memory during add_memory() does not take care of that.

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
without locks. This was introduced after 30467e0b3be. And skimming over
the code, I assume it could need some more care in regards to locking
(e.g. device_online() called without device_hotplug_lock. This will
be addressed in the following patches.

Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()

We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().

Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/base/memory.c | 13 +------------
mm/memory_hotplug.c | 28 ++++++++++++++++++++--------
2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 40cac122ec73..0e5985682642 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn)
/*
* MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
* OK to have direct references to sparsemem variables in here.
- * Must already be protected by mem_hotplug_begin().
*/
static int
memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
@@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev)
if (mem->online_type < 0)
mem->online_type = MMOP_ONLINE_KEEP;

- /* Already under protection of mem_hotplug_begin() */
ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);

/* clear online_type */
@@ -341,19 +339,11 @@ store_mem_state(struct device *dev,
goto err;
}

- /*
- * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
- * the correct memory block to online before doing device_online(dev),
- * which will take dev->mutex. Take the lock early to prevent an
- * inversion, memory_subsys_online() callbacks will be implemented by
- * assuming it's already protected.
- */
- mem_hotplug_begin();
-
switch (online_type) {
case MMOP_ONLINE_KERNEL:
case MMOP_ONLINE_MOVABLE:
case MMOP_ONLINE_KEEP:
+ /* mem->online_type is protected by device_hotplug_lock */
mem->online_type = online_type;
ret = device_online(&mem->dev);
break;
@@ -364,7 +354,6 @@ store_mem_state(struct device *dev,
ret = -EINVAL; /* should never happen */
}

- mem_hotplug_done();
err:
unlock_device_hotplug();

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index affb03e0dfef..d4c7e42e46f3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -860,7 +860,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
return zone;
}

-/* Must be protected by mem_hotplug_begin() or a device_lock */
int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
{
unsigned long flags;
@@ -872,6 +871,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
struct memory_notify arg;
struct memory_block *mem;

+ mem_hotplug_begin();
+
/*
* We can't use pfn_to_nid() because nid might be stored in struct page
* which is not yet initialized. Instead, we find nid from memory block.
@@ -936,6 +937,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ

if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);
+ mem_hotplug_done();
return 0;

failed_addition:
@@ -943,6 +945,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
(unsigned long long) pfn << PAGE_SHIFT,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
+ mem_hotplug_done();
return ret;
}
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -1147,20 +1150,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
/* create new memmap entry */
firmware_map_add_hotplug(start, start + size, "System RAM");

+ /* device_online() will take the lock when calling online_pages() */
+ mem_hotplug_done();
+
/* online pages if requested */
if (online)
walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
NULL, online_memory_block);

- goto out;
-
+ return ret;
error:
/* rollback pgdat allocation and others */
if (new_node)
rollback_node_hotadd(nid);
memblock_remove(start, size);
-
-out:
mem_hotplug_done();
return ret;
}
@@ -1588,10 +1591,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
return -EINVAL;
if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
return -EINVAL;
+
+ mem_hotplug_begin();
+
/* This makes hotplug much easier...and readable.
we assume this for now. .*/
- if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
+ if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
+ &valid_end)) {
+ mem_hotplug_done();
return -EINVAL;
+ }

zone = page_zone(pfn_to_page(valid_start));
node = zone_to_nid(zone);
@@ -1600,8 +1609,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn,
MIGRATE_MOVABLE, true);
- if (ret)
+ if (ret) {
+ mem_hotplug_done();
return ret;
+ }

arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
@@ -1672,6 +1683,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
writeback_set_ratelimit();

memory_notify(MEM_OFFLINE, &arg);
+ mem_hotplug_done();
return 0;

failed_removal:
@@ -1681,10 +1693,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
memory_notify(MEM_CANCEL_OFFLINE, &arg);
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+ mem_hotplug_done();
return ret;
}

-/* Must be protected by mem_hotplug_begin() or a device_lock */
int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
{
return __offline_pages(start_pfn, start_pfn + nr_pages);
--
2.17.1


2018-09-27 09:27:37

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 5/6] powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages()

Let's perform all checking + offlining + removing under
device_hotplug_lock, so nobody can mess with these devices via
sysfs concurrently.

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Michael Neuling <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index fdd48f1a39f7..84d038ed3882 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -70,6 +70,7 @@ static int change_memblock_state(struct memory_block *mem, void *arg)
return 0;
}

+/* called with device_hotplug_lock held */
static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
{
u64 end_pfn = start_pfn + nr_pages - 1;
@@ -110,6 +111,7 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
/* Trace memory needs to be aligned to the size */
end_pfn = round_down(end_pfn - nr_pages, nr_pages);

+ lock_device_hotplug();
for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) {
/*
@@ -118,7 +120,6 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
* we never try to remove memory that spans two iomem
* resources.
*/
- lock_device_hotplug();
end_pfn = base_pfn + nr_pages;
for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> PAGE_SHIFT) {
__remove_memory(nid, pfn << PAGE_SHIFT, bytes);
@@ -127,6 +128,7 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
return base_pfn << PAGE_SHIFT;
}
}
+ unlock_device_hotplug();

return 0;
}
--
2.17.1


2018-09-27 09:28:10

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory
to synchronize against online/offline request (e.g. from user space) -
which already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used
by XEN, which is never built as a module. If somebody requires it, we
also have to export a locked variant (as device_hotplug_lock is never
exported).

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: John Allen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
.../platforms/pseries/hotplug-memory.c | 2 +-
drivers/acpi/acpi_memhotplug.c | 2 +-
drivers/base/memory.c | 9 ++++++--
drivers/xen/balloon.c | 3 +++
include/linux/memory_hotplug.h | 1 +
mm/memory_hotplug.c | 22 ++++++++++++++++---
6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index dd0264c43f3e..d26a771d985e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -673,7 +673,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);

/* Add the memory */
- rc = add_memory(nid, lmb->base_addr, block_sz);
+ rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
invalidate_lmb_associativity_index(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 811148415993..8fe0960ea572 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);

- result = add_memory(node, info->start_addr, info->length);
+ result = __add_memory(node, info->start_addr, info->length);

/*
* If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 817320c7c4c1..40cac122ec73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;

+ ret = lock_device_hotplug_sysfs();
+ if (ret)
+ goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
- ret = add_memory(nid, phys_addr,
- MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+ ret = __add_memory(nid, phys_addr,
+ MIN_MEMORY_BLOCK_SIZE * sections_per_block);

if (ret)
goto out;

ret = count;
out:
+ unlock_device_hotplug();
return ret;
}

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index a3f5cbfcd4a1..fdfc64f5acea 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void)
* callers drop the mutex before trying again.
*/
mutex_unlock(&balloon_mutex);
+ /* add_memory_resource() requires the device_hotplug lock */
+ lock_device_hotplug();
rc = add_memory_resource(nid, resource, memhp_auto_online);
+ unlock_device_hotplug();
mutex_lock(&balloon_mutex);

if (rc) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 1f096852f479..ffd9cd10fcf3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -324,6 +324,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
extern void __ref free_area_init_core_hotplug(int nid);
extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
void *arg, int (*func)(struct memory_block *, void *));
+extern int __add_memory(int nid, u64 start, u64 size);
extern int add_memory(int nid, u64 start, u64 size);
extern int add_memory_resource(int nid, struct resource *resource, bool online);
extern int arch_add_memory(int nid, u64 start, u64 size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f6dbd5d8fffd..affb03e0dfef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1090,7 +1090,12 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}

-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
+/*
+ * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
+ * and online/offline operations (triggered e.g. by sysfs).
+ *
+ * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
+ */
int __ref add_memory_resource(int nid, struct resource *res, bool online)
{
u64 start, size;
@@ -1159,9 +1164,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
mem_hotplug_done();
return ret;
}
-EXPORT_SYMBOL_GPL(add_memory_resource);

-int __ref add_memory(int nid, u64 start, u64 size)
+/* requires device_hotplug_lock, see add_memory_resource() */
+int __ref __add_memory(int nid, u64 start, u64 size)
{
struct resource *res;
int ret;
@@ -1175,6 +1180,17 @@ int __ref add_memory(int nid, u64 start, u64 size)
release_memory_resource(res);
return ret;
}
+
+int add_memory(int nid, u64 start, u64 size)
+{
+ int rc;
+
+ lock_device_hotplug();
+ rc = __add_memory(nid, start, size);
+ unlock_device_hotplug();
+
+ return rc;
+}
EXPORT_SYMBOL_GPL(add_memory);

#ifdef CONFIG_MEMORY_HOTREMOVE
--
2.17.1


2018-09-27 09:28:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 6/6] memory-hotplug.txt: Add some details about locking internals

Let's document the magic a bit, especially why device_hotplug_lock is
required when adding/removing memory and how it all play together with
requests to online/offline memory from user space.

Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/memory-hotplug.txt | 42 +++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 7f49ebf3ddb2..ce4faa5530fa 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -3,7 +3,7 @@ Memory Hotplug
==============

:Created: Jul 28 2007
-:Updated: Add description of notifier of memory hotplug: Oct 11 2007
+:Updated: Add some details about locking internals: Aug 20 2018

This document is about memory hotplug including how-to-use and current status.
Because Memory Hotplug is still under development, contents of this text will
@@ -495,6 +495,46 @@ further processing of the notification queue.

NOTIFY_STOP stops further processing of the notification queue.

+
+Locking Internals
+=================
+
+When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
+the device_hotplug_lock should be held to:
+
+- synchronize against online/offline requests (e.g. via sysfs). This way, memory
+ block devices can only be accessed (.online/.state attributes) by user
+ space once memory has been fully added. And when removing memory, we
+ know nobody is in critical sections.
+- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
+
+Especially, there is a possible lock inversion that is avoided using
+device_hotplug_lock when adding memory and user space tries to online that
+memory faster than expected:
+
+- device_online() will first take the device_lock(), followed by
+ mem_hotplug_lock
+- add_memory_resource() will first take the mem_hotplug_lock, followed by
+ the device_lock() (while creating the devices, during bus_add_device()).
+
+As the device is visible to user space before taking the device_lock(), this
+can result in a lock inversion.
+
+onlining/offlining of memory should be done via device_online()/
+device_offline() - to make sure it is properly synchronized to actions
+via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
+
+When adding/removing/onlining/offlining memory or adding/removing
+heterogeneous/device memory, we should always hold the mem_hotplug_lock in
+write mode to serialise memory hotplug (e.g. access to global/zone
+variables).
+
+In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
+mode allows for a quite efficient get_online_mems/put_online_mems
+implementation, so code accessing memory can protect from that memory
+vanishing.
+
+
Future Work
===========

--
2.17.1


2018-09-27 09:29:02

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online()

device_online() should be called with device_hotplug_lock() held.

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Michael Neuling <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Rashmica Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 773623f6bfb1..fdd48f1a39f7 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -242,9 +242,11 @@ static int memtrace_online(void)
* we need to online the memory ourselves.
*/
if (!memhp_auto_online) {
+ lock_device_hotplug();
walk_memory_range(PFN_DOWN(ent->start),
PFN_UP(ent->start + ent->size - 1),
NULL, online_mem_block);
+ unlock_device_hotplug();
}

/*
--
2.17.1


2018-10-05 07:09:46

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] memory-hotplug.txt: Add some details about locking internals

On Thu, Sep 27, 2018 at 11:25:54AM +0200, David Hildenbrand wrote:
> Cc: Jonathan Corbet <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Reviewed-by: Pavel Tatashin <[email protected]>
> Reviewed-by: Rashmica Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>

--
Oscar Salvador
SUSE L3

2018-10-05 07:09:51

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

On Thu, Sep 27, 2018 at 11:25:51AM +0200, David Hildenbrand wrote:
> Reviewed-by: Pavel Tatashin <[email protected]>
> Reviewed-by: Rashmica Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>
--
Oscar Salvador
SUSE L3

2018-10-05 07:17:11

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock

On Thu, Sep 27, 2018 at 11:25:49AM +0200, David Hildenbrand wrote:
> Reviewed-by: Pavel Tatashin <[email protected]>
> Reviewed-by: Rafael J. Wysocki <[email protected]>
> Reviewed-by: Rashmica Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>

--
Oscar Salvador
SUSE L3

2018-10-05 07:17:42

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

On Thu, Sep 27, 2018 at 11:25:50AM +0200, David Hildenbrand wrote:
> Reviewed-by: Pavel Tatashin <[email protected]>
> Reviewed-by: Rafael J. Wysocki <[email protected]>
> Reviewed-by: Rashmica Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>

--
Oscar Salvador
SUSE L3