This is the same approach as in the first RFC, but this time without
exporting device_hotplug_lock (requested by Greg) and with some more
details and documentation regarding locking. Tested only on x86 so far.
--------------------------------------------------------------------------
Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.
While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.
E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.
Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.
Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.
I propose the general rules (documentation added in patch 6):
1. add_memory/add_memory_resource() must only be called with
device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
device_hotplug_lock. This is already documented and true for now in core
code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
online_pages/offline_pages.
To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.
RFC -> RFCv2:
- Don't export device_hotplug_lock, provide proper remove_memory/add_memory
wrappers.
- Split up the patches a bit.
- Try to improve powernv memtrace locking
- Add some documentation for locking that matches my knowledge
David Hildenbrand (6):
mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
mm/memory_hotplug: make add_memory() take the device_hotplug_lock
mm/memory_hotplug: fix online/offline_pages called w.o.
mem_hotplug_lock
powerpc/powernv: hold device_hotplug_lock when calling device_online()
powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
memory-hotplug.txt: Add some details about locking internals
Documentation/memory-hotplug.txt | 39 +++++++++++-
arch/powerpc/platforms/powernv/memtrace.c | 14 +++--
.../platforms/pseries/hotplug-memory.c | 8 +--
drivers/acpi/acpi_memhotplug.c | 4 +-
drivers/base/memory.c | 22 +++----
drivers/xen/balloon.c | 3 +
include/linux/memory_hotplug.h | 4 +-
mm/memory_hotplug.c | 59 +++++++++++++++----
8 files changed, 115 insertions(+), 38 deletions(-)
--
2.17.1
Let's perform all checking + offlining + removing under
device_hotplug_lock, so nobody can mess with these devices via
sysfs concurrently.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Michael Neuling <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index ef7181d4fe68..473e59842ec5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
{
u64 end_pfn = start_pfn + nr_pages - 1;
+ lock_device_hotplug();
+
if (walk_memory_range(start_pfn, end_pfn, NULL,
- check_memblock_online))
+ check_memblock_online)) {
+ unlock_device_hotplug();
return false;
+ }
walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
change_memblock_state);
@@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
if (offline_pages(start_pfn, nr_pages)) {
walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
change_memblock_state);
+ unlock_device_hotplug();
return false;
}
walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
change_memblock_state);
- remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+ __remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+ unlock_device_hotplug();
return true;
}
--
2.17.1
Let's document the magic a bit, especially why device_hotplug_lock is
required when adding/removing memory and how it all play together with
requests to online/offline memory from user space.
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/memory-hotplug.txt | 39 +++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 7f49ebf3ddb2..03aaad7d7373 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -3,7 +3,7 @@ Memory Hotplug
==============
:Created: Jul 28 2007
-:Updated: Add description of notifier of memory hotplug: Oct 11 2007
+:Updated: Add some details about locking internals: Aug 20 2018
This document is about memory hotplug including how-to-use and current status.
Because Memory Hotplug is still under development, contents of this text will
@@ -495,6 +495,43 @@ further processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
+
+Locking Internals
+=================
+
+When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
+the device_hotplug_lock should be held to:
+
+- synchronize against online/offline requests (e.g. via sysfs). This way, memory
+ block devices can only be accessed (.online/.state attributes) by user
+ space once memory has been fully added. And when removing memory, we
+ know nobody is in critical sections.
+- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
+
+Especially, there is a possible lock inversion that is avoided using
+device_hotplug_lock when adding memory and user space tries to online that
+memory faster than expected:
+
+- device_online() will first take the device_lock(), followed by
+ mem_hotplug_lock
+- add_memory_resource() will first take the mem_hotplug_lock, followed by
+ the device_lock() (while creating the devices, during bus_add_device()).
+
+As the device is visible to user space before taking the device_lock(), this
+can result in a lock inversion.
+
+onlining/offlining of memory should be done via device_online()/
+device_offline() - to make sure it is properly synchronized to actions
+via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
+
+When adding/removing/onlining/offlining memory or adding/removing
+heterogeneous/device memory, we should always hold the mem_hotplug_lock to
+serialise memory hotplug (e.g. access to global/zone variables).
+
+In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) allows
+for a quite efficient get_online_mems/put_online_mems implementation.
+
+
Future Work
===========
--
2.17.1
remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.
The lock is already held in
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
So, let's use the locked variant.
The lock is not held (but taken in)
arch/powerpc/platforms/powernv/memtrace.c
So let's keep using the (now) locked variant.
Apart from that, there are not other users in the tree.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: John Allen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 2 --
arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
drivers/acpi/acpi_memhotplug.c | 2 +-
include/linux/memory_hotplug.h | 3 ++-
mm/memory_hotplug.c | 9 ++++++++-
5 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..8f1cd4f3bfd5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
change_memblock_state);
- lock_device_hotplug();
remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
- unlock_device_hotplug();
return true;
}
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..b3f54466e25f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
nid = memory_add_physaddr_to_nid(base);
for (i = 0; i < sections_per_block; i++) {
- remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
+ __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
base += MIN_MEMORY_BLOCK_SIZE;
}
@@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
block_sz = pseries_memory_block_size();
nid = memory_add_physaddr_to_nid(lmb->base_addr);
- remove_memory(nid, lmb->base_addr, block_sz);
+ __remove_memory(nid, lmb->base_addr, block_sz);
/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);
@@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
rc = dlpar_online_lmb(lmb);
if (rc) {
- remove_memory(nid, lmb->base_addr, block_sz);
+ __remove_memory(nid, lmb->base_addr, block_sz);
dlpar_remove_device_tree_lmb(lmb);
} else {
lmb->flags |= DRCONF_MEM_ASSIGNED;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..811148415993 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
nid = memory_add_physaddr_to_nid(info->start_addr);
acpi_unbind_memory_blocks(info);
- remove_memory(nid, info->start_addr, info->length);
+ __remove_memory(nid, info->start_addr, info->length);
list_del(&info->list);
kfree(info);
}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 34a28227068d..1f096852f479 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
extern void try_offline_node(int nid);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern void remove_memory(int nid, u64 start, u64 size);
+extern void __remove_memory(int nid, u64 start, u64 size);
#else
static inline bool is_mem_section_removable(unsigned long pfn,
@@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
}
static inline void remove_memory(int nid, u64 start, u64 size) {}
+static inline void __remove_memory(int nid, u64 start, u64 size) {}
#endif /* CONFIG_MEMORY_HOTREMOVE */
extern void __ref free_area_init_core_hotplug(int nid);
@@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern bool is_memblock_offlined(struct memory_block *mem);
-extern void remove_memory(int nid, u64 start, u64 size);
extern int sparse_add_one_section(struct pglist_data *pgdat,
unsigned long start_pfn, struct vmem_altmap *altmap);
extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9eea6e809a4e..898e13d4d87d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1872,7 +1872,7 @@ EXPORT_SYMBOL(try_offline_node);
* and online/offline operations before this call, as required by
* try_offline_node().
*/
-void __ref remove_memory(int nid, u64 start, u64 size)
+void __ref __remove_memory(int nid, u64 start, u64 size)
{
int ret;
@@ -1901,5 +1901,12 @@ void __ref remove_memory(int nid, u64 start, u64 size)
mem_hotplug_done();
}
+
+void __ref remove_memory(int nid, u64 start, u64 size)
+{
+ lock_device_hotplug();
+ __remove_memory(nid, start, size);
+ unlock_device_hotplug();
+}
EXPORT_SYMBOL_GPL(remove_memory);
#endif /* CONFIG_MEMORY_HOTREMOVE */
--
2.17.1
There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
a) device_lock()
b) mem_hotplug_lock
While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.
In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.
The problems I spotted related to this:
1. Memory block device attributes: While .state first calls
mem_hotplug_begin() and the calls device_online() - which takes
device_lock() - .online does no longer call mem_hotplug_begin(), so
effectively calls online_pages() without mem_hotplug_lock.
2. device_online() should be called under device_hotplug_lock, however
onlining memory during add_memory() does not take care of that.
In addition, I think there is also something wrong about the locking in
3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
without locks. This was introduced after 30467e0b3be. And skimming over
the code, I assume it could need some more care in regards to locking
(e.g. device_online() called without device_hotplug_lock - but I'll
not touch that for now).
Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()
We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().
Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).
[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
2015-February/065324.html
This patch is partly based on a patch by Vitaly Kuznetsov.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/base/memory.c | 13 +------------
mm/memory_hotplug.c | 28 ++++++++++++++++++++--------
2 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5b0375be7f65..04be13539eb8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn)
/*
* MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
* OK to have direct references to sparsemem variables in here.
- * Must already be protected by mem_hotplug_begin().
*/
static int
memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
@@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev)
if (mem->online_type < 0)
mem->online_type = MMOP_ONLINE_KEEP;
- /* Already under protection of mem_hotplug_begin() */
ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
/* clear online_type */
@@ -341,19 +339,11 @@ store_mem_state(struct device *dev,
goto err;
}
- /*
- * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
- * the correct memory block to online before doing device_online(dev),
- * which will take dev->mutex. Take the lock early to prevent an
- * inversion, memory_subsys_online() callbacks will be implemented by
- * assuming it's already protected.
- */
- mem_hotplug_begin();
-
switch (online_type) {
case MMOP_ONLINE_KERNEL:
case MMOP_ONLINE_MOVABLE:
case MMOP_ONLINE_KEEP:
+ /* mem->online_type is protected by device_hotplug_lock */
mem->online_type = online_type;
ret = device_online(&mem->dev);
break;
@@ -364,7 +354,6 @@ store_mem_state(struct device *dev,
ret = -EINVAL; /* should never happen */
}
- mem_hotplug_done();
err:
unlock_device_hotplug();
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e2b5c751e3ea..a2c6c87d83f3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
return zone;
}
-/* Must be protected by mem_hotplug_begin() or a device_lock */
int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
{
unsigned long flags;
@@ -893,6 +892,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
struct memory_notify arg;
struct memory_block *mem;
+ mem_hotplug_begin();
+
/*
* We can't use pfn_to_nid() because nid might be stored in struct page
* which is not yet initialized. Instead, we find nid from memory block.
@@ -957,6 +958,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);
+ mem_hotplug_done();
return 0;
failed_addition:
@@ -964,6 +966,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
(unsigned long long) pfn << PAGE_SHIFT,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
+ mem_hotplug_done();
return ret;
}
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -1168,20 +1171,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
/* create new memmap entry */
firmware_map_add_hotplug(start, start + size, "System RAM");
+ /* device_online() will take the lock when calling online_pages() */
+ mem_hotplug_done();
+
/* online pages if requested */
if (online)
walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
NULL, online_memory_block);
- goto out;
-
+ return ret;
error:
/* rollback pgdat allocation and others */
if (new_node)
rollback_node_hotadd(nid);
memblock_remove(start, size);
-
-out:
mem_hotplug_done();
return ret;
}
@@ -1621,10 +1624,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
return -EINVAL;
if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
return -EINVAL;
+
+ mem_hotplug_begin();
+
/* This makes hotplug much easier...and readable.
we assume this for now. .*/
- if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
+ if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
+ &valid_end)) {
+ mem_hotplug_done();
return -EINVAL;
+ }
zone = page_zone(pfn_to_page(valid_start));
node = zone_to_nid(zone);
@@ -1633,8 +1642,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn,
MIGRATE_MOVABLE, true);
- if (ret)
+ if (ret) {
+ mem_hotplug_done();
return ret;
+ }
arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
@@ -1705,6 +1716,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
writeback_set_ratelimit();
memory_notify(MEM_OFFLINE, &arg);
+ mem_hotplug_done();
return 0;
failed_removal:
@@ -1714,10 +1726,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
memory_notify(MEM_CANCEL_OFFLINE, &arg);
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+ mem_hotplug_done();
return ret;
}
-/* Must be protected by mem_hotplug_begin() or a device_lock */
int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
{
return __offline_pages(start_pfn, start_pfn + nr_pages);
--
2.17.1
device_online() should be called with device_hotplug_lock() held.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Michael Neuling <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/platforms/powernv/memtrace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 8f1cd4f3bfd5..ef7181d4fe68 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -229,9 +229,11 @@ static int memtrace_online(void)
* we need to online the memory ourselves.
*/
if (!memhp_auto_online) {
+ lock_device_hotplug();
walk_memory_range(PFN_DOWN(ent->start),
PFN_UP(ent->start + ent->size - 1),
NULL, online_mem_block);
+ unlock_device_hotplug();
}
/*
--
2.17.1
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.
In general, we should hold the device_hotplug_lock when adding memory
to synchronize against online/offline request (e.g. from user space) -
which already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.
Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.
The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.
Don't export add_memory_resource(), as it once was exported to be used
by XEN, which is never built as a module. If somebody requires it, we
also have to export a locked variant (as device_hotplug_lock is never
exported).
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: John Allen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
.../platforms/pseries/hotplug-memory.c | 2 +-
drivers/acpi/acpi_memhotplug.c | 2 +-
drivers/base/memory.c | 9 ++++++--
drivers/xen/balloon.c | 3 +++
include/linux/memory_hotplug.h | 1 +
mm/memory_hotplug.c | 22 ++++++++++++++++---
6 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b3f54466e25f..2e6f41dc103a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
/* Add the memory */
- rc = add_memory(nid, lmb->base_addr, block_sz);
+ rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 811148415993..8fe0960ea572 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
- result = add_memory(node, info->start_addr, info->length);
+ result = __add_memory(node, info->start_addr, info->length);
/*
* If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..5b0375be7f65 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -521,15 +521,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
+ ret = lock_device_hotplug_sysfs();
+ if (ret)
+ goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
- ret = add_memory(nid, phys_addr,
- MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+ ret = __add_memory(nid, phys_addr,
+ MIN_MEMORY_BLOCK_SIZE * sections_per_block);
if (ret)
goto out;
ret = count;
out:
+ unlock_device_hotplug();
return ret;
}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e12bb256036f..6bab019a82b1 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void)
* callers drop the mutex before trying again.
*/
mutex_unlock(&balloon_mutex);
+ /* add_memory_resource() requires the device_hotplug lock */
+ lock_device_hotplug();
rc = add_memory_resource(nid, resource, memhp_auto_online);
+ unlock_device_hotplug();
mutex_lock(&balloon_mutex);
if (rc) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 1f096852f479..ffd9cd10fcf3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -324,6 +324,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
extern void __ref free_area_init_core_hotplug(int nid);
extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
void *arg, int (*func)(struct memory_block *, void *));
+extern int __add_memory(int nid, u64 start, u64 size);
extern int add_memory(int nid, u64 start, u64 size);
extern int add_memory_resource(int nid, struct resource *resource, bool online);
extern int arch_add_memory(int nid, u64 start, u64 size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 898e13d4d87d..e2b5c751e3ea 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1111,7 +1111,12 @@ static int online_memory_block(struct memory_block *mem, void *arg)
return device_online(&mem->dev);
}
-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
+/*
+ * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
+ * and online/offline operations (triggered e.g. by sysfs).
+ *
+ * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
+ */
int __ref add_memory_resource(int nid, struct resource *res, bool online)
{
u64 start, size;
@@ -1180,9 +1185,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
mem_hotplug_done();
return ret;
}
-EXPORT_SYMBOL_GPL(add_memory_resource);
-int __ref add_memory(int nid, u64 start, u64 size)
+/* requires device_hotplug_lock, see add_memory_resource() */
+int __ref __add_memory(int nid, u64 start, u64 size)
{
struct resource *res;
int ret;
@@ -1196,6 +1201,17 @@ int __ref add_memory(int nid, u64 start, u64 size)
release_memory_resource(res);
return ret;
}
+
+int add_memory(int nid, u64 start, u64 size)
+{
+ int rc;
+
+ lock_device_hotplug();
+ rc = __add_memory(nid, start, size);
+ unlock_device_hotplug();
+
+ return rc;
+}
EXPORT_SYMBOL_GPL(add_memory);
#ifdef CONFIG_MEMORY_HOTREMOVE
--
2.17.1
On 21.08.2018 12:44, David Hildenbrand wrote:
> This is the same approach as in the first RFC, but this time without
> exporting device_hotplug_lock (requested by Greg) and with some more
> details and documentation regarding locking. Tested only on x86 so far.
>
I'll be on vacation for two weeks starting on Saturday. If there are no
comments I'll send this as !RFC once I return.
Thanks!
> --------------------------------------------------------------------------
>
> Reading through the code and studying how mem_hotplug_lock is to be used,
> I noticed that there are two places where we can end up calling
> device_online()/device_offline() - online_pages()/offline_pages() without
> the mem_hotplug_lock. And there are other places where we call
> device_online()/device_offline() without the device_hotplug_lock.
>
> While e.g.
> echo "online" > /sys/devices/system/memory/memory9/state
> is fine, e.g.
> echo 1 > /sys/devices/system/memory/memory9/online
> Will not take the mem_hotplug_lock. However the device_lock() and
> device_hotplug_lock.
>
> E.g. via memory_probe_store(), we can end up calling
> add_memory()->online_pages() without the device_hotplug_lock. So we can
> have concurrent callers in online_pages(). We e.g. touch in online_pages()
> basically unprotected zone->present_pages then.
>
> Looks like there is a longer history to that (see Patch #2 for details),
> and fixing it to work the way it was intended is not really possible. We
> would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
> sounds wrong.
>
> Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
> More details can be found in patch 3 and patch 6.
>
> I propose the general rules (documentation added in patch 6):
>
> 1. add_memory/add_memory_resource() must only be called with
> device_hotplug_lock.
> 2. remove_memory() must only be called with device_hotplug_lock. This is
> already documented and holds for all callers.
> 3. device_online()/device_offline() must only be called with
> device_hotplug_lock. This is already documented and true for now in core
> code. Other callers (related to memory hotplug) have to be fixed up.
> 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
> online_pages/offline_pages.
>
> To me, this looks way cleaner than what we have right now (and easier to
> verify). And looking at the documentation of remove_memory, using
> lock_device_hotplug also for add_memory() feels natural.
>
>
> RFC -> RFCv2:
> - Don't export device_hotplug_lock, provide proper remove_memory/add_memory
> wrappers.
> - Split up the patches a bit.
> - Try to improve powernv memtrace locking
> - Add some documentation for locking that matches my knowledge
>
> David Hildenbrand (6):
> mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
> mm/memory_hotplug: make add_memory() take the device_hotplug_lock
> mm/memory_hotplug: fix online/offline_pages called w.o.
> mem_hotplug_lock
> powerpc/powernv: hold device_hotplug_lock when calling device_online()
> powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
> memory-hotplug.txt: Add some details about locking internals
>
> Documentation/memory-hotplug.txt | 39 +++++++++++-
> arch/powerpc/platforms/powernv/memtrace.c | 14 +++--
> .../platforms/pseries/hotplug-memory.c | 8 +--
> drivers/acpi/acpi_memhotplug.c | 4 +-
> drivers/base/memory.c | 22 +++----
> drivers/xen/balloon.c | 3 +
> include/linux/memory_hotplug.h | 4 +-
> mm/memory_hotplug.c | 59 +++++++++++++++----
> 8 files changed, 115 insertions(+), 38 deletions(-)
>
--
Thanks,
David / dhildenb
On 8/30/18 8:31 AM, David Hildenbrand wrote:
> On 21.08.2018 12:44, David Hildenbrand wrote:
>> This is the same approach as in the first RFC, but this time without
>> exporting device_hotplug_lock (requested by Greg) and with some more
>> details and documentation regarding locking. Tested only on x86 so far.
>>
>
> I'll be on vacation for two weeks starting on Saturday. If there are no
> comments I'll send this as !RFC once I return.
>
I am studying this series, will send my comments later today.
Pavel
> +
> +void __ref remove_memory(int nid, u64 start, u64 size)
Remove __ref, otherwise looks good:
Reviewed-by: Pavel Tatashin <[email protected]>
> +{
> + lock_device_hotplug();
> + __remove_memory(nid, start, size);
> + unlock_device_hotplug();
> +}
> EXPORT_SYMBOL_GPL(remove_memory);
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> add_memory() currently does not take the device_hotplug_lock, however
> is aleady called under the lock from
> arch/powerpc/platforms/pseries/hotplug-memory.c
> drivers/acpi/acpi_memhotplug.c
> to synchronize against CPU hot-remove and similar.
>
> In general, we should hold the device_hotplug_lock when adding memory
> to synchronize against online/offline request (e.g. from user space) -
> which already resulted in lock inversions due to device_lock() and
> mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
> hot-add deadlock"). add_memory()/add_memory_resource() will create memory
> block devices, so this really feels like the right thing to do.
>
> Holding the device_hotplug_lock makes sure that a memory block device
> can really only be accessed (e.g. via .online/.state) from user space,
> once the memory has been fully added to the system.
>
> The lock is not held yet in
> drivers/xen/balloon.c
> arch/powerpc/platforms/powernv/memtrace.c
> drivers/s390/char/sclp_cmd.c
> drivers/hv/hv_balloon.c
> So, let's either use the locked variants or take the lock.
>
> Don't export add_memory_resource(), as it once was exported to be used
> by XEN, which is never built as a module. If somebody requires it, we
> also have to export a locked variant (as device_hotplug_lock is never
> exported).
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> device_online() should be called with device_hotplug_lock() held.
>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Rashmica Gupta <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Michael Neuling <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/powerpc/platforms/powernv/memtrace.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 8f1cd4f3bfd5..ef7181d4fe68 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -229,9 +229,11 @@ static int memtrace_online(void)
> * we need to online the memory ourselves.
> */
> if (!memhp_auto_online) {
> + lock_device_hotplug();
> walk_memory_range(PFN_DOWN(ent->start),
> PFN_UP(ent->start + ent->size - 1),
> NULL, online_mem_block);
> + unlock_device_hotplug();
> }
>
> /*
>
Reviewed-by: Pavel Tatashin <[email protected]>
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> Let's perform all checking + offlining + removing under
> device_hotplug_lock, so nobody can mess with these devices via
> sysfs concurrently.
>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Rashmica Gupta <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Michael Neuling <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> arch/powerpc/platforms/powernv/memtrace.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index ef7181d4fe68..473e59842ec5 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
> {
> u64 end_pfn = start_pfn + nr_pages - 1;
>
> + lock_device_hotplug();
> +
> if (walk_memory_range(start_pfn, end_pfn, NULL,
> - check_memblock_online))
> + check_memblock_online)) {
> + unlock_device_hotplug();
> return false;
> + }
>
> walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
> change_memblock_state);
> @@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
> if (offline_pages(start_pfn, nr_pages)) {
> walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
> change_memblock_state);
> + unlock_device_hotplug();
> return false;
> }
>
> walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
> change_memblock_state);
>
> - remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
> + __remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
>
> + unlock_device_hotplug();
> return true;
> }
>
>
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
> mem_hotplug_begin() and the calls device_online() - which takes
> device_lock() - .online does no longer call mem_hotplug_begin(), so
> effectively calls online_pages() without mem_hotplug_lock.
>
> 2. device_online() should be called under device_hotplug_lock, however
> onlining memory during add_memory() does not take care of that.
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
> without locks. This was introduced after 30467e0b3be. And skimming over
> the code, I assume it could need some more care in regards to locking
> (e.g. device_online() called without device_hotplug_lock - but I'll
> not touch that for now).
>
> Now that we hold the device_hotplug_lock when
> - adding memory (e.g. via add_memory()/add_memory_resource())
> - removing memory (e.g. via remove_memory())
> - device_online()/device_offline()
>
> We can move mem_hotplug_lock usage back into
> online_pages()/offline_pages().
>
> Why is mem_hotplug_lock still needed? Essentially to make
> get_online_mems()/put_online_mems() be very fast (relying on
> device_hotplug_lock would be very slow), and to serialize against
> addition of memory that does not create memory block devices (hmm).
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
On 8/21/18 6:44 AM, David Hildenbrand wrote:
> Let's document the magic a bit, especially why device_hotplug_lock is
> required when adding/removing memory and how it all play together with
> requests to online/offline memory from user space.
>
On 30.08.2018 21:35, Pasha Tatashin wrote:
>> +
>> +void __ref remove_memory(int nid, u64 start, u64 size)
>
> Remove __ref, otherwise looks good:
Indeed, will do.
Thanks for the review. Will resend in two weeks when I'm back from vacation.
Cheers!
>
> Reviewed-by: Pavel Tatashin <[email protected]>
>
>> +{
>> + lock_device_hotplug();
>> + __remove_memory(nid, start, size);
>> + unlock_device_hotplug();
>> +}
>> EXPORT_SYMBOL_GPL(remove_memory);
>> #endif /* CONFIG_MEMORY_HOTREMOVE */
--
Thanks,
David / dhildenb
On Tue, Aug 21, 2018 at 12:44:12PM +0200, David Hildenbrand wrote:
> This is the same approach as in the first RFC, but this time without
> exporting device_hotplug_lock (requested by Greg) and with some more
> details and documentation regarding locking. Tested only on x86 so far.
Hi David,
I would like to review this but I am on vacation, so I will not be able to get to it
soon.
I plan to do it once I am back.
Thanks
--
Oscar Salvador
SUSE L3
On 31.08.2018 22:54, Oscar Salvador wrote:
> On Tue, Aug 21, 2018 at 12:44:12PM +0200, David Hildenbrand wrote:
>> This is the same approach as in the first RFC, but this time without
>> exporting device_hotplug_lock (requested by Greg) and with some more
>> details and documentation regarding locking. Tested only on x86 so far.
>
> Hi David,
>
> I would like to review this but I am on vacation, so I will not be able to get to it
> soon.
> I plan to do it once I am back.
Sure, I won't be resending within next two weeks either way, as I am
also on vacation.
Have a nice vacation!
>
> Thanks
>
--
Thanks,
David / dhildenb
Am 03.09.18 um 02:36 schrieb Rashmica:
> Hi David,
>
>
> On 21/08/18 20:44, David Hildenbrand wrote:
>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> a) device_lock()
>> b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>
> Do you mean "onlining of memory from user space first took a),
> followed by b)"?
Very right, thanks.
>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>> mem_hotplug_begin() and the calls device_online() - which takes
>> device_lock() - .online does no longer call mem_hotplug_begin(), so
>> effectively calls online_pages() without mem_hotplug_lock.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>> onlining memory during add_memory() does not take care of that.
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>> without locks. This was introduced after 30467e0b3be. And skimming over
>> the code, I assume it could need some more care in regards to locking
>> (e.g. device_online() called without device_hotplug_lock - but I'll
>> not touch that for now).
>
> Can you mention that you fixed this in later patches?
Sure!
>
>
> The series looks good to me. Feel free to add my reviewed-by:
>
> Reviewed-by: Rashmica Gupta <[email protected]>
>
Thanks, r-b only for this patch or all of the series?
--
Thanks,
David / dhildenb
On Mon, 2018-09-17 at 09:32 +0200, David Hildenbrand wrote:
> Am 03.09.18 um 02:36 schrieb Rashmica:
> > Hi David,
> >
> >
> > On 21/08/18 20:44, David Hildenbrand wrote:
> >
> > > There seem to be some problems as result of 30467e0b3be ("mm,
> > > hotplug:
> > > fix concurrent memory hot-add deadlock"), which tried to fix a
> > > possible
> > > lock inversion reported and discussed in [1] due to the two locks
> > > a) device_lock()
> > > b) mem_hotplug_lock
> > >
> > > While add_memory() first takes b), followed by a) during
> > > bus_probe_device(), onlining of memory from user space first took
> > > b),
> > > followed by a), exposing a possible deadlock.
> >
> > Do you mean "onlining of memory from user space first took a),
> > followed by b)"?
>
> Very right, thanks.
>
> >
> > > In [1], and it was decided to not make use of
> > > device_hotplug_lock, but
> > > rather to enforce a locking order.
> > >
> > > The problems I spotted related to this:
> > >
> > > 1. Memory block device attributes: While .state first calls
> > > mem_hotplug_begin() and the calls device_online() - which
> > > takes
> > > device_lock() - .online does no longer call
> > > mem_hotplug_begin(), so
> > > effectively calls online_pages() without mem_hotplug_lock.
> > >
> > > 2. device_online() should be called under device_hotplug_lock,
> > > however
> > > onlining memory during add_memory() does not take care of
> > > that.
> > >
> > > In addition, I think there is also something wrong about the
> > > locking in
> > >
> > > 3. arch/powerpc/platforms/powernv/memtrace.c calls
> > > offline_pages()
> > > without locks. This was introduced after 30467e0b3be. And
> > > skimming over
> > > the code, I assume it could need some more care in regards to
> > > locking
> > > (e.g. device_online() called without device_hotplug_lock - but
> > > I'll
> > > not touch that for now).
> >
> > Can you mention that you fixed this in later patches?
>
> Sure!
>
> >
> >
> > The series looks good to me. Feel free to add my reviewed-by:
> >
> > Reviewed-by: Rashmica Gupta <[email protected]>
> >
>
> Thanks, r-b only for this patch or all of the series?
Sorry, I somehow missed this. To all of the series.
>