2013-07-24 18:34:04

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 0/8] Correct memory hot add/remove for powerpc

The current implementation of memory hot add and remove for powerpc is broken.
This patch set both corrects this issue and updates the memory hot add and
remove code for powerpc so that it can be done properly in the kernel.

The first two patches update the powerpc hot add and remove code to work with
all of the updates that have gone in to enable memory remove with sparse
vmemmap enabled. With these two patches applied the powerpc code is back to
working, but not working properly.

The remaining patches update the powerpc memory add and remove code so the
work can be done in the kernel and all while holding the memory hotplug lock.
The current powerpc implementation does some of the work in the kernel and
some of the work in userspace. While this code did work at one time, it has
a problem in that it does part of the work to add and remove memory without
holding the memory hotplug lock. In this scheme memory could be added and
removed fast enough to cause the system to crash. This was a result of
doing part of the add or remove without holding the lock.

In order to do memory hot remove in the kernel, this patch set introduces
a sysfs release file (/sys/device/system/memory/release) which one
can write the physical address of the memory to be removed to. Additionally
there is a new set of flags defined for the memory notification chain to
indicate that memory is being hot added or hot removed. This allows any work
that may need to be done prior to or after memory is hot added or removed
to be performed.

The remaining patches in the patch set update the powerpc to properly do
memory hot add and remove in the kernel.

Nathan Fontenot
---
Documentation/memory-hotplug.txt | 26 ++++
arch/powerpc/mm/mem.c | 35 +++++-
arch/powerpc/platforms/pseries/hotplug-memory.c | 95 +---------------
drivers/base/memory.c | 81 ++++++++++++--
linux/Documentation/memory-hotplug.txt | 34 ++++-
linux/arch/powerpc/Kconfig | 2
linux/arch/powerpc/mm/init_64.c | 6 +
linux/arch/powerpc/mm/mem.c | 9 +
linux/arch/powerpc/platforms/pseries/dlpar.c | 103 ++++++++++++++++++
linux/arch/powerpc/platforms/pseries/hotplug-memory.c | 60 +---------
linux/arch/x86/Kconfig | 2
linux/drivers/base/memory.c | 20 +--
linux/include/linux/memory.h | 6 +
linux/mm/Kconfig | 2
linux/mm/memory_hotplug.c | 25 +++-
15 files changed, 322 insertions(+), 184 deletions(-)


2013-07-24 18:35:21

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 1/8] register bootmem pages for powerpc when sparse vmemmap is not defined

Previous commit 46723bfa540... introduced a new config option
HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for powerpc
when sparse vmemmap is not defined.

This patch defines HAVE_BOOTMEM_INFO_NODE for powerpc and adds the call to
register_page_bootmem_info_node. Without this patch we get a BUG_ON for memory
hot remove in put_page_bootmem().

This also adds a stub for register_page_bootmem_memmap to allow powerpc to
build with sparse vmemmap defined.

Signed-off-by: Nathan Fontenot <[email protected]>
---

---
arch/powerpc/mm/init_64.c | 6 ++++++
arch/powerpc/mm/mem.c | 9 +++++++++
mm/Kconfig | 2 +-
3 files changed, 16 insertions(+), 1 deletion(-)

Index: linux/arch/powerpc/mm/init_64.c
===================================================================
--- linux.orig/arch/powerpc/mm/init_64.c
+++ linux/arch/powerpc/mm/init_64.c
@@ -300,5 +300,11 @@ void vmemmap_free(unsigned long start, u
{
}

+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+ WARN_ONCE(1, KERN_INFO
+ "Sparse Vmemmap not fully supported for bootmem info nodes\n");
+}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */

Index: linux/arch/powerpc/mm/mem.c
===================================================================
--- linux.orig/arch/powerpc/mm/mem.c
+++ linux/arch/powerpc/mm/mem.c
@@ -297,12 +297,21 @@ void __init paging_init(void)
}
#endif /* ! CONFIG_NEED_MULTIPLE_NODES */

+static void __init register_page_bootmem_info(void)
+{
+ int i;
+
+ for_each_online_node(i)
+ register_page_bootmem_info_node(NODE_DATA(i));
+}
+
void __init mem_init(void)
{
#ifdef CONFIG_SWIOTLB
swiotlb_init(0);
#endif

+ register_page_bootmem_info();
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
set_max_mapnr(max_pfn);
free_all_bootmem();
Index: linux/mm/Kconfig
===================================================================
--- linux.orig/mm/Kconfig
+++ linux/mm/Kconfig
@@ -183,7 +183,7 @@ config MEMORY_HOTPLUG_SPARSE
config MEMORY_HOTREMOVE
bool "Allow for memory hot remove"
select MEMORY_ISOLATION
- select HAVE_BOOTMEM_INFO_NODE if X86_64
+ select HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)
depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
depends on MIGRATION


2013-07-24 18:36:43

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 2/8] Mark powerpc memory resources as busy

Memory I/O resources need to be marked as busy or else we cannot remove
them when doing memory hot remove.

Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/mm/mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/powerpc/mm/mem.c
===================================================================
--- linux.orig/arch/powerpc/mm/mem.c
+++ linux/arch/powerpc/mm/mem.c
@@ -523,7 +523,7 @@ static int add_system_ram_resources(void
res->name = "System RAM";
res->start = base;
res->end = base + size - 1;
- res->flags = IORESOURCE_MEM;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
WARN_ON(request_resource(&iomem_resource, res) < 0);
}
}

2013-07-24 18:37:57

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 3/8] Add all memory via sysfs probe interface at once

When doing memory hot add via the 'probe' interface in sysfs we do not
need to loop through and add memory one section at a time. I think this
was originally done for powerpc, but is not needed. This patch removes
the loop and just calls add_memory for all of the memory to be added.

Signed-off-by: Nathan Fontenot <[email protected]>
---
drivers/base/memory.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

Index: linux/drivers/base/memory.c
===================================================================
--- linux.orig/drivers/base/memory.c
+++ linux/drivers/base/memory.c
@@ -427,8 +427,8 @@ memory_probe_store(struct device *dev, s
const char *buf, size_t count)
{
u64 phys_addr;
- int nid;
- int i, ret;
+ int nid, ret;
+ unsigned long block_size;
unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;

phys_addr = simple_strtoull(buf, NULL, 0);
@@ -436,19 +436,11 @@ memory_probe_store(struct device *dev, s
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;

- for (i = 0; i < sections_per_block; i++) {
- nid = memory_add_physaddr_to_nid(phys_addr);
- ret = add_memory(nid, phys_addr,
- PAGES_PER_SECTION << PAGE_SHIFT);
- if (ret)
- goto out;
+ block_size = get_memory_block_size();
+ nid = memory_add_physaddr_to_nid(phys_addr);
+ ret = add_memory(nid, phys_addr, block_size);

- phys_addr += MIN_MEMORY_BLOCK_SIZE;
- }
-
- ret = count;
-out:
- return ret;
+ return ret ? ret : count;
}

static DEVICE_ATTR(probe, S_IWUSR, NULL, memory_probe_store);

2013-07-24 18:39:41

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 4/8] Create a sysfs release file for hot removing memory

Provide a sysfs interface to hot remove memory.

This patch updates the sysfs interface for hot add of memory to also
provide a sysfs interface to hot remove memory. The use of this interface
is controlled with the ARCH_MEMORY_PROBE config option, currently used
by x86 and powerpc. This patch also updates the name of this option to
CONFIG_ARCH_MEMORY_PROBE_RELEASE to indicate that it controls the probe
and release sysfs interfaces.

Signed-off-by: Nathan Fontenot <[email protected]>
---
Documentation/memory-hotplug.txt | 34 ++++++++++++----
arch/powerpc/Kconfig | 2
arch/x86/Kconfig | 2
drivers/base/memory.c | 81 ++++++++++++++++++++++++++++++++++-----
4 files changed, 100 insertions(+), 19 deletions(-)

Index: linux/drivers/base/memory.c
===================================================================
--- linux.orig/drivers/base/memory.c
+++ linux/drivers/base/memory.c
@@ -129,22 +129,30 @@ static ssize_t show_mem_end_phys_index(s
return sprintf(buf, "%08lx\n", phys_index);
}

+static int is_memblock_removable(unsigned long start_section_nr)
+{
+ unsigned long pfn;
+ int i, ret = 1;
+
+ for (i = 0; i < sections_per_block; i++) {
+ pfn = section_nr_to_pfn(start_section_nr + i);
+ ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+ }
+
+ return ret;
+}
+
/*
* Show whether the section of memory is likely to be hot-removable
*/
static ssize_t show_mem_removable(struct device *dev,
struct device_attribute *attr, char *buf)
{
- unsigned long i, pfn;
- int ret = 1;
+ int ret;
struct memory_block *mem =
container_of(dev, struct memory_block, dev);

- for (i = 0; i < sections_per_block; i++) {
- pfn = section_nr_to_pfn(mem->start_section_nr + i);
- ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
- }
-
+ ret = is_memblock_removable(mem->start_section_nr);
return sprintf(buf, "%d\n", ret);
}

@@ -421,7 +429,7 @@ static DEVICE_ATTR(block_size_bytes, 044
* as well as ppc64 will do all of their discovery in userspace
* and will require this interface.
*/
-#ifdef CONFIG_ARCH_MEMORY_PROBE
+#ifdef CONFIG_ARCH_MEMORY_PROBE_RELEASE
static ssize_t
memory_probe_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
@@ -444,6 +452,60 @@ memory_probe_store(struct device *dev, s
}

static DEVICE_ATTR(probe, S_IWUSR, NULL, memory_probe_store);
+
+static int is_memblock_offline(struct memory_block *mem, void *arg)
+{
+ if (mem->state == MEM_ONLINE)
+ return 1;
+
+ return 0;
+}
+
+static ssize_t
+memory_release_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ u64 phys_addr;
+ int nid, ret = 0;
+ unsigned long block_size, pfn;
+ unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
+
+ lock_device_hotplug();
+
+ ret = kstrtoull(buf, 0, &phys_addr);
+ if (ret)
+ goto out;
+
+ if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ block_size = get_memory_block_size();
+ nid = memory_add_physaddr_to_nid(phys_addr);
+
+ /* Ensure memory is offline and removable before removing it. */
+ ret = walk_memory_range(PFN_DOWN(phys_addr),
+ PFN_UP(phys_addr + block_size - 1), NULL,
+ is_memblock_offline);
+ if (!ret) {
+ pfn = phys_addr >> PAGE_SHIFT;
+ ret = !is_memblock_removable(pfn_to_section_nr(pfn));
+ }
+
+ if (ret) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ remove_memory(nid, phys_addr, block_size);
+
+out:
+ unlock_device_hotplug();
+ return ret ? ret : count;
+}
+
+static DEVICE_ATTR(release, S_IWUSR, NULL, memory_release_store);
#endif

#ifdef CONFIG_MEMORY_FAILURE
@@ -694,8 +756,9 @@ bool is_memblock_offlined(struct memory_
}

static struct attribute *memory_root_attrs[] = {
-#ifdef CONFIG_ARCH_MEMORY_PROBE
+#ifdef CONFIG_ARCH_MEMORY_PROBE_RELEASE
&dev_attr_probe.attr,
+ &dev_attr_release.attr,
#endif

#ifdef CONFIG_MEMORY_FAILURE
Index: linux/arch/powerpc/Kconfig
===================================================================
--- linux.orig/arch/powerpc/Kconfig
+++ linux/arch/powerpc/Kconfig
@@ -438,7 +438,7 @@ config SYS_SUPPORTS_HUGETLBFS

source "mm/Kconfig"

-config ARCH_MEMORY_PROBE
+config ARCH_MEMORY_PROBE_RELEASE
def_bool y
depends on MEMORY_HOTPLUG

Index: linux/arch/x86/Kconfig
===================================================================
--- linux.orig/arch/x86/Kconfig
+++ linux/arch/x86/Kconfig
@@ -1343,7 +1343,7 @@ config ARCH_SELECT_MEMORY_MODEL
def_bool y
depends on ARCH_SPARSEMEM_ENABLE

-config ARCH_MEMORY_PROBE
+config ARCH_MEMORY_PROBE_RELEASE
def_bool y
depends on X86_64 && MEMORY_HOTPLUG

Index: linux/Documentation/memory-hotplug.txt
===================================================================
--- linux.orig/Documentation/memory-hotplug.txt
+++ linux/Documentation/memory-hotplug.txt
@@ -17,7 +17,9 @@ be changed often.
3. sysfs files for memory hotplug
4. Physical memory hot-add phase
4.1 Hardware(Firmware) Support
- 4.2 Notify memory hot-add event by hand
+ 4.2 Notify memory hot-addand hot-remove event by hand
+ 4.2.1 Probe interface
+ 4.2.2 Release interface
5. Logical Memory hot-add phase
5.1. State of memory
5.2. How to online memory
@@ -69,7 +71,7 @@ management tables, and makes sysfs files

If firmware supports notification of connection of new memory to OS,
this phase is triggered automatically. ACPI can notify this event. If not,
-"probe" operation by system administration is used instead.
+"probe" and "release" operations by system administration is used instead.
(see Section 4.).

Logical Memory Hotplug phase is to change memory state into
@@ -208,20 +210,23 @@ calls hotplug code for all of objects wh
If memory device is found, memory hotplug code will be called.


-4.2 Notify memory hot-add event by hand
+4.2 Notify memory hot-add and hot-remove event by hand
------------
In some environments, especially virtualized environment, firmware will not
notify memory hotplug event to the kernel. For such environment, "probe"
-interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
+and "release" interfaces are supported. This interface depends on
+CONFIG_ARCH_MEMORY_PROBE_RELEASE.

-Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
-contain highly architecture codes. Please add config if you need "probe"
-interface.
+Now, CONFIG_ARCH_MEMORY_PROBE_RELEASE is supported only by powerpc but it does
+not contain highly architecture codes. Please add config if you need "probe"
+and "release" interfaces.

+4.2.1 "probe" interface
+------------
Probe interface is located at
/sys/devices/system/memory/probe

-You can tell the physical address of new memory to the kernel by
+You can tell the physical address of new memory to hot-add to the kernel by

% echo start_address_of_new_memory > /sys/devices/system/memory/probe

@@ -230,6 +235,19 @@ memory range is hot-added. In this case,
current implementation). You'll have to online memory by yourself.
Please see "How to online memory" in this text.

+4.2.2 "release" interface
+------------
+Release interface is located at
+/sys/devices/system/memory/release
+
+You can tell the physical address of memory to hot-remove from the kernel by
+
+% echo start_address_of_memory > /sys/devices/system/memory/release
+
+Then, [start_address_of_memory, start_address_of_memory + section_size)
+memory range is hot-removed. You will need to ensure all of the memory in
+this range has been offlined prior to using this interface, please see
+"How to offline memory" in this text.


------------------------------

2013-07-24 18:41:24

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 5/8] Add notifiers for memory hot add/remove

In order to allow architectures or other subsystems to do any needed
work prior to hot adding or hot removing memory the memory notifier
chain should be updated to provide notifications of these events.

This patch adds the notifications for memory hot add and hot remove.

Signed-off-by: Nathan Fontenot <[email protected]>
--
Documentation/memory-hotplug.txt | 26 +++++++++++++++++++++++---
include/linux/memory.h | 6 ++++++
mm/memory_hotplug.c | 25 ++++++++++++++++++++++---
3 files changed, 51 insertions(+), 6 deletions(-)

Index: linux/include/linux/memory.h
===================================================================
--- linux.orig/include/linux/memory.h
+++ linux/include/linux/memory.h
@@ -50,6 +50,12 @@ int arch_get_memory_phys_device(unsigned
#define MEM_GOING_ONLINE (1<<3)
#define MEM_CANCEL_ONLINE (1<<4)
#define MEM_CANCEL_OFFLINE (1<<5)
+#define MEM_BEING_HOT_REMOVED (1<<6)
+#define MEM_HOT_REMOVED (1<<7)
+#define MEM_CANCEL_HOT_REMOVE (1<<8)
+#define MEM_BEING_HOT_ADDED (1<<9)
+#define MEM_HOT_ADDED (1<<10)
+#define MEM_CANCEL_HOT_ADD (1<<11)

struct memory_notify {
unsigned long start_pfn;
Index: linux/mm/memory_hotplug.c
===================================================================
--- linux.orig/mm/memory_hotplug.c
+++ linux/mm/memory_hotplug.c
@@ -1073,17 +1073,25 @@ out:
int __ref add_memory(int nid, u64 start, u64 size)
{
pg_data_t *pgdat = NULL;
- bool new_pgdat;
+ bool new_pgdat = false;
bool new_node;
- struct resource *res;
+ struct resource *res = NULL;
+ struct memory_notify arg;
int ret;

lock_memory_hotplug();

+ arg.start_pfn = start >> PAGE_SHIFT;
+ arg.nr_pages = size / PAGE_SIZE;
+ ret = memory_notify(MEM_BEING_HOT_ADDED, &arg);
+ ret = notifier_to_errno(ret);
+ if (ret)
+ goto error;
+
res = register_memory_resource(start, size);
ret = -EEXIST;
if (!res)
- goto out;
+ goto error;

{ /* Stupid hack to suppress address-never-null warning */
void *p = NODE_DATA(nid);
@@ -1119,9 +1127,12 @@ int __ref add_memory(int nid, u64 start,
/* create new memmap entry */
firmware_map_add_hotplug(start, start + size, "System RAM");

+ memory_notify(MEM_HOT_ADDED, &arg);
goto out;

error:
+ memory_notify(MEM_CANCEL_HOT_ADD, &arg);
+
/* rollback pgdat allocation and others */
if (new_pgdat)
rollback_node_hotadd(nid, pgdat);
@@ -1784,10 +1795,15 @@ EXPORT_SYMBOL(try_offline_node);

void __ref remove_memory(int nid, u64 start, u64 size)
{
+ struct memory_notify arg;
int ret;

lock_memory_hotplug();

+ arg.start_pfn = start >> PAGE_SHIFT;
+ arg.nr_pages = size / PAGE_SIZE;
+ memory_notify(MEM_BEING_HOT_REMOVED, &arg);
+
/*
* All memory blocks must be offlined before removing memory. Check
* whether all memory blocks in question are offline and trigger a BUG()
@@ -1796,6 +1812,7 @@ void __ref remove_memory(int nid, u64 st
ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
is_memblock_offlined_cb);
if (ret) {
+ memory_notify(MEM_CANCEL_HOT_REMOVE, &arg);
unlock_memory_hotplug();
BUG();
}
@@ -1807,6 +1824,8 @@ void __ref remove_memory(int nid, u64 st

try_offline_node(nid);

+ memory_notify(MEM_HOT_REMOVED, &arg);
+
unlock_memory_hotplug();
}
EXPORT_SYMBOL_GPL(remove_memory);
Index: linux/Documentation/memory-hotplug.txt
===================================================================
--- linux.orig/Documentation/memory-hotplug.txt
+++ linux/Documentation/memory-hotplug.txt
@@ -371,7 +371,9 @@ Need more implementation yet....
--------------------------------
8. Memory hotplug event notifier
--------------------------------
-Memory hotplug has event notifier. There are 6 types of notification.
+Memory hotplug has event notifier. There are 12 types of notification, the
+first six relate to memory hotplug and the second six relate to memory hot
+add/remove.

MEMORY_GOING_ONLINE
Generated before new memory becomes available in order to be able to
@@ -398,6 +400,24 @@ MEMORY_CANCEL_OFFLINE
MEMORY_OFFLINE
Generated after offlining memory is complete.

+MEMORY_BEING_HOT_REMOVED
+ Generated prior to the process of hot removing memory.
+
+MEMORY_CANCEL_HOT_REMOVE
+ Generated if MEMORY_BEING_HOT_REMOVED fails.
+
+MEMORY_HOT_REMOVED
+ Generated when memory has been successfully hot removed.
+
+MEMORY_BEING_HOT_ADDED
+ Generated prior to the process of hot adding memory.
+
+MEMORY_HOT_ADD_CANCEL
+ Generated if MEMORY_BEING_HOT_ADDED fails.
+
+MEMORY_HOT_ADDED
+ Generated when memory has successfully been hot added.
+
A callback routine can be registered by
hotplug_memory_notifier(callback_func, priority)

@@ -412,8 +432,8 @@ struct memory_notify {
int status_change_nid;
}

-start_pfn is start_pfn of online/offline memory.
-nr_pages is # of pages of online/offline memory.
+start_pfn is start_pfn of online/offline/add/remove memory.
+nr_pages is # of pages of online/offline/add/remove memory.
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask

2013-07-24 18:44:33

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 6/8] Update the powerpc arch specific memory add/remove handlers

In order to properly hot add and remove memory for powerpc the arch
specific callouts need to now complete all of the required work to
fully add or remove the memory.

With this update we can also remove the handler for memory node add
because the powerpc arch specific memory add handler will do all the
work needed. We do still need the memory node remove handler because
systems with memory specified in the memory@XXX nodes in the device tree
we have to use the removal of the node to trigger memory hot remove.

For systems on newer firmware with memory specified in the
ibm,dynamic-reconfiguration-memory node of the device tree this is not an
issue.

Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/mm/mem.c | 33 +++++++++++++++++++---
arch/powerpc/platforms/pseries/hotplug-memory.c | 35 ------------------------
2 files changed, 29 insertions(+), 39 deletions(-)

Index: linux/arch/powerpc/mm/mem.c
===================================================================
--- linux.orig/arch/powerpc/mm/mem.c
+++ linux/arch/powerpc/mm/mem.c
@@ -35,6 +35,7 @@
#include <linux/memblock.h>
#include <linux/hugetlb.h>
#include <linux/slab.h>
+#include <linux/vmalloc.h>

#include <asm/pgalloc.h>
#include <asm/prom.h>
@@ -120,17 +121,24 @@ int arch_add_memory(int nid, u64 start,
struct zone *zone;
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+ u64 va_start;
+ int ret;

pgdata = NODE_DATA(nid);

- start = (unsigned long)__va(start);
- if (create_section_mapping(start, start + size))
+ va_start = (unsigned long)__va(start);
+ if (create_section_mapping(va_start, va_start + size))
return -EINVAL;

/* this should work for most non-highmem platforms */
zone = pgdata->node_zones;

- return __add_pages(nid, zone, start_pfn, nr_pages);
+ ret = __add_pages(nid, zone, start_pfn, nr_pages);
+ if (ret)
+ return ret;
+
+ ret = memblock_add(start, size);
+ return ret;
}

#ifdef CONFIG_MEMORY_HOTREMOVE
@@ -138,10 +146,27 @@ int arch_remove_memory(u64 start, u64 si
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+ unsigned long va_addr;
struct zone *zone;
+ int ret;

zone = page_zone(pfn_to_page(start_pfn));
- return __remove_pages(zone, start_pfn, nr_pages);
+ ret = __remove_pages(zone, start_pfn, nr_pages);
+ if (ret)
+ return ret;
+
+ memblock_remove(start, size);
+
+ /* remove htab bolted mappings */
+ va_addr = (unsigned long)__va(start);
+ ret = remove_section_mapping(va_addr, va_addr + size);
+
+ /* Ensure all vmalloc mappings are flushed in case they also
+ * hit that section of memory.
+ */
+ vm_unmap_aliases();
+
+ return ret;
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
Index: linux/arch/powerpc/platforms/pseries/hotplug-memory.c
===================================================================
--- linux.orig/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ linux/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -166,38 +166,6 @@ static inline int pseries_remove_memory(
}
#endif /* CONFIG_MEMORY_HOTREMOVE */

-static int pseries_add_memory(struct device_node *np)
-{
- const char *type;
- const unsigned int *regs;
- unsigned long base;
- unsigned int lmb_size;
- int ret = -EINVAL;
-
- /*
- * Check to see if we are actually adding memory
- */
- type = of_get_property(np, "device_type", NULL);
- if (type == NULL || strcmp(type, "memory") != 0)
- return 0;
-
- /*
- * Find the base and size of the memblock
- */
- regs = of_get_property(np, "reg", NULL);
- if (!regs)
- return ret;
-
- base = *(unsigned long *)regs;
- lmb_size = regs[3];
-
- /*
- * Update memory region to represent the memory add
- */
- ret = memblock_add(base, lmb_size);
- return (ret < 0) ? -EINVAL : 0;
-}
-
static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
{
struct of_drconf_cell *new_drmem, *old_drmem;
@@ -251,9 +219,6 @@ static int pseries_memory_notifier(struc
int err = 0;

switch (action) {
- case OF_RECONFIG_ATTACH_NODE:
- err = pseries_add_memory(node);
- break;
case OF_RECONFIG_DETACH_NODE:
err = pseries_remove_memory(node);
break;

2013-07-24 18:45:57

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 7/8] Add memory hot add/remove notifier handlers for pwoerpc

Add memory hot add/remove notifier handlers for powerpc/pseries.

This patch allows the powerpc/pseries platforms to perform memory DLPAR
int the kernel. The handlers for add and remove do the work of
acquiring/releasing the memory to firmware and updating the device tree.

This is only used when memory is specified in the
ibm,dynamic-reconfiguration-memory device tree node so the memory notifiers
are registered contingent on its existence.

Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/platforms/pseries/dlpar.c | 103 +++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)

Index: linux/arch/powerpc/platforms/pseries/dlpar.c
===================================================================
--- linux.orig/arch/powerpc/platforms/pseries/dlpar.c
+++ linux/arch/powerpc/platforms/pseries/dlpar.c
@@ -15,6 +15,7 @@
#include <linux/notifier.h>
#include <linux/spinlock.h>
#include <linux/cpu.h>
+#include <linux/memory.h>
#include <linux/slab.h>
#include <linux/of.h>
#include "offline_states.h"
@@ -531,11 +532,113 @@ out:
return rc ? rc : count;
}

+static struct of_drconf_cell *dlpar_get_drconf_cell(struct device_node *dn,
+ unsigned long phys_addr)
+{
+ struct of_drconf_cell *drmem;
+ u32 entries;
+ u32 *prop;
+ int i;
+
+ prop = (u32 *)of_get_property(dn, "ibm,dynamic-memory", NULL);
+ of_node_put(dn);
+ if (!prop)
+ return NULL;
+
+ entries = *prop++;
+ drmem = (struct of_drconf_cell *)prop;
+
+ for (i = 0; i < entries; i++) {
+ if (drmem[i].base_addr == phys_addr)
+ return &drmem[i];
+ }
+
+ return NULL;
+}
+
+static int dlpar_mem_probe(unsigned long phys_addr)
+{
+ struct device_node *dn;
+ struct of_drconf_cell *drmem;
+ int rc;
+
+ dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+ if (!dn)
+ return -EINVAL;
+
+ drmem = dlpar_get_drconf_cell(dn, phys_addr);
+ of_node_put(dn);
+
+ if (!drmem)
+ return -EINVAL;
+
+ if (drmem->flags & DRCONF_MEM_ASSIGNED)
+ return 0;
+
+ drmem->flags |= DRCONF_MEM_ASSIGNED;
+
+ rc = dlpar_acquire_drc(drmem->drc_index);
+ return rc;
+}
+
+static int dlpar_mem_release(unsigned long phys_addr)
+{
+ struct device_node *dn;
+ struct of_drconf_cell *drmem;
+ int rc;
+
+ dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+ if (!dn)
+ return -EINVAL;
+
+ drmem = dlpar_get_drconf_cell(dn, phys_addr);
+ of_node_put(dn);
+
+ if (!drmem)
+ return -EINVAL;
+
+ if (!drmem->flags & DRCONF_MEM_ASSIGNED)
+ return 0;
+
+ drmem->flags &= ~DRCONF_MEM_ASSIGNED;
+
+ rc = dlpar_release_drc(drmem->drc_index);
+ return rc;
+}
+
+static int pseries_dlpar_mem_callback(struct notifier_block *nb,
+ unsigned long action, void *hp_arg)
+{
+ struct memory_notify *arg = hp_arg;
+ unsigned long phys_addr = arg->start_pfn << PAGE_SHIFT;
+ int rc = 0;
+
+
+ switch (action) {
+ case MEM_BEING_HOT_ADDED:
+ rc = dlpar_mem_probe(phys_addr);
+ break;
+ case MEM_HOT_REMOVED:
+ rc = dlpar_mem_release(phys_addr);
+ break;
+ }
+
+ return notifier_from_errno(rc);
+}
+
static int __init pseries_dlpar_init(void)
{
+ struct device_node *dn;
+
ppc_md.cpu_probe = dlpar_cpu_probe;
ppc_md.cpu_release = dlpar_cpu_release;

+ dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+ if (dn) {
+ hotplug_memory_notifier(pseries_dlpar_mem_callback, 0);
+ of_node_put(dn);
+ }
+
return 0;
}
machine_device_initcall(pseries, pseries_dlpar_init);

2013-07-24 18:47:19

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH 8/8] Remove no longer needed powerpc memory node update handler

Remove the update_node handler for powerpc/pseries.

Now that we can do memory dlpar in the kernel we no longer need the of
update node notifier to update the ibm,dynamic-memory property of the
ibm,dynamic-reconfiguration-memory node. This work is now handled by
the memory notification handlers for powerpc/pseries.

This patch also conditionally registers the handler for of node remove
if we are not using the ibm,dynamic-reconfiguration-memory device tree
layout. That handler is only needed for handling memory@XXX nodes
in the device tree.

Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/platforms/pseries/hotplug-memory.c | 60 +++---------------------
1 file changed, 8 insertions(+), 52 deletions(-)

Index: linux/arch/powerpc/platforms/pseries/hotplug-memory.c
===================================================================
--- linux.orig/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ linux/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -166,67 +166,15 @@ static inline int pseries_remove_memory(
}
#endif /* CONFIG_MEMORY_HOTREMOVE */

-static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
-{
- struct of_drconf_cell *new_drmem, *old_drmem;
- unsigned long memblock_size;
- u32 entries;
- u32 *p;
- int i, rc = -EINVAL;
-
- memblock_size = get_memblock_size();
- if (!memblock_size)
- return -EINVAL;
-
- p = (u32 *)of_get_property(pr->dn, "ibm,dynamic-memory", NULL);
- if (!p)
- return -EINVAL;
-
- /* The first int of the property is the number of lmb's described
- * by the property. This is followed by an array of of_drconf_cell
- * entries. Get the niumber of entries and skip to the array of
- * of_drconf_cell's.
- */
- entries = *p++;
- old_drmem = (struct of_drconf_cell *)p;
-
- p = (u32 *)pr->prop->value;
- p++;
- new_drmem = (struct of_drconf_cell *)p;
-
- for (i = 0; i < entries; i++) {
- if ((old_drmem[i].flags & DRCONF_MEM_ASSIGNED) &&
- (!(new_drmem[i].flags & DRCONF_MEM_ASSIGNED))) {
- rc = pseries_remove_memblock(old_drmem[i].base_addr,
- memblock_size);
- break;
- } else if ((!(old_drmem[i].flags & DRCONF_MEM_ASSIGNED)) &&
- (new_drmem[i].flags & DRCONF_MEM_ASSIGNED)) {
- rc = memblock_add(old_drmem[i].base_addr,
- memblock_size);
- rc = (rc < 0) ? -EINVAL : 0;
- break;
- }
- }
-
- return rc;
-}
-
static int pseries_memory_notifier(struct notifier_block *nb,
unsigned long action, void *node)
{
- struct of_prop_reconfig *pr;
int err = 0;

switch (action) {
case OF_RECONFIG_DETACH_NODE:
err = pseries_remove_memory(node);
break;
- case OF_RECONFIG_UPDATE_PROPERTY:
- pr = (struct of_prop_reconfig *)node;
- if (!strcmp(pr->prop->name, "ibm,dynamic-memory"))
- err = pseries_update_drconf_memory(pr);
- break;
}
return notifier_from_errno(err);
}
@@ -237,6 +185,14 @@ static struct notifier_block pseries_mem

static int __init pseries_memory_hotplug_init(void)
{
+ struct device_node *dn;
+
+ dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+ if (dn) {
+ of_node_put(dn);
+ return 0;
+ }
+
if (firmware_has_feature(FW_FEATURE_LPAR))
of_reconfig_notifier_register(&pseries_mem_nb);


2013-08-02 02:27:11

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/8] register bootmem pages for powerpc when sparse vmemmap is not defined

On Wed, Jul 24, 2013 at 01:35:11PM -0500, Nathan Fontenot wrote:
> Previous commit 46723bfa540... introduced a new config option
> HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for powerpc
> when sparse vmemmap is not defined.

So that's a bug fix that should go into 3.10 stable?

cheers

2013-08-02 02:28:33

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 2/8] Mark powerpc memory resources as busy

On Wed, Jul 24, 2013 at 01:36:34PM -0500, Nathan Fontenot wrote:
> Memory I/O resources need to be marked as busy or else we cannot remove
> them when doing memory hot remove.

I would have thought it was the opposite?

cheers

2013-08-02 02:33:05

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 3/8] Add all memory via sysfs probe interface at once

On Wed, Jul 24, 2013 at 01:37:47PM -0500, Nathan Fontenot wrote:
> When doing memory hot add via the 'probe' interface in sysfs we do not
> need to loop through and add memory one section at a time. I think this
> was originally done for powerpc, but is not needed. This patch removes
> the loop and just calls add_memory for all of the memory to be added.

Looks like memory hot add is supported on ia64, x86, sh, powerpc and
s390. Have you tested on any?

cheers

2013-08-02 19:05:11

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 1/8] register bootmem pages for powerpc when sparse vmemmap is not defined

On 08/01/2013 09:27 PM, Michael Ellerman wrote:
> On Wed, Jul 24, 2013 at 01:35:11PM -0500, Nathan Fontenot wrote:
>> Previous commit 46723bfa540... introduced a new config option
>> HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for powerpc
>> when sparse vmemmap is not defined.
>
> So that's a bug fix that should go into 3.10 stable?
>

Yes, I believe this one as well as patch 2/8 should go into 3.10 stable.

I'll re-send with linux stable added.

-Nathan

2013-08-02 19:06:08

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 2/8] Mark powerpc memory resources as busy

On 08/01/2013 09:28 PM, Michael Ellerman wrote:
> On Wed, Jul 24, 2013 at 01:36:34PM -0500, Nathan Fontenot wrote:
>> Memory I/O resources need to be marked as busy or else we cannot remove
>> them when doing memory hot remove.
>
> I would have thought it was the opposite?

Me too.

As it turns out the code in kernel/resource.c checks to make sure the
IORESOURCE_BUSY flag is set when trying to release a resource.

-Nathan

2013-08-02 19:13:19

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 3/8] Add all memory via sysfs probe interface at once

On 08/01/2013 09:32 PM, Michael Ellerman wrote:
> On Wed, Jul 24, 2013 at 01:37:47PM -0500, Nathan Fontenot wrote:
>> When doing memory hot add via the 'probe' interface in sysfs we do not
>> need to loop through and add memory one section at a time. I think this
>> was originally done for powerpc, but is not needed. This patch removes
>> the loop and just calls add_memory for all of the memory to be added.
>
> Looks like memory hot add is supported on ia64, x86, sh, powerpc and
> s390. Have you tested on any?

I have tested on powerpc. I would love to say I tested on the other
platforms... but I haven't. I should be able to get a x86 box to test
on but the other architectures may not be possible.

-Nathan

2013-08-05 03:11:16

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 2/8] Mark powerpc memory resources as busy

On Fri, Aug 02, 2013 at 02:05:57PM -0500, Nathan Fontenot wrote:
> On 08/01/2013 09:28 PM, Michael Ellerman wrote:
> > On Wed, Jul 24, 2013 at 01:36:34PM -0500, Nathan Fontenot wrote:
> >> Memory I/O resources need to be marked as busy or else we cannot remove
> >> them when doing memory hot remove.
> >
> > I would have thought it was the opposite?
>
> Me too.
>
> As it turns out the code in kernel/resource.c checks to make sure the
> IORESOURCE_BUSY flag is set when trying to release a resource.

OK, I guess there's probably some sane reason, but it does seem
backward.

cheers

2013-08-05 03:13:29

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 3/8] Add all memory via sysfs probe interface at once

On Fri, Aug 02, 2013 at 02:13:06PM -0500, Nathan Fontenot wrote:
> On 08/01/2013 09:32 PM, Michael Ellerman wrote:
> > On Wed, Jul 24, 2013 at 01:37:47PM -0500, Nathan Fontenot wrote:
> >> When doing memory hot add via the 'probe' interface in sysfs we do not
> >> need to loop through and add memory one section at a time. I think this
> >> was originally done for powerpc, but is not needed. This patch removes
> >> the loop and just calls add_memory for all of the memory to be added.
> >
> > Looks like memory hot add is supported on ia64, x86, sh, powerpc and
> > s390. Have you tested on any?
>
> I have tested on powerpc. I would love to say I tested on the other
> platforms... but I haven't. I should be able to get a x86 box to test
> on but the other architectures may not be possible.

Is the rest of your series dependent on this patch? Or is it sort of
incidental?

If possible it might be worth pulling this one out and sticking it in
linux-next for a cycle to give people a chance to test it. Unless
someone who knows the code well is comfortable with it.

cheers

2013-08-06 20:45:13

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 3/8] Add all memory via sysfs probe interface at once

On 08/04/2013 10:13 PM, Michael Ellerman wrote:
> On Fri, Aug 02, 2013 at 02:13:06PM -0500, Nathan Fontenot wrote:
>> On 08/01/2013 09:32 PM, Michael Ellerman wrote:
>>> On Wed, Jul 24, 2013 at 01:37:47PM -0500, Nathan Fontenot wrote:
>>>> When doing memory hot add via the 'probe' interface in sysfs we do not
>>>> need to loop through and add memory one section at a time. I think this
>>>> was originally done for powerpc, but is not needed. This patch removes
>>>> the loop and just calls add_memory for all of the memory to be added.
>>>
>>> Looks like memory hot add is supported on ia64, x86, sh, powerpc and
>>> s390. Have you tested on any?
>>
>> I have tested on powerpc. I would love to say I tested on the other
>> platforms... but I haven't. I should be able to get a x86 box to test
>> on but the other architectures may not be possible.
>
> Is the rest of your series dependent on this patch? Or is it sort of
> incidental?
>
> If possible it might be worth pulling this one out and sticking it in
> linux-next for a cycle to give people a chance to test it. Unless
> someone who knows the code well is comfortable with it.
>

I am planning on pulling the first two patches and sending them out
separate from the patch set since they are really independent of the
rest of the patch series.

The remaining code I will send out for review and inclusion in
linux-next so it can have the proper test time as you mentioned.

-Nathan

2013-08-09 07:16:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 3/8] Add all memory via sysfs probe interface at once

On Tue, 2013-08-06 at 15:44 -0500, Nathan Fontenot wrote:
> I am planning on pulling the first two patches and sending them out
> separate from the patch set since they are really independent of the
> rest of the patch series.
>
> The remaining code I will send out for review and inclusion in
> linux-next so it can have the proper test time as you mentioned.

Ping ? :-)

Cheers,
Ben.