2009-11-25 07:04:53

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH v2 0/3] Kernel handling of Dynamic Logical Partitioning

version 2 of the patch set with updates from comments.

The Dynamic Logical Partitioning (DLPAR) capabilities of the powerpc pseries
platform allows for the addition and removal of resources (i.e. cpus,
memory, pci devices) from a partition. The removal of a resource involves
removing the resource's node from the device tree and then returning the
resource to firmware via the rtas set-indicator call. To add a resource, it
is first obtained from firmware via the rtas set-indicator call and then a
new device tree node is created using the ibm,configure-coinnector rtas call
and added to the device tree.

The following set of patches implements the needed infrastructure to have the
kernel handle the DLPAR addition and removal of cpus (other DLPAR'able items to
follow in future patches). The framework for this is to create a set of probe/release
sysfs files that will facilitate arch-specific call-outs to handle addition and
removal of cpus to the system.

-Nathan Fontenot

1/3 - powerpc/pseries kernel DLPAR infrastructure
2/3 - Create probe/release sysfs files and the powerpc handlers
3/3 - powerpc/pseries CPU DLPAR handling

arch/powerpc/Kconfig | 4
arch/powerpc/include/asm/machdep.h | 5
arch/powerpc/include/asm/pSeries_reconfig.h | 1
arch/powerpc/kernel/smp.c | 17
arch/powerpc/kernel/sysfs.c | 19 +
arch/powerpc/platforms/pseries/Makefile | 2
arch/powerpc/platforms/pseries/dlpar.c | 500 +++++++++++++++++++++++++++-
arch/powerpc/platforms/pseries/reconfig.c | 2
drivers/base/cpu.c | 33 +
include/linux/cpu.h | 4
10 files changed, 581 insertions(+), 6 deletions(-)


2009-11-25 07:10:50

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH v2 1/3] Kernel DLPAR Infrastructure

The Dynamic Logical Partitioning capabilities of the powerpc pseries platform
allows for the addition and removal of resources (i.e. CPU's, memory, and PCI
devices) from a partition. The removal of a resource involves
removing the resource's node from the device tree and then returning the
resource to firmware via the rtas set-indicator call. To add a resource, it
is first obtained from firmware via the rtas set-indicator call and then a
new device tree node is created using the ibm,configure-coinnector rtas call
and added to the device tree.

This patch provides the kernel DLPAR infrastructure in a new filed named
dlpar.c. The functionality provided is for acquiring and releasing a resource
from firmware and the parsing of information returned from the
ibm,configure-connector rtas call. Additionally this exports the pSeries
reconfiguration notifier chain so that it can be invoked when device tree
updates are made.

Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/include/asm/pSeries_reconfig.h | 1
arch/powerpc/platforms/pseries/Makefile | 2
arch/powerpc/platforms/pseries/dlpar.c | 344 ++++++++++++++++++++++++++++
arch/powerpc/platforms/pseries/reconfig.c | 2
4 files changed, 347 insertions(+), 2 deletions(-)

Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ powerpc/arch/powerpc/platforms/pseries/dlpar.c 2009-11-25 04:54:13.000000000 -0600
@@ -0,0 +1,344 @@
+/*
+ * Support for dynamic reconfiguration for PCI, Memory, and CPU
+ * Hotplug and Dynamic Logical Partitioning on RPA platforms.
+ *
+ * Copyright (C) 2009 Nathan Fontenot
+ * Copyright (C) 2009 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kref.h>
+#include <linux/notifier.h>
+#include <linux/proc_fs.h>
+#include <linux/spinlock.h>
+#include <linux/cpu.h>
+
+#include <asm/prom.h>
+#include <asm/machdep.h>
+#include <asm/uaccess.h>
+#include <asm/rtas.h>
+#include <asm/pSeries_reconfig.h>
+
+struct cc_workarea {
+ u32 drc_index;
+ u32 zero;
+ u32 name_offset;
+ u32 prop_length;
+ u32 prop_offset;
+};
+
+static void dlpar_free_cc_property(struct property *prop)
+{
+ kfree(prop->name);
+ kfree(prop->value);
+ kfree(prop);
+}
+
+static struct property *dlpar_parse_cc_property(struct cc_workarea *ccwa)
+{
+ struct property *prop;
+ char *name;
+ char *value;
+
+ prop = kzalloc(sizeof(*prop), GFP_KERNEL);
+ if (!prop)
+ return NULL;
+
+ name = (char *)ccwa + ccwa->name_offset;
+ prop->name = kstrdup(name, GFP_KERNEL);
+
+ prop->length = ccwa->prop_length;
+ value = (char *)ccwa + ccwa->prop_offset;
+ prop->value = kzalloc(prop->length, GFP_KERNEL);
+ if (!prop->value) {
+ dlpar_free_cc_property(prop);
+ return NULL;
+ }
+
+ memcpy(prop->value, value, prop->length);
+ return prop;
+}
+
+static struct device_node *dlpar_parse_cc_node(struct cc_workarea *ccwa)
+{
+ struct device_node *dn;
+ char *name;
+
+ dn = kzalloc(sizeof(*dn), GFP_KERNEL);
+ if (!dn)
+ return NULL;
+
+ /* The configure connector reported name does not contain a
+ * preceeding '/', so we allocate a buffer large enough to
+ * prepend this to the full_name.
+ */
+ name = (char *)ccwa + ccwa->name_offset;
+ dn->full_name = kmalloc(strlen(name) + 2, GFP_KERNEL);
+ if (!dn->full_name) {
+ kfree(dn);
+ return NULL;
+ }
+
+ sprintf(dn->full_name, "/%s", name);
+ return dn;
+}
+
+static void dlpar_free_one_cc_node(struct device_node *dn)
+{
+ struct property *prop;
+
+ while (dn->properties) {
+ prop = dn->properties;
+ dn->properties = prop->next;
+ dlpar_free_cc_property(prop);
+ }
+
+ kfree(dn->full_name);
+ kfree(dn);
+}
+
+static void dlpar_free_cc_nodes(struct device_node *dn)
+{
+ if (dn->child)
+ dlpar_free_cc_nodes(dn->child);
+
+ if (dn->sibling)
+ dlpar_free_cc_nodes(dn->sibling);
+
+ dlpar_free_one_cc_node(dn);
+}
+
+#define NEXT_SIBLING 1
+#define NEXT_CHILD 2
+#define NEXT_PROPERTY 3
+#define PREV_PARENT 4
+#define MORE_MEMORY 5
+#define CALL_AGAIN -2
+#define ERR_CFG_USE -9003
+
+struct device_node *dlpar_configure_connector(u32 drc_index)
+{
+ struct device_node *dn;
+ struct device_node *first_dn = NULL;
+ struct device_node *last_dn = NULL;
+ struct property *property;
+ struct property *last_property = NULL;
+ struct cc_workarea *ccwa;
+ int cc_token;
+ int rc;
+
+ cc_token = rtas_token("ibm,configure-connector");
+ if (cc_token == RTAS_UNKNOWN_SERVICE)
+ return NULL;
+
+ spin_lock(&rtas_data_buf_lock);
+ ccwa = (struct cc_workarea *)&rtas_data_buf[0];
+ ccwa->drc_index = drc_index;
+ ccwa->zero = 0;
+
+ rc = rtas_call(cc_token, 2, 1, NULL, rtas_data_buf, NULL);
+ while (rc) {
+ switch (rc) {
+ case NEXT_SIBLING:
+ dn = dlpar_parse_cc_node(ccwa);
+ if (!dn)
+ goto cc_error;
+
+ dn->parent = last_dn->parent;
+ last_dn->sibling = dn;
+ last_dn = dn;
+ break;
+
+ case NEXT_CHILD:
+ dn = dlpar_parse_cc_node(ccwa);
+ if (!dn)
+ goto cc_error;
+
+ if (!first_dn)
+ first_dn = dn;
+ else {
+ dn->parent = last_dn;
+ if (last_dn)
+ last_dn->child = dn;
+ }
+
+ last_dn = dn;
+ break;
+
+ case NEXT_PROPERTY:
+ property = dlpar_parse_cc_property(ccwa);
+ if (!property)
+ goto cc_error;
+
+ if (!last_dn->properties)
+ last_dn->properties = property;
+ else
+ last_property->next = property;
+
+ last_property = property;
+ break;
+
+ case PREV_PARENT:
+ last_dn = last_dn->parent;
+ break;
+
+ case CALL_AGAIN:
+ break;
+
+ case MORE_MEMORY:
+ case ERR_CFG_USE:
+ default:
+ printk(KERN_ERR "Unexpected Error (%d) "
+ "returned from configure-connector\n", rc);
+ goto cc_error;
+ }
+
+ rc = rtas_call(cc_token, 2, 1, NULL, rtas_data_buf, NULL);
+ }
+
+ spin_unlock(&rtas_data_buf_lock);
+ return first_dn;
+
+cc_error:
+ if (first_dn)
+ dlpar_free_cc_nodes(first_dn);
+ spin_unlock(&rtas_data_buf_lock);
+ return NULL;
+}
+
+static struct device_node *derive_parent(const char *path)
+{
+ struct device_node *parent;
+ char *last_slash;
+
+ last_slash = strrchr(path, '/');
+ if (last_slash == path) {
+ parent = of_find_node_by_path("/");
+ } else {
+ char *parent_path;
+ int parent_path_len = last_slash - path + 1;
+ parent_path = kmalloc(parent_path_len, GFP_KERNEL);
+ if (!parent_path)
+ return NULL;
+
+ strlcpy(parent_path, path, parent_path_len);
+ parent = of_find_node_by_path(parent_path);
+ kfree(parent_path);
+ }
+
+ return parent;
+}
+
+int dlpar_attach_node(struct device_node *dn)
+{
+ struct proc_dir_entry *ent;
+ int rc;
+
+ of_node_set_flag(dn, OF_DYNAMIC);
+ kref_init(&dn->kref);
+ dn->parent = derive_parent(dn->full_name);
+ if (!dn->parent)
+ return -ENOMEM;
+
+ rc = blocking_notifier_call_chain(&pSeries_reconfig_chain,
+ PSERIES_RECONFIG_ADD, dn);
+ if (rc == NOTIFY_BAD) {
+ printk(KERN_ERR "Failed to add device node %s\n",
+ dn->full_name);
+ return -ENOMEM; /* For now, safe to assume kmalloc failure */
+ }
+
+ of_attach_node(dn);
+
+#ifdef CONFIG_PROC_DEVICETREE
+ ent = proc_mkdir(strrchr(dn->full_name, '/') + 1, dn->parent->pde);
+ if (ent)
+ proc_device_tree_add_node(dn, ent);
+#endif
+
+ of_node_put(dn->parent);
+ return 0;
+}
+
+int dlpar_detach_node(struct device_node *dn)
+{
+ struct device_node *parent = dn->parent;
+ struct property *prop = dn->properties;
+
+#ifdef CONFIG_PROC_DEVICETREE
+ while (prop) {
+ remove_proc_entry(prop->name, dn->pde);
+ prop = prop->next;
+ }
+
+ if (dn->pde)
+ remove_proc_entry(dn->pde->name, parent->pde);
+#endif
+
+ blocking_notifier_call_chain(&pSeries_reconfig_chain,
+ PSERIES_RECONFIG_REMOVE, dn);
+ of_detach_node(dn);
+ of_node_put(dn); /* Must decrement the refcount */
+
+ return 0;
+}
+
+#define DR_ENTITY_SENSE 9003
+#define DR_ENTITY_PRESENT 1
+#define DR_ENTITY_UNUSABLE 2
+#define ALLOCATION_STATE 9003
+#define ALLOC_UNUSABLE 0
+#define ALLOC_USABLE 1
+#define ISOLATION_STATE 9001
+#define ISOLATE 0
+#define UNISOLATE 1
+
+int dlpar_acquire_drc(u32 drc_index)
+{
+ int dr_status, rc;
+
+ rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status,
+ DR_ENTITY_SENSE, drc_index);
+ if (rc || dr_status != DR_ENTITY_UNUSABLE)
+ return -1;
+
+ rc = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_USABLE);
+ if (rc)
+ return rc;
+
+ rc = rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE);
+ if (rc) {
+ rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_UNUSABLE);
+ return rc;
+ }
+
+ return 0;
+}
+
+int dlpar_release_drc(u32 drc_index)
+{
+ int dr_status, rc;
+
+ rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status,
+ DR_ENTITY_SENSE, drc_index);
+ if (rc || dr_status != DR_ENTITY_PRESENT)
+ return -1;
+
+ rc = rtas_set_indicator(ISOLATION_STATE, drc_index, ISOLATE);
+ if (rc)
+ return rc;
+
+ rc = rtas_set_indicator(ALLOCATION_STATE, drc_index, ALLOC_UNUSABLE);
+ if (rc) {
+ rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE);
+ return rc;
+ }
+
+ return 0;
+}
+
+
Index: powerpc/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/Makefile 2009-11-20 17:53:54.000000000 -0600
+++ powerpc/arch/powerpc/platforms/pseries/Makefile 2009-11-20 17:55:52.000000000 -0600
@@ -8,7 +8,7 @@

obj-y := lpar.o hvCall.o nvram.o reconfig.o \
setup.o iommu.o ras.o rtasd.o \
- firmware.o power.o
+ firmware.o power.o dlpar.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_XICS) += xics.o
obj-$(CONFIG_SCANLOG) += scanlog.o
Index: powerpc/arch/powerpc/include/asm/pSeries_reconfig.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/pSeries_reconfig.h 2009-11-20 17:53:54.000000000 -0600
+++ powerpc/arch/powerpc/include/asm/pSeries_reconfig.h 2009-11-20 17:55:52.000000000 -0600
@@ -17,6 +17,7 @@
#ifdef CONFIG_PPC_PSERIES
extern int pSeries_reconfig_notifier_register(struct notifier_block *);
extern void pSeries_reconfig_notifier_unregister(struct notifier_block *);
+extern struct blocking_notifier_head pSeries_reconfig_chain;
#else /* !CONFIG_PPC_PSERIES */
static inline int pSeries_reconfig_notifier_register(struct notifier_block *nb)
{
Index: powerpc/arch/powerpc/platforms/pseries/reconfig.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/reconfig.c 2009-11-20 17:53:54.000000000 -0600
+++ powerpc/arch/powerpc/platforms/pseries/reconfig.c 2009-11-20 17:55:52.000000000 -0600
@@ -96,7 +96,7 @@
return parent;
}

-static BLOCKING_NOTIFIER_HEAD(pSeries_reconfig_chain);
+BLOCKING_NOTIFIER_HEAD(pSeries_reconfig_chain);

int pSeries_reconfig_notifier_register(struct notifier_block *nb)
{

2009-11-25 07:12:26

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH v2 2/3] sysfs cpu probe/release files

In order to support kernel DLPAR of CPU resources we need to provide an
interface to add (probe) and remove (release) the resource from the system.
This patch Creates new generic probe and release sysfs files to facilitate
cpu probe/release. The probe/release interface provides for allowing each
arch to supply their own routines for implementing the backend of adding
and removing cpus to/from the system.

This also creates the powerpc specific stubs to handle the arch callouts
from writes to the sysfs files.

The creation and use of these files is regulated by the
CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
capability will have the files created.

Signed-off-by: Nathan Fontenot <[email protected]>
---

arch/powerpc/Kconfig | 4 ++++
arch/powerpc/include/asm/machdep.h | 5 +++++
arch/powerpc/kernel/sysfs.c | 19 +++++++++++++++++++
drivers/base/cpu.c | 32 ++++++++++++++++++++++++++++++++
include/linux/cpu.h | 2 ++
5 files changed, 62 insertions(+)

Index: powerpc/drivers/base/cpu.c
===================================================================
--- powerpc.orig/drivers/base/cpu.c 2009-11-25 04:06:10.000000000 -0600
+++ powerpc/drivers/base/cpu.c 2009-11-25 04:11:01.000000000 -0600
@@ -72,6 +72,38 @@
per_cpu(cpu_sys_devices, logical_cpu) = NULL;
return;
}
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static ssize_t cpu_probe_store(struct class *class, const char *buf,
+ size_t count)
+{
+ return arch_cpu_probe(buf, count);
+}
+
+static ssize_t cpu_release_store(struct class *class, const char *buf,
+ size_t count)
+{
+ return arch_cpu_release(buf, count);
+}
+
+static CLASS_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);
+static CLASS_ATTR(release, S_IWUSR, NULL, cpu_release_store);
+
+int __init cpu_probe_release_init(void)
+{
+ int rc;
+
+ rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+ &class_attr_probe.attr);
+ if (!rc)
+ rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+ &class_attr_release.attr);
+
+ return rc;
+}
+device_initcall(cpu_probe_release_init);
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
+
#else /* ... !CONFIG_HOTPLUG_CPU */
static inline void register_cpu_control(struct cpu *cpu)
{
Index: powerpc/arch/powerpc/include/asm/machdep.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/machdep.h 2009-11-25 04:06:10.000000000 -0600
+++ powerpc/arch/powerpc/include/asm/machdep.h 2009-11-25 04:11:01.000000000 -0600
@@ -266,6 +266,11 @@
void (*suspend_disable_irqs)(void);
void (*suspend_enable_irqs)(void);
#endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ ssize_t (*cpu_probe)(const char *, size_t);
+ ssize_t (*cpu_release)(const char *, size_t);
+#endif
};

extern void e500_idle(void);
Index: powerpc/arch/powerpc/kernel/sysfs.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/sysfs.c 2009-11-25 04:06:10.000000000 -0600
+++ powerpc/arch/powerpc/kernel/sysfs.c 2009-11-25 04:11:01.000000000 -0600
@@ -461,6 +461,25 @@

cacheinfo_cpu_offline(cpu);
}
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+ if (ppc_md.cpu_probe)
+ return ppc_md.cpu_probe(buf, count);
+
+ return -EINVAL;
+}
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+ if (ppc_md.cpu_release)
+ return ppc_md.cpu_release(buf, count);
+
+ return -EINVAL;
+}
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
+
#endif /* CONFIG_HOTPLUG_CPU */

static int __cpuinit sysfs_cpu_notify(struct notifier_block *self,
Index: powerpc/arch/powerpc/Kconfig
===================================================================
--- powerpc.orig/arch/powerpc/Kconfig 2009-11-25 04:06:10.000000000 -0600
+++ powerpc/arch/powerpc/Kconfig 2009-11-25 04:11:01.000000000 -0600
@@ -320,6 +320,10 @@

Say N if you are unsure.

+config ARCH_CPU_PROBE_RELEASE
+ def_bool y
+ depends on HOTPLUG_CPU
+
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y

Index: powerpc/include/linux/cpu.h
===================================================================
--- powerpc.orig/include/linux/cpu.h 2009-11-25 04:06:10.000000000 -0600
+++ powerpc/include/linux/cpu.h 2009-11-25 04:11:56.000000000 -0600
@@ -43,6 +43,8 @@

#ifdef CONFIG_HOTPLUG_CPU
extern void unregister_cpu(struct cpu *cpu);
+extern ssize_t arch_cpu_probe(const char *, size_t);
+extern ssize_t arch_cpu_release(const char *, size_t);
#endif
struct notifier_block;

2009-11-25 07:13:32

by Nathan Fontenot

[permalink] [raw]
Subject: [PATCH v2 3/3]CPU DLPAR handling

This patch adds the specific routines to probe and release (add and remove)
cpu resource for the powerpc pseries platform and registers these handlers
with the ppc_md callout structure.

Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/platforms/pseries/dlpar.c | 88 +++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)

Index: powerpc/arch/powerpc/platforms/pseries/dlpar.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/dlpar.c 2009-11-25 04:54:13.000000000 -0600
+++ powerpc/arch/powerpc/platforms/pseries/dlpar.c 2009-11-25 04:55:00.000000000 -0600
@@ -341,4 +341,92 @@
return 0;
}

+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE

+static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
+{
+ struct device_node *dn;
+ unsigned long drc_index;
+ char *cpu_name;
+ int rc;
+
+ rc = strict_strtoul(buf, 0, &drc_index);
+ if (rc)
+ return -EINVAL;
+
+ dn = dlpar_configure_connector(drc_index);
+ if (!dn)
+ return -EINVAL;
+
+ /* configure-connector reports cpus as living in the base
+ * directory of the device tree. CPUs actually live in the
+ * cpus directory so we need to fixup the full_name.
+ */
+ cpu_name = kzalloc(strlen(dn->full_name) + strlen("/cpus") + 1,
+ GFP_KERNEL);
+ if (!cpu_name) {
+ dlpar_free_cc_nodes(dn);
+ return -ENOMEM;
+ }
+
+ sprintf(cpu_name, "/cpus%s", dn->full_name);
+ kfree(dn->full_name);
+ dn->full_name = cpu_name;
+
+ rc = dlpar_acquire_drc(drc_index);
+ if (rc) {
+ dlpar_free_cc_nodes(dn);
+ return -EINVAL;
+ }
+
+ rc = dlpar_attach_node(dn);
+ if (rc) {
+ dlpar_release_drc(drc_index);
+ dlpar_free_cc_nodes(dn);
+ }
+
+ return rc ? rc : count;
+}
+
+static ssize_t dlpar_cpu_release(const char *buf, size_t count)
+{
+ struct device_node *dn;
+ const u32 *drc_index;
+ int rc;
+
+ dn = of_find_node_by_path(buf);
+ if (!dn)
+ return -EINVAL;
+
+ drc_index = of_get_property(dn, "ibm,my-drc-index", NULL);
+ if (!drc_index) {
+ of_node_put(dn);
+ return -EINVAL;
+ }
+
+ rc = dlpar_release_drc(*drc_index);
+ if (rc) {
+ of_node_put(dn);
+ return -EINVAL;
+ }
+
+ rc = dlpar_detach_node(dn);
+ if (rc) {
+ dlpar_acquire_drc(*drc_index);
+ return rc;
+ }
+
+ of_node_put(dn);
+ return count;
+}
+
+static int __init pseries_dlpar_init(void)
+{
+ ppc_md.cpu_probe = dlpar_cpu_probe;
+ ppc_md.cpu_release = dlpar_cpu_release;
+
+ return 0;
+}
+machine_device_initcall(pseries, pseries_dlpar_init);
+
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */

2009-11-25 13:06:57

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] sysfs cpu probe/release files

On Wed, Nov 25, 2009 at 01:12:21AM -0600, Nathan Fontenot wrote:
> In order to support kernel DLPAR of CPU resources we need to provide an
> interface to add (probe) and remove (release) the resource from the system.
> This patch Creates new generic probe and release sysfs files to facilitate
> cpu probe/release. The probe/release interface provides for allowing each
> arch to supply their own routines for implementing the backend of adding
> and removing cpus to/from the system.
>
> This also creates the powerpc specific stubs to handle the arch callouts
> from writes to the sysfs files.

Can you document these new sysfs files with some information in the
Documentation/ABI/ directory please?

thanks,

greg k-h

2009-11-26 03:00:58

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] Kernel DLPAR Infrastructure

Nathan Fontenot writes:

> The Dynamic Logical Partitioning capabilities of the powerpc pseries platform
> allows for the addition and removal of resources (i.e. CPU's, memory, and PCI
> devices) from a partition. The removal of a resource involves
> removing the resource's node from the device tree and then returning the
> resource to firmware via the rtas set-indicator call. To add a resource, it
> is first obtained from firmware via the rtas set-indicator call and then a
> new device tree node is created using the ibm,configure-coinnector rtas call
> and added to the device tree.
>
> This patch provides the kernel DLPAR infrastructure in a new filed named
> dlpar.c. The functionality provided is for acquiring and releasing a resource
> from firmware and the parsing of information returned from the
> ibm,configure-connector rtas call. Additionally this exports the pSeries
> reconfiguration notifier chain so that it can be invoked when device tree
> updates are made.
>
> Signed-off-by: Nathan Fontenot <[email protected]>

Acked-by: Paul Mackerras <[email protected]>

2009-11-26 03:00:44

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] sysfs cpu probe/release files

Nathan Fontenot writes:

> In order to support kernel DLPAR of CPU resources we need to provide an
> interface to add (probe) and remove (release) the resource from the system.
> This patch Creates new generic probe and release sysfs files to facilitate
> cpu probe/release. The probe/release interface provides for allowing each
> arch to supply their own routines for implementing the backend of adding
> and removing cpus to/from the system.
>
> This also creates the powerpc specific stubs to handle the arch callouts
> from writes to the sysfs files.
>
> The creation and use of these files is regulated by the
> CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
> capability will have the files created.
>
> Signed-off-by: Nathan Fontenot <[email protected]>

Acked-by: Paul Mackerras <[email protected]>

2009-11-26 03:00:45

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH v2 3/3]CPU DLPAR handling

Nathan Fontenot writes:

> This patch adds the specific routines to probe and release (add and remove)
> cpu resource for the powerpc pseries platform and registers these handlers
> with the ppc_md callout structure.
>
> Signed-off-by: Nathan Fontenot <[email protected]>

Acked-by: Paul Mackerras <[email protected]>

2009-11-26 03:23:28

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sysfs cpu probe/release files

Version 3 of this patch is updated with documentation added to
Documentation/ABI. There are no changes to any of the C code from v2
of the patch.

In order to support kernel DLPAR of CPU resources we need to provide an
interface to add (probe) and remove (release) the resource from the system.
This patch Creates new generic probe and release sysfs files to facilitate
cpu probe/release. The probe/release interface provides for allowing each
arch to supply their own routines for implementing the backend of adding
and removing cpus to/from the system.

This also creates the powerpc specific stubs to handle the arch callouts
from writes to the sysfs files.

The creation and use of these files is regulated by the
CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
capability will have the files created.

Signed-off-by: Nathan Fontenot <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 15 +++++++++
arch/powerpc/Kconfig | 4 ++
arch/powerpc/include/asm/machdep.h | 5 +++
arch/powerpc/kernel/sysfs.c | 19 ++++++++++++
drivers/base/cpu.c | 32 +++++++++++++++++++++
include/linux/cpu.h | 2 +
6 files changed, 77 insertions(+)

Index: powerpc/drivers/base/cpu.c
===================================================================
--- powerpc.orig/drivers/base/cpu.c 2009-11-25 04:52:37.000000000 -0600
+++ powerpc/drivers/base/cpu.c 2009-11-25 04:54:25.000000000 -0600
@@ -72,6 +72,38 @@
per_cpu(cpu_sys_devices, logical_cpu) = NULL;
return;
}
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static ssize_t cpu_probe_store(struct class *class, const char *buf,
+ size_t count)
+{
+ return arch_cpu_probe(buf, count);
+}
+
+static ssize_t cpu_release_store(struct class *class, const char *buf,
+ size_t count)
+{
+ return arch_cpu_release(buf, count);
+}
+
+static CLASS_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);
+static CLASS_ATTR(release, S_IWUSR, NULL, cpu_release_store);
+
+int __init cpu_probe_release_init(void)
+{
+ int rc;
+
+ rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+ &class_attr_probe.attr);
+ if (!rc)
+ rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj,
+ &class_attr_release.attr);
+
+ return rc;
+}
+device_initcall(cpu_probe_release_init);
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
+
#else /* ... !CONFIG_HOTPLUG_CPU */
static inline void register_cpu_control(struct cpu *cpu)
{
Index: powerpc/arch/powerpc/include/asm/machdep.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/machdep.h 2009-11-25 04:52:37.000000000 -0600
+++ powerpc/arch/powerpc/include/asm/machdep.h 2009-11-25 04:54:25.000000000 -0600
@@ -266,6 +266,11 @@
void (*suspend_disable_irqs)(void);
void (*suspend_enable_irqs)(void);
#endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ ssize_t (*cpu_probe)(const char *, size_t);
+ ssize_t (*cpu_release)(const char *, size_t);
+#endif
};

extern void e500_idle(void);
Index: powerpc/arch/powerpc/kernel/sysfs.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/sysfs.c 2009-11-25 04:52:37.000000000 -0600
+++ powerpc/arch/powerpc/kernel/sysfs.c 2009-11-25 04:54:25.000000000 -0600
@@ -461,6 +461,25 @@

cacheinfo_cpu_offline(cpu);
}
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+ if (ppc_md.cpu_probe)
+ return ppc_md.cpu_probe(buf, count);
+
+ return -EINVAL;
+}
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+ if (ppc_md.cpu_release)
+ return ppc_md.cpu_release(buf, count);
+
+ return -EINVAL;
+}
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
+
#endif /* CONFIG_HOTPLUG_CPU */

static int __cpuinit sysfs_cpu_notify(struct notifier_block *self,
Index: powerpc/arch/powerpc/Kconfig
===================================================================
--- powerpc.orig/arch/powerpc/Kconfig 2009-11-25 04:52:37.000000000 -0600
+++ powerpc/arch/powerpc/Kconfig 2009-11-25 04:54:25.000000000 -0600
@@ -320,6 +320,10 @@

Say N if you are unsure.

+config ARCH_CPU_PROBE_RELEASE
+ def_bool y
+ depends on HOTPLUG_CPU
+
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y

Index: powerpc/include/linux/cpu.h
===================================================================
--- powerpc.orig/include/linux/cpu.h 2009-11-25 04:52:37.000000000 -0600
+++ powerpc/include/linux/cpu.h 2009-11-25 04:54:25.000000000 -0600
@@ -43,6 +43,8 @@

#ifdef CONFIG_HOTPLUG_CPU
extern void unregister_cpu(struct cpu *cpu);
+extern ssize_t arch_cpu_probe(const char *, size_t);
+extern ssize_t arch_cpu_release(const char *, size_t);
#endif
struct notifier_block;

Index: powerpc/Documentation/ABI/testing/sysfs-devices-system-cpu
===================================================================
--- powerpc.orig/Documentation/ABI/testing/sysfs-devices-system-cpu 2009-11-20 17:53:51.000000000 -0600
+++ powerpc/Documentation/ABI/testing/sysfs-devices-system-cpu 2009-11-26 01:29:25.000000000 -0600
@@ -62,6 +62,21 @@
See Documentation/cputopology.txt for more information.


+What: /sys/devices/system/cpu/probe
+ /sys/devices/system/cpu/release
+Date: November 2009
+Contact: Linux kernel mailing list <[email protected]>
+Description: Dynamic addition and removal of CPU's. This is not hotplug
+ removal, this is meant complete removal/addition of the CPU
+ from the system.
+
+ probe: writes to this file will dynamically add a CPU to the
+ system. Information written to the file to add CPU's is
+ architecture specific.
+
+ release: writes to this file dynamically remove a CPU from
+ the system. Information writtento the file to remove CPU's
+ is architecture specific.

What: /sys/devices/system/cpu/cpu#/node
Date: October 2009

2009-11-27 22:41:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Kernel handling of Dynamic Logical Partitioning

Nathan Fontenot <[email protected]> writes:

> version 2 of the patch set with updates from comments.
>
> The Dynamic Logical Partitioning (DLPAR) capabilities of the powerpc pseries
> platform allows for the addition and removal of resources (i.e. cpus,
> memory, pci devices) from a partition. The removal of a resource involves
> removing the resource's node from the device tree and then returning the
> resource to firmware via the rtas set-indicator call. To add a resource, it
> is first obtained from firmware via the rtas set-indicator call and then a
> new device tree node is created using the ibm,configure-coinnector rtas call
> and added to the device tree.
>
> The following set of patches implements the needed infrastructure to have the
> kernel handle the DLPAR addition and removal of cpus (other DLPAR'able items to
> follow in future patches). The framework for this is to create a set of
> probe/release sysfs files that will facilitate arch-specific call-outs to handle
> addition and removal of cpus to the system.

How does this differ from /sys/devices/system/cpu/cpuX/online ?

>From the descriptions it appears that we already have what you are trying to
add and you just need to tie it into DLPAR on ppc.

Eric

2009-11-30 20:48:38

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Kernel handling of Dynamic Logical Partitioning

Eric W. Biederman wrote:
> Nathan Fontenot <[email protected]> writes:
>
>> version 2 of the patch set with updates from comments.
>>
>> The Dynamic Logical Partitioning (DLPAR) capabilities of the powerpc pseries
>> platform allows for the addition and removal of resources (i.e. cpus,
>> memory, pci devices) from a partition. The removal of a resource involves
>> removing the resource's node from the device tree and then returning the
>> resource to firmware via the rtas set-indicator call. To add a resource, it
>> is first obtained from firmware via the rtas set-indicator call and then a
>> new device tree node is created using the ibm,configure-coinnector rtas call
>> and added to the device tree.
>>
>> The following set of patches implements the needed infrastructure to have the
>> kernel handle the DLPAR addition and removal of cpus (other DLPAR'able items to
>> follow in future patches). The framework for this is to create a set of
>> probe/release sysfs files that will facilitate arch-specific call-outs to handle
>> addition and removal of cpus to the system.
>
> How does this differ from /sys/devices/system/cpu/cpuX/online ?
>
> From the descriptions it appears that we already have what you are trying to
> add and you just need to tie it into DLPAR on ppc.
>

The DLPAR and hotplug operations need to be kept separate for ppc. One big reason
for this is that DLPAR operations on ppc are driven from the Hardware Management
Console (HMC). It is (almost always) not possible to DLPAR add a resource (i.e.
cpu, memory,pcie device) from the OS. There are HMC <=> firmware interactions
that must occur to make the resource available to the partition.

Second is that there are times when one would want to hotplug add/remove
a cpu but not dlpar add/remove it.

When a resource is DLPAR removed, the OS has no knowledge that the resource exists.
Likewise, this allows the addition of resources from the HMC after the system is booted.
This is a feature of ppc partitions that allows for the dynamic movement of resources
between partitions.

-Nathan Fontenot