Subject: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

Hi,

This is version 5 of patch series that provides a framework to choose the
state a pseries CPU must be put to when it is offlined.

Previous versions can be found here:
Version 4: http://lkml.org/lkml/2009/10/9/59
Version 3: http://lkml.org/lkml/2009/9/15/164
Version 2: http://lkml.org/lkml/2009/8/28/102
Version 1: http://lkml.org/lkml/2009/8/6/236

Changes from the previous version include:
- Rebased against Nathan Fontenot's latest "pseries kernel handling of dynamic
logical paritioning v4" patches found here:
http://lkml.org/lkml/2009/10/21/98

- Added boot-time option to disable putting the offlined vcpus into an
extended H_CEDE state.

- Addressed Ben's comments regarding the if-else sequencing in
pseries_mach_cpu_die().

- Addition of comments for pseries_cpu_die() to distinguish it from
pseries_mach_cpu_die()

Also,

- This approach addresses Peter Z's objections regarding layering
violations. The user simply offlines the cpu and doesn't worry about what
state the CPU should be put into. That part is automatically handled by the
kernel.

- It does not add any additional sysfs interface instead uses the existing
sysfs interface to offline CPUs.

- On platforms which do not have support for ceding the vcpu with a
latency specifier value, the offlining mechanism defaults to the current
method of calling rtas_stop_self().

The patchset has been tested on the available pseries platforms and it works
as per the expectations. I believe that the patch set is ready for inclusion.
---

Gautham R Shenoy (4):
pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
pseries: Add code to online/offline CPUs of a DLPAR node.
pSeries: Add hooks to put the CPU into an appropriate offline state
pSeries: extended_cede_processor() helper function.


Documentation/cpu-hotplug.txt | 6 +
arch/powerpc/include/asm/lppaca.h | 9 +
arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++
arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
arch/powerpc/platforms/pseries/offline_states.h | 18 ++
arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
arch/powerpc/platforms/pseries/smp.c | 19 ++
arch/powerpc/xmon/xmon.c | 3
drivers/base/cpu.c | 2
include/linux/cpu.h | 13 ++
10 files changed, 387 insertions(+), 16 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

--
Thanks and Regards
gautham.


Subject: [PATCH v5 1/4] pSeries: extended_cede_processor() helper function.

This patch provides an extended_cede_processor() helper function
which takes the cede latency hint as an argument. This hint is to be passed
on to the hypervisor to cede to the corresponding state on platforms
which support it.

Signed-off-by: Gautham R Shenoy <[email protected]>
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/include/asm/lppaca.h | 9 ++++++++-
arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 ++++++++++++++++++++++
arch/powerpc/xmon/xmon.c | 3 ++-
3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index f78f65c..14b592d 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -100,7 +100,14 @@ struct lppaca {
// Used to pass parms from the OS to PLIC for SetAsrAndRfid
u64 saved_gpr3; // Saved GPR3 x20-x27
u64 saved_gpr4; // Saved GPR4 x28-x2F
- u64 saved_gpr5; // Saved GPR5 x30-x37
+ union {
+ u64 saved_gpr5; /* Saved GPR5 x30-x37 */
+ struct {
+ u8 cede_latency_hint; /* x30 */
+ u8 reserved[7]; /* x31-x36 */
+ } fields;
+ } gpr5_dword;
+

u8 dtl_enable_mask; // Dispatch Trace Log mask x38-x38
u8 donate_dedicated_cpu; // Donate dedicated CPU cycles x39-x39
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..0603c91 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -9,11 +9,33 @@ static inline long poll_pending(void)
return plpar_hcall_norets(H_POLL_PENDING);
}

+static inline u8 get_cede_latency_hint(void)
+{
+ return get_lppaca()->gpr5_dword.fields.cede_latency_hint;
+}
+
+static inline void set_cede_latency_hint(u8 latency_hint)
+{
+ get_lppaca()->gpr5_dword.fields.cede_latency_hint = latency_hint;
+}
+
static inline long cede_processor(void)
{
return plpar_hcall_norets(H_CEDE);
}

+static inline long extended_cede_processor(unsigned long latency_hint)
+{
+ long rc;
+ u8 old_latency_hint = get_cede_latency_hint();
+
+ set_cede_latency_hint(latency_hint);
+ rc = cede_processor();
+ set_cede_latency_hint(old_latency_hint);
+
+ return rc;
+}
+
static inline long vpa_call(unsigned long flags, unsigned long cpu,
unsigned long vpa)
{
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c6f0a71..57124cf 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1623,7 +1623,8 @@ static void super_regs(void)
ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1);
printf(" Saved Gpr3=%.16lx Saved Gpr4=%.16lx \n",
ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4);
- printf(" Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5);
+ printf(" Saved Gpr5=%.16lx \n",
+ ptrLpPaca->gpr5_dword.saved_gpr5);
}
#endif

Subject: [PATCH v5 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state

When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.

This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.

The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
rtas_stop_self()

- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
hint specifier.

The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().

The patch also provides a boot-time command line argument to disable/enable
CPU_STATE_INACTIVE.

Signed-off-by: Gautham R Shenoy <[email protected]>
Signed-off-by: Nathan Fontenot <[email protected]>
---
Documentation/cpu-hotplug.txt | 6 +
arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
arch/powerpc/platforms/pseries/offline_states.h | 18 ++
arch/powerpc/platforms/pseries/smp.c | 19 ++
4 files changed, 216 insertions(+), 9 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9d620c1..4d4a644 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -49,6 +49,12 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
additional_cpus=n (*) Use this to limit hotpluggable cpus. This option sets
cpu_possible_map = cpu_present_map + additional_cpus

+cede_offline={"off","on"} Use this option to disable/enable putting offlined
+ processors to an extended H_CEDE state on
+ supported pseries platforms.
+ If nothing is specified,
+ cede_offline is set to "on".
+
(*) Option valid only for following architectures
- ia64

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index ebff6d9..6ea4698 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
#include <asm/pSeries_reconfig.h>
#include "xics.h"
#include "plpar_wrappers.h"
+#include "offline_states.h"

/* This version can't take the spinlock, because it never returns */
static struct rtas_args rtas_stop_self_args = {
@@ -39,6 +40,55 @@ static struct rtas_args rtas_stop_self_args = {
.rets = &rtas_stop_self_args.args[0],
};

+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+ CPU_STATE_OFFLINE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
+
+static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
+
+static int cede_offline_enabled __read_mostly = 1;
+
+/*
+ * Enable/disable cede_offline when available.
+ */
+static int __init setup_cede_offline(char *str)
+{
+ if (!strcmp(str, "off"))
+ cede_offline_enabled = 0;
+ else if (!strcmp(str, "on"))
+ cede_offline_enabled = 1;
+ else
+ return 0;
+ return 1;
+}
+
+__setup("cede_offline=", setup_cede_offline);
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+ return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+ per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+ return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+ per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+ per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
static void rtas_stop_self(void)
{
struct rtas_args *args = &rtas_stop_self_args;
@@ -56,11 +106,61 @@ static void rtas_stop_self(void)

static void pseries_mach_cpu_die(void)
{
+ unsigned int cpu = smp_processor_id();
+ unsigned int hwcpu = hard_smp_processor_id();
+ u8 cede_latency_hint = 0;
+
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
- unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
- rtas_stop_self();
+
+ if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+ set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+ cede_latency_hint = 2;
+
+ get_lppaca()->idle = 1;
+ if (!get_lppaca()->shared_proc)
+ get_lppaca()->donate_dedicated_cpu = 1;
+
+ printk(KERN_INFO
+ "cpu %u (hwid %u) ceding for offline with hint %d\n",
+ cpu, hwcpu, cede_latency_hint);
+ while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+ extended_cede_processor(cede_latency_hint);
+ printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n",
+ cpu, hwcpu);
+ printk(KERN_INFO
+ "Decrementer value = %x Timebase value = %llx\n",
+ get_dec(), get_tb());
+ }
+
+ printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n",
+ cpu, hwcpu);
+
+ if (!get_lppaca()->shared_proc)
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+ }
+
+ if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
+ unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+ /*
+ * NOTE: Calling start_secondary() here for now to
+ * start new context.
+ * However, need to do it cleanly by resetting the
+ * stack pointer.
+ */
+ start_secondary();
+
+ } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+ set_cpu_current_state(cpu, CPU_STATE_OFFLINE);
+ unregister_slb_shadow(hard_smp_processor_id(),
+ __pa(get_slb_shadow()));
+ rtas_stop_self();
+ }
+
/* Should never get here... */
BUG();
for(;;);
@@ -106,18 +206,43 @@ static int pseries_cpu_disable(void)
return 0;
}

+/*
+ * pseries_cpu_die: Wait for the cpu to die.
+ * @cpu: logical processor id of the CPU whose death we're awaiting.
+ *
+ * This function is called from the context of the thread which is performing
+ * the cpu-offline. Here we wait for long enough to allow the cpu in question
+ * to self-destroy so that the cpu-offline thread can send the CPU_DEAD
+ * notifications.
+ *
+ * OTOH, pseries_mach_cpu_die() is called by the @cpu when it wants to
+ * self-destruct.
+ */
static void pseries_cpu_die(unsigned int cpu)
{
int tries;
- int cpu_status;
+ int cpu_status = 1;
unsigned int pcpu = get_hard_smp_processor_id(cpu);

- for (tries = 0; tries < 25; tries++) {
- cpu_status = query_cpu_stopped(pcpu);
- if (cpu_status == 0 || cpu_status == -1)
- break;
- cpu_relax();
+ if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+ cpu_status = 1;
+ for (tries = 0; tries < 1000; tries++) {
+ if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
+ cpu_status = 0;
+ break;
+ }
+ cpu_relax();
+ }
+ } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+ for (tries = 0; tries < 25; tries++) {
+ cpu_status = query_cpu_stopped(pcpu);
+ if (cpu_status == 0 || cpu_status == -1)
+ break;
+ cpu_relax();
+ }
}
+
if (cpu_status != 0) {
printk("Querying DEAD? cpu %i (%i) shows %i\n",
cpu, pcpu, cpu_status);
@@ -252,10 +377,41 @@ static struct notifier_block pseries_smp_nb = {
.notifier_call = pseries_smp_notifier,
};

+#define MAX_CEDE_LATENCY_LEVELS 4
+#define CEDE_LATENCY_PARAM_LENGTH 10
+#define CEDE_LATENCY_PARAM_MAX_LENGTH \
+ (MAX_CEDE_LATENCY_LEVELS * CEDE_LATENCY_PARAM_LENGTH * sizeof(char))
+#define CEDE_LATENCY_TOKEN 45
+
+static char cede_parameters[CEDE_LATENCY_PARAM_MAX_LENGTH];
+
+static int parse_cede_parameters(void)
+{
+ int call_status;
+
+ memset(cede_parameters, 0, CEDE_LATENCY_PARAM_MAX_LENGTH);
+ call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+ NULL,
+ CEDE_LATENCY_TOKEN,
+ __pa(cede_parameters),
+ CEDE_LATENCY_PARAM_MAX_LENGTH);
+
+ if (call_status != 0)
+ printk(KERN_INFO "CEDE_LATENCY: \
+ %s %s Error calling get-system-parameter(0x%x)\n",
+ __FILE__, __func__, call_status);
+ else
+ printk(KERN_INFO "CEDE_LATENCY: \
+ get-system-parameter successful.\n");
+
+ return call_status;
+}
+
static int __init pseries_cpu_hotplug_init(void)
{
struct device_node *np;
const char *typep;
+ int cpu;

for_each_node_by_name(np, "interrupt-controller") {
typep = of_get_property(np, "compatible", NULL);
@@ -283,8 +439,16 @@ static int __init pseries_cpu_hotplug_init(void)
smp_ops->cpu_die = pseries_cpu_die;

/* Processors can be added/removed only on LPAR */
- if (firmware_has_feature(FW_FEATURE_LPAR))
+ if (firmware_has_feature(FW_FEATURE_LPAR)) {
pSeries_reconfig_notifier_register(&pseries_smp_nb);
+ cpu_maps_update_begin();
+ if (cede_offline_enabled && parse_cede_parameters() == 0) {
+ default_offline_state = CPU_STATE_INACTIVE;
+ for_each_online_cpu(cpu)
+ set_default_offline_state(cpu);
+ }
+ cpu_maps_update_done();
+ }

return 0;
}
diff --git a/arch/powerpc/platforms/pseries/offline_states.h b/arch/powerpc/platforms/pseries/offline_states.h
new file mode 100644
index 0000000..22574e0
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_states.h
@@ -0,0 +1,18 @@
+#ifndef _OFFLINE_STATES_H_
+#define _OFFLINE_STATES_H_
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+ CPU_STATE_OFFLINE,
+ CPU_STATE_INACTIVE,
+ CPU_STATE_ONLINE,
+ CPU_MAX_OFFLINE_STATES
+};
+
+extern enum cpu_state_vals get_cpu_current_state(int cpu);
+extern void set_cpu_current_state(int cpu, enum cpu_state_vals state);
+extern enum cpu_state_vals get_preferred_offline_state(int cpu);
+extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state);
+extern void set_default_offline_state(int cpu);
+extern int start_secondary(void);
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 440000c..8868c01 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
#include "plpar_wrappers.h"
#include "pseries.h"
#include "xics.h"
+#include "offline_states.h"


/*
@@ -84,6 +85,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
/* Fixup atomic count: it exited inside IRQ handler. */
task_thread_info(paca[lcpu].__current)->preempt_count = 0;

+ if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
+ goto out;
+
/*
* If the RTAS start-cpu token does not exist then presume the
* cpu is already spinning.
@@ -98,6 +102,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
return 0;
}

+out:
return 1;
}

@@ -111,12 +116,16 @@ static void __devinit smp_xics_setup_cpu(int cpu)
vpa_init(cpu);

cpu_clear(cpu, of_spin_map);
+ set_cpu_current_state(cpu, CPU_STATE_ONLINE);
+ set_default_offline_state(cpu);

}
#endif /* CONFIG_XICS */

static void __devinit smp_pSeries_kick_cpu(int nr)
{
+ long rc;
+ unsigned long hcpuid;
BUG_ON(nr < 0 || nr >= NR_CPUS);

if (!smp_startup_cpu(nr))
@@ -128,6 +137,16 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
* the processor will continue on to secondary_start
*/
paca[nr].cpu_start = 1;
+
+ set_preferred_offline_state(nr, CPU_STATE_ONLINE);
+
+ if (get_cpu_current_state(nr) == CPU_STATE_INACTIVE) {
+ hcpuid = get_hard_smp_processor_id(nr);
+ rc = plpar_hcall_norets(H_PROD, hcpuid);
+ if (rc != H_SUCCESS)
+ panic("Error: Prod to wake up processor %d Ret= %ld\n",
+ nr, rc);
+ }
}

static int smp_pSeries_cpu_bootable(unsigned int nr)

Subject: [PATCH v5 3/4] pseries: Add code to online/offline CPUs of a DLPAR node.

Currently the cpu-allocation/deallocation on pSeries is a
two step process from the Userspace.

- Set the indicators and update the device tree by writing to the sysfs
tunable "probe" during allocation and "release" during deallocation.
- Online / Offline the CPUs of the allocated/would_be_deallocated node by
writing to the sysfs tunable "online".

This patch adds kernel code to online/offline the CPUs soon_after/just_before
they have been allocated/would_be_deallocated. This way, the userspace tool
that performs DLPAR operations would only have to deal with one set of sysfs
tunables namely "probe" and release".

Signed-off-by: Gautham R Shenoy <[email protected]>
Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/platforms/pseries/dlpar.c | 101 ++++++++++++++++++++++++++++++++
1 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index ceb6d17..8e04a69 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -19,6 +19,7 @@
#include <linux/sysdev.h>
#include <linux/sysfs.h>
#include <linux/cpu.h>
+#include "offline_states.h"

#include <asm/prom.h>
#include <asm/machdep.h>
@@ -353,6 +354,98 @@ int remove_device_tree_nodes(struct device_node *dn)
return rc;
}

+int online_node_cpus(struct device_node *dn)
+{
+ int rc = 0;
+ unsigned int cpu;
+ int len, nthreads, i;
+ const u32 *intserv;
+
+ intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+ if (!intserv)
+ return -EINVAL;
+
+ nthreads = len / sizeof(u32);
+
+ cpu_maps_update_begin();
+ for (i = 0; i < nthreads; i++) {
+ for_each_present_cpu(cpu) {
+ if (get_hard_smp_processor_id(cpu) != intserv[i])
+ continue;
+ BUG_ON(get_cpu_current_state(cpu)
+ != CPU_STATE_OFFLINE);
+ cpu_maps_update_done();
+ rc = cpu_up(cpu);
+ if (rc)
+ goto out;
+ cpu_maps_update_begin();
+
+ break;
+ }
+ if (cpu == num_possible_cpus())
+ printk(KERN_WARNING "Could not find cpu to online "
+ "with physical id 0x%x\n", intserv[i]);
+ }
+ cpu_maps_update_done();
+
+out:
+ return rc;
+
+}
+
+int offline_node_cpus(struct device_node *dn)
+{
+ int rc = 0;
+ unsigned int cpu;
+ int len, nthreads, i;
+ const u32 *intserv;
+
+ intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+ if (!intserv)
+ return -EINVAL;
+
+ nthreads = len / sizeof(u32);
+
+ cpu_maps_update_begin();
+ for (i = 0; i < nthreads; i++) {
+ for_each_present_cpu(cpu) {
+ if (get_hard_smp_processor_id(cpu) != intserv[i])
+ continue;
+
+ if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
+ break;
+
+ if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
+ cpu_maps_update_done();
+ rc = cpu_down(cpu);
+ if (rc)
+ goto out;
+ cpu_maps_update_begin();
+ break;
+
+ }
+
+ /*
+ * The cpu is in CPU_STATE_INACTIVE.
+ * Upgrade it's state to CPU_STATE_OFFLINE.
+ */
+ set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
+ BUG_ON(plpar_hcall_norets(H_PROD, intserv[i])
+ != H_SUCCESS);
+ __cpu_die(cpu);
+ break;
+ }
+ if (cpu == num_possible_cpus())
+ printk(KERN_WARNING "Could not find cpu to offline "
+ "with physical id 0x%x\n", intserv[i]);
+ }
+ cpu_maps_update_done();
+
+out:
+ return rc;
+
+}
+
#define DR_ENTITY_SENSE 9003
#define DR_ENTITY_PRESENT 1
#define DR_ENTITY_UNUSABLE 2
@@ -447,6 +540,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
if (rc)
release_drc(drc_index);

+ rc = online_node_cpus(dn);
+
return rc ? -EINVAL : count;
}

@@ -467,6 +562,11 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
return -EINVAL;
}

+ rc = offline_node_cpus(dn);
+
+ if (rc)
+ goto out;
+
rc = release_drc(*drc_index);
if (rc) {
of_node_put(dn);
@@ -478,6 +578,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
acquire_drc(*drc_index);

of_node_put(dn);
+out:
return rc ? -EINVAL : count;
}

Subject: [PATCH v5 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate

Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
information.

- Online/offline the allocated/deallocated CPU.

This is achieved by writing to the sysfs tunables "probe" during allocation
and "release" during deallocation.

At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable "online".

It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the "online" sysfs tunable thereby causing the deallocate process to fail.

The solution to this is to serialize writes to the "probe/release" sysfs
tunable with the writes to the "online" sysfs tunable.

This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES

Signed-off-by: Gautham R Shenoy <[email protected]>
---
arch/powerpc/platforms/pseries/dlpar.c | 28 +++++++++++++++++++++++-----
drivers/base/cpu.c | 2 ++
include/linux/cpu.h | 13 +++++++++++++
3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 8e04a69..b6fc6ab 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -501,6 +501,18 @@ int release_drc(u32 drc_index)
}

#ifdef CONFIG_HOTPLUG_CPU
+static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+ mutex_lock(&pseries_cpu_hotplug_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+ mutex_unlock(&pseries_cpu_hotplug_mutex);
+}
+
static ssize_t cpu_probe_store(struct class *class, const char *buf,
size_t count)
{
@@ -509,18 +521,19 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
char *cpu_name;
int rc;

+ cpu_hotplug_driver_lock();
rc = strict_strtoul(buf, 0, &drc_index);
if (rc)
- return -EINVAL;
+ goto out;

rc = acquire_drc(drc_index);
if (rc)
- return -EINVAL;
+ goto out;

dn = configure_connector(drc_index);
if (!dn) {
release_drc(drc_index);
- return -EINVAL;
+ goto out;
}

/* fixup dn name */
@@ -529,7 +542,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
if (!cpu_name) {
free_cc_nodes(dn);
release_drc(drc_index);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out;
}

sprintf(cpu_name, "/cpus/%s", dn->full_name);
@@ -541,6 +555,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
release_drc(drc_index);

rc = online_node_cpus(dn);
+out:
+ cpu_hotplug_driver_unlock();

return rc ? -EINVAL : count;
}
@@ -562,6 +578,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
return -EINVAL;
}

+ cpu_hotplug_driver_lock();
rc = offline_node_cpus(dn);

if (rc)
@@ -570,7 +587,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
rc = release_drc(*drc_index);
if (rc) {
of_node_put(dn);
- return -EINVAL;
+ goto out;
}

rc = remove_device_tree_nodes(dn);
@@ -579,6 +596,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,

of_node_put(dn);
out:
+ cpu_hotplug_driver_unlock();
return rc ? -EINVAL : count;
}

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..07c3f05 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
struct cpu *cpu = container_of(dev, struct cpu, sysdev);
ssize_t ret;

+ cpu_hotplug_driver_lock();
switch (buf[0]) {
case '0':
ret = cpu_down(cpu->sysdev.id);
@@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
default:
ret = -EINVAL;
}
+ cpu_hotplug_driver_unlock();

if (ret >= 0)
ret = count;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4753619..b0ad4e1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -115,6 +115,19 @@ extern void put_online_cpus(void);
#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
int cpu_down(unsigned int cpu);

+#ifdef CONFIG_PPC_PSERIES
+extern void cpu_hotplug_driver_lock(void);
+extern void cpu_hotplug_driver_unlock(void);
+#else
+static inline void cpu_hotplug_driver_lock(void)
+{
+}
+
+static inline void cpu_hotplug_driver_unlock(void)
+{
+}
+#endif
+
#else /* CONFIG_HOTPLUG_CPU */

#define get_online_cpus() do { } while (0)

Subject: Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

On Fri, Oct 30, 2009 at 10:52:43AM +0530, Gautham R Shenoy wrote:
> Hi,

Hi Peter,

Did you get a chance to look at this new design ?

>
> This is version 5 of patch series that provides a framework to choose the
> state a pseries CPU must be put to when it is offlined.
>
> Previous versions can be found here:
> Version 4: http://lkml.org/lkml/2009/10/9/59
> Version 3: http://lkml.org/lkml/2009/9/15/164
> Version 2: http://lkml.org/lkml/2009/8/28/102
> Version 1: http://lkml.org/lkml/2009/8/6/236
>
> Changes from the previous version include:
> - Rebased against Nathan Fontenot's latest "pseries kernel handling of dynamic
> logical paritioning v4" patches found here:
> http://lkml.org/lkml/2009/10/21/98
>
> - Added boot-time option to disable putting the offlined vcpus into an
> extended H_CEDE state.
>
> - Addressed Ben's comments regarding the if-else sequencing in
> pseries_mach_cpu_die().
>
> - Addition of comments for pseries_cpu_die() to distinguish it from
> pseries_mach_cpu_die()
>
> Also,
>
> - This approach addresses Peter Z's objections regarding layering
> violations. The user simply offlines the cpu and doesn't worry about what
> state the CPU should be put into. That part is automatically handled by the
> kernel.
>
> - It does not add any additional sysfs interface instead uses the existing
> sysfs interface to offline CPUs.
>
> - On platforms which do not have support for ceding the vcpu with a
> latency specifier value, the offlining mechanism defaults to the current
> method of calling rtas_stop_self().
>
> The patchset has been tested on the available pseries platforms and it works
> as per the expectations. I believe that the patch set is ready for inclusion.
> ---
>
> Gautham R Shenoy (4):
> pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
> pseries: Add code to online/offline CPUs of a DLPAR node.
> pSeries: Add hooks to put the CPU into an appropriate offline state
> pSeries: extended_cede_processor() helper function.
>
>
> Documentation/cpu-hotplug.txt | 6 +
> arch/powerpc/include/asm/lppaca.h | 9 +
> arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++
> arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
> arch/powerpc/platforms/pseries/offline_states.h | 18 ++
> arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
> arch/powerpc/platforms/pseries/smp.c | 19 ++
> arch/powerpc/xmon/xmon.c | 3
> drivers/base/cpu.c | 2
> include/linux/cpu.h | 13 ++
> 10 files changed, 387 insertions(+), 16 deletions(-)
> create mode 100644 arch/powerpc/platforms/pseries/offline_states.h
>
> --
> Thanks and Regards
> gautham.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Thanks and Regards
gautham

2009-11-11 13:28:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

On Fri, 2009-10-30 at 10:52 +0530, Gautham R Shenoy wrote:

> Gautham R Shenoy (4):
> pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
> pseries: Add code to online/offline CPUs of a DLPAR node.
> pSeries: Add hooks to put the CPU into an appropriate offline state
> pSeries: extended_cede_processor() helper function.
>
>
> Documentation/cpu-hotplug.txt | 6 +
> arch/powerpc/include/asm/lppaca.h | 9 +
> arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++
> arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
> arch/powerpc/platforms/pseries/offline_states.h | 18 ++
> arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
> arch/powerpc/platforms/pseries/smp.c | 19 ++
> arch/powerpc/xmon/xmon.c | 3
> drivers/base/cpu.c | 2
> include/linux/cpu.h | 13 ++
> 10 files changed, 387 insertions(+), 16 deletions(-)
> create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

Not quite sure how you solved the DLPAR communication but since pretty
much everything is under arch/powerpc/ I really don't have much to say.

2009-11-11 21:36:04

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

On Wed, 2009-11-11 at 14:25 +0100, Peter Zijlstra wrote:
> On Fri, 2009-10-30 at 10:52 +0530, Gautham R Shenoy wrote:
>
> > Gautham R Shenoy (4):
> > pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
> > pseries: Add code to online/offline CPUs of a DLPAR node.
> > pSeries: Add hooks to put the CPU into an appropriate offline state
> > pSeries: extended_cede_processor() helper function.
> >
> >
> > Documentation/cpu-hotplug.txt | 6 +
> > arch/powerpc/include/asm/lppaca.h | 9 +
> > arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++
> > arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
> > arch/powerpc/platforms/pseries/offline_states.h | 18 ++
> > arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
> > arch/powerpc/platforms/pseries/smp.c | 19 ++
> > arch/powerpc/xmon/xmon.c | 3
> > drivers/base/cpu.c | 2
> > include/linux/cpu.h | 13 ++
> > 10 files changed, 387 insertions(+), 16 deletions(-)
> > create mode 100644 arch/powerpc/platforms/pseries/offline_states.h
>
> Not quite sure how you solved the DLPAR communication but since pretty
> much everything is under arch/powerpc/ I really don't have much to say.

Allright. I was hoping to have your ack for the drivers/base/cpu.c
change :-)

Cheers,
Ben.

2009-11-11 21:46:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

On Thu, 2009-11-12 at 08:35 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2009-11-11 at 14:25 +0100, Peter Zijlstra wrote:
> > On Fri, 2009-10-30 at 10:52 +0530, Gautham R Shenoy wrote:
> >
> > > Gautham R Shenoy (4):
> > > pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
> > > pseries: Add code to online/offline CPUs of a DLPAR node.
> > > pSeries: Add hooks to put the CPU into an appropriate offline state
> > > pSeries: extended_cede_processor() helper function.
> > >
> > >
> > > Documentation/cpu-hotplug.txt | 6 +
> > > arch/powerpc/include/asm/lppaca.h | 9 +
> > > arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++
> > > arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
> > > arch/powerpc/platforms/pseries/offline_states.h | 18 ++
> > > arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
> > > arch/powerpc/platforms/pseries/smp.c | 19 ++
> > > arch/powerpc/xmon/xmon.c | 3
> > > drivers/base/cpu.c | 2
> > > include/linux/cpu.h | 13 ++
> > > 10 files changed, 387 insertions(+), 16 deletions(-)
> > > create mode 100644 arch/powerpc/platforms/pseries/offline_states.h
> >
> > Not quite sure how you solved the DLPAR communication but since pretty
> > much everything is under arch/powerpc/ I really don't have much to say.
>
> Allright. I was hoping to have your ack for the drivers/base/cpu.c
> change :-)

Sure.

Acked-by: Peter Zijlstra <[email protected]>

2009-11-24 03:37:37

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

On Fri, 2009-10-30 at 10:52 +0530, Gautham R Shenoy wrote:
> Hi,
>
> This is version 5 of patch series that provides a framework to choose the
> state a pseries CPU must be put to when it is offlined.
>
> Previous versions can be found here:
> Version 4: http://lkml.org/lkml/2009/10/9/59
> Version 3: http://lkml.org/lkml/2009/9/15/164
> Version 2: http://lkml.org/lkml/2009/8/28/102
> Version 1: http://lkml.org/lkml/2009/8/6/236
>
> Changes from the previous version include:
> - Rebased against Nathan Fontenot's latest "pseries kernel handling of dynamic
> logical paritioning v4" patches found here:
> http://lkml.org/lkml/2009/10/21/98

I can't merge them right now because afaik, Nathan patches are still not
quite ready. So we need to either get a final version of Nathan patches
something like tomorrow or you need to rebase your series on current
-next by the end of the week, or I'm afraid it's going to miss the next
merge window.

Cheers,
Ben.

> - Added boot-time option to disable putting the offlined vcpus into an
> extended H_CEDE state.
>
> - Addressed Ben's comments regarding the if-else sequencing in
> pseries_mach_cpu_die().
>
> - Addition of comments for pseries_cpu_die() to distinguish it from
> pseries_mach_cpu_die()
>
> Also,
>
> - This approach addresses Peter Z's objections regarding layering
> violations. The user simply offlines the cpu and doesn't worry about what
> state the CPU should be put into. That part is automatically handled by the
> kernel.
>
> - It does not add any additional sysfs interface instead uses the existing
> sysfs interface to offline CPUs.
>
> - On platforms which do not have support for ceding the vcpu with a
> latency specifier value, the offlining mechanism defaults to the current
> method of calling rtas_stop_self().
>
> The patchset has been tested on the available pseries platforms and it works
> as per the expectations. I believe that the patch set is ready for inclusion.
> ---
>
> Gautham R Shenoy (4):
> pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
> pseries: Add code to online/offline CPUs of a DLPAR node.
> pSeries: Add hooks to put the CPU into an appropriate offline state
> pSeries: extended_cede_processor() helper function.
>
>
> Documentation/cpu-hotplug.txt | 6 +
> arch/powerpc/include/asm/lppaca.h | 9 +
> arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++
> arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-
> arch/powerpc/platforms/pseries/offline_states.h | 18 ++
> arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
> arch/powerpc/platforms/pseries/smp.c | 19 ++
> arch/powerpc/xmon/xmon.c | 3
> drivers/base/cpu.c | 2
> include/linux/cpu.h | 13 ++
> 10 files changed, 387 insertions(+), 16 deletions(-)
> create mode 100644 arch/powerpc/platforms/pseries/offline_states.h
>

2009-11-24 05:25:50

by Vaidyanathan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

* Benjamin Herrenschmidt <[email protected]> [2009-11-24 14:35:09]:

> On Fri, 2009-10-30 at 10:52 +0530, Gautham R Shenoy wrote:
> > Hi,
> >
> > This is version 5 of patch series that provides a framework to choose the
> > state a pseries CPU must be put to when it is offlined.
> >
> > Previous versions can be found here:
> > Version 4: http://lkml.org/lkml/2009/10/9/59
> > Version 3: http://lkml.org/lkml/2009/9/15/164
> > Version 2: http://lkml.org/lkml/2009/8/28/102
> > Version 1: http://lkml.org/lkml/2009/8/6/236
> >
> > Changes from the previous version include:
> > - Rebased against Nathan Fontenot's latest "pseries kernel handling of dynamic
> > logical paritioning v4" patches found here:
> > http://lkml.org/lkml/2009/10/21/98
>
> I can't merge them right now because afaik, Nathan patches are still not
> quite ready. So we need to either get a final version of Nathan patches
> something like tomorrow or you need to rebase your series on current
> -next by the end of the week, or I'm afraid it's going to miss the next
> merge window.

Hi Ben,

I had checked with Nathan earlier and he mentioned that he is working
on an update. We can post a rebase to -next tomorrow, but this series
depends on Nathan's patch, hence will work with him. This feature is
important for the next merge window.

Thanks,
Vaidy

2009-11-26 05:13:23

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH v5 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state


Hi Gautham,

> + call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> + NULL,
> + CEDE_LATENCY_TOKEN,
> + __pa(cede_parameters),
> + CEDE_LATENCY_PARAM_MAX_LENGTH);
> +
> + if (call_status != 0)
> + printk(KERN_INFO "CEDE_LATENCY: \
> + %s %s Error calling get-system-parameter(0x%x)\n",
> + __FILE__, __func__, call_status);
> + else
> + printk(KERN_INFO "CEDE_LATENCY: \
> + get-system-parameter successful.\n");

I'm seeing this on a POWER6 box:


CEDE_LATENCY: arch/powerpc/platforms/pseries/hotplug-cpu.c parse_cede_parameters Error calling get-system-parameter(0xfffffffd)


Seems overly verbose. Do we need to print success/fail on this call? Perhaps
a summary line if anything during initialisation failed:

Cede latency not supported

and

Cede latency supported

If it all worked.

Anton