Subject: [PATCH v4 0/4] pseries: Add cede support for cpu-offline

Hi,

This is version 4 of patch series that provides a framework to choose the
state a pseries CPU must be put to when it is offlined.

Previous versions can be found here:
Version 3: http://lkml.org/lkml/2009/9/15/164
Version 2: http://lkml.org/lkml/2009/8/28/102
Version 1: http://lkml.org/lkml/2009/8/6/236

This patch series differs considerably from the previous iterations in that:
- It is based on top of Nathan Fontenot's
"pseries kernel handling of dynamic logical partitioning v2" patches
found here: http://lkml.org/lkml/2009/9/18/173

- It does away with the sysfs tunables "available_hotplug_states" and
"currrent_hotplug_state", thereby refraining from exposing any platform
specific option to the userspace.

- It provides a framework that discovers the support for ceding the vcpu
to the hypervisor with a latency specifier value. When such a support is
present, it would put the offlined CPUs into a ceded state while
setting the cede latency specifier value in the VPA indicating
the latency expectation of the guest OS. Hypervisor may use this for
better energy savings.

- On platforms which don't have support for ceding with a latency specifier
hint, it puts the offlined CPUs into the rtas_stop_self() state.

- Currently, Nathan's patches have provided sysfs interfaces
/sys/devices/system/cpu/{probe, release}
to dynamically allocate and deallocate resources respectively. The current
process of allocating and deallocating CPUs would require two steps,
namely:
- Write to the "probe/release" file
- Write to the "online" file.

This patch-series combines these two operations such that when the user
writes to the "probe", the CPUs would not only get allocated to the
partition but also onlined. Similarly, when the user writes to "release",
the CPUs would be offlined before they are deallocated.

Thus, the user will now have to play with only one set of sysfs tunables
for performing deallocate and the sysfs tunable "online" can be used for
the purpose of deactivating the CPU while still letting the LPAR own it.

- It provides serializations for handling the writes the the "online" file
against the writes to "probe/release" pair so that at any given instant,
the user could either be deallocating the CPUs or deactivating it, but
not both.

With this approach, I hope that the objection regarding the layering
violation that was raised by Peter for previous approaches is
addressed, while also providing clean interfaces for the userspace tools
to distinguish between the deallocation and deactivation without exposing
any platform specific details.

The patchset has been tested on the available pseries platforms and it
works as per the expectations.

Comments are very much appreciated.
---

Gautham R Shenoy (4):
pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
pseries: Add code to online/offline CPUs of a DLPAR node.
pSeries: Add hooks to put the CPU into an appropriate offline state
pSeries: extended_cede_processor() helper function.


arch/powerpc/include/asm/lppaca.h | 9 +
arch/powerpc/platforms/pseries/dlpar.c | 129 ++++++++++++++++++-
arch/powerpc/platforms/pseries/hotplug-cpu.c | 157 ++++++++++++++++++++++-
arch/powerpc/platforms/pseries/offline_states.h | 18 +++
arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 +++
arch/powerpc/platforms/pseries/smp.c | 19 +++
arch/powerpc/xmon/xmon.c | 3
drivers/base/cpu.c | 2
include/linux/cpu.h | 13 ++
9 files changed, 356 insertions(+), 16 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

--
Thanks and Regards
gautham.


Subject: [PATCH v4 1/4] pSeries: extended_cede_processor() helper function.

This patch provides an extended_cede_processor() helper function
which takes the cede latency hint as an argument. This hint is to be passed
on to the hypervisor to cede to the corresponding state on platforms
which support it.

Signed-off-by: Gautham R Shenoy <[email protected]>
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/include/asm/lppaca.h | 9 ++++++++-
arch/powerpc/platforms/pseries/plpar_wrappers.h | 22 ++++++++++++++++++++++
arch/powerpc/xmon/xmon.c | 3 ++-
3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index f78f65c..14b592d 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -100,7 +100,14 @@ struct lppaca {
// Used to pass parms from the OS to PLIC for SetAsrAndRfid
u64 saved_gpr3; // Saved GPR3 x20-x27
u64 saved_gpr4; // Saved GPR4 x28-x2F
- u64 saved_gpr5; // Saved GPR5 x30-x37
+ union {
+ u64 saved_gpr5; /* Saved GPR5 x30-x37 */
+ struct {
+ u8 cede_latency_hint; /* x30 */
+ u8 reserved[7]; /* x31-x36 */
+ } fields;
+ } gpr5_dword;
+

u8 dtl_enable_mask; // Dispatch Trace Log mask x38-x38
u8 donate_dedicated_cpu; // Donate dedicated CPU cycles x39-x39
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..0603c91 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -9,11 +9,33 @@ static inline long poll_pending(void)
return plpar_hcall_norets(H_POLL_PENDING);
}

+static inline u8 get_cede_latency_hint(void)
+{
+ return get_lppaca()->gpr5_dword.fields.cede_latency_hint;
+}
+
+static inline void set_cede_latency_hint(u8 latency_hint)
+{
+ get_lppaca()->gpr5_dword.fields.cede_latency_hint = latency_hint;
+}
+
static inline long cede_processor(void)
{
return plpar_hcall_norets(H_CEDE);
}

+static inline long extended_cede_processor(unsigned long latency_hint)
+{
+ long rc;
+ u8 old_latency_hint = get_cede_latency_hint();
+
+ set_cede_latency_hint(latency_hint);
+ rc = cede_processor();
+ set_cede_latency_hint(old_latency_hint);
+
+ return rc;
+}
+
static inline long vpa_call(unsigned long flags, unsigned long cpu,
unsigned long vpa)
{
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c6f0a71..57124cf 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1623,7 +1623,8 @@ static void super_regs(void)
ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1);
printf(" Saved Gpr3=%.16lx Saved Gpr4=%.16lx \n",
ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4);
- printf(" Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5);
+ printf(" Saved Gpr5=%.16lx \n",
+ ptrLpPaca->gpr5_dword.saved_gpr5);
}
#endif

Subject: [PATCH v4 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state

When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.

This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.

The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
rtas_stop_self()

- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
hint specifier.

The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().

Signed-off-by: Gautham R Shenoy <[email protected]>
Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/platforms/pseries/hotplug-cpu.c | 157 ++++++++++++++++++++++-
arch/powerpc/platforms/pseries/offline_states.h | 18 +++
arch/powerpc/platforms/pseries/smp.c | 19 +++
3 files changed, 185 insertions(+), 9 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index ebff6d9..8a47db4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
#include <asm/pSeries_reconfig.h>
#include "xics.h"
#include "plpar_wrappers.h"
+#include "offline_states.h"

/* This version can't take the spinlock, because it never returns */
static struct rtas_args rtas_stop_self_args = {
@@ -39,6 +40,37 @@ static struct rtas_args rtas_stop_self_args = {
.rets = &rtas_stop_self_args.args[0],
};

+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+ CPU_STATE_OFFLINE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
+
+static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+ return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+ per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+ return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+ per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+ per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
static void rtas_stop_self(void)
{
struct rtas_args *args = &rtas_stop_self_args;
@@ -56,11 +88,66 @@ static void rtas_stop_self(void)

static void pseries_mach_cpu_die(void)
{
+ unsigned int cpu = smp_processor_id();
+ unsigned int hwcpu = hard_smp_processor_id();
+ u8 cede_latency_hint = 0;
+
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
- unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
- rtas_stop_self();
+
+ if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+ set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+ cede_latency_hint = 2;
+
+ get_lppaca()->idle = 1;
+ if (!get_lppaca()->shared_proc)
+ get_lppaca()->donate_dedicated_cpu = 1;
+
+ printk(KERN_INFO
+ "cpu %u (hwid %u) ceding for offline with hint %d\n",
+ cpu, hwcpu, cede_latency_hint);
+ while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+ extended_cede_processor(cede_latency_hint);
+ printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n",
+ cpu, hwcpu);
+ printk(KERN_INFO
+ "Decrementer value = %x Timebase value = %llx\n",
+ get_dec(), get_tb());
+ }
+
+ printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n",
+ cpu, hwcpu);
+
+ if (!get_lppaca()->shared_proc)
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+ }
+
+ if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
+ unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+ /*
+ * NOTE: Calling start_secondary() here for now to
+ * start new context.
+ * However, need to do it cleanly by resetting the
+ * stack pointer.
+ */
+ start_secondary();
+ goto out_bug;
+
+ }
+
+ if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+ set_cpu_current_state(cpu, CPU_STATE_OFFLINE);
+ unregister_slb_shadow(hard_smp_processor_id(),
+ __pa(get_slb_shadow()));
+ rtas_stop_self();
+ goto out_bug;
+ }
+
+out_bug:
/* Should never get here... */
BUG();
for(;;);
@@ -109,15 +196,28 @@ static int pseries_cpu_disable(void)
static void pseries_cpu_die(unsigned int cpu)
{
int tries;
- int cpu_status;
+ int cpu_status = 1;
unsigned int pcpu = get_hard_smp_processor_id(cpu);

- for (tries = 0; tries < 25; tries++) {
- cpu_status = query_cpu_stopped(pcpu);
- if (cpu_status == 0 || cpu_status == -1)
- break;
- cpu_relax();
+ if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+ cpu_status = 1;
+ for (tries = 0; tries < 1000; tries++) {
+ if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
+ cpu_status = 0;
+ break;
+ }
+ cpu_relax();
+ }
+ } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+ for (tries = 0; tries < 25; tries++) {
+ cpu_status = query_cpu_stopped(pcpu);
+ if (cpu_status == 0 || cpu_status == -1)
+ break;
+ cpu_relax();
+ }
}
+
if (cpu_status != 0) {
printk("Querying DEAD? cpu %i (%i) shows %i\n",
cpu, pcpu, cpu_status);
@@ -252,10 +352,41 @@ static struct notifier_block pseries_smp_nb = {
.notifier_call = pseries_smp_notifier,
};

+#define MAX_CEDE_LATENCY_LEVELS 4
+#define CEDE_LATENCY_PARAM_LENGTH 10
+#define CEDE_LATENCY_PARAM_MAX_LENGTH \
+ (MAX_CEDE_LATENCY_LEVELS * CEDE_LATENCY_PARAM_LENGTH * sizeof(char))
+#define CEDE_LATENCY_TOKEN 45
+
+static char cede_parameters[CEDE_LATENCY_PARAM_MAX_LENGTH];
+
+static int parse_cede_parameters(void)
+{
+ int call_status;
+
+ memset(cede_parameters, 0, CEDE_LATENCY_PARAM_MAX_LENGTH);
+ call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+ NULL,
+ CEDE_LATENCY_TOKEN,
+ __pa(cede_parameters),
+ CEDE_LATENCY_PARAM_MAX_LENGTH);
+
+ if (call_status != 0)
+ printk(KERN_INFO "CEDE_LATENCY: \
+ %s %s Error calling get-system-parameter(0x%x)\n",
+ __FILE__, __func__, call_status);
+ else
+ printk(KERN_INFO "CEDE_LATENCY: \
+ get-system-parameter successful.\n");
+
+ return call_status;
+}
+
static int __init pseries_cpu_hotplug_init(void)
{
struct device_node *np;
const char *typep;
+ int cpu;

for_each_node_by_name(np, "interrupt-controller") {
typep = of_get_property(np, "compatible", NULL);
@@ -283,8 +414,16 @@ static int __init pseries_cpu_hotplug_init(void)
smp_ops->cpu_die = pseries_cpu_die;

/* Processors can be added/removed only on LPAR */
- if (firmware_has_feature(FW_FEATURE_LPAR))
+ if (firmware_has_feature(FW_FEATURE_LPAR)) {
pSeries_reconfig_notifier_register(&pseries_smp_nb);
+ cpu_maps_update_begin();
+ if (parse_cede_parameters() == 0) {
+ default_offline_state = CPU_STATE_INACTIVE;
+ for_each_online_cpu(cpu)
+ set_default_offline_state(cpu);
+ }
+ cpu_maps_update_done();
+ }

return 0;
}
diff --git a/arch/powerpc/platforms/pseries/offline_states.h b/arch/powerpc/platforms/pseries/offline_states.h
new file mode 100644
index 0000000..22574e0
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_states.h
@@ -0,0 +1,18 @@
+#ifndef _OFFLINE_STATES_H_
+#define _OFFLINE_STATES_H_
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+ CPU_STATE_OFFLINE,
+ CPU_STATE_INACTIVE,
+ CPU_STATE_ONLINE,
+ CPU_MAX_OFFLINE_STATES
+};
+
+extern enum cpu_state_vals get_cpu_current_state(int cpu);
+extern void set_cpu_current_state(int cpu, enum cpu_state_vals state);
+extern enum cpu_state_vals get_preferred_offline_state(int cpu);
+extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state);
+extern void set_default_offline_state(int cpu);
+extern int start_secondary(void);
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 440000c..8868c01 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
#include "plpar_wrappers.h"
#include "pseries.h"
#include "xics.h"
+#include "offline_states.h"


/*
@@ -84,6 +85,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
/* Fixup atomic count: it exited inside IRQ handler. */
task_thread_info(paca[lcpu].__current)->preempt_count = 0;

+ if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
+ goto out;
+
/*
* If the RTAS start-cpu token does not exist then presume the
* cpu is already spinning.
@@ -98,6 +102,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
return 0;
}

+out:
return 1;
}

@@ -111,12 +116,16 @@ static void __devinit smp_xics_setup_cpu(int cpu)
vpa_init(cpu);

cpu_clear(cpu, of_spin_map);
+ set_cpu_current_state(cpu, CPU_STATE_ONLINE);
+ set_default_offline_state(cpu);

}
#endif /* CONFIG_XICS */

static void __devinit smp_pSeries_kick_cpu(int nr)
{
+ long rc;
+ unsigned long hcpuid;
BUG_ON(nr < 0 || nr >= NR_CPUS);

if (!smp_startup_cpu(nr))
@@ -128,6 +137,16 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
* the processor will continue on to secondary_start
*/
paca[nr].cpu_start = 1;
+
+ set_preferred_offline_state(nr, CPU_STATE_ONLINE);
+
+ if (get_cpu_current_state(nr) == CPU_STATE_INACTIVE) {
+ hcpuid = get_hard_smp_processor_id(nr);
+ rc = plpar_hcall_norets(H_PROD, hcpuid);
+ if (rc != H_SUCCESS)
+ panic("Error: Prod to wake up processor %d Ret= %ld\n",
+ nr, rc);
+ }
}

static int smp_pSeries_cpu_bootable(unsigned int nr)

Subject: [PATCH v4 3/4] pseries: Add code to online/offline CPUs of a DLPAR node.

Currently the cpu-allocation/deallocation on pSeries is a
two step process from the Userspace.

- Set the indicators and update the device tree by writing to the sysfs
tunable "probe" during allocation and "release" during deallocation.
- Online / Offline the CPUs of the allocated/would_be_deallocated node by
writing to the sysfs tunable "online".

This patch adds kernel code to online/offline the CPUs soon_after/just_before
they have been allocated/would_be_deallocated. This way, the userspace tool
that performs DLPAR operations would only have to deal with one set of sysfs
tunables namely "probe" and release".

Signed-off-by: Gautham R Shenoy <[email protected]>
Signed-off-by: Nathan Fontenot <[email protected]>
---
arch/powerpc/platforms/pseries/dlpar.c | 103 ++++++++++++++++++++++++++++++++
1 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 84d156b..9752386 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -20,7 +20,7 @@
#include <linux/sysdev.h>
#include <linux/sysfs.h>
#include <linux/cpu.h>
-
+#include "offline_states.h"

#include <asm/prom.h>
#include <asm/machdep.h>
@@ -357,6 +357,98 @@ int remove_device_tree_nodes(struct device_node *dn)
return rc;
}

+int online_node_cpus(struct device_node *dn)
+{
+ int rc = 0;
+ unsigned int cpu;
+ int len, nthreads, i;
+ const u32 *intserv;
+
+ intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+ if (!intserv)
+ return -EINVAL;
+
+ nthreads = len / sizeof(u32);
+
+ cpu_maps_update_begin();
+ for (i = 0; i < nthreads; i++) {
+ for_each_present_cpu(cpu) {
+ if (get_hard_smp_processor_id(cpu) != intserv[i])
+ continue;
+ BUG_ON(get_cpu_current_state(cpu)
+ != CPU_STATE_OFFLINE);
+ cpu_maps_update_done();
+ rc = cpu_up(cpu);
+ if (rc)
+ goto out;
+ cpu_maps_update_begin();
+
+ break;
+ }
+ if (cpu == num_possible_cpus())
+ printk(KERN_WARNING "Could not find cpu to online "
+ "with physical id 0x%x\n", intserv[i]);
+ }
+ cpu_maps_update_done();
+
+out:
+ return rc;
+
+}
+
+int offline_node_cpus(struct device_node *dn)
+{
+ int rc = 0;
+ unsigned int cpu;
+ int len, nthreads, i;
+ const u32 *intserv;
+
+ intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+ if (!intserv)
+ return -EINVAL;
+
+ nthreads = len / sizeof(u32);
+
+ cpu_maps_update_begin();
+ for (i = 0; i < nthreads; i++) {
+ for_each_present_cpu(cpu) {
+ if (get_hard_smp_processor_id(cpu) != intserv[i])
+ continue;
+
+ if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
+ break;
+
+ if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
+ cpu_maps_update_done();
+ rc = cpu_down(cpu);
+ if (rc)
+ goto out;
+ cpu_maps_update_begin();
+ break;
+
+ }
+
+ /*
+ * The cpu is in CPU_STATE_INACTIVE.
+ * Upgrade it's state to CPU_STATE_OFFLINE.
+ */
+ set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
+ BUG_ON(plpar_hcall_norets(H_PROD, intserv[i])
+ != H_SUCCESS);
+ __cpu_die(cpu);
+ break;
+ }
+ if (cpu == num_possible_cpus())
+ printk(KERN_WARNING "Could not find cpu to offline "
+ "with physical id 0x%x\n", intserv[i]);
+ }
+ cpu_maps_update_done();
+
+out:
+ return rc;
+
+}
+
#define DR_ENTITY_SENSE 9003
#define DR_ENTITY_PRESENT 1
#define DR_ENTITY_UNUSABLE 2
@@ -591,6 +683,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
if (rc)
release_drc(drc_index);

+ rc = online_node_cpus(dn);
+
return rc ? rc : count;
}

@@ -611,6 +705,11 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
return -EINVAL;
}

+ rc = offline_node_cpus(dn);
+
+ if (rc)
+ goto out;
+
rc = release_drc(*drc_index);
if (rc) {
of_node_put(dn);
@@ -622,6 +721,8 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
acquire_drc(*drc_index);

of_node_put(dn);
+
+out:
return rc ? rc : count;
}

Subject: [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate

Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
information.

- Online/offline the allocated/deallocated CPU.

This is achieved by writing to the sysfs tunables "probe" during allocation
and "release" during deallocation.

At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable "online".

It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the "online" sysfs tunable thereby causing the deallocate process to fail.

The solution to this is to serialize writes to the "probe/release" sysfs
tunable with the writes to the "online" sysfs tunable.

This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES

Signed-off-by: Gautham R Shenoy <[email protected]>
---
arch/powerpc/platforms/pseries/dlpar.c | 26 ++++++++++++++++++++++----
drivers/base/cpu.c | 2 ++
include/linux/cpu.h | 13 +++++++++++++
3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 9752386..fc261e6 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -644,6 +644,18 @@ static ssize_t memory_release_store(struct class *class, const char *buf,
return rc ? -1 : count;
}

+static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+ mutex_lock(&pseries_cpu_hotplug_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+ mutex_unlock(&pseries_cpu_hotplug_mutex);
+}
+
static ssize_t cpu_probe_store(struct class *class, const char *buf,
size_t count)
{
@@ -656,14 +668,15 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
if (rc)
return -EINVAL;

+ cpu_hotplug_driver_lock();
rc = acquire_drc(drc_index);
if (rc)
- return rc;
+ goto out;

dn = configure_connector(drc_index);
if (!dn) {
release_drc(drc_index);
- return rc;
+ goto out;
}

/* fixup dn name */
@@ -672,7 +685,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
if (!cpu_name) {
free_cc_nodes(dn);
release_drc(drc_index);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out;
}

sprintf(cpu_name, "/cpus/%s", dn->full_name);
@@ -684,6 +698,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
release_drc(drc_index);

rc = online_node_cpus(dn);
+out:
+ cpu_hotplug_driver_unlock();

return rc ? rc : count;
}
@@ -705,6 +721,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
return -EINVAL;
}

+ cpu_hotplug_driver_lock();
rc = offline_node_cpus(dn);

if (rc)
@@ -713,7 +730,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
rc = release_drc(*drc_index);
if (rc) {
of_node_put(dn);
- return rc;
+ goto out;
}

rc = remove_device_tree_nodes(dn);
@@ -723,6 +740,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
of_node_put(dn);

out:
+ cpu_hotplug_driver_unlock();
return rc ? rc : count;
}

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..07c3f05 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
struct cpu *cpu = container_of(dev, struct cpu, sysdev);
ssize_t ret;

+ cpu_hotplug_driver_lock();
switch (buf[0]) {
case '0':
ret = cpu_down(cpu->sysdev.id);
@@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
default:
ret = -EINVAL;
}
+ cpu_hotplug_driver_unlock();

if (ret >= 0)
ret = count;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4753619..b0ad4e1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -115,6 +115,19 @@ extern void put_online_cpus(void);
#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
int cpu_down(unsigned int cpu);

+#ifdef CONFIG_PPC_PSERIES
+extern void cpu_hotplug_driver_lock(void);
+extern void cpu_hotplug_driver_unlock(void);
+#else
+static inline void cpu_hotplug_driver_lock(void)
+{
+}
+
+static inline void cpu_hotplug_driver_unlock(void)
+{
+}
+#endif
+
#else /* CONFIG_HOTPLUG_CPU */

#define get_online_cpus() do { } while (0)

2009-10-27 03:25:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate

On Fri, 2009-10-09 at 14:01 +0530, Gautham R Shenoy wrote:
> Currently the cpu-allocation/deallocation process comprises of two steps:
> - Set the indicators and to update the device tree with DLPAR node
> information.
>
> - Online/offline the allocated/deallocated CPU.
>
> This is achieved by writing to the sysfs tunables "probe" during allocation
> and "release" during deallocation.
>
> At the sametime, the userspace can independently online/offline the CPUs of
> the system using the sysfs tunable "online".
>
> It is quite possible that when a userspace tool offlines a CPU
> for the purpose of deallocation and is in the process of updating the device
> tree, some other userspace tool could bring the CPU back online by writing to
> the "online" sysfs tunable thereby causing the deallocate process to fail.
>
> The solution to this is to serialize writes to the "probe/release" sysfs
> tunable with the writes to the "online" sysfs tunable.
>
> This patch employs a mutex to provide this serialization, which is a no-op on
> all architectures except PPC_PSERIES
>
> Signed-off-by: Gautham R Shenoy <[email protected]>

Peter, did you get a chance to review this one ?

Cheers,
Ben.

> ---
> arch/powerpc/platforms/pseries/dlpar.c | 26 ++++++++++++++++++++++----
> drivers/base/cpu.c | 2 ++
> include/linux/cpu.h | 13 +++++++++++++
> 3 files changed, 37 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index 9752386..fc261e6 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -644,6 +644,18 @@ static ssize_t memory_release_store(struct class *class, const char *buf,
> return rc ? -1 : count;
> }
>
> +static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
> +
> +void cpu_hotplug_driver_lock()
> +{
> + mutex_lock(&pseries_cpu_hotplug_mutex);
> +}
> +
> +void cpu_hotplug_driver_unlock()
> +{
> + mutex_unlock(&pseries_cpu_hotplug_mutex);
> +}
> +
> static ssize_t cpu_probe_store(struct class *class, const char *buf,
> size_t count)
> {
> @@ -656,14 +668,15 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
> if (rc)
> return -EINVAL;
>
> + cpu_hotplug_driver_lock();
> rc = acquire_drc(drc_index);
> if (rc)
> - return rc;
> + goto out;
>
> dn = configure_connector(drc_index);
> if (!dn) {
> release_drc(drc_index);
> - return rc;
> + goto out;
> }
>
> /* fixup dn name */
> @@ -672,7 +685,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
> if (!cpu_name) {
> free_cc_nodes(dn);
> release_drc(drc_index);
> - return -ENOMEM;
> + rc = -ENOMEM;
> + goto out;
> }
>
> sprintf(cpu_name, "/cpus/%s", dn->full_name);
> @@ -684,6 +698,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
> release_drc(drc_index);
>
> rc = online_node_cpus(dn);
> +out:
> + cpu_hotplug_driver_unlock();
>
> return rc ? rc : count;
> }
> @@ -705,6 +721,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
> return -EINVAL;
> }
>
> + cpu_hotplug_driver_lock();
> rc = offline_node_cpus(dn);
>
> if (rc)
> @@ -713,7 +730,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
> rc = release_drc(*drc_index);
> if (rc) {
> of_node_put(dn);
> - return rc;
> + goto out;
> }
>
> rc = remove_device_tree_nodes(dn);
> @@ -723,6 +740,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
> of_node_put(dn);
>
> out:
> + cpu_hotplug_driver_unlock();
> return rc ? rc : count;
> }
>
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index e62a4cc..07c3f05 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
> struct cpu *cpu = container_of(dev, struct cpu, sysdev);
> ssize_t ret;
>
> + cpu_hotplug_driver_lock();
> switch (buf[0]) {
> case '0':
> ret = cpu_down(cpu->sysdev.id);
> @@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
> default:
> ret = -EINVAL;
> }
> + cpu_hotplug_driver_unlock();
>
> if (ret >= 0)
> ret = count;
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 4753619..b0ad4e1 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -115,6 +115,19 @@ extern void put_online_cpus(void);
> #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
> int cpu_down(unsigned int cpu);
>
> +#ifdef CONFIG_PPC_PSERIES
> +extern void cpu_hotplug_driver_lock(void);
> +extern void cpu_hotplug_driver_unlock(void);
> +#else
> +static inline void cpu_hotplug_driver_lock(void)
> +{
> +}
> +
> +static inline void cpu_hotplug_driver_unlock(void)
> +{
> +}
> +#endif
> +
> #else /* CONFIG_HOTPLUG_CPU */
>
> #define get_online_cpus() do { } while (0)
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev