From: "Gautham R. Shenoy" <[email protected]>
Hi,
This is the second version of the patchset containing the fixes to
make CPU-Hotplug working on correctly on POWER9 DD1 systems.
The earlier version of this patchset can be found here:
https://lkml.org/lkml/2017/3/13/46
This patch addresses the feedback provided by Nicholas Piggin. The key
changes from the earlier version are:
- Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.
- Adds a more descriptive warning describing the consequences of no
suitable default/deepest stop states being available.
- Update the commit log for "powernv:Recover correct PACA on wakeup
from a stop on P9 DD1" describing the problem that it is trying to
solve. Also ensure that on P9 DD1, NVGPRs are restored from the
stack on the way out by setting the NAPSTATELOST in paca.
I have retained Nicholas Piggin's Reviewed-by tag for the second patch
in this series (was first patch in the previous series) even though
the file containing the code is different.
There are four patches in the series.
- The first patch moves the platform-idle code invoked when a CPU offlined from powernv/smp.c to powernv/idle.c
- The second adds a fallback mechanism for CPU-Hotplug when no
platform idle state is available.
- The third patch ensures that the kernel doesn't use any stop state
that is not exposed by the firmware.
- The fourth patch adds a recovery framework for correctly recovering
paca pointer of the thread waking up from a stop.
These patches are based on v4.11-rc3.
The patches have been tested with stop1 (ESL=EC=1) as the
deepest-state entered into during CPU-Hotplug.
Gautham R. Shenoy (4):
powernv: Move CPU-Offline idle state invocation from smp.c to idle.c
powernv:smp: Add busy-wait loop as fall back for CPU-Hotplug
powernv:idle: Don't override default/deepest directly in kernel
powernv: Recover correct PACA on wakeup from a stop on P9 DD1
arch/powerpc/include/asm/cpuidle.h | 1 +
arch/powerpc/include/asm/paca.h | 5 ++
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/idle_book3s.S | 49 +++++++++++++++++-
arch/powerpc/platforms/powernv/idle.c | 88 ++++++++++++++++++++++++++------
arch/powerpc/platforms/powernv/powernv.h | 2 -
arch/powerpc/platforms/powernv/smp.c | 18 +------
7 files changed, 129 insertions(+), 35 deletions(-)
--
1.9.4
From: "Gautham R. Shenoy" <[email protected]>
POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
core. Thus the HSPRG0 of a thread waking up from can contain the paca
pointer of its sibling.
This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.
At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.
On wakeup from a stop, the thread will determine its index in the core
from the lower 2 bits of the PIR register and recover its PACA pointer
by indexing into the correct slot in the provisioned space in the
current PACA.
Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.
[Changelog written with inputs from [email protected]]
Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/include/asm/paca.h | 5 ++++
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/idle_book3s.S | 49 ++++++++++++++++++++++++++++++++++-
arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++
4 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 708c3e5..4405630 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -172,6 +172,11 @@ struct paca_struct {
u8 thread_mask;
/* Mask to denote subcore sibling threads */
u8 subcore_sibling_mask;
+ /*
+ * Pointer to an array which contains pointer
+ * to the sibling threads' paca.
+ */
+ struct paca_struct *thread_sibling_pacas[8];
#endif
#ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4367e7d..6ec5016 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -727,6 +727,7 @@ int main(void)
OFFSET(PACA_THREAD_IDLE_STATE, paca_struct, thread_idle_state);
OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
+ OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
#endif
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 9957287..c2f2cfb 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -375,6 +375,46 @@ _GLOBAL(power9_idle_stop)
li r4,1
b pnv_powersave_common
/* No return */
+
+
+/*
+ * On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
+ * HSPRG0 will be set to the HSPRG0 value of one of the
+ * threads in this core. Thus the value we have in r13
+ * may not be this thread's paca pointer.
+ *
+ * Fortunately, the PIR remains invariant. Since this thread's
+ * paca pointer is recorded in all its sibling's paca, we can
+ * correctly recover this thread's paca pointer if we
+ * know the index of this thread in the core.
+ *
+ * This index can be obtained from the lower two bits of the PIR.
+ *
+ * i.e, thread's position in the core = PIR[62:63].
+ * If this value is i, then this thread's paca is
+ * paca->thread_sibling_pacas[i].
+ */
+power9_dd1_recover_paca:
+ mfspr r4, SPRN_PIR
+ clrldi r4, r4, 62
+ /*
+ * Since each entry in thread_sibling_pacas is 8 bytes
+ * we need to left-shift by 3 bits. Thus r4 = i * 8
+ */
+ sldi r4, r4, 3
+ /* Get &paca->thread_sibling_pacas[0] in r5 */
+ addi r5, r13, PACA_SIBLING_PACA_PTRS
+ /* Load paca->thread_sibling_pacas[i] into r13 */
+ ldx r13, r4, r5
+ SET_PACA(r13)
+ /*
+ * Indicate that we have lost NVGPR state
+ * which needs to be restored from the stack.
+ */
+ li r3, 1
+ stb r0,PACA_NAPSTATELOST(r13)
+ blr
+
/*
* Called from reset vector. Check whether we have woken up with
* hypervisor state loss. If yes, restore hypervisor state and return
@@ -385,7 +425,14 @@ _GLOBAL(power9_idle_stop)
*/
_GLOBAL(pnv_restore_hyp_resource)
BEGIN_FTR_SECTION
- ld r2,PACATOC(r13);
+BEGIN_FTR_SECTION_NESTED(70)
+ mflr r6
+ bl power9_dd1_recover_paca
+ mtlr r6
+ ld r2, PACATOC(r13)
+FTR_SECTION_ELSE_NESTED(70)
+ ld r2, PACATOC(r13)
+ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
/*
* POWER ISA 3. Use PSSCR to determine if we
* are waking up from deep idle state
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 63ade78..5790d4c 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -560,6 +560,28 @@ static int __init pnv_init_idle_states(void)
pnv_alloc_idle_core_states();
+ /*
+ * For each CPU, record its PACA address in each of it's
+ * sibling thread's PACA at the slot corresponding to this
+ * CPU's index in the core.
+ */
+ if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+ int cpu;
+
+ pr_info("powernv: idle: Saving PACA pointers of all CPUs in their thread sibling PACA\n");
+ for_each_possible_cpu(cpu) {
+ int base_cpu = cpu_first_thread_sibling(cpu);
+ int idx = cpu_thread_in_core(cpu);
+ int i;
+
+ for (i = 0; i < threads_per_core; i++) {
+ int j = base_cpu + i;
+
+ paca[j].thread_sibling_pacas[idx] = &paca[cpu];
+ }
+ }
+ }
+
if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
ppc_md.power_save = power7_idle;
--
1.9.4
From: "Gautham R. Shenoy" <[email protected]>
Currently, the powernv cpu-offline function assumes that platform idle
states such as stop on POWER9, winkle/sleep/nap on POWER8 are always
available. On POWER8, it picks nap as the default state if other deep
idle states like sleep/winkle are not available and enabled in the
platform.
On POWER9, nap is not available and all idle states are managed by
STOP instruction. The parameters to the idle state are passed through
processor stop status control register (PSSCR). Hence as such
executing STOP would take parameters from current PSSCR. We do not
want to make any assumptions in kernel on what STOP states and PSSCR
features are configured by the platform.
Ideally platform will configure a good set of stop states that can be
used in the kernel. We would like to start with a clean slate, if the
platform choose to not configure any state or there is an error in
platform firmware that lead to no stop states being configured or
allowed to be requested.
This patch adds a fallback method for CPU-Hotplug that is similar to
snooze loop at idle where the threads are left to spin at low priority
and hence reduce the cycles consumed.
This is a safe fallback mechanism in the case when no stop state would
be requested if the platform firmware did not configure them most
likely due to an error condition.
Requesting a stop state when the platform has not configured them or
enabled them would lead to further error conditions which could be
difficult to debug.
[Changelog written with inputs from [email protected]]
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/platforms/powernv/idle.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 419edff..f335e0f 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -283,8 +283,16 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
(idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
srr1 = power7_sleep();
- } else {
+ } else if (idle_states & OPAL_PM_NAP_ENABLED) {
srr1 = power7_nap(1);
+ } else {
+ /* This is the fallback method. We emulate snooze */
+ while (!generic_check_cpu_restart(cpu)) {
+ HMT_low();
+ HMT_very_low();
+ }
+ srr1 = 0;
+ HMT_medium();
}
return srr1;
--
1.9.4
From: "Gautham R. Shenoy" <[email protected]>
Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.
Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/include/asm/cpuidle.h | 1 +
arch/powerpc/platforms/powernv/idle.c | 25 +++++++++++++++++++++++++
arch/powerpc/platforms/powernv/powernv.h | 2 --
arch/powerpc/platforms/powernv/smp.c | 18 ++----------------
4 files changed, 28 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index 1557315..4649ca0 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -46,6 +46,7 @@
extern u64 pnv_first_deep_stop_state;
+unsigned long pnv_cpu_offline(unsigned int cpu);
int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
static inline void report_invalid_psscr_val(u64 psscr_val, int err)
{
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 4ee837e..419edff 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -266,6 +266,31 @@ static void power9_idle(void)
u64 pnv_deepest_stop_psscr_mask;
/*
+ * pnv_cpu_offline: A function that puts the CPU into the deepest
+ * available platform idle state on a CPU-Offline.
+ */
+unsigned long pnv_cpu_offline(unsigned int cpu)
+{
+ unsigned long srr1;
+
+ u32 idle_states = pnv_get_supported_cpuidle_states();
+
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
+ pnv_deepest_stop_psscr_mask);
+ } else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
+ srr1 = power7_winkle();
+ } else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
+ (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
+ srr1 = power7_sleep();
+ } else {
+ srr1 = power7_nap(1);
+ }
+
+ return srr1;
+}
+
+/*
* Power ISA 3.0 idle initialization.
*
* POWER ISA 3.0 defines a new SPR Processor stop Status and Control
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 6130522..6dbc0a1 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -18,8 +18,6 @@ static inline void pnv_pci_shutdown(void) { }
#endif
extern u32 pnv_get_supported_cpuidle_states(void);
-extern u64 pnv_deepest_stop_psscr_val;
-extern u64 pnv_deepest_stop_psscr_mask;
extern void pnv_lpc_init(void);
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 8b67e1e..914b456 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -35,6 +35,7 @@
#include <asm/dbell.h>
#include <asm/kvm_ppc.h>
#include <asm/ppc-opcode.h>
+#include <asm/cpuidle.h>
#include "powernv.h"
@@ -140,7 +141,6 @@ static void pnv_smp_cpu_kill_self(void)
{
unsigned int cpu;
unsigned long srr1, wmask;
- u32 idle_states;
/* Standard hot unplug procedure */
local_irq_disable();
@@ -155,8 +155,6 @@ static void pnv_smp_cpu_kill_self(void)
if (cpu_has_feature(CPU_FTR_ARCH_207S))
wmask = SRR1_WAKEMASK_P8;
- idle_states = pnv_get_supported_cpuidle_states();
-
/* We don't want to take decrementer interrupts while we are offline,
* so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9)
* enabled as to let IPIs in.
@@ -184,19 +182,7 @@ static void pnv_smp_cpu_kill_self(void)
kvmppc_set_host_ipi(cpu, 0);
ppc64_runlatch_off();
-
- if (cpu_has_feature(CPU_FTR_ARCH_300)) {
- srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
- pnv_deepest_stop_psscr_mask);
- } else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
- srr1 = power7_winkle();
- } else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
- (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
- srr1 = power7_sleep();
- } else {
- srr1 = power7_nap(1);
- }
-
+ srr1 = pnv_cpu_offline(cpu);
ppc64_runlatch_on();
/*
--
1.9.4
From: "Gautham R. Shenoy" <[email protected]>
Currently during idle-init on power9, if we don't find suitable stop
states in the device tree that can be used as the
default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
stop state psscr to be used by power9_idle and deepest stop state
which is used by CPU-Hotplug.
However, if the platform firmware has not configured or enabled a stop
state, the kernel should not make any assumptions and fallback to a
default choice.
If the kernel uses a stop state that is not configured by the platform
firmware, it may lead to further failures which should be avoided.
In this patch, we modify the init code to ensure that the kernel uses
only the stop states exposed by the firmware through the device
tree. When a suitable default stop state isn't found, we disable
ppc_md.power_save for power9. Similarly, when a suitable
deepest_stop_state is not found in the device tree exported by the
firmware, fall back to the default busy-wait loop in the CPU-Hotplug
code.
[Changelog written with inputs from [email protected]]
Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/platforms/powernv/idle.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index f335e0f..63ade78 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -147,7 +147,6 @@ u32 pnv_get_supported_cpuidle_states(void)
}
EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
-
static void pnv_fastsleep_workaround_apply(void *info)
{
@@ -241,8 +240,9 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
* The default stop state that will be used by ppc_md.power_save
* function on platforms that support stop instruction.
*/
-u64 pnv_default_stop_val;
-u64 pnv_default_stop_mask;
+static u64 pnv_default_stop_val;
+static u64 pnv_default_stop_mask;
+static bool default_stop_found;
/*
* Used for ppc_md.power_save which needs a function with no parameters
@@ -262,8 +262,9 @@ static void power9_idle(void)
* psscr value and mask of the deepest stop idle state.
* Used when a cpu is offlined.
*/
-u64 pnv_deepest_stop_psscr_val;
-u64 pnv_deepest_stop_psscr_mask;
+static u64 pnv_deepest_stop_psscr_val;
+static u64 pnv_deepest_stop_psscr_mask;
+static bool deepest_stop_found;
/*
* pnv_cpu_offline: A function that puts the CPU into the deepest
@@ -275,7 +276,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
u32 idle_states = pnv_get_supported_cpuidle_states();
- if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
pnv_deepest_stop_psscr_mask);
} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
@@ -385,7 +386,6 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
int rc = 0, i;
- bool default_stop_found = false, deepest_stop_found = false;
psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
@@ -465,21 +465,24 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
}
}
- if (!default_stop_found) {
- pnv_default_stop_val = PSSCR_HV_DEFAULT_VAL;
- pnv_default_stop_mask = PSSCR_HV_DEFAULT_MASK;
- pr_warn("Setting default stop psscr val=0x%016llx,mask=0x%016llx\n",
+ if (unlikely(!default_stop_found)) {
+ pr_warn("cpuidle-powernv: No suitable default stop state found. Disabling platform idle.\n");
+ } else {
+ ppc_md.power_save = power9_idle;
+ pr_info("cpuidle-powernv: Default stop: psscr = 0x%016llx,mask=0x%016llx\n",
pnv_default_stop_val, pnv_default_stop_mask);
}
- if (!deepest_stop_found) {
- pnv_deepest_stop_psscr_val = PSSCR_HV_DEFAULT_VAL;
- pnv_deepest_stop_psscr_mask = PSSCR_HV_DEFAULT_MASK;
- pr_warn("Setting default stop psscr val=0x%016llx,mask=0x%016llx\n",
+ if (unlikely(!deepest_stop_found)) {
+ pr_warn("cpuidle-powernv: No suitable stop state for CPU-Hotplug. Offlined CPUs will busy wait");
+ } else {
+ pr_info("cpuidle-powernv: Deepest stop: psscr = 0x%016llx,mask=0x%016llx\n",
pnv_deepest_stop_psscr_val,
pnv_deepest_stop_psscr_mask);
}
+ pr_info("cpuidle-powernv: Requested Level (RL) value of first deep stop = 0x%llx\n",
+ pnv_first_deep_stop_state);
out:
kfree(psscr_val);
kfree(psscr_mask);
@@ -559,8 +562,6 @@ static int __init pnv_init_idle_states(void)
if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
ppc_md.power_save = power7_idle;
- else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
- ppc_md.power_save = power9_idle;
out:
return 0;
--
1.9.4
On Mon, 20 Mar 2017 21:24:15 +0530
"Gautham R. Shenoy" <[email protected]> wrote:
> From: "Gautham R. Shenoy" <[email protected]>
>
> Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
> transitions the CPU to the deepest available platform idle state to a
> new function named pnv_cpu_offline() in powernv/idle.c. The rationale
> behind this code movement is that the data required to determine the
> deepest available platform state resides in powernv/idle.c.
>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
Looks good. As a nitpick, possibly one or two variables in idle.c could
become static (pnv_deepest_stop_psscr_*).
Reviewed-by: Nicholas Piggin <[email protected]>
On Mon, 20 Mar 2017 21:24:17 +0530
"Gautham R. Shenoy" <[email protected]> wrote:
> From: "Gautham R. Shenoy" <[email protected]>
>
> Currently during idle-init on power9, if we don't find suitable stop
> states in the device tree that can be used as the
> default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
> stop state psscr to be used by power9_idle and deepest stop state
> which is used by CPU-Hotplug.
>
> However, if the platform firmware has not configured or enabled a stop
> state, the kernel should not make any assumptions and fallback to a
> default choice.
>
> If the kernel uses a stop state that is not configured by the platform
> firmware, it may lead to further failures which should be avoided.
>
> In this patch, we modify the init code to ensure that the kernel uses
> only the stop states exposed by the firmware through the device
> tree. When a suitable default stop state isn't found, we disable
> ppc_md.power_save for power9. Similarly, when a suitable
> deepest_stop_state is not found in the device tree exported by the
> firmware, fall back to the default busy-wait loop in the CPU-Hotplug
> code.
>
> [Changelog written with inputs from [email protected]]
>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
> ---
> arch/powerpc/platforms/powernv/idle.c | 35 ++++++++++++++++++-----------------
> 1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index f335e0f..63ade78 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -147,7 +147,6 @@ u32 pnv_get_supported_cpuidle_states(void)
> }
> EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
>
> -
> static void pnv_fastsleep_workaround_apply(void *info)
>
> {
> @@ -241,8 +240,9 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
> * The default stop state that will be used by ppc_md.power_save
> * function on platforms that support stop instruction.
> */
> -u64 pnv_default_stop_val;
> -u64 pnv_default_stop_mask;
> +static u64 pnv_default_stop_val;
> +static u64 pnv_default_stop_mask;
> +static bool default_stop_found;
>
> /*
> * Used for ppc_md.power_save which needs a function with no parameters
> @@ -262,8 +262,9 @@ static void power9_idle(void)
> * psscr value and mask of the deepest stop idle state.
> * Used when a cpu is offlined.
> */
> -u64 pnv_deepest_stop_psscr_val;
> -u64 pnv_deepest_stop_psscr_mask;
> +static u64 pnv_deepest_stop_psscr_val;
> +static u64 pnv_deepest_stop_psscr_mask;
> +static bool deepest_stop_found;
Aha you have made them static. Nitpick withdrawn :)
The log messages look good now.
Reviewed-by: Nicholas Piggin <[email protected]>
On Mon, 20 Mar 2017 21:24:18 +0530
"Gautham R. Shenoy" <[email protected]> wrote:
> From: "Gautham R. Shenoy" <[email protected]>
>
> POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
> waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
> core. Thus the HSPRG0 of a thread waking up from can contain the paca
> pointer of its sibling.
>
> This patch implements a context recovery framework within threads of a
> core, by provisioning space in paca_struct for saving every sibling
> threads's paca pointers. Basically, we should be able to arrive at the
> right paca pointer from any of the thread's existing paca pointer.
>
> At bootup, during powernv idle-init, we save the paca address of every
> CPU in each one its siblings paca_struct in the slot corresponding to
> this CPU's index in the core.
>
> On wakeup from a stop, the thread will determine its index in the core
> from the lower 2 bits of the PIR register and recover its PACA pointer
> by indexing into the correct slot in the provisioned space in the
> current PACA.
>
> Furthermore, ensure that the NVGPRs are restored from the stack on the
> way out by setting the NAPSTATELOST in paca.
Thanks for expanding on this, it makes the patch easier to follow :)
As noted before, I think if we use PACA_EXNMI for system reset, then
*hopefully* there should be minimal races with the initial use of other
thread's PACA at the start of the exception. So I'll work on getting
that in, but it need not prevent this patch from being merged first
IMO.
> [Changelog written with inputs from [email protected]]
>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
> ---
> arch/powerpc/include/asm/paca.h | 5 ++++
> arch/powerpc/kernel/asm-offsets.c | 1 +
> arch/powerpc/kernel/idle_book3s.S | 49 ++++++++++++++++++++++++++++++++++-
> arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++
> 4 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 708c3e5..4405630 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -172,6 +172,11 @@ struct paca_struct {
> u8 thread_mask;
> /* Mask to denote subcore sibling threads */
> u8 subcore_sibling_mask;
> + /*
> + * Pointer to an array which contains pointer
> + * to the sibling threads' paca.
> + */
> + struct paca_struct *thread_sibling_pacas[8];
Is 8 the right number? I wonder if we have a define for it.
> #endif
>
> #ifdef CONFIG_PPC_BOOK3S_64
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 4367e7d..6ec5016 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -727,6 +727,7 @@ int main(void)
> OFFSET(PACA_THREAD_IDLE_STATE, paca_struct, thread_idle_state);
> OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
> OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
> + OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
> #endif
>
> DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 9957287..c2f2cfb 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -375,6 +375,46 @@ _GLOBAL(power9_idle_stop)
> li r4,1
> b pnv_powersave_common
> /* No return */
> +
> +
> +/*
> + * On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
> + * HSPRG0 will be set to the HSPRG0 value of one of the
> + * threads in this core. Thus the value we have in r13
> + * may not be this thread's paca pointer.
> + *
> + * Fortunately, the PIR remains invariant. Since this thread's
> + * paca pointer is recorded in all its sibling's paca, we can
> + * correctly recover this thread's paca pointer if we
> + * know the index of this thread in the core.
> + *
> + * This index can be obtained from the lower two bits of the PIR.
> + *
> + * i.e, thread's position in the core = PIR[62:63].
> + * If this value is i, then this thread's paca is
> + * paca->thread_sibling_pacas[i].
> + */
> +power9_dd1_recover_paca:
> + mfspr r4, SPRN_PIR
> + clrldi r4, r4, 62
Does SPRN_TIR work?
> + /*
> + * Since each entry in thread_sibling_pacas is 8 bytes
> + * we need to left-shift by 3 bits. Thus r4 = i * 8
> + */
> + sldi r4, r4, 3
> + /* Get &paca->thread_sibling_pacas[0] in r5 */
> + addi r5, r13, PACA_SIBLING_PACA_PTRS
> + /* Load paca->thread_sibling_pacas[i] into r13 */
> + ldx r13, r4, r5
> + SET_PACA(r13)
> + /*
> + * Indicate that we have lost NVGPR state
> + * which needs to be restored from the stack.
> + */
> + li r3, 1
> + stb r0,PACA_NAPSTATELOST(r13)
> + blr
> +
> /*
> * Called from reset vector. Check whether we have woken up with
> * hypervisor state loss. If yes, restore hypervisor state and return
> @@ -385,7 +425,14 @@ _GLOBAL(power9_idle_stop)
> */
> _GLOBAL(pnv_restore_hyp_resource)
> BEGIN_FTR_SECTION
> - ld r2,PACATOC(r13);
> +BEGIN_FTR_SECTION_NESTED(70)
> + mflr r6
> + bl power9_dd1_recover_paca
> + mtlr r6
> + ld r2, PACATOC(r13)
> +FTR_SECTION_ELSE_NESTED(70)
> + ld r2, PACATOC(r13)
> +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
This is quite neat now you've moved it to its own function. Nice.
It will be only a trivial clash with my patches now, I think.
Reviewed-by: Nicholas Piggin <[email protected]>
On Tue, 21 Mar 2017 02:59:46 +1000
Nicholas Piggin <[email protected]> wrote:
> On Mon, 20 Mar 2017 21:24:18 +0530
> This is quite neat now you've moved it to its own function. Nice.
> It will be only a trivial clash with my patches now, I think.
>
> Reviewed-by: Nicholas Piggin <[email protected]>
Hmm... This won't actually work for machine check wakeups either.
We're doing the low level machine check before checking for wakeup,
which complicates things a little bit.
Simplest way to handle that might be to just always immediately
call the restore paca function at machine check entry, and then
just always mark machine checks as not-recoverable for DD1.
Thanks,
Nick
Hi Nick,
On Tue, Mar 21, 2017 at 02:35:17AM +1000, Nicholas Piggin wrote:
> On Mon, 20 Mar 2017 21:24:15 +0530
> "Gautham R. Shenoy" <[email protected]> wrote:
>
> > From: "Gautham R. Shenoy" <[email protected]>
> >
> > Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
> > transitions the CPU to the deepest available platform idle state to a
> > new function named pnv_cpu_offline() in powernv/idle.c. The rationale
> > behind this code movement is that the data required to determine the
> > deepest available platform state resides in powernv/idle.c.
> >
> > Signed-off-by: Gautham R. Shenoy <[email protected]>
>
> Looks good. As a nitpick, possibly one or two variables in idle.c could
> become static (pnv_deepest_stop_psscr_*).
>
I changed that, but it ended up being in the next patch.
> Reviewed-by: Nicholas Piggin <[email protected]>
>
Thanks!
--
Thanks and Regards
gautham.
Hi,
On Tue, Mar 21, 2017 at 02:39:34AM +1000, Nicholas Piggin wrote:
> > @@ -241,8 +240,9 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
> > * The default stop state that will be used by ppc_md.power_save
> > * function on platforms that support stop instruction.
> > */
> > -u64 pnv_default_stop_val;
> > -u64 pnv_default_stop_mask;
> > +static u64 pnv_default_stop_val;
> > +static u64 pnv_default_stop_mask;
> > +static bool default_stop_found;
> >
> > /*
> > * Used for ppc_md.power_save which needs a function with no parameters
> > @@ -262,8 +262,9 @@ static void power9_idle(void)
> > * psscr value and mask of the deepest stop idle state.
> > * Used when a cpu is offlined.
> > */
> > -u64 pnv_deepest_stop_psscr_val;
> > -u64 pnv_deepest_stop_psscr_mask;
> > +static u64 pnv_deepest_stop_psscr_val;
> > +static u64 pnv_deepest_stop_psscr_mask;
> > +static bool deepest_stop_found;
>
> Aha you have made them static. Nitpick withdrawn :)
>
> The log messages look good now.
>
> Reviewed-by: Nicholas Piggin <[email protected]>
>
Thanks!
--
Thanks and Regards
gautham.
On Wed, 22 Mar 2017 11:28:46 +0530
Gautham R Shenoy <[email protected]> wrote:
> On Tue, Mar 21, 2017 at 02:59:46AM +1000, Nicholas Piggin wrote:
> > On Mon, 20 Mar 2017 21:24:18 +0530
> > "Gautham R. Shenoy" <[email protected]> wrote:
> >
> > > From: "Gautham R. Shenoy" <[email protected]>
> > >
> > > POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
> > > waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
> > > core. Thus the HSPRG0 of a thread waking up from can contain the paca
> > > pointer of its sibling.
> > >
> > > This patch implements a context recovery framework within threads of a
> > > core, by provisioning space in paca_struct for saving every sibling
> > > threads's paca pointers. Basically, we should be able to arrive at the
> > > right paca pointer from any of the thread's existing paca pointer.
> > >
> > > At bootup, during powernv idle-init, we save the paca address of every
> > > CPU in each one its siblings paca_struct in the slot corresponding to
> > > this CPU's index in the core.
> > >
> > > On wakeup from a stop, the thread will determine its index in the core
> > > from the lower 2 bits of the PIR register and recover its PACA pointer
> > > by indexing into the correct slot in the provisioned space in the
> > > current PACA.
> > >
> > > Furthermore, ensure that the NVGPRs are restored from the stack on the
> > > way out by setting the NAPSTATELOST in paca.
> >
> > Thanks for expanding on this, it makes the patch easier to follow :)
> >
> > As noted before, I think if we use PACA_EXNMI for system reset, then
> > *hopefully* there should be minimal races with the initial use of other
> > thread's PACA at the start of the exception. So I'll work on getting
> > that in, but it need not prevent this patch from being merged first
> > IMO.
> >
> > > [Changelog written with inputs from [email protected]]
> > >
> > > Signed-off-by: Gautham R. Shenoy <[email protected]>
> > > ---
> > > arch/powerpc/include/asm/paca.h | 5 ++++
> > > arch/powerpc/kernel/asm-offsets.c | 1 +
> > > arch/powerpc/kernel/idle_book3s.S | 49 ++++++++++++++++++++++++++++++++++-
> > > arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++
> > > 4 files changed, 76 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> > > index 708c3e5..4405630 100644
> > > --- a/arch/powerpc/include/asm/paca.h
> > > +++ b/arch/powerpc/include/asm/paca.h
> > > @@ -172,6 +172,11 @@ struct paca_struct {
> > > u8 thread_mask;
> > > /* Mask to denote subcore sibling threads */
> > > u8 subcore_sibling_mask;
> > > + /*
> > > + * Pointer to an array which contains pointer
> > > + * to the sibling threads' paca.
> > > + */
> > > + struct paca_struct *thread_sibling_pacas[8];
>
> >
> > Is 8 the right number? I wonder if we have a define for it.
>
> Thats the maximum number of threads per core that we have had on POWER
> so far.
>
> Perhaps, I can make this
>
> struct paca_struct **thread_sibling_pacas;
>
> and allocate threads_per_core number of slots in
> pnv_init_idle_states. Sounds ok ?
I guess that would minimise PACA overhead for non-DD1 machines,
so if it's not too much trouble, that might be good.
> > > +power9_dd1_recover_paca:
> > > + mfspr r4, SPRN_PIR
> > > + clrldi r4, r4, 62
> >
> > Does SPRN_TIR work?
>
> I wasn't aware of SPRN_TIR!
>
> I can check this. If my reading of the ISA is correct, TIR should
> contain the thread number which are in the range [0..3].
Yep.
> > Reviewed-by: Nicholas Piggin <[email protected]>
> >
>
> Thanks for reviewing the patch.
No problems. Don't worry about the machine check wakeup for the moment
either. It's more important to just get the normal wakeup fix in I think.
We can revisit what to do there after my machine check patches go in
(idle machine check does not really work right now for POWER9 anyway).
Thanks,
Nick
On Tue, Mar 21, 2017 at 02:59:46AM +1000, Nicholas Piggin wrote:
> On Mon, 20 Mar 2017 21:24:18 +0530
> "Gautham R. Shenoy" <[email protected]> wrote:
>
> > From: "Gautham R. Shenoy" <[email protected]>
> >
> > POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
> > waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
> > core. Thus the HSPRG0 of a thread waking up from can contain the paca
> > pointer of its sibling.
> >
> > This patch implements a context recovery framework within threads of a
> > core, by provisioning space in paca_struct for saving every sibling
> > threads's paca pointers. Basically, we should be able to arrive at the
> > right paca pointer from any of the thread's existing paca pointer.
> >
> > At bootup, during powernv idle-init, we save the paca address of every
> > CPU in each one its siblings paca_struct in the slot corresponding to
> > this CPU's index in the core.
> >
> > On wakeup from a stop, the thread will determine its index in the core
> > from the lower 2 bits of the PIR register and recover its PACA pointer
> > by indexing into the correct slot in the provisioned space in the
> > current PACA.
> >
> > Furthermore, ensure that the NVGPRs are restored from the stack on the
> > way out by setting the NAPSTATELOST in paca.
>
> Thanks for expanding on this, it makes the patch easier to follow :)
>
> As noted before, I think if we use PACA_EXNMI for system reset, then
> *hopefully* there should be minimal races with the initial use of other
> thread's PACA at the start of the exception. So I'll work on getting
> that in, but it need not prevent this patch from being merged first
> IMO.
>
> > [Changelog written with inputs from [email protected]]
> >
> > Signed-off-by: Gautham R. Shenoy <[email protected]>
> > ---
> > arch/powerpc/include/asm/paca.h | 5 ++++
> > arch/powerpc/kernel/asm-offsets.c | 1 +
> > arch/powerpc/kernel/idle_book3s.S | 49 ++++++++++++++++++++++++++++++++++-
> > arch/powerpc/platforms/powernv/idle.c | 22 ++++++++++++++++
> > 4 files changed, 76 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> > index 708c3e5..4405630 100644
> > --- a/arch/powerpc/include/asm/paca.h
> > +++ b/arch/powerpc/include/asm/paca.h
> > @@ -172,6 +172,11 @@ struct paca_struct {
> > u8 thread_mask;
> > /* Mask to denote subcore sibling threads */
> > u8 subcore_sibling_mask;
> > + /*
> > + * Pointer to an array which contains pointer
> > + * to the sibling threads' paca.
> > + */
> > + struct paca_struct *thread_sibling_pacas[8];
>
> Is 8 the right number? I wonder if we have a define for it.
Thats the maximum number of threads per core that we have had on POWER
so far.
Perhaps, I can make this
struct paca_struct **thread_sibling_pacas;
and allocate threads_per_core number of slots in
pnv_init_idle_states. Sounds ok ?
>
> > #endif
> >
> > #ifdef CONFIG_PPC_BOOK3S_64
> > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> > index 4367e7d..6ec5016 100644
> > --- a/arch/powerpc/kernel/asm-offsets.c
> > +++ b/arch/powerpc/kernel/asm-offsets.c
> > @@ -727,6 +727,7 @@ int main(void)
> > OFFSET(PACA_THREAD_IDLE_STATE, paca_struct, thread_idle_state);
> > OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
> > OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
> > + OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
> > #endif
> >
> > DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> > diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> > index 9957287..c2f2cfb 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S
> > @@ -375,6 +375,46 @@ _GLOBAL(power9_idle_stop)
> > li r4,1
> > b pnv_powersave_common
> > /* No return */
> > +
> > +
> > +/*
> > + * On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
> > + * HSPRG0 will be set to the HSPRG0 value of one of the
> > + * threads in this core. Thus the value we have in r13
> > + * may not be this thread's paca pointer.
> > + *
> > + * Fortunately, the PIR remains invariant. Since this thread's
> > + * paca pointer is recorded in all its sibling's paca, we can
> > + * correctly recover this thread's paca pointer if we
> > + * know the index of this thread in the core.
> > + *
> > + * This index can be obtained from the lower two bits of the PIR.
> > + *
> > + * i.e, thread's position in the core = PIR[62:63].
> > + * If this value is i, then this thread's paca is
> > + * paca->thread_sibling_pacas[i].
> > + */
> > +power9_dd1_recover_paca:
> > + mfspr r4, SPRN_PIR
> > + clrldi r4, r4, 62
>
> Does SPRN_TIR work?
I wasn't aware of SPRN_TIR!
I can check this. If my reading of the ISA is correct, TIR should
contain the thread number which are in the range [0..3].
>
> > + /*
> > + * Since each entry in thread_sibling_pacas is 8 bytes
> > + * we need to left-shift by 3 bits. Thus r4 = i * 8
> > + */
> > + sldi r4, r4, 3
> > + /* Get &paca->thread_sibling_pacas[0] in r5 */
> > + addi r5, r13, PACA_SIBLING_PACA_PTRS
> > + /* Load paca->thread_sibling_pacas[i] into r13 */
> > + ldx r13, r4, r5
> > + SET_PACA(r13)
> > + /*
> > + * Indicate that we have lost NVGPR state
> > + * which needs to be restored from the stack.
> > + */
> > + li r3, 1
> > + stb r0,PACA_NAPSTATELOST(r13)
> > + blr
> > +
> > /*
> > * Called from reset vector. Check whether we have woken up with
> > * hypervisor state loss. If yes, restore hypervisor state and return
> > @@ -385,7 +425,14 @@ _GLOBAL(power9_idle_stop)
> > */
> > _GLOBAL(pnv_restore_hyp_resource)
> > BEGIN_FTR_SECTION
> > - ld r2,PACATOC(r13);
> > +BEGIN_FTR_SECTION_NESTED(70)
> > + mflr r6
> > + bl power9_dd1_recover_paca
> > + mtlr r6
> > + ld r2, PACATOC(r13)
> > +FTR_SECTION_ELSE_NESTED(70)
> > + ld r2, PACATOC(r13)
> > +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
>
> This is quite neat now you've moved it to its own function. Nice.
> It will be only a trivial clash with my patches now, I think.
Sure!
>
> Reviewed-by: Nicholas Piggin <[email protected]>
>
Thanks for reviewing the patch.
--
Thanks and Regards
gautham.