2018-10-11 13:42:51

by Akshay Adiga

[permalink] [raw]
Subject: [RFC PATCH v2 0/3] New device-tree format and Opal based idle save-restore

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

1) Idea is to bump up the version string in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

2) Slowly deprecate cpuidle/cpuhotplug threshold which is hard-coded
into cpuidle-powernv driver. Instead use compatible strings to indicate
if idle state is suitable for cpuidle and hotplug.

New idle state device tree format :
power-mgt {
...
ibm,enabled-stop-levels = <0xec000000>;
ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
ibm,cpu-idle-state-flags = <0x100000 0x101000>;
ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
ibm,idle-states {
stop4 {
flags = <0x207000>;
compatible = "ibm,state-v1",
"opal-support";
type = "cpuidle";
psscr-mask = <0x0 0x3003ff>;
handle = <0x102>;
latency-ns = <0x186a0>;
residency-ns = <0x989680>;
psscr = <0x0 0x300374>;
};
...
stop11 {
...
compatible = "ibm,state-v1",
"opal-support";
type = "cpuoffline";
...
};
};

High-level parsing algorithm :

Say Known version string = "ibm,state-v1"

for each stop state node in device tree:
if (compatible has known version string)
kernel takes care of stop-transitions
else if (compatible has "opal-support")
OPAL takes care of stop-transitions
else
Skip All deeper states

When a state does not have both version support and opal support,
Its possible to exit from a shallower state. Hence skipping all
deeper states.

OPAL support for idle states
----------------------------

With this patch series, all the states that loose hypervisor state
will be handled through opal_call.

Patch 3 adds support for Saving/restoring of SPRs and resync-timebase
in OPAL. Also all the decision making such as identifying first thread
in the core and taking locks before restoring, etc are implemented in
OPAL.

How does it work ?
-------------------

Consider a case that stop4 has a bug. We take the following steps to
mitigate the problem.

1) Change compatible string for stop4 in OPAL to "ibm-state-v2" and
remove "opal-supported". ship the new firmware.
The kernel ignores stop4 and all deeper states. But we will still have
shallower states. Prevents from completely disabling stop states.

2) Implement workaround in OPAL and add "opal-supported". Ship new firmware
The kernel uses opal for stop-transtion , which has workaround implemented.
We get stop4 and deeper states working without kernel changes and backports.
(and considerably less time)

3) Implement workaround in kernel and add "ibm-state-v2" as known versions
The kernel will now be able to handle stop4 and deeper states.

Changes from v1 :
- Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
idle code in C"
http://patchwork.ozlabs.org/patch/969596/
- All the states that loses hypervisor states will be handled by OPAL
- All the decision making such as identifying first thread in
the core and taking locks before restoring in such cases have also been
moved to OPAL


Abhishek Goel (1):
cpuidle/powernv: save-restore sprs in opal

Akshay Adiga (2):
cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1
powernv/cpuidle: Pass pointers instead of values to stop loop

arch/powerpc/include/asm/cpuidle.h | 9 +
arch/powerpc/include/asm/opal-api.h | 4 +-
arch/powerpc/include/asm/opal.h | 3 +
arch/powerpc/include/asm/processor.h | 8 +-
arch/powerpc/kernel/idle_book3s.S | 6 +-
arch/powerpc/platforms/powernv/idle.c | 247 ++++++++++++++----
.../powerpc/platforms/powernv/opal-wrappers.S | 2 +
drivers/cpuidle/cpuidle-powernv.c | 46 ++--
8 files changed, 251 insertions(+), 74 deletions(-)

--
2.17.1



2018-10-11 13:27:35

by Akshay Adiga

[permalink] [raw]
Subject: [RFC PATCH v2 1/3] cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1

This patch adds support for new device-tree format for idle state
description.

Previously if a older kernel runs on a newer firmware, it may enable
all available states irrespective of its capability of handling it.
New device tree format adds a compatible flag, so that only kernel
which has the capability to handle the version of stop state will enable
it.

Older kernel will still see stop0 and stop0_lite in older format and we
will depricate it after some time.

1) Idea is to bump up the version in firmware if we find a bug or
regression in stop states. A fix will be provided in linux which would
now know about the bumped up version of stop states, where as kernel
without fixes would ignore the states.

2) Slowly deprecate cpuidle /cpuhotplug threshold which is hard-coded
into cpuidle-powernv driver. Instead use compatible strings to indicate
if idle state is suitable for cpuidle and hotplug.

New idle state device tree format :
power-mgt {
...
ibm,enabled-stop-levels = <0xec000000>;
ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
ibm,cpu-idle-state-flags = <0x100000 0x101000>;
ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
ibm,idle-states {
stop4 {
flags = <0x207000>;
compatible = "ibm,state-v1",
"opal-supported";
type = "cpuidle";
psscr-mask = <0x0 0x3003ff>;
handle = <0x102>;
latency-ns = <0x186a0>;
residency-ns = <0x989680>;
psscr = <0x0 0x300374>;
};
...
stop11 {
...
compatible = "ibm,state-v1",
"opal-supported";
type = "cpuoffline";
...
};
};
type strings :
"cpuidle" : indicates it should be used by cpuidle-driver
"cpuoffline" : indicates it should be used by hotplug driver

compatible strings :
"ibm,state-v1" : kernel checks if it knows about this version
"opal-supported" : indicates kernel can fall back to use opal
for stop-transitions

Signed-off-by: Akshay Adiga <[email protected]>
---

Changes from v1 :
- Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
idle code in C"
- Moved "cpuidle" and "cpuoffline" as seperate property called
"type"


arch/powerpc/include/asm/cpuidle.h | 9 ++
arch/powerpc/platforms/powernv/idle.c | 132 +++++++++++++++++++++++++-
drivers/cpuidle/cpuidle-powernv.c | 31 ++++--
3 files changed, 160 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index 9844b3ded187..e920a15e797f 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -70,14 +70,23 @@

#ifndef __ASSEMBLY__

+enum idle_state_type_t {
+ CPUIDLE_TYPE,
+ CPUOFFLINE_TYPE
+};
+
+#define POWERNV_THRESHOLD_LATENCY_NS 200000
+#define PNV_VER_NAME_LEN 32
#define PNV_IDLE_NAME_LEN 16
struct pnv_idle_states_t {
char name[PNV_IDLE_NAME_LEN];
+ char version[PNV_VER_NAME_LEN];
u32 latency_ns;
u32 residency_ns;
u64 psscr_val;
u64 psscr_mask;
u32 flags;
+ enum idle_state_type_t type;
bool valid;
};

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 96186af9e953..755918402591 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -54,6 +54,20 @@ static bool default_stop_found;
static u64 pnv_first_tb_loss_level = MAX_STOP_STATE + 1;
static u64 pnv_first_hv_loss_level = MAX_STOP_STATE + 1;

+
+static int parse_dt_v1(struct device_node *np);
+struct stop_version_t {
+ const char name[PNV_VER_NAME_LEN];
+ int (*parser_fn)(struct device_node *np);
+};
+struct stop_version_t known_versions[] = {
+ {
+ .name = "ibm,state-v1",
+ .parser_fn = parse_dt_v1,
+ }
+ };
+const int nr_known_versions = 1;
+
/*
* psscr value and mask of the deepest stop idle state.
* Used when a cpu is offlined.
@@ -1195,6 +1209,77 @@ static void __init pnv_probe_idle_states(void)
supported_cpuidle_states |= pnv_idle_states[i].flags;
}

+static int parse_dt_v1(struct device_node *dt_node)
+{
+ const char *temp_str;
+ int rc;
+ int i = nr_pnv_idle_states;
+
+ if (!dt_node) {
+ pr_err("Invalid device_node\n");
+ return -EINVAL;
+ }
+
+ rc = of_property_read_string(dt_node, "name", &temp_str);
+ if (rc) {
+ pr_err("error reading names rc= %d\n", rc);
+ return -EINVAL;
+ }
+ strncpy(pnv_idle_states[i].name, temp_str, PNV_IDLE_NAME_LEN);
+ rc = of_property_read_u32(dt_node, "residency-ns",
+ &pnv_idle_states[i].residency_ns);
+ if (rc) {
+ pr_err("error reading residency rc= %d\n", rc);
+ return -EINVAL;
+ }
+ rc = of_property_read_u32(dt_node, "latency-ns",
+ &pnv_idle_states[i].latency_ns);
+ if (rc) {
+ pr_err("error reading latency rc= %d\n", rc);
+ return -EINVAL;
+ }
+ rc = of_property_read_u32(dt_node, "flags",
+ &pnv_idle_states[i].flags);
+ if (rc) {
+ pr_err("error reading flags rc= %d\n", rc);
+ return -EINVAL;
+ }
+
+ /* We are not expecting power8 device-tree in this format */
+ rc = of_property_read_u64(dt_node, "psscr-mask",
+ &pnv_idle_states[i].psscr_mask);
+ if (rc) {
+ pr_err("error reading psscr-mask rc= %d\n", rc);
+ return -EINVAL;
+ }
+ rc = of_property_read_u64(dt_node, "psscr",
+ &pnv_idle_states[i].psscr_val);
+ if (rc) {
+ pr_err("error reading psscr rc= %d\n", rc);
+ return -EINVAL;
+ }
+
+ /*
+ * TODO : save the version strings in data structure
+ */
+ rc = of_property_read_string(dt_node, "type", &temp_str);
+ pr_info("type = %s\n", temp_str);
+ if (rc) {
+ pr_err("error reading type rc= %d\n", rc);
+ return -EINVAL;
+ }
+ if (strcmp(temp_str, "cpuidle") == 0)
+ pnv_idle_states[i].type = CPUIDLE_TYPE;
+ else if (strcmp(temp_str, "cpuoffline") == 0)
+ pnv_idle_states[i].type = CPUOFFLINE_TYPE;
+ else {
+ pr_err("Invalid type skipping %s\n",
+ pnv_idle_states[i].name);
+ return -EINVAL;
+ }
+ return 0;
+
+}
/*
* This function parses device-tree and populates all the information
* into pnv_idle_states structure. It also sets up nr_pnv_idle_states
@@ -1203,8 +1288,9 @@ static void __init pnv_probe_idle_states(void)

static int pnv_parse_cpuidle_dt(void)
{
- struct device_node *np;
+ struct device_node *np, *np1, *dt_node;
int nr_idle_states, i;
+ int additional_states = 0;
int rc = 0;
u32 *temp_u32;
u64 *temp_u64;
@@ -1218,8 +1304,14 @@ static int pnv_parse_cpuidle_dt(void)
nr_idle_states = of_property_count_u32_elems(np,
"ibm,cpu-idle-state-flags");

- pnv_idle_states = kcalloc(nr_idle_states, sizeof(*pnv_idle_states),
- GFP_KERNEL);
+ np1 = of_find_node_by_path("/ibm,opal/power-mgt/ibm,idle-states");
+ if (np1) {
+ for_each_child_of_node(np1, dt_node)
+ additional_states++;
+ }
+ pr_info("states in new format : %d\n", additional_states);
+ pnv_idle_states = kcalloc(nr_idle_states + additional_states,
+ sizeof(*pnv_idle_states), GFP_KERNEL);
temp_u32 = kcalloc(nr_idle_states, sizeof(u32), GFP_KERNEL);
temp_u64 = kcalloc(nr_idle_states, sizeof(u64), GFP_KERNEL);
temp_string = kcalloc(nr_idle_states, sizeof(char *), GFP_KERNEL);
@@ -1298,8 +1390,40 @@ static int pnv_parse_cpuidle_dt(void)
for (i = 0; i < nr_idle_states; i++)
strlcpy(pnv_idle_states[i].name, temp_string[i],
PNV_IDLE_NAME_LEN);
+
+ /* Mark states as CPUIDLE_TYPE /CPUOFFLINE for older version*/
+ for (i = 0; i < nr_idle_states; i++) {
+ if (pnv_idle_states[i].latency_ns > POWERNV_THRESHOLD_LATENCY_NS)
+ pnv_idle_states[i].type = CPUOFFLINE_TYPE;
+ else
+ pnv_idle_states[i].type = CPUIDLE_TYPE;
+ }
nr_pnv_idle_states = nr_idle_states;
- rc = 0;
+ /* Parsing node-based idle states device-tree format */
+ if (!np1) {
+ pr_info("dt does not contain ibm,idle_states");
+ goto out;
+ }
+ /* Parse each child node with appropriate parser_fn */
+ for_each_child_of_node(np1, dt_node) {
+ bool found_known_version = false;
+ /* we don't have state falling back to opal*/
+ for (i = 0; i < nr_known_versions ; i++) {
+ if (of_device_is_compatible(dt_node, known_versions[i].name)) {
+ rc = known_versions[i].parser_fn(dt_node);
+ if (rc) {
+ pr_err("%s could not parse\n", known_versions[i].name);
+ continue;
+ }
+ found_known_version = true;
+ }
+ }
+ if (!found_known_version) {
+ pr_info("Unsupported state, skipping all further state\n");
+ goto out;
+ }
+ nr_pnv_idle_states++;
+ }
out:
kfree(temp_u32);
kfree(temp_u64);
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 84b1ebe212b3..a15514ebd1c3 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -26,7 +26,6 @@
* Expose only those Hardware idle states via the cpuidle framework
* that have latency value below POWERNV_THRESHOLD_LATENCY_NS.
*/
-#define POWERNV_THRESHOLD_LATENCY_NS 200000

static struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
@@ -265,7 +264,7 @@ extern u32 pnv_get_supported_cpuidle_states(void);
static int powernv_add_idle_states(void)
{
int nr_idle_states = 1; /* Snooze */
- int dt_idle_states;
+ int dt_idle_states = 0;
u32 has_stop_states = 0;
int i;
u32 supported_flags = pnv_get_supported_cpuidle_states();
@@ -277,14 +276,19 @@ static int powernv_add_idle_states(void)
goto out;
}

- /* TODO: Count only states which are eligible for cpuidle */
- dt_idle_states = nr_pnv_idle_states;
+ /* Count only cpuidle states*/
+ for (i = 0; i < nr_pnv_idle_states; i++) {
+ if (pnv_idle_states[i].type == CPUIDLE_TYPE)
+ dt_idle_states++;
+ }
+ pr_info("idle states in dt = %d , states with idle flag = %d",
+ nr_pnv_idle_states, dt_idle_states);

/*
* Since snooze is used as first idle state, max idle states allowed is
* CPUIDLE_STATE_MAX -1
*/
- if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
+ if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
pr_warn("cpuidle-powernv: discovered idle states more than allowed");
dt_idle_states = CPUIDLE_STATE_MAX - 1;
}
@@ -305,8 +309,15 @@ static int powernv_add_idle_states(void)
* Skip the platform idle state whose flag isn't in
* the supported_cpuidle_states flag mask.
*/
- if ((state->flags & supported_flags) != state->flags)
+ if ((state->flags & supported_flags) != state->flags) {
+ pr_warn("State %d does not have supported flag\n", i);
+ continue;
+ }
+ if (state->type != CPUIDLE_TYPE) {
+ pr_info("State %d is not idletype, it of %d type\n", i,
+ state->type);
continue;
+ }
/*
* If an idle state has exit latency beyond
* POWERNV_THRESHOLD_LATENCY_NS then don't use it
@@ -321,8 +332,10 @@ static int powernv_add_idle_states(void)
exit_latency = DIV_ROUND_UP(state->latency_ns, 1000);
target_residency = DIV_ROUND_UP(state->residency_ns, 1000);

- if (has_stop_states && !(state->valid))
+ if (has_stop_states && !(state->valid)) {
+ pr_warn("State %d is invalid\n", i);
continue;
+ }

if (state->flags & OPAL_PM_TIMEBASE_STOP)
stops_timebase = true;
@@ -360,8 +373,10 @@ static int powernv_add_idle_states(void)
state->psscr_mask);
}
#endif
- else
+ else {
+ pr_warn("cpuidle-powernv : could not add state\n");
continue;
+ }
nr_idle_states++;
}
out:
--
2.17.1


2018-10-11 13:29:32

by Akshay Adiga

[permalink] [raw]
Subject: [RFC PATCH v2 3/3] cpuidle/powernv: save-restore sprs in opal

From: Abhishek Goel <[email protected]>

This patch moves the saving and restoring of sprs for P9 cpuidle
from kernel to opal.
In an attempt to make the powernv idle code backward compatible,
and to some extent forward compatible, add support for pre-stop entry
and post-stop exit actions in OPAL. If a kernel knows about this
opal call, then just a firmware supporting newer hardware is required,
instead of waiting for kernel updates.

Signed-off-by: Abhishek Goel <[email protected]>
Signed-off-by: Akshay Adiga <[email protected]>
---
Changes from v1 :
- Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
idle code in C"
- Set a global variable "request_opal_call" to indicate that deep
states should make opal_call.
- All the states that loses hypervisor states will be handled by OPAL
- All the decision making such as identifying first thread in
the core and taking locks before restoring in such cases have also been
moved to OPAL
arch/powerpc/include/asm/opal-api.h | 4 +-
arch/powerpc/include/asm/opal.h | 3 +
arch/powerpc/include/asm/processor.h | 3 +-
arch/powerpc/kernel/idle_book3s.S | 6 +-
arch/powerpc/platforms/powernv/idle.c | 88 +++++++++++++------
.../powerpc/platforms/powernv/opal-wrappers.S | 2 +
6 files changed, 77 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 8365353330b4..93ea1f79e295 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -210,7 +210,9 @@
#define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164
#define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165
#define OPAL_NX_COPROC_INIT 167
-#define OPAL_LAST 167
+#define OPAL_IDLE_SAVE 170
+#define OPAL_IDLE_RESTORE 171
+#define OPAL_LAST 171

#define QUIESCE_HOLD 1 /* Spin all calls at entry */
#define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ff3866473afe..26995e16171e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -356,6 +356,9 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
extern void opal_shutdown(void);
extern int opal_resync_timebase(void);

+extern int opal_cpuidle_save(u64 psscr);
+extern int opal_cpuidle_restore(u64 psscr, u64 srr1);
+
extern void opal_lpc_init(void);

extern void opal_kmsg_init(void);
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 822d3236ad7f..26fa6c1836f4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -510,7 +510,8 @@ static inline unsigned long get_clean_sp(unsigned long sp, int is_32)

/* asm stubs */
extern unsigned long isa300_idle_stop_noloss(unsigned long psscr_val);
-extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val);
+extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val,
+ bool request_opal_call);
extern unsigned long isa206_idle_insn_mayloss(unsigned long type);

extern unsigned long cpuidle_disable;
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index ffdee1ab4388..a2014d152035 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -52,14 +52,16 @@ _GLOBAL(isa300_idle_stop_noloss)
_GLOBAL(isa300_idle_stop_mayloss)
mtspr SPRN_PSSCR,r3
std r1,PACAR1(r13)
- mflr r4
+ mflr r7
mfcr r5
/* use stack red zone rather than a new frame */
addi r6,r1,-INT_FRAME_SIZE
SAVE_GPR(2, r6)
SAVE_NVGPRS(r6)
- std r4,_LINK(r6)
+ std r7,_LINK(r6)
std r5,_CCR(r6)
+ cmpwi r4,0
+ bne opal_cpuidle_save
PPC_STOP
b . /* catch bugs */

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 681a23a066bb..bcfe08022e65 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -171,6 +171,7 @@ static void pnv_fastsleep_workaround_apply(void *info)

static bool power7_fastsleep_workaround_entry = true;
static bool power7_fastsleep_workaround_exit = true;
+static bool request_opal_call = false;

/*
* Used to store fastsleep workaround state
@@ -604,6 +605,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
unsigned long mmcr0 = 0;
struct p9_sprs sprs;
bool sprs_saved = false;
+ bool is_hv_loss = false;

memset(&sprs, 0, sizeof(sprs));

@@ -648,7 +650,9 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
*/
mmcr0 = mfspr(SPRN_MMCR0);
}
- if ((psscr & PSSCR_RL_MASK) >= pnv_first_hv_loss_level) {
+
+ is_hv_loss = (psscr & PSSCR_RL_MASK) >= pnv_first_hv_loss_level;
+ if (is_hv_loss && (!request_opal_call)) {
sprs.lpcr = mfspr(SPRN_LPCR);
sprs.hfscr = mfspr(SPRN_HFSCR);
sprs.fscr = mfspr(SPRN_FSCR);
@@ -674,7 +678,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
atomic_start_thread_idle();
}

- srr1 = isa300_idle_stop_mayloss(psscr);
+ srr1 = isa300_idle_stop_mayloss(psscr,
+ is_hv_loss && request_opal_call);

#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
local_paca->requested_psscr = 0;
@@ -685,6 +690,25 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
WARN_ON_ONCE(!srr1);
WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));

+ /*
+ * On POWER9, SRR1 bits do not match exactly as expected.
+ * SRR1_WS_GPRLOSS (10b) can also result in SPR loss, so
+ * always test PSSCR if there is any state loss.
+ */
+ if (likely(((psscr & PSSCR_PLS) >> 60) < pnv_first_hv_loss_level)) {
+ if (sprs_saved)
+ atomic_stop_thread_idle();
+ goto out;
+ }
+
+ if (request_opal_call) {
+ opal_cpuidle_restore(psscr, srr1);
+ goto opal_return;
+ }
+
+ /* HV state loss */
+ BUG_ON(!sprs_saved);
+
if ((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS) {
unsigned long mmcra;

@@ -712,19 +736,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
if (unlikely((srr1 & SRR1_WAKEMASK_P8) == SRR1_WAKEHMI))
hmi_exception_realmode(NULL);

- /*
- * On POWER9, SRR1 bits do not match exactly as expected.
- * SRR1_WS_GPRLOSS (10b) can also result in SPR loss, so
- * always test PSSCR if there is any state loss.
- */
- if (likely((psscr & PSSCR_RL_MASK) < pnv_first_hv_loss_level)) {
- if (sprs_saved)
- atomic_stop_thread_idle();
- goto out;
- }
-
- /* HV state loss */
- BUG_ON(!sprs_saved);

atomic_lock_thread_idle();

@@ -771,6 +782,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)

mtspr(SPRN_SPRG3, local_paca->sprg_vdso);

+opal_return:
if (!radix_enabled())
__slb_restore_bolted_realmode();

@@ -1284,6 +1296,7 @@ static int pnv_parse_cpuidle_dt(void)
u32 *temp_u32;
u64 *temp_u64;
const char **temp_string;
+ bool fall_back_to_opal = false;

np = of_find_node_by_path("/ibm,opal/power-mgt");
if (!np) {
@@ -1396,23 +1409,48 @@ static int pnv_parse_cpuidle_dt(void)
/* Parse each child node with appropriate parser_fn */
for_each_child_of_node(np1, dt_node) {
bool found_known_version = false;
- /* we don't have state falling back to opal*/
- for (i = 0; i < nr_known_versions ; i++) {
- if (of_device_is_compatible(dt_node, known_versions[i].name)) {
- rc = known_versions[i].parser_fn(dt_node);
+ if (!fall_back_to_opal) {
+ /* we don't have state falling back to opal*/
+ for (i = 0; i < nr_known_versions ; i++) {
+ if (of_device_is_compatible(dt_node, known_versions[i].name)) {
+ rc = known_versions[i].parser_fn(dt_node);
+ if (rc) {
+ pr_err("%s could not parse\n", known_versions[i].name);
+ continue;
+ }
+ found_known_version = true;
+ }
+ }
+ }
+
+ /*
+ * If any previous state falls back to opal_call
+ * Then all futher states will either call opal_call
+ * or not be included for cpuidle/cpuoffline.
+ *
+ * Moreover, having any intermediate state with no
+ * kernel support or opal support can be potentially
+ * dangerous, as hardware can potentially wakeup from
+ * that state. Hence, no futher states are added to
+ * to cpuidle/cpuoffline
+ */
+ if (!found_known_version || fall_back_to_opal) {
+ if (of_device_is_compatible(dt_node, "opal-support")) {
+ rc = known_versions[0].parser_fn(dt_node);
if (rc) {
- pr_err("%s could not parse\n", known_versions[i].name);
+ pr_err("%s could not parse\n", "opal-support");
continue;
}
- found_known_version = true;
+ fall_back_to_opal = true;
+ } else {
+ pr_info("Unsupported state, skipping all further state\n");
+ goto out;
}
}
- if (!found_known_version) {
- pr_info("Unsupported state, skipping all further state\n");
- goto out;
- }
nr_pnv_idle_states++;
}
+ if (fall_back_to_opal)
+ request_opal_call = true;
out:
kfree(temp_u32);
kfree(temp_u64);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 251528231a9e..7a039a81a67e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -331,3 +331,5 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64);
OPAL_CALL(opal_sensor_group_enable, OPAL_SENSOR_GROUP_ENABLE);
OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT);
+OPAL_CALL(opal_cpuidle_save, OPAL_IDLE_SAVE);
+OPAL_CALL(opal_cpuidle_restore, OPAL_IDLE_RESTORE);
--
2.17.1


2018-10-11 13:43:19

by Akshay Adiga

[permalink] [raw]
Subject: [RFC PATCH v2 2/3] powernv/cpuidle: Pass pointers instead of values to stop loop

Passing pointer to the pnv_idle_state instead of psscr value and mask.
This helps us to pass more information to the stop loop. This will help to
figure out the method to enter/exit idle state.

Signed-off-by: Akshay Adiga <[email protected]>

---
Changes from v1 :
- Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
idle code in C"

arch/powerpc/include/asm/processor.h | 5 ++-
arch/powerpc/platforms/powernv/idle.c | 47 ++++++++++-----------------
drivers/cpuidle/cpuidle-powernv.c | 15 +++------
3 files changed, 24 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 936795acba48..822d3236ad7f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -43,6 +43,7 @@
#include <asm/thread_info.h>
#include <asm/ptrace.h>
#include <asm/hw_breakpoint.h>
+#include <asm/cpuidle.h>

/* We do _not_ want to define new machine types at all, those must die
* in favor of using the device-tree
@@ -518,9 +519,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
extern int powersave_nap; /* set if nap mode can be used in idle loop */

extern void power7_idle_type(unsigned long type);
-extern void power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask);
-
+extern void power9_idle_type(struct pnv_idle_states_t *state);
extern void flush_instruction_cache(void);
extern void hard_reset_now(void);
extern void poweroff_now(void);
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 755918402591..681a23a066bb 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -44,8 +44,7 @@ int nr_pnv_idle_states;
* The default stop state that will be used by ppc_md.power_save
* function on platforms that support stop instruction.
*/
-static u64 pnv_default_stop_val;
-static u64 pnv_default_stop_mask;
+struct pnv_idle_states_t *pnv_default_state;
static bool default_stop_found;

/*
@@ -72,9 +71,7 @@ const int nr_known_versions = 1;
* psscr value and mask of the deepest stop idle state.
* Used when a cpu is offlined.
*/
-static u64 pnv_deepest_stop_psscr_val;
-static u64 pnv_deepest_stop_psscr_mask;
-static u64 pnv_deepest_stop_flag;
+static struct pnv_idle_states_t *pnv_deepest_state;
static bool deepest_stop_found;

static unsigned long power7_offline_type;
@@ -96,7 +93,7 @@ static int pnv_save_sprs_for_deep_states(void)
uint64_t hid5_val = mfspr(SPRN_HID5);
uint64_t hmeer_val = mfspr(SPRN_HMEER);
uint64_t msr_val = MSR_IDLE;
- uint64_t psscr_val = pnv_deepest_stop_psscr_val;
+ uint64_t psscr_val = pnv_deepest_state->psscr_val;

for_each_present_cpu(cpu) {
uint64_t pir = get_hard_smp_processor_id(cpu);
@@ -820,17 +817,15 @@ static unsigned long power9_offline_stop(unsigned long psscr)
return srr1;
}

-static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask)
+static unsigned long __power9_idle_type(struct pnv_idle_states_t *state)
{
unsigned long psscr;
unsigned long srr1;

if (!prep_irq_for_idle_irqsoff())
return 0;
-
psscr = mfspr(SPRN_PSSCR);
- psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
+ psscr = (psscr & ~state->psscr_mask) | state->psscr_val;

__ppc64_runlatch_off();
srr1 = power9_idle_stop(psscr, true);
@@ -841,12 +836,10 @@ static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
return srr1;
}

-void power9_idle_type(unsigned long stop_psscr_val,
- unsigned long stop_psscr_mask)
+void power9_idle_type(struct pnv_idle_states_t *state)
{
unsigned long srr1;
-
- srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask);
+ srr1 = __power9_idle_type(state);
irq_set_pending_from_srr1(srr1);
}

@@ -855,7 +848,7 @@ void power9_idle_type(unsigned long stop_psscr_val,
*/
void power9_idle(void)
{
- power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
+ power9_idle_type(pnv_default_state);
}

#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -974,8 +967,8 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
unsigned long psscr;

psscr = mfspr(SPRN_PSSCR);
- psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
- pnv_deepest_stop_psscr_val;
+ psscr = (psscr & ~pnv_deepest_state->psscr_mask) |
+ pnv_deepest_state->psscr_val;
srr1 = power9_offline_stop(psscr);
} else if (cpu_has_feature(CPU_FTR_ARCH_206) && power7_offline_type) {
srr1 = power7_offline();
@@ -1123,16 +1116,13 @@ static void __init pnv_power9_idle_init(void)

if (max_residency_ns < state->residency_ns) {
max_residency_ns = state->residency_ns;
- pnv_deepest_stop_psscr_val = state->psscr_val;
- pnv_deepest_stop_psscr_mask = state->psscr_mask;
- pnv_deepest_stop_flag = state->flags;
+ pnv_deepest_state = state;
deepest_stop_found = true;
}

if (!default_stop_found &&
(state->flags & OPAL_PM_STOP_INST_FAST)) {
- pnv_default_stop_val = state->psscr_val;
- pnv_default_stop_mask = state->psscr_mask;
+ pnv_default_state = state;
default_stop_found = true;
WARN_ON(state->flags & OPAL_PM_LOSE_FULL_CONTEXT);
}
@@ -1143,15 +1133,15 @@ static void __init pnv_power9_idle_init(void)
} else {
ppc_md.power_save = power9_idle;
pr_info("cpuidle-powernv: Default stop: psscr = 0x%016llx,mask=0x%016llx\n",
- pnv_default_stop_val, pnv_default_stop_mask);
+ pnv_default_state->psscr_val, pnv_default_state->psscr_mask);
}

if (unlikely(!deepest_stop_found)) {
pr_warn("cpuidle-powernv: No suitable stop state for CPU-Hotplug. Offlined CPUs will busy wait");
} else {
pr_info("cpuidle-powernv: Deepest stop: psscr = 0x%016llx,mask=0x%016llx\n",
- pnv_deepest_stop_psscr_val,
- pnv_deepest_stop_psscr_mask);
+ pnv_deepest_state->psscr_val,
+ pnv_deepest_state->psscr_mask);
}

pr_info("cpuidle-powernv: First stop level that may lose SPRs = 0x%lld\n",
@@ -1173,16 +1163,15 @@ static void __init pnv_disable_deep_states(void)
pr_warn("cpuidle-powernv: Idle power-savings, CPU-Hotplug affected\n");

if (cpu_has_feature(CPU_FTR_ARCH_300) &&
- (pnv_deepest_stop_flag & OPAL_PM_LOSE_FULL_CONTEXT)) {
+ (pnv_deepest_state->flags & OPAL_PM_LOSE_FULL_CONTEXT)) {
/*
* Use the default stop state for CPU-Hotplug
* if available.
*/
if (default_stop_found) {
- pnv_deepest_stop_psscr_val = pnv_default_stop_val;
- pnv_deepest_stop_psscr_mask = pnv_default_stop_mask;
+ pnv_deepest_state = pnv_default_state;
pr_warn("cpuidle-powernv: Offlined CPUs will stop with psscr = 0x%016llx\n",
- pnv_deepest_stop_psscr_val);
+ pnv_deepest_state->psscr_val);
} else { /* Fallback to snooze loop for CPU-Hotplug */
deepest_stop_found = false;
pr_warn("cpuidle-powernv: Offlined CPUs will busy wait\n");
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index a15514ebd1c3..5116d5991d30 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -35,13 +35,7 @@ static struct cpuidle_driver powernv_idle_driver = {
static int max_idle_state __read_mostly;
static struct cpuidle_state *cpuidle_state_table __read_mostly;

-struct stop_psscr_table {
- u64 val;
- u64 mask;
-};
-
-static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
-
+struct pnv_idle_states_t idx_to_state_ptr[CPUIDLE_STATE_MAX] __read_mostly;
static u64 default_snooze_timeout __read_mostly;
static bool snooze_timeout_en __read_mostly;

@@ -143,8 +137,9 @@ static int stop_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- power9_idle_type(stop_psscr_table[index].val,
- stop_psscr_table[index].mask);
+ struct pnv_idle_states_t *state;
+ state = &pnv_idle_states[index];
+ power9_idle_type(state);
return index;
}

@@ -242,8 +237,6 @@ static inline void add_powernv_state(int index, const char *name,
powernv_states[index].exit_latency = exit_latency;
powernv_states[index].enter = idle_fn;
/* For power8 and below psscr_* will be 0 */
- stop_psscr_table[index].val = psscr_val;
- stop_psscr_table[index].mask = psscr_mask;
}

/*
--
2.17.1


2018-10-11 19:57:13

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/3] New device-tree format and Opal based idle save-restore

+ devicetree mail list

On 10/11/18 06:22, Akshay Adiga wrote:
> Previously if a older kernel runs on a newer firmware, it may enable
> all available states irrespective of its capability of handling it.
> New device tree format adds a compatible flag, so that only kernel
> which has the capability to handle the version of stop state will enable
> it.
>
> Older kernel will still see stop0 and stop0_lite in older format and we
> will depricate it after some time.
>
> 1) Idea is to bump up the version string in firmware if we find a bug or
> regression in stop states. A fix will be provided in linux which would
> now know about the bumped up version of stop states, where as kernel
> without fixes would ignore the states.
>
> 2) Slowly deprecate cpuidle/cpuhotplug threshold which is hard-coded
> into cpuidle-powernv driver. Instead use compatible strings to indicate
> if idle state is suitable for cpuidle and hotplug.
>
> New idle state device tree format :
> power-mgt {
> ...
> ibm,enabled-stop-levels = <0xec000000>;
> ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
> ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
> ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
> ibm,cpu-idle-state-flags = <0x100000 0x101000>;
> ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
> ibm,idle-states {
> stop4 {
> flags = <0x207000>;
> compatible = "ibm,state-v1",
> "opal-support";
> type = "cpuidle";
> psscr-mask = <0x0 0x3003ff>;
> handle = <0x102>;
> latency-ns = <0x186a0>;
> residency-ns = <0x989680>;
> psscr = <0x0 0x300374>;
> };
> ...
> stop11 {
> ...
> compatible = "ibm,state-v1",
> "opal-support";
> type = "cpuoffline";
> ...
> };
> };
>
> High-level parsing algorithm :
>
> Say Known version string = "ibm,state-v1"
>
> for each stop state node in device tree:
> if (compatible has known version string)
> kernel takes care of stop-transitions
> else if (compatible has "opal-support")
> OPAL takes care of stop-transitions
> else
> Skip All deeper states
>
> When a state does not have both version support and opal support,
> Its possible to exit from a shallower state. Hence skipping all
> deeper states.
>
> OPAL support for idle states
> ----------------------------
>
> With this patch series, all the states that loose hypervisor state
> will be handled through opal_call.
>
> Patch 3 adds support for Saving/restoring of SPRs and resync-timebase
> in OPAL. Also all the decision making such as identifying first thread
> in the core and taking locks before restoring, etc are implemented in
> OPAL.
>
> How does it work ?
> -------------------
>
> Consider a case that stop4 has a bug. We take the following steps to
> mitigate the problem.
>
> 1) Change compatible string for stop4 in OPAL to "ibm-state-v2" and
> remove "opal-supported". ship the new firmware.
> The kernel ignores stop4 and all deeper states. But we will still have
> shallower states. Prevents from completely disabling stop states.
>
> 2) Implement workaround in OPAL and add "opal-supported". Ship new firmware
> The kernel uses opal for stop-transtion , which has workaround implemented.
> We get stop4 and deeper states working without kernel changes and backports.
> (and considerably less time)
>
> 3) Implement workaround in kernel and add "ibm-state-v2" as known versions
> The kernel will now be able to handle stop4 and deeper states.
>
> Changes from v1 :
> - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
> idle code in C"
> http://patchwork.ozlabs.org/patch/969596/
> - All the states that loses hypervisor states will be handled by OPAL
> - All the decision making such as identifying first thread in
> the core and taking locks before restoring in such cases have also been
> moved to OPAL
>
>
> Abhishek Goel (1):
> cpuidle/powernv: save-restore sprs in opal
>
> Akshay Adiga (2):
> cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1
> powernv/cpuidle: Pass pointers instead of values to stop loop
>
> arch/powerpc/include/asm/cpuidle.h | 9 +
> arch/powerpc/include/asm/opal-api.h | 4 +-
> arch/powerpc/include/asm/opal.h | 3 +
> arch/powerpc/include/asm/processor.h | 8 +-
> arch/powerpc/kernel/idle_book3s.S | 6 +-
> arch/powerpc/platforms/powernv/idle.c | 247 ++++++++++++++----
> .../powerpc/platforms/powernv/opal-wrappers.S | 2 +
> drivers/cpuidle/cpuidle-powernv.c | 46 ++--
> 8 files changed, 251 insertions(+), 74 deletions(-)
>


2018-10-11 19:57:14

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1

+ devicetree mail list

On 10/11/18 06:22, Akshay Adiga wrote:
> This patch adds support for new device-tree format for idle state
> description.
>
> Previously if a older kernel runs on a newer firmware, it may enable
> all available states irrespective of its capability of handling it.
> New device tree format adds a compatible flag, so that only kernel
> which has the capability to handle the version of stop state will enable
> it.
>
> Older kernel will still see stop0 and stop0_lite in older format and we
> will depricate it after some time.
>
> 1) Idea is to bump up the version in firmware if we find a bug or
> regression in stop states. A fix will be provided in linux which would
> now know about the bumped up version of stop states, where as kernel
> without fixes would ignore the states.
>
> 2) Slowly deprecate cpuidle /cpuhotplug threshold which is hard-coded
> into cpuidle-powernv driver. Instead use compatible strings to indicate
> if idle state is suitable for cpuidle and hotplug.
>
> New idle state device tree format :
> power-mgt {
> ...
> ibm,enabled-stop-levels = <0xec000000>;
> ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
> ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
> ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
> ibm,cpu-idle-state-flags = <0x100000 0x101000>;
> ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
> ibm,idle-states {
> stop4 {
> flags = <0x207000>;
> compatible = "ibm,state-v1",
> "opal-supported";
> type = "cpuidle";
> psscr-mask = <0x0 0x3003ff>;
> handle = <0x102>;
> latency-ns = <0x186a0>;
> residency-ns = <0x989680>;
> psscr = <0x0 0x300374>;
> };
> ...
> stop11 {
> ...
> compatible = "ibm,state-v1",
> "opal-supported";
> type = "cpuoffline";
> ...
> };
> };
> type strings :
> "cpuidle" : indicates it should be used by cpuidle-driver
> "cpuoffline" : indicates it should be used by hotplug driver
>
> compatible strings :
> "ibm,state-v1" : kernel checks if it knows about this version
> "opal-supported" : indicates kernel can fall back to use opal
> for stop-transitions
>
> Signed-off-by: Akshay Adiga <[email protected]>
> ---
>
> Changes from v1 :
> - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
> idle code in C"
> - Moved "cpuidle" and "cpuoffline" as seperate property called
> "type"
>
>
> arch/powerpc/include/asm/cpuidle.h | 9 ++
> arch/powerpc/platforms/powernv/idle.c | 132 +++++++++++++++++++++++++-
> drivers/cpuidle/cpuidle-powernv.c | 31 ++++--
> 3 files changed, 160 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
> index 9844b3ded187..e920a15e797f 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -70,14 +70,23 @@
>
> #ifndef __ASSEMBLY__
>
> +enum idle_state_type_t {
> + CPUIDLE_TYPE,
> + CPUOFFLINE_TYPE
> +};
> +
> +#define POWERNV_THRESHOLD_LATENCY_NS 200000
> +#define PNV_VER_NAME_LEN 32
> #define PNV_IDLE_NAME_LEN 16
> struct pnv_idle_states_t {
> char name[PNV_IDLE_NAME_LEN];
> + char version[PNV_VER_NAME_LEN];
> u32 latency_ns;
> u32 residency_ns;
> u64 psscr_val;
> u64 psscr_mask;
> u32 flags;
> + enum idle_state_type_t type;
> bool valid;
> };
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 96186af9e953..755918402591 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -54,6 +54,20 @@ static bool default_stop_found;
> static u64 pnv_first_tb_loss_level = MAX_STOP_STATE + 1;
> static u64 pnv_first_hv_loss_level = MAX_STOP_STATE + 1;
>
> +
> +static int parse_dt_v1(struct device_node *np);
> +struct stop_version_t {
> + const char name[PNV_VER_NAME_LEN];
> + int (*parser_fn)(struct device_node *np);
> +};
> +struct stop_version_t known_versions[] = {
> + {
> + .name = "ibm,state-v1",
> + .parser_fn = parse_dt_v1,
> + }
> + };
> +const int nr_known_versions = 1;
> +
> /*
> * psscr value and mask of the deepest stop idle state.
> * Used when a cpu is offlined.
> @@ -1195,6 +1209,77 @@ static void __init pnv_probe_idle_states(void)
> supported_cpuidle_states |= pnv_idle_states[i].flags;
> }
>
> +static int parse_dt_v1(struct device_node *dt_node)
> +{
> + const char *temp_str;
> + int rc;
> + int i = nr_pnv_idle_states;
> +
> + if (!dt_node) {
> + pr_err("Invalid device_node\n");
> + return -EINVAL;
> + }
> +
> + rc = of_property_read_string(dt_node, "name", &temp_str);
> + if (rc) {
> + pr_err("error reading names rc= %d\n", rc);
> + return -EINVAL;
> + }
> + strncpy(pnv_idle_states[i].name, temp_str, PNV_IDLE_NAME_LEN);
> + rc = of_property_read_u32(dt_node, "residency-ns",
> + &pnv_idle_states[i].residency_ns);
> + if (rc) {
> + pr_err("error reading residency rc= %d\n", rc);
> + return -EINVAL;
> + }
> + rc = of_property_read_u32(dt_node, "latency-ns",
> + &pnv_idle_states[i].latency_ns);
> + if (rc) {
> + pr_err("error reading latency rc= %d\n", rc);
> + return -EINVAL;
> + }
> + rc = of_property_read_u32(dt_node, "flags",
> + &pnv_idle_states[i].flags);
> + if (rc) {
> + pr_err("error reading flags rc= %d\n", rc);
> + return -EINVAL;
> + }
> +
> + /* We are not expecting power8 device-tree in this format */
> + rc = of_property_read_u64(dt_node, "psscr-mask",
> + &pnv_idle_states[i].psscr_mask);
> + if (rc) {
> + pr_err("error reading psscr-mask rc= %d\n", rc);
> + return -EINVAL;
> + }
> + rc = of_property_read_u64(dt_node, "psscr",
> + &pnv_idle_states[i].psscr_val);
> + if (rc) {
> + pr_err("error reading psscr rc= %d\n", rc);
> + return -EINVAL;
> + }
> +
> + /*
> + * TODO : save the version strings in data structure
> + */
> + rc = of_property_read_string(dt_node, "type", &temp_str);
> + pr_info("type = %s\n", temp_str);
> + if (rc) {
> + pr_err("error reading type rc= %d\n", rc);
> + return -EINVAL;
> + }
> + if (strcmp(temp_str, "cpuidle") == 0)
> + pnv_idle_states[i].type = CPUIDLE_TYPE;
> + else if (strcmp(temp_str, "cpuoffline") == 0)
> + pnv_idle_states[i].type = CPUOFFLINE_TYPE;
> + else {
> + pr_err("Invalid type skipping %s\n",
> + pnv_idle_states[i].name);
> + return -EINVAL;
> + }
> + return 0;
> +
> +}
> /*
> * This function parses device-tree and populates all the information
> * into pnv_idle_states structure. It also sets up nr_pnv_idle_states
> @@ -1203,8 +1288,9 @@ static void __init pnv_probe_idle_states(void)
>
> static int pnv_parse_cpuidle_dt(void)
> {
> - struct device_node *np;
> + struct device_node *np, *np1, *dt_node;
> int nr_idle_states, i;
> + int additional_states = 0;
> int rc = 0;
> u32 *temp_u32;
> u64 *temp_u64;
> @@ -1218,8 +1304,14 @@ static int pnv_parse_cpuidle_dt(void)
> nr_idle_states = of_property_count_u32_elems(np,
> "ibm,cpu-idle-state-flags");
>
> - pnv_idle_states = kcalloc(nr_idle_states, sizeof(*pnv_idle_states),
> - GFP_KERNEL);
> + np1 = of_find_node_by_path("/ibm,opal/power-mgt/ibm,idle-states");
> + if (np1) {
> + for_each_child_of_node(np1, dt_node)
> + additional_states++;
> + }
> + pr_info("states in new format : %d\n", additional_states);
> + pnv_idle_states = kcalloc(nr_idle_states + additional_states,
> + sizeof(*pnv_idle_states), GFP_KERNEL);
> temp_u32 = kcalloc(nr_idle_states, sizeof(u32), GFP_KERNEL);
> temp_u64 = kcalloc(nr_idle_states, sizeof(u64), GFP_KERNEL);
> temp_string = kcalloc(nr_idle_states, sizeof(char *), GFP_KERNEL);
> @@ -1298,8 +1390,40 @@ static int pnv_parse_cpuidle_dt(void)
> for (i = 0; i < nr_idle_states; i++)
> strlcpy(pnv_idle_states[i].name, temp_string[i],
> PNV_IDLE_NAME_LEN);
> +
> + /* Mark states as CPUIDLE_TYPE /CPUOFFLINE for older version*/
> + for (i = 0; i < nr_idle_states; i++) {
> + if (pnv_idle_states[i].latency_ns > POWERNV_THRESHOLD_LATENCY_NS)
> + pnv_idle_states[i].type = CPUOFFLINE_TYPE;
> + else
> + pnv_idle_states[i].type = CPUIDLE_TYPE;
> + }
> nr_pnv_idle_states = nr_idle_states;
> - rc = 0;
> + /* Parsing node-based idle states device-tree format */
> + if (!np1) {
> + pr_info("dt does not contain ibm,idle_states");
> + goto out;
> + }
> + /* Parse each child node with appropriate parser_fn */
> + for_each_child_of_node(np1, dt_node) {
> + bool found_known_version = false;
> + /* we don't have state falling back to opal*/
> + for (i = 0; i < nr_known_versions ; i++) {
> + if (of_device_is_compatible(dt_node, known_versions[i].name)) {
> + rc = known_versions[i].parser_fn(dt_node);
> + if (rc) {
> + pr_err("%s could not parse\n", known_versions[i].name);
> + continue;
> + }
> + found_known_version = true;
> + }
> + }
> + if (!found_known_version) {
> + pr_info("Unsupported state, skipping all further state\n");
> + goto out;
> + }
> + nr_pnv_idle_states++;
> + }
> out:
> kfree(temp_u32);
> kfree(temp_u64);
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 84b1ebe212b3..a15514ebd1c3 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -26,7 +26,6 @@
> * Expose only those Hardware idle states via the cpuidle framework
> * that have latency value below POWERNV_THRESHOLD_LATENCY_NS.
> */
> -#define POWERNV_THRESHOLD_LATENCY_NS 200000
>
> static struct cpuidle_driver powernv_idle_driver = {
> .name = "powernv_idle",
> @@ -265,7 +264,7 @@ extern u32 pnv_get_supported_cpuidle_states(void);
> static int powernv_add_idle_states(void)
> {
> int nr_idle_states = 1; /* Snooze */
> - int dt_idle_states;
> + int dt_idle_states = 0;
> u32 has_stop_states = 0;
> int i;
> u32 supported_flags = pnv_get_supported_cpuidle_states();
> @@ -277,14 +276,19 @@ static int powernv_add_idle_states(void)
> goto out;
> }
>
> - /* TODO: Count only states which are eligible for cpuidle */
> - dt_idle_states = nr_pnv_idle_states;
> + /* Count only cpuidle states*/
> + for (i = 0; i < nr_pnv_idle_states; i++) {
> + if (pnv_idle_states[i].type == CPUIDLE_TYPE)
> + dt_idle_states++;
> + }
> + pr_info("idle states in dt = %d , states with idle flag = %d",
> + nr_pnv_idle_states, dt_idle_states);
>
> /*
> * Since snooze is used as first idle state, max idle states allowed is
> * CPUIDLE_STATE_MAX -1
> */
> - if (nr_pnv_idle_states > CPUIDLE_STATE_MAX - 1) {
> + if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
> pr_warn("cpuidle-powernv: discovered idle states more than allowed");
> dt_idle_states = CPUIDLE_STATE_MAX - 1;
> }
> @@ -305,8 +309,15 @@ static int powernv_add_idle_states(void)
> * Skip the platform idle state whose flag isn't in
> * the supported_cpuidle_states flag mask.
> */
> - if ((state->flags & supported_flags) != state->flags)
> + if ((state->flags & supported_flags) != state->flags) {
> + pr_warn("State %d does not have supported flag\n", i);
> + continue;
> + }
> + if (state->type != CPUIDLE_TYPE) {
> + pr_info("State %d is not idletype, it of %d type\n", i,
> + state->type);
> continue;
> + }
> /*
> * If an idle state has exit latency beyond
> * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> @@ -321,8 +332,10 @@ static int powernv_add_idle_states(void)
> exit_latency = DIV_ROUND_UP(state->latency_ns, 1000);
> target_residency = DIV_ROUND_UP(state->residency_ns, 1000);
>
> - if (has_stop_states && !(state->valid))
> + if (has_stop_states && !(state->valid)) {
> + pr_warn("State %d is invalid\n", i);
> continue;
> + }
>
> if (state->flags & OPAL_PM_TIMEBASE_STOP)
> stops_timebase = true;
> @@ -360,8 +373,10 @@ static int powernv_add_idle_states(void)
> state->psscr_mask);
> }
> #endif
> - else
> + else {
> + pr_warn("cpuidle-powernv : could not add state\n");
> continue;
> + }
> nr_idle_states++;
> }
> out:
>


2018-10-11 19:57:27

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/3] powernv/cpuidle: Pass pointers instead of values to stop loop

+ devicetree mail list

On 10/11/18 06:22, Akshay Adiga wrote:
> Passing pointer to the pnv_idle_state instead of psscr value and mask.
> This helps us to pass more information to the stop loop. This will help to
> figure out the method to enter/exit idle state.
>
> Signed-off-by: Akshay Adiga <[email protected]>
>
> ---
> Changes from v1 :
> - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
> idle code in C"
>
> arch/powerpc/include/asm/processor.h | 5 ++-
> arch/powerpc/platforms/powernv/idle.c | 47 ++++++++++-----------------
> drivers/cpuidle/cpuidle-powernv.c | 15 +++------
> 3 files changed, 24 insertions(+), 43 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 936795acba48..822d3236ad7f 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -43,6 +43,7 @@
> #include <asm/thread_info.h>
> #include <asm/ptrace.h>
> #include <asm/hw_breakpoint.h>
> +#include <asm/cpuidle.h>
>
> /* We do _not_ want to define new machine types at all, those must die
> * in favor of using the device-tree
> @@ -518,9 +519,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
> extern int powersave_nap; /* set if nap mode can be used in idle loop */
>
> extern void power7_idle_type(unsigned long type);
> -extern void power9_idle_type(unsigned long stop_psscr_val,
> - unsigned long stop_psscr_mask);
> -
> +extern void power9_idle_type(struct pnv_idle_states_t *state);
> extern void flush_instruction_cache(void);
> extern void hard_reset_now(void);
> extern void poweroff_now(void);
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 755918402591..681a23a066bb 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -44,8 +44,7 @@ int nr_pnv_idle_states;
> * The default stop state that will be used by ppc_md.power_save
> * function on platforms that support stop instruction.
> */
> -static u64 pnv_default_stop_val;
> -static u64 pnv_default_stop_mask;
> +struct pnv_idle_states_t *pnv_default_state;
> static bool default_stop_found;
>
> /*
> @@ -72,9 +71,7 @@ const int nr_known_versions = 1;
> * psscr value and mask of the deepest stop idle state.
> * Used when a cpu is offlined.
> */
> -static u64 pnv_deepest_stop_psscr_val;
> -static u64 pnv_deepest_stop_psscr_mask;
> -static u64 pnv_deepest_stop_flag;
> +static struct pnv_idle_states_t *pnv_deepest_state;
> static bool deepest_stop_found;
>
> static unsigned long power7_offline_type;
> @@ -96,7 +93,7 @@ static int pnv_save_sprs_for_deep_states(void)
> uint64_t hid5_val = mfspr(SPRN_HID5);
> uint64_t hmeer_val = mfspr(SPRN_HMEER);
> uint64_t msr_val = MSR_IDLE;
> - uint64_t psscr_val = pnv_deepest_stop_psscr_val;
> + uint64_t psscr_val = pnv_deepest_state->psscr_val;
>
> for_each_present_cpu(cpu) {
> uint64_t pir = get_hard_smp_processor_id(cpu);
> @@ -820,17 +817,15 @@ static unsigned long power9_offline_stop(unsigned long psscr)
> return srr1;
> }
>
> -static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
> - unsigned long stop_psscr_mask)
> +static unsigned long __power9_idle_type(struct pnv_idle_states_t *state)
> {
> unsigned long psscr;
> unsigned long srr1;
>
> if (!prep_irq_for_idle_irqsoff())
> return 0;
> -
> psscr = mfspr(SPRN_PSSCR);
> - psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
> + psscr = (psscr & ~state->psscr_mask) | state->psscr_val;
>
> __ppc64_runlatch_off();
> srr1 = power9_idle_stop(psscr, true);
> @@ -841,12 +836,10 @@ static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
> return srr1;
> }
>
> -void power9_idle_type(unsigned long stop_psscr_val,
> - unsigned long stop_psscr_mask)
> +void power9_idle_type(struct pnv_idle_states_t *state)
> {
> unsigned long srr1;
> -
> - srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask);
> + srr1 = __power9_idle_type(state);
> irq_set_pending_from_srr1(srr1);
> }
>
> @@ -855,7 +848,7 @@ void power9_idle_type(unsigned long stop_psscr_val,
> */
> void power9_idle(void)
> {
> - power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
> + power9_idle_type(pnv_default_state);
> }
>
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> @@ -974,8 +967,8 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
> unsigned long psscr;
>
> psscr = mfspr(SPRN_PSSCR);
> - psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
> - pnv_deepest_stop_psscr_val;
> + psscr = (psscr & ~pnv_deepest_state->psscr_mask) |
> + pnv_deepest_state->psscr_val;
> srr1 = power9_offline_stop(psscr);
> } else if (cpu_has_feature(CPU_FTR_ARCH_206) && power7_offline_type) {
> srr1 = power7_offline();
> @@ -1123,16 +1116,13 @@ static void __init pnv_power9_idle_init(void)
>
> if (max_residency_ns < state->residency_ns) {
> max_residency_ns = state->residency_ns;
> - pnv_deepest_stop_psscr_val = state->psscr_val;
> - pnv_deepest_stop_psscr_mask = state->psscr_mask;
> - pnv_deepest_stop_flag = state->flags;
> + pnv_deepest_state = state;
> deepest_stop_found = true;
> }
>
> if (!default_stop_found &&
> (state->flags & OPAL_PM_STOP_INST_FAST)) {
> - pnv_default_stop_val = state->psscr_val;
> - pnv_default_stop_mask = state->psscr_mask;
> + pnv_default_state = state;
> default_stop_found = true;
> WARN_ON(state->flags & OPAL_PM_LOSE_FULL_CONTEXT);
> }
> @@ -1143,15 +1133,15 @@ static void __init pnv_power9_idle_init(void)
> } else {
> ppc_md.power_save = power9_idle;
> pr_info("cpuidle-powernv: Default stop: psscr = 0x%016llx,mask=0x%016llx\n",
> - pnv_default_stop_val, pnv_default_stop_mask);
> + pnv_default_state->psscr_val, pnv_default_state->psscr_mask);
> }
>
> if (unlikely(!deepest_stop_found)) {
> pr_warn("cpuidle-powernv: No suitable stop state for CPU-Hotplug. Offlined CPUs will busy wait");
> } else {
> pr_info("cpuidle-powernv: Deepest stop: psscr = 0x%016llx,mask=0x%016llx\n",
> - pnv_deepest_stop_psscr_val,
> - pnv_deepest_stop_psscr_mask);
> + pnv_deepest_state->psscr_val,
> + pnv_deepest_state->psscr_mask);
> }
>
> pr_info("cpuidle-powernv: First stop level that may lose SPRs = 0x%lld\n",
> @@ -1173,16 +1163,15 @@ static void __init pnv_disable_deep_states(void)
> pr_warn("cpuidle-powernv: Idle power-savings, CPU-Hotplug affected\n");
>
> if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> - (pnv_deepest_stop_flag & OPAL_PM_LOSE_FULL_CONTEXT)) {
> + (pnv_deepest_state->flags & OPAL_PM_LOSE_FULL_CONTEXT)) {
> /*
> * Use the default stop state for CPU-Hotplug
> * if available.
> */
> if (default_stop_found) {
> - pnv_deepest_stop_psscr_val = pnv_default_stop_val;
> - pnv_deepest_stop_psscr_mask = pnv_default_stop_mask;
> + pnv_deepest_state = pnv_default_state;
> pr_warn("cpuidle-powernv: Offlined CPUs will stop with psscr = 0x%016llx\n",
> - pnv_deepest_stop_psscr_val);
> + pnv_deepest_state->psscr_val);
> } else { /* Fallback to snooze loop for CPU-Hotplug */
> deepest_stop_found = false;
> pr_warn("cpuidle-powernv: Offlined CPUs will busy wait\n");
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index a15514ebd1c3..5116d5991d30 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -35,13 +35,7 @@ static struct cpuidle_driver powernv_idle_driver = {
> static int max_idle_state __read_mostly;
> static struct cpuidle_state *cpuidle_state_table __read_mostly;
>
> -struct stop_psscr_table {
> - u64 val;
> - u64 mask;
> -};
> -
> -static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
> -
> +struct pnv_idle_states_t idx_to_state_ptr[CPUIDLE_STATE_MAX] __read_mostly;
> static u64 default_snooze_timeout __read_mostly;
> static bool snooze_timeout_en __read_mostly;
>
> @@ -143,8 +137,9 @@ static int stop_loop(struct cpuidle_device *dev,
> struct cpuidle_driver *drv,
> int index)
> {
> - power9_idle_type(stop_psscr_table[index].val,
> - stop_psscr_table[index].mask);
> + struct pnv_idle_states_t *state;
> + state = &pnv_idle_states[index];
> + power9_idle_type(state);
> return index;
> }
>
> @@ -242,8 +237,6 @@ static inline void add_powernv_state(int index, const char *name,
> powernv_states[index].exit_latency = exit_latency;
> powernv_states[index].enter = idle_fn;
> /* For power8 and below psscr_* will be 0 */
> - stop_psscr_table[index].val = psscr_val;
> - stop_psscr_table[index].mask = psscr_mask;
> }
>
> /*
>


2018-10-11 19:58:28

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/3] cpuidle/powernv: save-restore sprs in opal

+ devicetree mail list

On 10/11/18 06:22, Akshay Adiga wrote:
> From: Abhishek Goel <[email protected]>
>
> This patch moves the saving and restoring of sprs for P9 cpuidle
> from kernel to opal.
> In an attempt to make the powernv idle code backward compatible,
> and to some extent forward compatible, add support for pre-stop entry
> and post-stop exit actions in OPAL. If a kernel knows about this
> opal call, then just a firmware supporting newer hardware is required,
> instead of waiting for kernel updates.
>
> Signed-off-by: Abhishek Goel <[email protected]>
> Signed-off-by: Akshay Adiga <[email protected]>
> ---
> Changes from v1 :
> - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s
> idle code in C"
> - Set a global variable "request_opal_call" to indicate that deep
> states should make opal_call.
> - All the states that loses hypervisor states will be handled by OPAL
> - All the decision making such as identifying first thread in
> the core and taking locks before restoring in such cases have also been
> moved to OPAL
> arch/powerpc/include/asm/opal-api.h | 4 +-
> arch/powerpc/include/asm/opal.h | 3 +
> arch/powerpc/include/asm/processor.h | 3 +-
> arch/powerpc/kernel/idle_book3s.S | 6 +-
> arch/powerpc/platforms/powernv/idle.c | 88 +++++++++++++------
> .../powerpc/platforms/powernv/opal-wrappers.S | 2 +
> 6 files changed, 77 insertions(+), 29 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 8365353330b4..93ea1f79e295 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -210,7 +210,9 @@
> #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164
> #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165
> #define OPAL_NX_COPROC_INIT 167
> -#define OPAL_LAST 167
> +#define OPAL_IDLE_SAVE 170
> +#define OPAL_IDLE_RESTORE 171
> +#define OPAL_LAST 171
>
> #define QUIESCE_HOLD 1 /* Spin all calls at entry */
> #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index ff3866473afe..26995e16171e 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -356,6 +356,9 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
> extern void opal_shutdown(void);
> extern int opal_resync_timebase(void);
>
> +extern int opal_cpuidle_save(u64 psscr);
> +extern int opal_cpuidle_restore(u64 psscr, u64 srr1);
> +
> extern void opal_lpc_init(void);
>
> extern void opal_kmsg_init(void);
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 822d3236ad7f..26fa6c1836f4 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -510,7 +510,8 @@ static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
>
> /* asm stubs */
> extern unsigned long isa300_idle_stop_noloss(unsigned long psscr_val);
> -extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val);
> +extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val,
> + bool request_opal_call);
> extern unsigned long isa206_idle_insn_mayloss(unsigned long type);
>
> extern unsigned long cpuidle_disable;
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index ffdee1ab4388..a2014d152035 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -52,14 +52,16 @@ _GLOBAL(isa300_idle_stop_noloss)
> _GLOBAL(isa300_idle_stop_mayloss)
> mtspr SPRN_PSSCR,r3
> std r1,PACAR1(r13)
> - mflr r4
> + mflr r7
> mfcr r5
> /* use stack red zone rather than a new frame */
> addi r6,r1,-INT_FRAME_SIZE
> SAVE_GPR(2, r6)
> SAVE_NVGPRS(r6)
> - std r4,_LINK(r6)
> + std r7,_LINK(r6)
> std r5,_CCR(r6)
> + cmpwi r4,0
> + bne opal_cpuidle_save
> PPC_STOP
> b . /* catch bugs */
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 681a23a066bb..bcfe08022e65 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -171,6 +171,7 @@ static void pnv_fastsleep_workaround_apply(void *info)
>
> static bool power7_fastsleep_workaround_entry = true;
> static bool power7_fastsleep_workaround_exit = true;
> +static bool request_opal_call = false;
>
> /*
> * Used to store fastsleep workaround state
> @@ -604,6 +605,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> unsigned long mmcr0 = 0;
> struct p9_sprs sprs;
> bool sprs_saved = false;
> + bool is_hv_loss = false;
>
> memset(&sprs, 0, sizeof(sprs));
>
> @@ -648,7 +650,9 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> */
> mmcr0 = mfspr(SPRN_MMCR0);
> }
> - if ((psscr & PSSCR_RL_MASK) >= pnv_first_hv_loss_level) {
> +
> + is_hv_loss = (psscr & PSSCR_RL_MASK) >= pnv_first_hv_loss_level;
> + if (is_hv_loss && (!request_opal_call)) {
> sprs.lpcr = mfspr(SPRN_LPCR);
> sprs.hfscr = mfspr(SPRN_HFSCR);
> sprs.fscr = mfspr(SPRN_FSCR);
> @@ -674,7 +678,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> atomic_start_thread_idle();
> }
>
> - srr1 = isa300_idle_stop_mayloss(psscr);
> + srr1 = isa300_idle_stop_mayloss(psscr,
> + is_hv_loss && request_opal_call);
>
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> local_paca->requested_psscr = 0;
> @@ -685,6 +690,25 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> WARN_ON_ONCE(!srr1);
> WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
>
> + /*
> + * On POWER9, SRR1 bits do not match exactly as expected.
> + * SRR1_WS_GPRLOSS (10b) can also result in SPR loss, so
> + * always test PSSCR if there is any state loss.
> + */
> + if (likely(((psscr & PSSCR_PLS) >> 60) < pnv_first_hv_loss_level)) {
> + if (sprs_saved)
> + atomic_stop_thread_idle();
> + goto out;
> + }
> +
> + if (request_opal_call) {
> + opal_cpuidle_restore(psscr, srr1);
> + goto opal_return;
> + }
> +
> + /* HV state loss */
> + BUG_ON(!sprs_saved);
> +
> if ((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS) {
> unsigned long mmcra;
>
> @@ -712,19 +736,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> if (unlikely((srr1 & SRR1_WAKEMASK_P8) == SRR1_WAKEHMI))
> hmi_exception_realmode(NULL);
>
> - /*
> - * On POWER9, SRR1 bits do not match exactly as expected.
> - * SRR1_WS_GPRLOSS (10b) can also result in SPR loss, so
> - * always test PSSCR if there is any state loss.
> - */
> - if (likely((psscr & PSSCR_RL_MASK) < pnv_first_hv_loss_level)) {
> - if (sprs_saved)
> - atomic_stop_thread_idle();
> - goto out;
> - }
> -
> - /* HV state loss */
> - BUG_ON(!sprs_saved);
>
> atomic_lock_thread_idle();
>
> @@ -771,6 +782,7 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
>
> mtspr(SPRN_SPRG3, local_paca->sprg_vdso);
>
> +opal_return:
> if (!radix_enabled())
> __slb_restore_bolted_realmode();
>
> @@ -1284,6 +1296,7 @@ static int pnv_parse_cpuidle_dt(void)
> u32 *temp_u32;
> u64 *temp_u64;
> const char **temp_string;
> + bool fall_back_to_opal = false;
>
> np = of_find_node_by_path("/ibm,opal/power-mgt");
> if (!np) {
> @@ -1396,23 +1409,48 @@ static int pnv_parse_cpuidle_dt(void)
> /* Parse each child node with appropriate parser_fn */
> for_each_child_of_node(np1, dt_node) {
> bool found_known_version = false;
> - /* we don't have state falling back to opal*/
> - for (i = 0; i < nr_known_versions ; i++) {
> - if (of_device_is_compatible(dt_node, known_versions[i].name)) {
> - rc = known_versions[i].parser_fn(dt_node);
> + if (!fall_back_to_opal) {
> + /* we don't have state falling back to opal*/
> + for (i = 0; i < nr_known_versions ; i++) {
> + if (of_device_is_compatible(dt_node, known_versions[i].name)) {
> + rc = known_versions[i].parser_fn(dt_node);
> + if (rc) {
> + pr_err("%s could not parse\n", known_versions[i].name);
> + continue;
> + }
> + found_known_version = true;
> + }
> + }
> + }
> +
> + /*
> + * If any previous state falls back to opal_call
> + * Then all futher states will either call opal_call
> + * or not be included for cpuidle/cpuoffline.
> + *
> + * Moreover, having any intermediate state with no
> + * kernel support or opal support can be potentially
> + * dangerous, as hardware can potentially wakeup from
> + * that state. Hence, no futher states are added to
> + * to cpuidle/cpuoffline
> + */
> + if (!found_known_version || fall_back_to_opal) {
> + if (of_device_is_compatible(dt_node, "opal-support")) {
> + rc = known_versions[0].parser_fn(dt_node);
> if (rc) {
> - pr_err("%s could not parse\n", known_versions[i].name);
> + pr_err("%s could not parse\n", "opal-support");
> continue;
> }
> - found_known_version = true;
> + fall_back_to_opal = true;
> + } else {
> + pr_info("Unsupported state, skipping all further state\n");
> + goto out;
> }
> }
> - if (!found_known_version) {
> - pr_info("Unsupported state, skipping all further state\n");
> - goto out;
> - }
> nr_pnv_idle_states++;
> }
> + if (fall_back_to_opal)
> + request_opal_call = true;
> out:
> kfree(temp_u32);
> kfree(temp_u64);
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index 251528231a9e..7a039a81a67e 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -331,3 +331,5 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
> OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64);
> OPAL_CALL(opal_sensor_group_enable, OPAL_SENSOR_GROUP_ENABLE);
> OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT);
> +OPAL_CALL(opal_cpuidle_save, OPAL_IDLE_SAVE);
> +OPAL_CALL(opal_cpuidle_restore, OPAL_IDLE_RESTORE);
>