Over the years this series have been iterated and discussed at various Linux
conferences and LKML. In this new v10, a quite significant amount of changes
have been made to address comments from v8 and v9. A summary is available
below, although let's start with a brand new clarification of the motivation
behind this series.
For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
From a CPU idle state perspective, this means some states may be shared among a
group of CPUs (aka CPU cluster).
To deal with idle management of a group of CPUs, sometimes the kernel needs to
be involved to manage the last-man standing algorithm, simply because it can't
rely solely on power management FWs to deal with this. Depending on the
platform, of course.
There are a couple of typical scenarios for when the kernel needs to be in
control, dealing with synchronization of when the last CPU in a cluster is about
to enter a deep idle state.
1)
The kernel needs to carry out so called last-man activities before the
CPU cluster can enter a deep idle state. This may for example involve to
configure external logics for wakeups, as the GIC may no longer be functional
once a deep cluster idle state have been entered. Likewise, these operations
may need to be restored, when the first CPU wakes up.
2)
Other more generic I/O devices, such as an MMC controller for example, may be a
part of the same power domain as the CPU cluster, due to a shared power-rail.
For these scenarios, when the MMC controller is in use dealing with an MMC
request, a deeper idle state of the CPU cluster may needs to be temporarily
disabled. This is needed to retain the MMC controller in a functional state,
else it may loose its register-context in the middle of serving a request.
In this series, we are extending the generic PM domain (aka genpd) to be used
for also CPU devices. Hence the goal is to re-use much of its current code to
help us manage the last-man standing synchronization. Moreover, as we already
use genpd to model power domains for generic I/O devices, both 1) and 2) can be
address with its help.
Moreover, to address these problems for ARM64 DT based platforms, we are
deploying support for genpd and runtime PM to the PSCI FW driver - and finally
we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
layout.
The series has been tested on the QCOM 410c dragonboard and the Hisilicon Hikey
board. You may also find the code at:
git.linaro.org/people/ulf.hansson/linux-pm.git next
Kind regards
Ulf Hansson
Changes in v10:
- Quite significant changes have been to the PSCI driver deployment. According
to an agreement with Lorenzo, the hierarchical CPU layout for PSCI should be
orthogonal to whether the PSCI FW supports OSI or not. This has been taken
care of in this version.
- Drop the generic attach/detach helpers of CPUs to genpd, instead make that
related code internal to PSCI, for now.
- Fix "BUG: sleeping for invalid context" for hotplug, as reported by Raju.
- Addressed various comments from version 8 and 9.
- Clarified changelogs and re-wrote the cover-letter to better explain the
motivations behind these changes.
Changes in v9:
- Collect only a subset from the changes in v8.
- Patch 3 is new, documenting existing genpd flags. Future wise, this means
when a new genpd flag is invented, we must also properly document it.
- No changes have been made to the patches picked from v8.
- Dropped the text from v8 cover-letter[1], to avoid confusion. When posting v10
(or whatever the next version containing the rest becomes), I am going re-write
the cover-letter to clarify, more exactly, the problems this series intends to
solve. The earlier text was simply too vague.
[1]
https://lwn.net/Articles/758091/
Changes in v8:
- Added some tags for reviews and acks.
- Cleanup timer patch (patch6) according to comments from Rafael.
- Rebased series on top of v4.18rc1 - it applied cleanly, except for patch 5.
- While adopting patch 5 to new genpd changes, I took the opportunity to
improve the new function description a bit.
- Corrected malformed SPDX-License-Identifier in patch20.
Changes in v7:
- Addressed comments concerning the PSCI changes from Mark Rutland, which moves
the psci firmware driver to a new firmware subdir and change to force PSCI PC
mode during boot to cope with kexec'ed booted kernels.
- Added some maintainers in cc for the timer/nohz patches.
- Minor update to the new genpd governor, taking into account the state's
poweroff latency while validating the sleep duration time.
- Addressed a problem pointed out by Geert Uytterhoeven, around calling
pm_runtime_get|put() for CPUs that has not been attached to a CPU PM domain.
- Re-based on Linus' latest master.
Lina Iyer (5):
timer: Export next wakeup time of a CPU
dt: psci: Update DT bindings to support hierarchical PSCI states
cpuidle: dt: Support hierarchical CPU idle states
drivers: firmware: psci: Support hierarchical CPU idle states
arm64: dts: Convert to the hierarchical CPU topology layout for
MSM8916
Ulf Hansson (22):
PM / Domains: Add generic data pointer to genpd_power_state struct
PM / Domains: Add support for CPU devices to genpd
PM / Domains: Add genpd governor for CPUs
of: base: Add of_get_cpu_state_node() to get idle states for a CPU
node
ARM/ARM64: cpuidle: Let back-end init ops take the driver as input
drivers: firmware: psci: Move psci to separate directory
MAINTAINERS: Update files for PSCI
drivers: firmware: psci: Split psci_dt_cpu_init_idle()
drivers: firmware: psci: Simplify state node parsing
drivers: firmware: psci: Simplify error path of psci_dt_init()
drivers: firmware: psci: Announce support for OS initiated suspend
mode
drivers: firmware: psci: Prepare to use OS initiated suspend mode
drivers: firmware: psci: Prepare to support PM domains
drivers: firmware: psci: Add support for PM domains using genpd
drivers: firmware: psci: Add hierarchical domain idle states converter
drivers: firmware: psci: Introduce psci_dt_topology_init()
drivers: firmware: psci: Add a helper to attach a CPU to its PM domain
drivers: firmware: psci: Attach the CPU's device to its PM domain
drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
drivers: firmware: psci: Support CPU hotplug for the hierarchical
model
arm64: kernel: Respect the hierarchical CPU topology in DT for PSCI
arm64: dts: hikey: Convert to the hierarchical CPU topology layout
.../devicetree/bindings/arm/psci.txt | 166 ++++++++
MAINTAINERS | 2 +-
arch/arm/include/asm/cpuidle.h | 4 +-
arch/arm/kernel/cpuidle.c | 5 +-
arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 +++-
arch/arm64/boot/dts/qcom/msm8916.dtsi | 57 ++-
arch/arm64/include/asm/cpu_ops.h | 4 +-
arch/arm64/include/asm/cpuidle.h | 6 +-
arch/arm64/kernel/cpuidle.c | 6 +-
arch/arm64/kernel/setup.c | 3 +
drivers/base/power/domain.c | 74 +++-
drivers/base/power/domain_governor.c | 61 ++-
drivers/cpuidle/cpuidle-arm.c | 2 +-
drivers/cpuidle/dt_idle_states.c | 5 +-
drivers/firmware/Kconfig | 15 +-
drivers/firmware/Makefile | 3 +-
drivers/firmware/psci/Kconfig | 13 +
drivers/firmware/psci/Makefile | 4 +
drivers/firmware/{ => psci}/psci.c | 240 ++++++++---
drivers/firmware/psci/psci.h | 23 ++
drivers/firmware/{ => psci}/psci_checker.c | 0
drivers/firmware/psci/psci_pm_domain.c | 389 ++++++++++++++++++
drivers/of/base.c | 35 ++
drivers/soc/qcom/spm.c | 3 +-
include/linux/of.h | 8 +
include/linux/pm_domain.h | 19 +-
include/linux/psci.h | 6 +-
include/linux/tick.h | 8 +
include/uapi/linux/psci.h | 5 +
kernel/time/tick-sched.c | 13 +
30 files changed, 1163 insertions(+), 103 deletions(-)
create mode 100644 drivers/firmware/psci/Kconfig
create mode 100644 drivers/firmware/psci/Makefile
rename drivers/firmware/{ => psci}/psci.c (76%)
create mode 100644 drivers/firmware/psci/psci.h
rename drivers/firmware/{ => psci}/psci_checker.c (100%)
create mode 100644 drivers/firmware/psci/psci_pm_domain.c
--
2.17.1
To enable a device belonging to a CPU to be attached to a PM domain managed
by genpd, let's do a few changes to it, as to make it convenient to manage
the specifics around CPUs.
To be able to quickly find out what CPUs that are attached to a genpd,
which typically becomes useful from a genpd governor as following changes
is about to show, let's add a cpumask to the struct generic_pm_domain. At
the point when a CPU device gets attached to a genpd, let's update its
cpumask. Moreover, let's also propagate changes to the cpumask upwards in
the topology to the master PM domains. In this way, the cpumask for a genpd
hierarchically reflects all CPUs attached to the topology below it.
Finally, let's make this an opt-in feature, to avoid having to manage CPUs
and the cpumask for a genpd that doesn't need it. For that reason, let's
add a new genpd configuration bit, GENPD_FLAG_CPU_DOMAIN.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Don't allocate the cpumask when not used.
- Simplify the code that updates the cpumask.
- Document the GENPD_FLAG_CPU_DOMAIN.
---
drivers/base/power/domain.c | 66 ++++++++++++++++++++++++++++++++++++-
include/linux/pm_domain.h | 13 ++++++++
2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index e27b91d36a2a..c3ff8e395308 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -20,6 +20,7 @@
#include <linux/sched.h>
#include <linux/suspend.h>
#include <linux/export.h>
+#include <linux/cpu.h>
#include "power.h"
@@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = {
#define genpd_is_irq_safe(genpd) (genpd->flags & GENPD_FLAG_IRQ_SAFE)
#define genpd_is_always_on(genpd) (genpd->flags & GENPD_FLAG_ALWAYS_ON)
#define genpd_is_active_wakeup(genpd) (genpd->flags & GENPD_FLAG_ACTIVE_WAKEUP)
+#define genpd_is_cpu_domain(genpd) (genpd->flags & GENPD_FLAG_CPU_DOMAIN)
static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev,
const struct generic_pm_domain *genpd)
@@ -1377,6 +1379,56 @@ static void genpd_free_dev_data(struct device *dev,
dev_pm_put_subsys_data(dev);
}
+static void __genpd_update_cpumask(struct generic_pm_domain *genpd,
+ int cpu, bool set, unsigned int depth)
+{
+ struct gpd_link *link;
+
+ if (!genpd_is_cpu_domain(genpd))
+ return;
+
+ list_for_each_entry(link, &genpd->slave_links, slave_node) {
+ struct generic_pm_domain *master = link->master;
+
+ genpd_lock_nested(master, depth + 1);
+ __genpd_update_cpumask(master, cpu, set, depth + 1);
+ genpd_unlock(master);
+ }
+
+ if (set)
+ cpumask_set_cpu(cpu, genpd->cpus);
+ else
+ cpumask_clear_cpu(cpu, genpd->cpus);
+}
+
+static void genpd_update_cpumask(struct generic_pm_domain *genpd,
+ struct device *dev, bool set)
+{
+ int cpu;
+
+ if (!genpd_is_cpu_domain(genpd))
+ return;
+
+ for_each_possible_cpu(cpu) {
+ if (get_cpu_device(cpu) == dev) {
+ __genpd_update_cpumask(genpd, cpu, set, 0);
+ return;
+ }
+ }
+}
+
+static void genpd_set_cpumask(struct generic_pm_domain *genpd,
+ struct device *dev)
+{
+ genpd_update_cpumask(genpd, dev, true);
+}
+
+static void genpd_clear_cpumask(struct generic_pm_domain *genpd,
+ struct device *dev)
+{
+ genpd_update_cpumask(genpd, dev, false);
+}
+
static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
struct gpd_timing_data *td)
{
@@ -1398,6 +1450,8 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
if (ret)
goto out;
+ genpd_set_cpumask(genpd, dev);
+
dev_pm_domain_set(dev, &genpd->domain);
genpd->device_count++;
@@ -1459,6 +1513,7 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
if (genpd->detach_dev)
genpd->detach_dev(genpd, dev);
+ genpd_clear_cpumask(genpd, dev);
dev_pm_domain_set(dev, NULL);
list_del_init(&pdd->list_node);
@@ -1686,11 +1741,18 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
if (genpd_is_always_on(genpd) && !genpd_status_on(genpd))
return -EINVAL;
+ if (genpd_is_cpu_domain(genpd) &&
+ !zalloc_cpumask_var(&genpd->cpus, GFP_KERNEL))
+ return -ENOMEM;
+
/* Use only one "off" state if there were no states declared */
if (genpd->state_count == 0) {
ret = genpd_set_default_power_state(genpd);
- if (ret)
+ if (ret) {
+ if (genpd_is_cpu_domain(genpd))
+ free_cpumask_var(genpd->cpus);
return ret;
+ }
} else if (!gov) {
pr_warn("%s : no governor for states\n", genpd->name);
}
@@ -1736,6 +1798,8 @@ static int genpd_remove(struct generic_pm_domain *genpd)
list_del(&genpd->gpd_list_node);
genpd_unlock(genpd);
cancel_work_sync(&genpd->power_off_work);
+ if (genpd_is_cpu_domain(genpd))
+ free_cpumask_var(genpd->cpus);
if (genpd->free_state) {
kfree(genpd->states);
genpd->states = NULL;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index f9e09bd4152c..5a4673605d22 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -16,6 +16,7 @@
#include <linux/of.h>
#include <linux/notifier.h>
#include <linux/spinlock.h>
+#include <linux/cpumask.h>
/*
* Flags to control the behaviour of a genpd.
@@ -42,11 +43,22 @@
* GENPD_FLAG_ACTIVE_WAKEUP: Instructs genpd to keep the PM domain powered
* on, in case any of its attached devices is used
* in the wakeup path to serve system wakeups.
+ *
+ * GENPD_FLAG_CPU_DOMAIN: Instructs genpd that it should expect to get
+ * devices attached, which may belong to CPUs or
+ * possibly have subdomains with CPUs attached.
+ * This flag enables the genpd backend driver to
+ * deploy idle power management support for CPUs
+ * and groups of CPUs. Note that, the backend
+ * driver must then comply with the so called,
+ * last-man-standing algorithm, for the CPUs in the
+ * PM domain.
*/
#define GENPD_FLAG_PM_CLK (1U << 0)
#define GENPD_FLAG_IRQ_SAFE (1U << 1)
#define GENPD_FLAG_ALWAYS_ON (1U << 2)
#define GENPD_FLAG_ACTIVE_WAKEUP (1U << 3)
+#define GENPD_FLAG_CPU_DOMAIN (1U << 4)
enum gpd_status {
GPD_STATE_ACTIVE = 0, /* PM domain is active */
@@ -93,6 +105,7 @@ struct generic_pm_domain {
unsigned int suspended_count; /* System suspend device counter */
unsigned int prepared_count; /* Suspend counter of prepared devices */
unsigned int performance_state; /* Aggregated max performance state */
+ cpumask_var_t cpus; /* A cpumask of the attached CPUs */
int (*power_off)(struct generic_pm_domain *domain);
int (*power_on)(struct generic_pm_domain *domain);
unsigned int (*opp_to_performance_state)(struct generic_pm_domain *genpd,
--
2.17.1
From: Lina Iyer <[email protected]>
Currently CPU's idle states are represented in a flattened model, via the
"cpu-idle-states" binding from within the CPU's device nodes.
Support the hierarchical layout during parsing and validating of the CPU's
idle states. This is simply done by calling the new OF helper,
of_get_cpu_state_node().
Cc: Lina Iyer <[email protected]>
Suggested-by: Sudeep Holla <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
drivers/cpuidle/dt_idle_states.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/cpuidle/dt_idle_states.c b/drivers/cpuidle/dt_idle_states.c
index 53342b7f1010..13f9b7cd32d1 100644
--- a/drivers/cpuidle/dt_idle_states.c
+++ b/drivers/cpuidle/dt_idle_states.c
@@ -118,8 +118,7 @@ static bool idle_state_valid(struct device_node *state_node, unsigned int idx,
for (cpu = cpumask_next(cpumask_first(cpumask), cpumask);
cpu < nr_cpu_ids; cpu = cpumask_next(cpu, cpumask)) {
cpu_node = of_cpu_device_node_get(cpu);
- curr_state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
- idx);
+ curr_state_node = of_get_cpu_state_node(cpu_node, idx);
if (state_node != curr_state_node)
valid = false;
@@ -176,7 +175,7 @@ int dt_init_idle_driver(struct cpuidle_driver *drv,
cpu_node = of_cpu_device_node_get(cpumask_first(cpumask));
for (i = 0; ; i++) {
- state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ state_node = of_get_cpu_state_node(cpu_node, i);
if (!state_node)
break;
--
2.17.1
Let's split the psci_dt_cpu_init_idle() function into two functions, as to
allow following changes to re-use some of the code.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
drivers/firmware/psci/psci.c | 42 ++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 878c4dcf0118..d50b46a0528f 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -270,10 +270,27 @@ static int __init psci_features(u32 psci_func_id)
#ifdef CONFIG_CPU_IDLE
static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
+static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
+{
+ int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
+
+ if (err) {
+ pr_warn("%pOF missing arm,psci-suspend-param property\n", np);
+ return err;
+ }
+
+ if (!psci_power_state_is_valid(*state)) {
+ pr_warn("Invalid PSCI power state %#x\n", *state);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
struct device_node *cpu_node, int cpu)
{
- int i, ret, count = 0;
+ int i, ret = 0, count = 0;
u32 *psci_states;
struct device_node *state_node;
@@ -292,29 +309,16 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
return -ENOMEM;
for (i = 0; i < count; i++) {
- u32 state;
-
state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ ret = psci_dt_parse_state_node(state_node, &psci_states[i]);
+ of_node_put(state_node);
- ret = of_property_read_u32(state_node,
- "arm,psci-suspend-param",
- &state);
- if (ret) {
- pr_warn(" * %pOF missing arm,psci-suspend-param property\n",
- state_node);
- of_node_put(state_node);
+ if (ret)
goto free_mem;
- }
- of_node_put(state_node);
- pr_debug("psci-power-state %#x index %d\n", state, i);
- if (!psci_power_state_is_valid(state)) {
- pr_warn("Invalid PSCI power state %#x\n", state);
- ret = -EINVAL;
- goto free_mem;
- }
- psci_states[i] = state;
+ pr_debug("psci-power-state %#x index %d\n", psci_states[i], i);
}
+
/* Idle states parsed correctly, initialize per-cpu pointer */
per_cpu(psci_power_state, cpu) = psci_states;
return 0;
--
2.17.1
If the hierarchical CPU topology is used, but the OS initiated mode isn't
supported, we rely solely on the regular cpuidle framework to manage the
idle state selection.
For this reason, introduce a new PSCI DT helper function,
psci_dt_pm_domains_parse_states(), which converts the hierarchically
described domain idle states, into regular flattened cpuidle states.
During the conversion, let's also insert the converted states into the
cpuidle driver's array of idle states, as to enables the cpuidle framework
to manage them.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch.
---
drivers/firmware/psci/psci.h | 2 +
drivers/firmware/psci/psci_pm_domain.c | 108 +++++++++++++++++++++++++
2 files changed, 110 insertions(+)
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
index 8cf6d7206fab..05af462cc96e 100644
--- a/drivers/firmware/psci/psci.h
+++ b/drivers/firmware/psci/psci.h
@@ -13,6 +13,8 @@ int psci_dt_parse_state_node(struct device_node *np, u32 *state);
#ifdef CONFIG_CPU_IDLE
int psci_dt_init_pm_domains(struct device_node *np);
+int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, u32 *psci_states);
#else
static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
#endif
diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
index d0dc38e96f85..6c9d6a644c7f 100644
--- a/drivers/firmware/psci/psci_pm_domain.c
+++ b/drivers/firmware/psci/psci_pm_domain.c
@@ -14,6 +14,10 @@
#include <linux/pm_domain.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/cpuidle.h>
+#include <linux/cpu_pm.h>
+
+#include <asm/cpuidle.h>
#include "psci.h"
@@ -97,6 +101,53 @@ static int psci_pd_parse_states(struct device_node *np,
return ret;
}
+static int psci_pd_enter_pc(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
+{
+ return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx);
+}
+
+static void psci_pd_enter_s2idle_pc(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
+{
+ psci_pd_enter_pc(dev, drv, idx);
+}
+
+static void psci_pd_convert_states(struct cpuidle_state *idle_state,
+ u32 *psci_state, struct genpd_power_state *state)
+{
+ u32 *state_data = state->data;
+ u64 target_residency_us = state->residency_ns;
+ u64 exit_latency_us = state->power_on_latency_ns +
+ state->power_off_latency_ns;
+
+ *psci_state = *state_data;
+ do_div(target_residency_us, 1000);
+ idle_state->target_residency = target_residency_us;
+ do_div(exit_latency_us, 1000);
+ idle_state->exit_latency = exit_latency_us;
+ idle_state->enter = &psci_pd_enter_pc;
+ idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc;
+ idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+
+ strncpy(idle_state->name, to_of_node(state->fwnode)->name,
+ CPUIDLE_NAME_LEN - 1);
+ strncpy(idle_state->desc, to_of_node(state->fwnode)->name,
+ CPUIDLE_NAME_LEN - 1);
+}
+
+static bool psci_pd_is_provider(struct device_node *np)
+{
+ struct psci_pd_provider *pd_prov, *it;
+
+ list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) {
+ if (pd_prov->node == np)
+ return true;
+ }
+
+ return false;
+}
+
static int psci_pd_init(struct device_node *np)
{
struct generic_pm_domain *pd;
@@ -259,4 +310,61 @@ int psci_dt_init_pm_domains(struct device_node *np)
pr_err("failed to create CPU PM domains ret=%d\n", ret);
return ret;
}
+
+int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, u32 *psci_states)
+{
+ struct genpd_power_state *pd_states;
+ struct of_phandle_args args;
+ int ret, pd_state_count, i, idx, psci_idx = drv->state_count - 2;
+ struct device_node *np = of_node_get(cpu_node);
+
+ /* Walk the CPU topology to find compatible domain idle states. */
+ while (np) {
+ ret = of_parse_phandle_with_args(np, "power-domains",
+ "#power-domain-cells", 0, &args);
+ of_node_put(np);
+ if (ret)
+ return 0;
+
+ np = args.np;
+
+ /* Verify that the node represents a psci pd provider. */
+ if (!psci_pd_is_provider(np)) {
+ of_node_put(np);
+ return 0;
+ }
+
+ /* Parse for compatible domain idle states. */
+ ret = psci_pd_parse_states(np, &pd_states, &pd_state_count);
+ if (ret) {
+ of_node_put(np);
+ return ret;
+ }
+
+ i = 0;
+ idx = drv->state_count;
+ while (i < pd_state_count && idx < CPUIDLE_STATE_MAX) {
+ psci_pd_convert_states(&drv->states[idx + i],
+ &psci_states[idx - 1 + i], &pd_states[i]);
+
+ /*
+ * In the hierarchical CPU topology the master PM domain
+ * idle state's DT property, "arm,psci-suspend-param",
+ * don't contain the bits for the idle state of the CPU.
+ * Take that into account here.
+ */
+ psci_states[idx - 1 + i] |= psci_states[psci_idx];
+ pr_debug("psci-power-state %#x index %d\n",
+ psci_states[idx - 1 + i], idx - 1 + i);
+
+ kfree(pd_states[i].data);
+ i++;
+ }
+ drv->state_count += i;
+ kfree(pd_states);
+ }
+
+ return 0;
+}
#endif
--
2.17.1
To enable the OS to manage last-man standing activities for a CPU, while an
idle state for a group of CPUs is selected, let's convert the Hikey
platform into using the hierarchical CPU topology layout.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch.
---
arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 ++++++++++++++++++++---
1 file changed, 76 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
index 97d5bf2c6ec5..fa5b385cfbc4 100644
--- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
@@ -20,6 +20,64 @@
psci {
compatible = "arm,psci-0.2";
method = "smc";
+
+ CPU_PD0: cpu-pd0 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD1: cpu-pd1 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD2: cpu-pd2 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD3: cpu-pd3 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD4: cpu-pd4 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD5: cpu-pd5 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD6: cpu-pd6 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD7: cpu-pd7 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CLUSTER_PD0: cluster-pd0 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_SLEEP>;
+ };
+
+ CLUSTER_PD1: cluster-pd1 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_SLEEP>;
+ };
};
cpus {
@@ -70,9 +128,8 @@
};
CLUSTER_SLEEP: cluster-sleep {
- compatible = "arm,idle-state";
- local-timer-stop;
- arm,psci-suspend-param = <0x1010000>;
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000000>;
entry-latency-us = <1000>;
exit-latency-us = <700>;
min-residency-us = <2700>;
@@ -88,9 +145,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD0>;
+ power-domain-names = "psci";
};
cpu1: cpu@1 {
@@ -101,9 +159,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD1>;
+ power-domain-names = "psci";
};
cpu2: cpu@2 {
@@ -114,9 +173,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD2>;
+ power-domain-names = "psci";
};
cpu3: cpu@3 {
@@ -127,9 +187,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD3>;
+ power-domain-names = "psci";
};
cpu4: cpu@100 {
@@ -140,9 +201,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD4>;
+ power-domain-names = "psci";
};
cpu5: cpu@101 {
@@ -153,9 +215,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD5>;
+ power-domain-names = "psci";
};
cpu6: cpu@102 {
@@ -166,9 +229,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD6>;
+ power-domain-names = "psci";
};
cpu7: cpu@103 {
@@ -179,9 +243,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD7>;
+ power-domain-names = "psci";
};
CLUSTER0_L2: l2-cache0 {
--
2.17.1
When the hierarchical CPU topology is used and when a CPU has been put
offline (hotplug), that same CPU prevents its PM domain and thus also
potential master PM domains, from being powered off. This is because genpd
observes the CPU's struct device to remain being active from a runtime PM
point of view.
To deal with this, let's decrease the runtime PM usage count by calling
pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
offline. Consequentially, we must then increase the runtime PM usage for
the CPU, while putting it online again.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Make it work when the hierarchical CPU topology is used, which may be
used both for OSI and PC mode.
- Rework the code to prevent "BUG: sleeping function called from
invalid context".
---
drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index b03bccce0a5d..f62c4963eb62 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -15,6 +15,7 @@
#include <linux/acpi.h>
#include <linux/arm-smccc.h>
+#include <linux/cpu.h>
#include <linux/cpuidle.h>
#include <linux/errno.h>
#include <linux/linkage.h>
@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
static int psci_cpu_off(u32 state)
{
+ struct device *dev;
int err;
u32 fn;
+ /*
+ * When the hierarchical CPU topology is used, decrease the runtime PM
+ * usage count for the current CPU, as to allow other parts in the
+ * topology to enter low power states.
+ */
+ if (psci_dt_topology) {
+ dev = get_cpu_device(smp_processor_id());
+ pm_runtime_put_sync_suspend(dev);
+ }
+
fn = psci_function_id[PSCI_FN_CPU_OFF];
err = invoke_psci_fn(fn, state, 0, 0);
return psci_to_linux_errno(err);
@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
{
+ struct device *dev;
int err;
u32 fn;
@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
err = invoke_psci_fn(fn, cpuid, entry_point, 0);
/* Clear the domain state to start fresh. */
psci_set_domain_state(0);
+
+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
+ if (!err && psci_dt_topology) {
+ dev = get_cpu_device(cpuid);
+ pm_runtime_get_sync(dev);
+ }
+
return psci_to_linux_errno(err);
}
--
2.17.1
From: Lina Iyer <[email protected]>
In the hierarchical layout, we are creating power domains around each CPU
and describes the idle states for them inside the power domain provider
node. Note that, the CPU's idle states still needs to be compatible with
"arm,idle-state".
Furthermore, represent the CPU cluster as a separate master power domain,
powering the CPU's power domains. The cluster node, contains the idle
states for the cluster and each idle state needs to be compatible with the
"domain-idle-state".
If the running platform is using a PSCI FW that supports the OS initiated
CPU suspend mode, which likely should be the case unless the PSCI FW is
very old, this change triggers the PSCI driver to enable it.
Cc: Andy Gross <[email protected]>
Cc: David Brown <[email protected]>
Cc: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Added power-domain-names property to the CPU nodes, as to avoid
possible future churns, if ever multiple power-domains specifiers.
---
arch/arm64/boot/dts/qcom/msm8916.dtsi | 57 +++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi b/arch/arm64/boot/dts/qcom/msm8916.dtsi
index d302d8d639a1..cfafce4bfdf0 100644
--- a/arch/arm64/boot/dts/qcom/msm8916.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi
@@ -110,10 +110,11 @@
reg = <0x0>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs 0>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD0>;
+ power-domain-names = "psci";
};
CPU1: cpu@1 {
@@ -122,10 +123,11 @@
reg = <0x1>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs 0>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD1>;
+ power-domain-names = "psci";
};
CPU2: cpu@2 {
@@ -134,10 +136,11 @@
reg = <0x2>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs 0>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD2>;
+ power-domain-names = "psci";
};
CPU3: cpu@3 {
@@ -146,10 +149,11 @@
reg = <0x3>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs 0>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD3>;
+ power-domain-names = "psci";
};
L2_0: l2-cache {
@@ -166,12 +170,57 @@
min-residency-us = <2000>;
local-timer-stop;
};
+
+ CLUSTER_RET: cluster-retention {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000010>;
+ entry-latency-us = <500>;
+ exit-latency-us = <500>;
+ min-residency-us = <2000>;
+ };
+
+ CLUSTER_PWRDN: cluster-gdhs {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000030>;
+ entry-latency-us = <2000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <6000>;
+ };
};
};
psci {
compatible = "arm,psci-1.0";
method = "smc";
+
+ CPU_PD0: cpu-pd0 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CPU_PD1: cpu-pd1 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CPU_PD2: cpu-pd2 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CPU_PD3: cpu-pd3 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CLUSTER_PD: cluster-pd {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>;
+ };
};
pmu {
--
2.17.1
As it's now perfectly possible that a PM domain managed by genpd contains
devices belonging to CPUs, we should start to take into account the
residency values for the idle states during the state selection process.
The residency value specifies the minimum duration of time, the CPU or a
group of CPUs, needs to spend in an idle state to not waste energy entering
it.
To deal with this, let's add a new genpd governor, pm_domain_cpu_gov, that
may be used for a PM domain that have CPU devices attached or if the CPUs
are attached through subdomains.
The new governor computes the minimum expected idle duration time for the
online CPUs being attached to the PM domain and its subdomains. Then in the
state selection process, trying the deepest state first, it verifies that
the idle duration time satisfies the state's residency value.
It should be noted that, when computing the minimum expected idle duration
time, we use the information from tick_nohz_get_next_wakeup(), to find the
next wakeup for the related CPUs. Future wise, this may deserve to be
improved, as there are more reasons to why a CPU may be woken up from idle.
Cc: Thomas Gleixner <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Lina Iyer <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Fold in patch that extended the new genpd CPU governor to cope with
QoS constraints, as to avoid confusion.
- Simplified the code according to suggestions from Rafael.
---
drivers/base/power/domain_governor.c | 61 +++++++++++++++++++++++++++-
include/linux/pm_domain.h | 3 ++
2 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
index 99896fbf18e4..61a7c3c03c98 100644
--- a/drivers/base/power/domain_governor.c
+++ b/drivers/base/power/domain_governor.c
@@ -10,6 +10,9 @@
#include <linux/pm_domain.h>
#include <linux/pm_qos.h>
#include <linux/hrtimer.h>
+#include <linux/cpumask.h>
+#include <linux/ktime.h>
+#include <linux/tick.h>
static int dev_update_qos_constraint(struct device *dev, void *data)
{
@@ -211,8 +214,10 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
struct generic_pm_domain *genpd = pd_to_genpd(pd);
struct gpd_link *link;
- if (!genpd->max_off_time_changed)
+ if (!genpd->max_off_time_changed) {
+ genpd->state_idx = genpd->cached_power_down_state_idx;
return genpd->cached_power_down_ok;
+ }
/*
* We have to invalidate the cached results for the masters, so
@@ -237,6 +242,7 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
genpd->state_idx--;
}
+ genpd->cached_power_down_state_idx = genpd->state_idx;
return genpd->cached_power_down_ok;
}
@@ -245,6 +251,54 @@ static bool always_on_power_down_ok(struct dev_pm_domain *domain)
return false;
}
+static bool cpu_power_down_ok(struct dev_pm_domain *pd)
+{
+ struct generic_pm_domain *genpd = pd_to_genpd(pd);
+ ktime_t domain_wakeup, cpu_wakeup;
+ s64 idle_duration_ns;
+ int cpu, i;
+
+ /* Validate dev PM QoS constraints. */
+ if (!default_power_down_ok(pd))
+ return false;
+
+ if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
+ return true;
+
+ /*
+ * Find the next wakeup for any of the online CPUs within the PM domain
+ * and its subdomains. Note, we only need the genpd->cpus, as it already
+ * contains a mask of all CPUs from subdomains.
+ */
+ domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
+ for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
+ cpu_wakeup = tick_nohz_get_next_wakeup(cpu);
+ if (ktime_before(cpu_wakeup, domain_wakeup))
+ domain_wakeup = cpu_wakeup;
+ }
+
+ /* The minimum idle duration is from now - until the next wakeup. */
+ idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, ktime_get()));
+ if (idle_duration_ns <= 0)
+ return false;
+
+ /*
+ * Find the deepest idle state that has its residency value satisfied
+ * and by also taking into account the power off latency for the state.
+ * Start at the state picked by the dev PM QoS constraint validation.
+ */
+ i = genpd->state_idx;
+ do {
+ if (idle_duration_ns >= (genpd->states[i].residency_ns +
+ genpd->states[i].power_off_latency_ns)) {
+ genpd->state_idx = i;
+ return true;
+ }
+ } while (--i >= 0);
+
+ return false;
+}
+
struct dev_power_governor simple_qos_governor = {
.suspend_ok = default_suspend_ok,
.power_down_ok = default_power_down_ok,
@@ -257,3 +311,8 @@ struct dev_power_governor pm_domain_always_on_gov = {
.power_down_ok = always_on_power_down_ok,
.suspend_ok = default_suspend_ok,
};
+
+struct dev_power_governor pm_domain_cpu_gov = {
+ .suspend_ok = default_suspend_ok,
+ .power_down_ok = cpu_power_down_ok,
+};
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 5a4673605d22..969a9b36c0db 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -116,6 +116,7 @@ struct generic_pm_domain {
s64 max_off_time_ns; /* Maximum allowed "suspended" time. */
bool max_off_time_changed;
bool cached_power_down_ok;
+ bool cached_power_down_state_idx;
int (*attach_dev)(struct generic_pm_domain *domain,
struct device *dev);
void (*detach_dev)(struct generic_pm_domain *domain,
@@ -195,6 +196,7 @@ int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state);
extern struct dev_power_governor simple_qos_governor;
extern struct dev_power_governor pm_domain_always_on_gov;
+extern struct dev_power_governor pm_domain_cpu_gov;
#else
static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev)
@@ -238,6 +240,7 @@ static inline int dev_pm_genpd_set_performance_state(struct device *dev,
#define simple_qos_governor (*(struct dev_power_governor *)(NULL))
#define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL))
+#define pm_domain_cpu_gov (*(struct dev_power_governor *)(NULL))
#endif
#ifdef CONFIG_PM_GENERIC_DOMAINS_SLEEP
--
2.17.1
To be able to initiate the PM domain data structures, let's export a new
init function, psci_dt_topology_init() and make it call
psci_dt_init_pm_domains(). Following changes to the ARM64 code, invokes
this new init function.
At first glance, it may seem like feasible idea to hook into the existing
psci_dt_init() function, instead of adding yet another init function for
PSCI. However, this doesn't work because psci_dt_init() is called early in
the boot sequence, which means allocating dynamic data structures isn't yet
possible.
Moreover, additional following changes need to understand whether the
hierarchical PM domain topology has been successfully initialized,
therefore let's store the result from the initialization attempt into an
internal PSCI flag.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Simplified patch, by moving PSCI OSI related changes out into other
more appropriate patches.
- Add a flag to store the result of the PM domain initialization.
- Updated and clarified changelog.
---
drivers/firmware/psci/psci.c | 18 ++++++++++++++++++
include/linux/psci.h | 2 ++
2 files changed, 20 insertions(+)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 19af2093151b..5b481e91ccab 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -91,6 +91,7 @@ static u32 psci_function_id[PSCI_FN_MAX];
static DEFINE_PER_CPU(u32, domain_state);
static u32 psci_cpu_suspend_feature;
+static bool psci_dt_topology;
u32 psci_get_domain_state(void)
{
@@ -741,6 +742,23 @@ int __init psci_dt_init(void)
return ret;
}
+int __init psci_dt_topology_init(void)
+{
+ struct device_node *np;
+ int ret;
+
+ np = of_find_matching_node_and_match(NULL, psci_of_match, NULL);
+ if (!np)
+ return -ENODEV;
+
+ /* Initialize the CPU PM domains based on topology described in DT. */
+ ret = psci_dt_init_pm_domains(np);
+ psci_dt_topology = ret > 0;
+
+ of_node_put(np);
+ return ret;
+}
+
#ifdef CONFIG_ACPI
/*
* We use PSCI 0.2+ when ACPI is deployed on ARM64 and it's
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 4f29a3bff379..16beccccbbcc 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -55,8 +55,10 @@ extern struct psci_operations psci_ops;
#if defined(CONFIG_ARM_PSCI_FW)
int __init psci_dt_init(void);
+int __init psci_dt_topology_init(void);
#else
static inline int psci_dt_init(void) { return 0; }
+static inline int psci_dt_topology_init(void) { return 0; }
#endif
#if defined(CONFIG_ARM_PSCI_FW) && defined(CONFIG_ACPI)
--
2.17.1
When the hierarchical CPU topology layout is used in DT, we need to setup
the corresponding PM domain data structures, as to allow a CPU and a group
of CPUs to be power managed accordingly. Let's enable this by deploying
support through the genpd interface.
Additionally, when the OS initiated mode is supported by the PSCI FW, let's
also parse the domain idle states DT bindings as to make genpd responsible
for the state selection, when the states are compatible with
"domain-idle-state". Otherwise, when only Platform Coordinated mode is
supported, we rely solely on the state selection to be managed through the
regular cpuidle framework.
If the initialization of the PM domain data structures succeeds and the OS
initiated mode is supported, we try to switch to it. In case it fails,
let's fall back into a degraded mode, rather than bailing out and returning
an error code.
Due to that the OS initiated mode may become enabled, we need to adjust to
maintain backwards compatibility for a kernel started through a kexec call.
Do this by explicitly switch to Platform Coordinated mode during boot.
To try to initiate the PM domain data structures, the PSCI driver shall
call the new function, psci_dt_init_pm_domains(). However, this is done
from following changes.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in V10:
- Enable the PM domains to be used for both PC and OSI mode.
- Fixup error paths.
- Move the management of kexec started kernels into this patch.
- Rewrite changelog.
---
drivers/firmware/psci/Makefile | 2 +-
drivers/firmware/psci/psci.c | 7 +-
drivers/firmware/psci/psci.h | 6 +
drivers/firmware/psci/psci_pm_domain.c | 262 +++++++++++++++++++++++++
4 files changed, 275 insertions(+), 2 deletions(-)
create mode 100644 drivers/firmware/psci/psci_pm_domain.c
diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
index 1956b882470f..ff300f1fec86 100644
--- a/drivers/firmware/psci/Makefile
+++ b/drivers/firmware/psci/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
#
-obj-$(CONFIG_ARM_PSCI_FW) += psci.o
+obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 623591b541a4..19af2093151b 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -704,9 +704,14 @@ static int __init psci_1_0_init(struct device_node *np)
if (err)
return err;
- if (psci_has_osi_support())
+ if (psci_has_osi_support()) {
pr_info("OSI mode supported.\n");
+ /* Make sure we default to PC mode. */
+ invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
+ PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
+ }
+
return 0;
}
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
index 7d9d38fd57e1..8cf6d7206fab 100644
--- a/drivers/firmware/psci/psci.h
+++ b/drivers/firmware/psci/psci.h
@@ -11,4 +11,10 @@ void psci_set_domain_state(u32 state);
bool psci_has_osi_support(void);
int psci_dt_parse_state_node(struct device_node *np, u32 *state);
+#ifdef CONFIG_CPU_IDLE
+int psci_dt_init_pm_domains(struct device_node *np);
+#else
+static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
+#endif
+
#endif /* __PSCI_H */
diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
new file mode 100644
index 000000000000..d0dc38e96f85
--- /dev/null
+++ b/drivers/firmware/psci/psci_pm_domain.c
@@ -0,0 +1,262 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PM domains for CPUs via genpd - managed by PSCI.
+ *
+ * Copyright (C) 2018 Linaro Ltd.
+ * Author: Ulf Hansson <[email protected]>
+ *
+ */
+
+#define pr_fmt(fmt) "psci: " fmt
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/pm_domain.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "psci.h"
+
+#ifdef CONFIG_CPU_IDLE
+
+struct psci_pd_provider {
+ struct list_head link;
+ struct device_node *node;
+};
+
+static LIST_HEAD(psci_pd_providers);
+static bool osi_mode_enabled;
+
+static int psci_pd_power_off(struct generic_pm_domain *pd)
+{
+ struct genpd_power_state *state = &pd->states[pd->state_idx];
+ u32 *pd_state;
+ u32 composite_pd_state;
+
+ /* If we have failed to enable OSI mode, then abort power off. */
+ if (psci_has_osi_support() && !osi_mode_enabled)
+ return -EBUSY;
+
+ if (!state->data)
+ return 0;
+
+ /* When OSI mode is enabled, set the corresponding domain state. */
+ pd_state = state->data;
+ composite_pd_state = *pd_state | psci_get_domain_state();
+ psci_set_domain_state(composite_pd_state);
+
+ return 0;
+}
+
+static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
+ int state_count)
+{
+ int i, ret;
+ u32 psci_state, *psci_state_buf;
+
+ for (i = 0; i < state_count; i++) {
+ ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
+ &psci_state);
+ if (ret)
+ goto free_state;
+
+ psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
+ if (!psci_state_buf) {
+ ret = -ENOMEM;
+ goto free_state;
+ }
+ *psci_state_buf = psci_state;
+ states[i].data = psci_state_buf;
+ }
+
+ return 0;
+
+free_state:
+ while (i >= 0) {
+ kfree(states[i].data);
+ i--;
+ }
+ return ret;
+}
+
+static int psci_pd_parse_states(struct device_node *np,
+ struct genpd_power_state **states, int *state_count)
+{
+ int ret;
+
+ /* Parse the domain idle states. */
+ ret = of_genpd_parse_idle_states(np, states, state_count);
+ if (ret)
+ return ret;
+
+ /* Fill out the PSCI specifics for each found state. */
+ ret = psci_pd_parse_state_nodes(*states, *state_count);
+ if (ret)
+ kfree(*states);
+
+ return ret;
+}
+
+static int psci_pd_init(struct device_node *np)
+{
+ struct generic_pm_domain *pd;
+ struct psci_pd_provider *pd_provider;
+ struct dev_power_governor *pd_gov;
+ struct genpd_power_state *states = NULL;
+ int i, ret = -ENOMEM, state_count = 0;
+
+ pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+ if (!pd)
+ goto out;
+
+ pd_provider = kzalloc(sizeof(*pd_provider), GFP_KERNEL);
+ if (!pd_provider)
+ goto free_pd;
+
+ pd->name = kasprintf(GFP_KERNEL, "%pOF", np);
+ if (!pd->name)
+ goto free_pd_prov;
+
+ /*
+ * For OSI mode, parse the domain idle states and let genpd manage the
+ * state selection for those being compatible with "domain-idle-state".
+ */
+ if (psci_has_osi_support()) {
+ ret = psci_pd_parse_states(np, &states, &state_count);
+ if (ret)
+ goto free_name;
+ }
+
+ pd->name = kbasename(pd->name);
+ pd->power_off = psci_pd_power_off;
+ pd->states = states;
+ pd->state_count = state_count;
+ pd->flags |= GENPD_FLAG_IRQ_SAFE | GENPD_FLAG_CPU_DOMAIN;
+
+ /* Use governor for CPU PM domains if it has some states to manage. */
+ pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL;
+
+ ret = pm_genpd_init(pd, pd_gov, false);
+ if (ret)
+ goto free_state;
+
+ ret = of_genpd_add_provider_simple(np, pd);
+ if (ret)
+ goto remove_pd;
+
+ pd_provider->node = of_node_get(np);
+ list_add(&pd_provider->link, &psci_pd_providers);
+
+ pr_debug("init PM domain %s\n", pd->name);
+ return 0;
+
+remove_pd:
+ pm_genpd_remove(pd);
+free_state:
+ for (i = 0; i < state_count; i++)
+ kfree(states[i].data);
+ kfree(states);
+free_name:
+ kfree(pd->name);
+free_pd_prov:
+ kfree(pd_provider);
+free_pd:
+ kfree(pd);
+out:
+ pr_err("failed to init PM domain ret=%d %pOF\n", ret, np);
+ return ret;
+}
+
+static void psci_pd_remove(void)
+{
+ struct psci_pd_provider *pd_provider, *it;
+ struct generic_pm_domain *genpd;
+ int i;
+
+ list_for_each_entry_safe(pd_provider, it, &psci_pd_providers, link) {
+ of_genpd_del_provider(pd_provider->node);
+
+ genpd = of_genpd_remove_last(pd_provider->node);
+ if (!IS_ERR(genpd)) {
+ for (i = 0; i < genpd->state_count; i++)
+ kfree(genpd->states[i].data);
+ kfree(genpd->states);
+ kfree(genpd);
+ }
+
+ of_node_put(pd_provider->node);
+ list_del(&pd_provider->link);
+ kfree(pd_provider);
+ }
+}
+
+static int psci_pd_init_topology(struct device_node *np)
+{
+ struct device_node *node;
+ struct of_phandle_args child, parent;
+ int ret;
+
+ for_each_child_of_node(np, node) {
+ if (of_parse_phandle_with_args(node, "power-domains",
+ "#power-domain-cells", 0, &parent))
+ continue;
+
+ child.np = node;
+ child.args_count = 0;
+
+ ret = of_genpd_add_subdomain(&parent, &child);
+ of_node_put(parent.np);
+ if (ret) {
+ of_node_put(node);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+int psci_dt_init_pm_domains(struct device_node *np)
+{
+ struct device_node *node;
+ int ret, pd_count = 0;
+
+ /*
+ * Parse child nodes for the "#power-domain-cells" property and
+ * initialize a genpd/genpd-of-provider pair when it's found.
+ */
+ for_each_child_of_node(np, node) {
+ if (!of_find_property(node, "#power-domain-cells", NULL))
+ continue;
+
+ ret = psci_pd_init(node);
+ if (ret)
+ goto put_node;
+
+ pd_count++;
+ }
+
+ /* Bail out if not using the hierarchical CPU topology. */
+ if (!pd_count)
+ return 0;
+
+ /* Link genpd masters/subdomains to model the CPU topology. */
+ ret = psci_pd_init_topology(np);
+ if (ret)
+ goto remove_pd;
+
+ /* Try to enable OSI mode if supported. */
+ if (psci_has_osi_support())
+ osi_mode_enabled = psci_set_osi_mode();
+
+ pr_info("Initialized CPU PM domain topology\n");
+ return pd_count;
+
+put_node:
+ of_node_put(node);
+remove_pd:
+ if (pd_count)
+ psci_pd_remove();
+ pr_err("failed to create CPU PM domains ret=%d\n", ret);
+ return ret;
+}
+#endif
--
2.17.1
Following changes are about to implement support for PM domains to PSCI.
Those changes are mainly going to be implemented in a new separate file,
hence a couple of the internal PSCI functions needs to be shared to be
accessible. So, let's do that via adding new PSCI header file.
Moreover, the changes deploying support for PM domains, needs to be able to
switch the PSCI FW into the OS initiated mode. For that reason, let's add a
new function that deals with this and share it via the new PSCI header
file.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch. Re-places the earlier patch: "drivers: firmware: psci:
Share a few internal PSCI functions".
---
drivers/firmware/psci/psci.c | 28 +++++++++++++++++++++-------
drivers/firmware/psci/psci.h | 14 ++++++++++++++
2 files changed, 35 insertions(+), 7 deletions(-)
create mode 100644 drivers/firmware/psci/psci.h
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 8dbcdecc2ae4..623591b541a4 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -34,6 +34,8 @@
#include <asm/smp_plat.h>
#include <asm/suspend.h>
+#include "psci.h"
+
/*
* While a 64-bit OS can make calls with SMC32 calling conventions, for some
* calls it is necessary to use SMC64 to pass or return 64-bit values.
@@ -90,23 +92,35 @@ static u32 psci_function_id[PSCI_FN_MAX];
static DEFINE_PER_CPU(u32, domain_state);
static u32 psci_cpu_suspend_feature;
-static inline u32 psci_get_domain_state(void)
+u32 psci_get_domain_state(void)
{
return __this_cpu_read(domain_state);
}
-static inline void psci_set_domain_state(u32 state)
+void psci_set_domain_state(u32 state)
{
__this_cpu_write(domain_state, state);
}
+bool psci_set_osi_mode(void)
+{
+ int ret;
+
+ ret = invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
+ PSCI_1_0_SUSPEND_MODE_OSI, 0, 0);
+ if (ret)
+ pr_warn("failed to enable OSI mode: %d\n", ret);
+
+ return !ret;
+}
+
static inline bool psci_has_ext_power_state(void)
{
return psci_cpu_suspend_feature &
PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
}
-static inline bool psci_has_osi_support(void)
+bool psci_has_osi_support(void)
{
return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
}
@@ -285,10 +299,7 @@ static int __init psci_features(u32 psci_func_id)
psci_func_id, 0, 0);
}
-#ifdef CONFIG_CPU_IDLE
-static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
-
-static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
+int psci_dt_parse_state_node(struct device_node *np, u32 *state)
{
int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
@@ -305,6 +316,9 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
return 0;
}
+#ifdef CONFIG_CPU_IDLE
+static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
+
static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
struct device_node *cpu_node, int cpu)
{
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
new file mode 100644
index 000000000000..7d9d38fd57e1
--- /dev/null
+++ b/drivers/firmware/psci/psci.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSCI_H
+#define __PSCI_H
+
+struct device_node;
+
+bool psci_set_osi_mode(void);
+u32 psci_get_domain_state(void);
+void psci_set_domain_state(u32 state);
+bool psci_has_osi_support(void);
+int psci_dt_parse_state_node(struct device_node *np, u32 *state);
+
+#endif /* __PSCI_H */
--
2.17.1
When the hierarchical CPU topology layout is used in DT, let's allow the
CPU to be power managed through its PM domain and via runtime PM. In other
words, let's deploy runtime PM support to the PSCI driver during idle
management of the CPU.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch: Replaces the earlier patch "kernel/cpu_pm: Manage runtime
PM in the idle path for CPUs". In the end it seemed like a better
idea to start with something specific to PSCI, rather than (ab?)using
the generic functions cpu_pm_enter|exit().
- Do runtime PM get/put for the deepest idle state for the CPU.
---
drivers/firmware/psci/psci.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 94cd0431b004..b03bccce0a5d 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -20,6 +20,7 @@
#include <linux/linkage.h>
#include <linux/of.h>
#include <linux/pm.h>
+#include <linux/pm_runtime.h>
#include <linux/printk.h>
#include <linux/psci.h>
#include <linux/reboot.h>
@@ -319,6 +320,7 @@ int psci_dt_parse_state_node(struct device_node *np, u32 *state)
#ifdef CONFIG_CPU_IDLE
static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
+static DEFINE_PER_CPU_READ_MOSTLY(u32, psci_rpm_state_id);
static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
struct device_node *cpu_node, int cpu)
@@ -369,6 +371,9 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
ret = psci_dt_attach_cpu(cpu);
if (ret)
goto free_mem;
+
+ /* Store index of deepest state to later for runtime PM. */
+ per_cpu(psci_rpm_state_id, cpu) = drv->state_count - 1;
}
/* Idle states parsed correctly, initialize per-cpu pointer */
@@ -466,7 +471,9 @@ int psci_cpu_suspend_enter(unsigned long index)
{
int ret;
u32 *state = __this_cpu_read(psci_power_state);
- u32 composite_state = state[index - 1] | psci_get_domain_state();
+ u32 composite_state, rpm_state_id;
+ bool runtime_pm = false;
+ struct device *dev = NULL;
/*
* idle state index 0 corresponds to wfi, should never be called
@@ -475,11 +482,29 @@ int psci_cpu_suspend_enter(unsigned long index)
if (WARN_ON_ONCE(!index))
return -EINVAL;
+ /*
+ * Do runtime PM if we are using the hierarchical CPU toplogy, but only
+ * when cpuidle have selected the deepest idle state for the CPU.
+ */
+ if (psci_dt_topology) {
+ rpm_state_id = __this_cpu_read(psci_rpm_state_id);
+ runtime_pm = rpm_state_id == index;
+ if (runtime_pm) {
+ dev = get_cpu_device(smp_processor_id());
+ pm_runtime_put_sync_suspend(dev);
+ }
+ }
+
+ composite_state = state[index - 1] | psci_get_domain_state();
+
if (!psci_power_state_loses_context(composite_state))
ret = psci_ops.cpu_suspend(composite_state, 0);
else
ret = cpu_suspend(index, psci_suspend_finisher);
+ if (runtime_pm)
+ pm_runtime_get_sync(dev);
+
/* Clear the domain state to start fresh when back from idle. */
psci_set_domain_state(0);
--
2.17.1
To enable the OS initiated mode, the CPU topology needs to be described
using the hierarchical model in DT. When used, the idle state bits for the
CPU are created by ORing the bits for PM domain's idle state.
Let's prepare the PSCI driver to deal with this, via introducing a per CPU
variable called domain_state and by adding internal helpers to read/write
the value of the variable.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Use __this_cpu_read|write() rather than this_cpu_read|write().
---
drivers/firmware/psci/psci.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 4f0cbc95e41b..8dbcdecc2ae4 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -87,8 +87,19 @@ static u32 psci_function_id[PSCI_FN_MAX];
(PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
+static DEFINE_PER_CPU(u32, domain_state);
static u32 psci_cpu_suspend_feature;
+static inline u32 psci_get_domain_state(void)
+{
+ return __this_cpu_read(domain_state);
+}
+
+static inline void psci_set_domain_state(u32 state)
+{
+ __this_cpu_write(domain_state, state);
+}
+
static inline bool psci_has_ext_power_state(void)
{
return psci_cpu_suspend_feature &
@@ -187,6 +198,8 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
fn = psci_function_id[PSCI_FN_CPU_ON];
err = invoke_psci_fn(fn, cpuid, entry_point, 0);
+ /* Clear the domain state to start fresh. */
+ psci_set_domain_state(0);
return psci_to_linux_errno(err);
}
@@ -409,15 +422,17 @@ int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
static int psci_suspend_finisher(unsigned long index)
{
u32 *state = __this_cpu_read(psci_power_state);
+ u32 composite_state = state[index - 1] | psci_get_domain_state();
- return psci_ops.cpu_suspend(state[index - 1],
- __pa_symbol(cpu_resume));
+ return psci_ops.cpu_suspend(composite_state, __pa_symbol(cpu_resume));
}
int psci_cpu_suspend_enter(unsigned long index)
{
int ret;
u32 *state = __this_cpu_read(psci_power_state);
+ u32 composite_state = state[index - 1] | psci_get_domain_state();
+
/*
* idle state index 0 corresponds to wfi, should never be called
* from the cpu_suspend operations
@@ -425,11 +440,14 @@ int psci_cpu_suspend_enter(unsigned long index)
if (WARN_ON_ONCE(!index))
return -EINVAL;
- if (!psci_power_state_loses_context(state[index - 1]))
- ret = psci_ops.cpu_suspend(state[index - 1], 0);
+ if (!psci_power_state_loses_context(composite_state))
+ ret = psci_ops.cpu_suspend(composite_state, 0);
else
ret = cpu_suspend(index, psci_suspend_finisher);
+ /* Clear the domain state to start fresh when back from idle. */
+ psci_set_domain_state(0);
+
return ret;
}
--
2.17.1
Instead of having each psci init function taking care of the of_node_put(),
let's deal with that from psci_dt_init(), as this enables a bit simpler
error path for each psci init function.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
Changes in v10:
- None.
---
drivers/firmware/psci/psci.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 631e20720a22..6bfa47cbd174 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -609,9 +609,9 @@ static int __init psci_0_2_init(struct device_node *np)
int err;
err = get_set_conduit_method(np);
-
if (err)
- goto out_put_node;
+ return err;
+
/*
* Starting with v0.2, the PSCI specification introduced a call
* (PSCI_VERSION) that allows probing the firmware version, so
@@ -619,11 +619,7 @@ static int __init psci_0_2_init(struct device_node *np)
* can be carried out according to the specific version reported
* by firmware
*/
- err = psci_probe();
-
-out_put_node:
- of_node_put(np);
- return err;
+ return psci_probe();
}
/*
@@ -635,9 +631,8 @@ static int __init psci_0_1_init(struct device_node *np)
int err;
err = get_set_conduit_method(np);
-
if (err)
- goto out_put_node;
+ return err;
pr_info("Using PSCI v0.1 Function IDs from DT\n");
@@ -661,9 +656,7 @@ static int __init psci_0_1_init(struct device_node *np)
psci_ops.migrate = psci_migrate;
}
-out_put_node:
- of_node_put(np);
- return err;
+ return 0;
}
static const struct of_device_id psci_of_match[] __initconst = {
@@ -678,6 +671,7 @@ int __init psci_dt_init(void)
struct device_node *np;
const struct of_device_id *matched_np;
psci_initcall_t init_fn;
+ int ret;
np = of_find_matching_node_and_match(NULL, psci_of_match, &matched_np);
@@ -685,7 +679,10 @@ int __init psci_dt_init(void)
return -ENODEV;
init_fn = (psci_initcall_t)matched_np->data;
- return init_fn(np);
+ ret = init_fn(np);
+
+ of_node_put(np);
+ return ret;
}
#ifdef CONFIG_ACPI
--
2.17.1
Instead of iterating through all the state nodes in DT, to find out how
many states that needs to be allocated, let's use the number already known
by the cpuidle driver. In this way we can drop the iteration altogether.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch.
---
drivers/firmware/psci/psci.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index d50b46a0528f..cbfc936d251c 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -290,26 +290,20 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
struct device_node *cpu_node, int cpu)
{
- int i, ret = 0, count = 0;
+ int i, ret = 0, num_state_nodes = drv->state_count - 1;
u32 *psci_states;
struct device_node *state_node;
- /* Count idle states */
- while ((state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
- count))) {
- count++;
- of_node_put(state_node);
- }
-
- if (!count)
- return -ENODEV;
-
- psci_states = kcalloc(count, sizeof(*psci_states), GFP_KERNEL);
+ psci_states = kcalloc(num_state_nodes, sizeof(*psci_states),
+ GFP_KERNEL);
if (!psci_states)
return -ENOMEM;
- for (i = 0; i < count; i++) {
+ for (i = 0; i < num_state_nodes; i++) {
state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ if (!state_node)
+ break;
+
ret = psci_dt_parse_state_node(state_node, &psci_states[i]);
of_node_put(state_node);
@@ -319,6 +313,11 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
pr_debug("psci-power-state %#x index %d\n", psci_states[i], i);
}
+ if (i != num_state_nodes) {
+ ret = -ENODEV;
+ goto free_mem;
+ }
+
/* Idle states parsed correctly, initialize per-cpu pointer */
per_cpu(psci_power_state, cpu) = psci_states;
return 0;
--
2.17.1
From: Lina Iyer <[email protected]>
Currently CPU's idle states are represented in a flattened model, via the
"cpu-idle-states" binding from within the CPU's device nodes.
Support the hierarchical layout, simply by converting to calling the new OF
helper, of_get_cpu_state_node().
Cc: Lina Iyer <[email protected]>
Suggested-by: Sudeep Holla <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
drivers/firmware/psci/psci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index cbfc936d251c..631e20720a22 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -300,7 +300,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
return -ENOMEM;
for (i = 0; i < num_state_nodes; i++) {
- state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ state_node = of_get_cpu_state_node(cpu_node, i);
if (!state_node)
break;
--
2.17.1
In order to allow the CPU to be power managed through a potential PM domain
and the corresponding topology, it needs to be attached to it. For that
reason, check if the PM domain data structures have been initiated for PSCI
and if so, let's try to attach the CPU device to its PM domain.
However, before attaching the CPU to its PM domain, we need to check
whether the PSCI firmware supports OS initiated mode or not. If that isn't
the case, we rely solely on the cpuidle framework to deal with the idle
state selection, which means we need to parse DT and convert the
hierarchical described domain idle states into regular cpuidle states,
hence let's do that.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Attach CPU devices to their PM domains, regardless of OSI mode or PC
mode.
- For PC mode, convert domain idle states to generic cpuidle states
to let the cpuidle driver manage them.
---
drivers/firmware/psci/psci.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 5b481e91ccab..94cd0431b004 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -327,7 +327,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
u32 *psci_states;
struct device_node *state_node;
- psci_states = kcalloc(num_state_nodes, sizeof(*psci_states),
+ psci_states = kcalloc(CPUIDLE_STATE_MAX, sizeof(*psci_states),
GFP_KERNEL);
if (!psci_states)
return -ENOMEM;
@@ -351,6 +351,26 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
goto free_mem;
}
+ /*
+ * If the hierarchical CPU topology is used, let's attach the CPU device
+ * to its corresponding PM domain. If OSI mode isn't supported, pick up
+ * the additional cpuidle states, from the domain idle states described
+ * in the hierarchical DT layout, as to enable the cpuidle driver to
+ * manage them.
+ */
+ if (psci_dt_topology) {
+ if (!psci_has_osi_support()) {
+ ret = psci_dt_pm_domains_parse_states(drv, cpu_node,
+ psci_states);
+ if (ret)
+ goto free_mem;
+ }
+
+ ret = psci_dt_attach_cpu(cpu);
+ if (ret)
+ goto free_mem;
+ }
+
/* Idle states parsed correctly, initialize per-cpu pointer */
per_cpu(psci_power_state, cpu) = psci_states;
return 0;
--
2.17.1
Introduce a new PSCI DT helper function, psci_dt_attach_cpu(), which takes
a CPU number as an in-parameter and attaches the CPU's struct device to its
corresponding PM domain. Additionally, the helper prepares the CPU to be
power managed via runtime PM, which is the last step needed to enable the
interaction with the PM domain through the runtime PM callbacks.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch: Replaces "PM / Domains: Add helper functions to
attach/detach CPUs to/from genpd".
---
drivers/firmware/psci/psci.h | 1 +
drivers/firmware/psci/psci_pm_domain.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
index 05af462cc96e..fbc9980dee69 100644
--- a/drivers/firmware/psci/psci.h
+++ b/drivers/firmware/psci/psci.h
@@ -15,6 +15,7 @@ int psci_dt_parse_state_node(struct device_node *np, u32 *state);
int psci_dt_init_pm_domains(struct device_node *np);
int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
struct device_node *cpu_node, u32 *psci_states);
+int psci_dt_attach_cpu(int cpu);
#else
static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
#endif
diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
index 6c9d6a644c7f..b0fa7da8a0ce 100644
--- a/drivers/firmware/psci/psci_pm_domain.c
+++ b/drivers/firmware/psci/psci_pm_domain.c
@@ -12,8 +12,10 @@
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/cpu.h>
#include <linux/cpuidle.h>
#include <linux/cpu_pm.h>
@@ -367,4 +369,21 @@ int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
return 0;
}
+
+int psci_dt_attach_cpu(int cpu)
+{
+ struct device *dev = get_cpu_device(cpu);
+ int ret;
+
+ ret = dev_pm_domain_attach(dev, true);
+ if (ret)
+ return ret;
+
+ pm_runtime_irq_safe(dev);
+ pm_runtime_get_noresume(dev);
+ pm_runtime_set_active(dev);
+ pm_runtime_enable(dev);
+
+ return 0;
+}
#endif
--
2.17.1
From: Lina Iyer <[email protected]>
Update DT bindings to represent hierarchical CPU and CPU PM domain idle
states for PSCI. Also update the PSCI examples to clearly show how
flattened and hierarchical idle states can be represented in DT.
Cc: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Sudeep Holla <[email protected]>
---
Change in V10:
- Clarified that the new hierarchical representation is orthogonal to
OS-initiated vs platform-coordinated PSCI CPU suspend mode.
- Clarified the representation for "arm,psci-suspend-param" in regards
to the flattened vs hierarchical model.
- Added power-domain-names property to the CPU nodes, as to avoid
future churns, if ever multiple power-domains specifiers.
---
.../devicetree/bindings/arm/psci.txt | 166 ++++++++++++++++++
1 file changed, 166 insertions(+)
diff --git a/Documentation/devicetree/bindings/arm/psci.txt b/Documentation/devicetree/bindings/arm/psci.txt
index a2c4f1d52492..e6d3553c8df8 100644
--- a/Documentation/devicetree/bindings/arm/psci.txt
+++ b/Documentation/devicetree/bindings/arm/psci.txt
@@ -105,7 +105,173 @@ Case 3: PSCI v0.2 and PSCI v0.1.
...
};
+ARM systems can have multiple cores sometimes in hierarchical arrangement.
+This often, but not always, maps directly to the processor power topology of
+the system. Individual nodes in a topology have their own specific power states
+and can be better represented in DT hierarchically.
+
+For these cases, the definitions of the idle states for the CPUs and the CPU
+topology, must conform to the domain idle state specification [3]. The domain
+idle states themselves, must be compatible with the defined 'domain-idle-state'
+binding [1], and also need to specify the arm,psci-suspend-param property for
+each idle state.
+
+DT allows representing CPUs and CPU idle states in two different ways -
+
+The flattened model as given in Example 1, lists CPU's idle states followed by
+the domain idle state that the CPUs may choose. Note that the idle states are
+all compatible with "arm,idle-state". Additionally, for the domain idle state
+the "arm,psci-suspend-param" represents a superset of the CPU's idle state.
+
+Example 2 represents the hierarchical model of CPUs and domain idle states.
+CPUs define their domain provider in their psci DT node. The domain controls
+the power to the CPU and possibly other h/w blocks that would enter an idle
+state along with the CPU. The CPU's idle states may therefore be considered as
+the domain's idle states and have the compatible "arm,idle-state". Such domains
+may also be embedded within another domain that may represent common h/w blocks
+between these CPUs. The idle states of the CPU topology shall be represented as
+the domain's idle states. Note that for the domain idle state, the
+"arm,psci-suspend-param" represents idle states hierarchically.
+
+In PSCI firmware v1.0, the OS-Initiated mode is introduced. However, the
+flattened vs hierarchical DT representation is orthogonal to the OS-Initiated
+vs the platform-coordinated PSCI CPU suspend modes, thus should be considered
+independent of each other.
+
+The hierarchical representation helps and makes it easy to implement OSI mode
+and OS implementations may choose to mandate it. For the default platform-
+coordinated mode, both representations are viable options.
+
+Example 1: Flattened representation of CPU and domain idle states
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ CPU0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a53", "arm,armv8";
+ reg = <0x0>;
+ enable-method = "psci";
+ cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
+ <&CLUSTER_PWRDN>;
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57", "arm,armv8";
+ reg = <0x100>;
+ enable-method = "psci";
+ cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
+ <&CLUSTER_PWRDN>;
+ };
+
+ idle-states {
+ CPU_PWRDN: cpu-power-down {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x0000001>;
+ entry-latency-us = <10>;
+ exit-latency-us = <10>;
+ min-residency-us = <100>;
+ };
+
+ CLUSTER_RET: cluster-retention {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x1000011>;
+ entry-latency-us = <500>;
+ exit-latency-us = <500>;
+ min-residency-us = <2000>;
+ };
+
+ CLUSTER_PWRDN: cluster-power-down {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x1000031>;
+ entry-latency-us = <2000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <6000>;
+ };
+ };
+
+ psci {
+ compatible = "arm,psci-0.2";
+ method = "smc";
+ };
+
+Example 2: Hierarchical representation of CPU and domain idle states
+
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ CPU0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a53", "arm,armv8";
+ reg = <0x0>;
+ enable-method = "psci";
+ power-domains = <&CPU_PD0>;
+ power-domain-names = "psci";
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57", "arm,armv8";
+ reg = <0x100>;
+ enable-method = "psci";
+ power-domains = <&CPU_PD1>;
+ power-domain-names = "psci";
+ };
+
+ idle-states {
+ CPU_PWRDN: cpu-power-down {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x0000001>;
+ entry-latency-us = <10>;
+ exit-latency-us = <10>;
+ min-residency-us = <100>;
+ };
+
+ CLUSTER_RET: cluster-retention {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000010>;
+ entry-latency-us = <500>;
+ exit-latency-us = <500>;
+ min-residency-us = <2000>;
+ };
+
+ CLUSTER_PWRDN: cluster-power-down {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000030>;
+ entry-latency-us = <2000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <6000>;
+ };
+ };
+ };
+
+ psci {
+ compatible = "arm,psci-1.0";
+ method = "smc";
+
+ CPU_PD0: cpu-pd0 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CPU_PWRDN>;
+ power-domains = <&CLUSTER_PD>;
+ };
+
+ CPU_PD1: cpu-pd1 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CPU_PWRDN>;
+ power-domains = <&CLUSTER_PD>;
+ };
+
+ CLUSTER_PD: cluster-pd {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>;
+ };
+ };
+
[1] Kernel documentation - ARM idle states bindings
Documentation/devicetree/bindings/arm/idle-states.txt
[2] Power State Coordination Interface (PSCI) specification
http://infocenter.arm.com/help/topic/com.arm.doc.den0022c/DEN0022C_Power_State_Coordination_Interface.pdf
+[3]. PM Domains description
+ Documentation/devicetree/bindings/power/power_domain.txt
--
2.17.1
Let's add a data pointer to the genpd_power_state struct, to allow a genpd
backend driver to store per state specific data. In order to introduce the
pointer, we also need to adopt how genpd frees the allocated data for the
default genpd_power_state struct, that it may allocate at pm_genpd_init().
More precisely, let's use an internal genpd flag to understand when the
states needs to be freed by genpd. When freeing the states data in
genpd_remove(), let's also clear the corresponding genpd->states pointer
and reset the genpd->state_count. In this way, a genpd backend driver
becomes aware of when there is state specific data for it to free.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Update the patch allow backend drivers to free the states specific
data during genpd removal. Due to this added complexity, I decided to
keep the patch separate, rather than fold it into the patch that makes
use of the new void pointer, which was suggested by Rafael.
- Claim authorship of the patch as lots of changes has been done since
the original pick up from Lina Iyer.
---
drivers/base/power/domain.c | 8 ++++++--
include/linux/pm_domain.h | 3 ++-
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 7f38a92b444a..e27b91d36a2a 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1620,7 +1620,7 @@ static int genpd_set_default_power_state(struct generic_pm_domain *genpd)
genpd->states = state;
genpd->state_count = 1;
- genpd->free = state;
+ genpd->free_state = true;
return 0;
}
@@ -1736,7 +1736,11 @@ static int genpd_remove(struct generic_pm_domain *genpd)
list_del(&genpd->gpd_list_node);
genpd_unlock(genpd);
cancel_work_sync(&genpd->power_off_work);
- kfree(genpd->free);
+ if (genpd->free_state) {
+ kfree(genpd->states);
+ genpd->states = NULL;
+ genpd->state_count = 0;
+ }
pr_debug("%s: removed %s\n", __func__, genpd->name);
return 0;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 3b5d7280e52e..f9e09bd4152c 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -69,6 +69,7 @@ struct genpd_power_state {
s64 residency_ns;
struct fwnode_handle *fwnode;
ktime_t idle_time;
+ void *data;
};
struct genpd_lock_ops;
@@ -110,7 +111,7 @@ struct generic_pm_domain {
struct genpd_power_state *states;
unsigned int state_count; /* number of states */
unsigned int state_idx; /* state that genpd will go to when off */
- void *free; /* Free the state that was allocated for default */
+ bool free_state; /* Free the state that was allocated for default */
ktime_t on_time;
ktime_t accounting_time;
const struct genpd_lock_ops *lock_ops;
--
2.17.1
PSCI firmware v1.0+, supports two different modes for CPU_SUSPEND. The
Platform Coordinated mode, which is the default and mandatory mode, while
support for the OS initiated mode is optional.
This change introduces initial support for the OS initiated mode, in way
that it adds the related PSCI bits from the spec and prints a message in
the log to inform whether the mode is supported by the PSCI FW.
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
drivers/firmware/psci/psci.c | 21 ++++++++++++++++++++-
include/uapi/linux/psci.h | 5 +++++
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 6bfa47cbd174..4f0cbc95e41b 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -95,6 +95,11 @@ static inline bool psci_has_ext_power_state(void)
PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
}
+static inline bool psci_has_osi_support(void)
+{
+ return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
+}
+
static inline bool psci_power_state_loses_context(u32 state)
{
const u32 mask = psci_has_ext_power_state() ?
@@ -659,10 +664,24 @@ static int __init psci_0_1_init(struct device_node *np)
return 0;
}
+static int __init psci_1_0_init(struct device_node *np)
+{
+ int err;
+
+ err = psci_0_2_init(np);
+ if (err)
+ return err;
+
+ if (psci_has_osi_support())
+ pr_info("OSI mode supported.\n");
+
+ return 0;
+}
+
static const struct of_device_id psci_of_match[] __initconst = {
{ .compatible = "arm,psci", .data = psci_0_1_init},
{ .compatible = "arm,psci-0.2", .data = psci_0_2_init},
- { .compatible = "arm,psci-1.0", .data = psci_0_2_init},
+ { .compatible = "arm,psci-1.0", .data = psci_1_0_init},
{},
};
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index b3bcabe380da..581f72085c33 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -49,6 +49,7 @@
#define PSCI_1_0_FN_PSCI_FEATURES PSCI_0_2_FN(10)
#define PSCI_1_0_FN_SYSTEM_SUSPEND PSCI_0_2_FN(14)
+#define PSCI_1_0_FN_SET_SUSPEND_MODE PSCI_0_2_FN(15)
#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
@@ -97,6 +98,10 @@
#define PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK \
(0x1 << PSCI_1_0_FEATURES_CPU_SUSPEND_PF_SHIFT)
+#define PSCI_1_0_OS_INITIATED BIT(0)
+#define PSCI_1_0_SUSPEND_MODE_PC 0
+#define PSCI_1_0_SUSPEND_MODE_OSI 1
+
/* PSCI return values (inclusive of all PSCI versions) */
#define PSCI_RET_SUCCESS 0
#define PSCI_RET_NOT_SUPPORTED -1
--
2.17.1
The files for the PSCI firmware driver were moved to a sub-directory, let's
update MAINTAINERS to reflect that.
Suggested-by: Mark Rutland <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 380e43f585d3..9805444711ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11919,7 +11919,7 @@ M: Mark Rutland <[email protected]>
M: Lorenzo Pieralisi <[email protected]>
L: [email protected]
S: Maintained
-F: drivers/firmware/psci*.c
+F: drivers/firmware/psci/
F: include/linux/psci.h
F: include/uapi/linux/psci.h
--
2.17.1
From: Lina Iyer <[email protected]>
Knowing the sleep duration of CPUs, is known to be needed while selecting
the most energy efficient idle state for a CPU or a group of CPUs.
However, to be able to compute the sleep duration, we need to know at what
time the next expected wakeup is for the CPU. Therefore, let's export this
information via a new function, tick_nohz_get_next_wakeup(). Following
changes make use of it.
Cc: Thomas Gleixner <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Lina Iyer <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- Updated function header of tick_nohz_get_next_wakeup().
---
include/linux/tick.h | 8 ++++++++
kernel/time/tick-sched.c | 13 +++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 55388ab45fd4..e48f6b26b425 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -125,6 +125,7 @@ extern bool tick_nohz_idle_got_tick(void);
extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
extern unsigned long tick_nohz_get_idle_calls(void);
extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
+extern ktime_t tick_nohz_get_next_wakeup(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
@@ -151,6 +152,13 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
*delta_next = TICK_NSEC;
return *delta_next;
}
+
+static inline ktime_t tick_nohz_get_next_wakeup(int cpu)
+{
+ /* Next wake up is the tick period, assume it starts now */
+ return ktime_add(ktime_get(), TICK_NSEC);
+}
+
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 69e673b88474..7a9166506503 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1089,6 +1089,19 @@ unsigned long tick_nohz_get_idle_calls(void)
return ts->idle_calls;
}
+/**
+ * tick_nohz_get_next_wakeup - return the next wake up of the CPU
+ * @cpu: the particular CPU to get next wake up for
+ *
+ * Called for idle CPUs only.
+ */
+ktime_t tick_nohz_get_next_wakeup(int cpu)
+{
+ struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
+
+ return dev->next_event;
+}
+
static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
{
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
--
2.17.1
To allow arch back-end init ops to operate on the cpuidle driver for the
corresponding CPU, let's pass along a pointer to the struct cpuidle_driver*
and forward it the relevant layers of callbacks for ARM/ARM64.
Following changes for the PSCI firmware driver starts making use of this.
Cc: Lina Iyer <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andy Gross <[email protected]>
Cc: David Brown <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- New patch!
I am seeking an ack from relevant maintainers. Please ping me if you need
further information about the hole series.
Thanks!
Ulf Hanssson
---
arch/arm/include/asm/cpuidle.h | 4 ++--
arch/arm/kernel/cpuidle.c | 5 +++--
arch/arm64/include/asm/cpu_ops.h | 4 +++-
arch/arm64/include/asm/cpuidle.h | 6 ++++--
arch/arm64/kernel/cpuidle.c | 6 +++---
drivers/cpuidle/cpuidle-arm.c | 2 +-
drivers/firmware/psci.c | 7 ++++---
drivers/soc/qcom/spm.c | 3 ++-
include/linux/psci.h | 4 +++-
9 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
index 6b2ff7243b4b..bee0a7847733 100644
--- a/arch/arm/include/asm/cpuidle.h
+++ b/arch/arm/include/asm/cpuidle.h
@@ -32,7 +32,7 @@ struct device_node;
struct cpuidle_ops {
int (*suspend)(unsigned long arg);
- int (*init)(struct device_node *, int cpu);
+ int (*init)(struct cpuidle_driver *, struct device_node *, int cpu);
};
struct of_cpuidle_method {
@@ -47,6 +47,6 @@ struct of_cpuidle_method {
extern int arm_cpuidle_suspend(int index);
-extern int arm_cpuidle_init(int cpu);
+extern int arm_cpuidle_init(struct cpuidle_driver *drv, int cpu);
#endif
diff --git a/arch/arm/kernel/cpuidle.c b/arch/arm/kernel/cpuidle.c
index fda5579123a8..43778c9b373d 100644
--- a/arch/arm/kernel/cpuidle.c
+++ b/arch/arm/kernel/cpuidle.c
@@ -122,6 +122,7 @@ static int __init arm_cpuidle_read_ops(struct device_node *dn, int cpu)
/**
* arm_cpuidle_init() - Initialize cpuidle_ops for a specific cpu
+ * @drv: the drv to be initialized
* @cpu: the cpu to be initialized
*
* Initialize the cpuidle ops with the device for the cpu and then call
@@ -137,7 +138,7 @@ static int __init arm_cpuidle_read_ops(struct device_node *dn, int cpu)
* -ENXIO if the HW reports a failure or a misconfiguration,
* -ENOMEM if the HW report an memory allocation failure
*/
-int __init arm_cpuidle_init(int cpu)
+int __init arm_cpuidle_init(struct cpuidle_driver *drv, int cpu)
{
struct device_node *cpu_node = of_cpu_device_node_get(cpu);
int ret;
@@ -147,7 +148,7 @@ int __init arm_cpuidle_init(int cpu)
ret = arm_cpuidle_read_ops(cpu_node, cpu);
if (!ret)
- ret = cpuidle_ops[cpu].init(cpu_node, cpu);
+ ret = cpuidle_ops[cpu].init(drv, cpu_node, cpu);
of_node_put(cpu_node);
diff --git a/arch/arm64/include/asm/cpu_ops.h b/arch/arm64/include/asm/cpu_ops.h
index 8f03446cf89f..8db870c29f1b 100644
--- a/arch/arm64/include/asm/cpu_ops.h
+++ b/arch/arm64/include/asm/cpu_ops.h
@@ -19,6 +19,8 @@
#include <linux/init.h>
#include <linux/threads.h>
+struct cpuidle_driver;
+
/**
* struct cpu_operations - Callback operations for hotplugging CPUs.
*
@@ -58,7 +60,7 @@ struct cpu_operations {
int (*cpu_kill)(unsigned int cpu);
#endif
#ifdef CONFIG_CPU_IDLE
- int (*cpu_init_idle)(unsigned int);
+ int (*cpu_init_idle)(struct cpuidle_driver *, unsigned int);
int (*cpu_suspend)(unsigned long);
#endif
};
diff --git a/arch/arm64/include/asm/cpuidle.h b/arch/arm64/include/asm/cpuidle.h
index 3c5ddb429ea2..3fd3efb61649 100644
--- a/arch/arm64/include/asm/cpuidle.h
+++ b/arch/arm64/include/asm/cpuidle.h
@@ -4,11 +4,13 @@
#include <asm/proc-fns.h>
+struct cpuidle_driver;
+
#ifdef CONFIG_CPU_IDLE
-extern int arm_cpuidle_init(unsigned int cpu);
+extern int arm_cpuidle_init(struct cpuidle_driver *drv, unsigned int cpu);
extern int arm_cpuidle_suspend(int index);
#else
-static inline int arm_cpuidle_init(unsigned int cpu)
+static inline int arm_cpuidle_init(struct cpuidle_driver *drv, unsigned int cpu)
{
return -EOPNOTSUPP;
}
diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index f2d13810daa8..aaf9dc5cb87a 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -18,13 +18,13 @@
#include <asm/cpuidle.h>
#include <asm/cpu_ops.h>
-int arm_cpuidle_init(unsigned int cpu)
+int arm_cpuidle_init(struct cpuidle_driver *drv, unsigned int cpu)
{
int ret = -EOPNOTSUPP;
if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_suspend &&
cpu_ops[cpu]->cpu_init_idle)
- ret = cpu_ops[cpu]->cpu_init_idle(cpu);
+ ret = cpu_ops[cpu]->cpu_init_idle(drv, cpu);
return ret;
}
@@ -51,7 +51,7 @@ int arm_cpuidle_suspend(int index)
int acpi_processor_ffh_lpi_probe(unsigned int cpu)
{
- return arm_cpuidle_init(cpu);
+ return arm_cpuidle_init(NULL, cpu);
}
int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c
index 3a407a3ef22b..39413973b21d 100644
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -106,7 +106,7 @@ static int __init arm_idle_init_cpu(int cpu)
* Call arch CPU operations in order to initialize
* idle states suspend back-end specific data
*/
- ret = arm_cpuidle_init(cpu);
+ ret = arm_cpuidle_init(drv, cpu);
/*
* Allow the initialization to continue for other CPUs, if the reported
diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index c80ec1d03274..878c4dcf0118 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -270,7 +270,8 @@ static int __init psci_features(u32 psci_func_id)
#ifdef CONFIG_CPU_IDLE
static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
-static int psci_dt_cpu_init_idle(struct device_node *cpu_node, int cpu)
+static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, int cpu)
{
int i, ret, count = 0;
u32 *psci_states;
@@ -371,7 +372,7 @@ static int __maybe_unused psci_acpi_cpu_init_idle(unsigned int cpu)
}
#endif
-int psci_cpu_init_idle(unsigned int cpu)
+int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
{
struct device_node *cpu_node;
int ret;
@@ -390,7 +391,7 @@ int psci_cpu_init_idle(unsigned int cpu)
if (!cpu_node)
return -ENODEV;
- ret = psci_dt_cpu_init_idle(cpu_node, cpu);
+ ret = psci_dt_cpu_init_idle(drv, cpu_node, cpu);
of_node_put(cpu_node);
diff --git a/drivers/soc/qcom/spm.c b/drivers/soc/qcom/spm.c
index 53807e839664..6e967f0a8608 100644
--- a/drivers/soc/qcom/spm.c
+++ b/drivers/soc/qcom/spm.c
@@ -208,7 +208,8 @@ static const struct of_device_id qcom_idle_state_match[] __initconst = {
{ },
};
-static int __init qcom_cpuidle_init(struct device_node *cpu_node, int cpu)
+static int __init qcom_cpuidle_init(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, int cpu)
{
const struct of_device_id *match_id;
struct device_node *state_node;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 8b1b3b5935ab..4f29a3bff379 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -20,9 +20,11 @@
#define PSCI_POWER_STATE_TYPE_STANDBY 0
#define PSCI_POWER_STATE_TYPE_POWER_DOWN 1
+struct cpuidle_driver;
+
bool psci_tos_resident_on(int cpu);
-int psci_cpu_init_idle(unsigned int cpu);
+int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu);
int psci_cpu_suspend_enter(unsigned long index);
enum psci_conduit {
--
2.17.1
The CPU's idle state nodes are currently parsed at the common cpuidle DT
library, but also when initializing back-end data for the arch specific CPU
operations, as in the PSCI driver case.
To avoid open-coding, let's introduce of_get_cpu_state_node(), which takes
the device node for the CPU and the index to the requested idle state node,
as in-parameters. In case a corresponding idle state node is found, it
returns the node with the refcount incremented for it, else it returns
NULL.
Moreover, for ARM, there are two generic methods, to describe the CPU's
idle states, either via the flattened description through the
"cpu-idle-states" binding [1] or via the hierarchical layout, using the
"power-domains" and the "domain-idle-states" bindings [2]. Hence, let's
take both options into account.
[1]
Documentation/devicetree/bindings/arm/idle-states.txt
[2]
Documentation/devicetree/bindings/arm/psci.txt
Cc: Rob Herring <[email protected]>
Cc: [email protected]
Cc: Lina Iyer <[email protected]>
Suggested-by: Sudeep Holla <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
---
Changes in v10:
- None.
---
drivers/of/base.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/of.h | 8 ++++++++
2 files changed, 43 insertions(+)
diff --git a/drivers/of/base.c b/drivers/of/base.c
index 09692c9b32a7..8f6974a22006 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -429,6 +429,41 @@ int of_cpu_node_to_id(struct device_node *cpu_node)
}
EXPORT_SYMBOL(of_cpu_node_to_id);
+/**
+ * of_get_cpu_state_node - Get CPU's idle state node at the given index
+ *
+ * @cpu_node: The device node for the CPU
+ * @index: The index in the list of the idle states
+ *
+ * Two generic methods can be used to describe a CPU's idle states, either via
+ * a flattened description through the "cpu-idle-states" binding or via the
+ * hierarchical layout, using the "power-domains" and the "domain-idle-states"
+ * bindings. This function check for both and returns the idle state node for
+ * the requested index.
+ *
+ * In case and idle state node is found at index, the refcount incremented for
+ * it, so call of_node_put() on it when done. Returns NULL if not found.
+ */
+struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
+ int index)
+{
+ struct of_phandle_args args;
+ int err;
+
+ err = of_parse_phandle_with_args(cpu_node, "power-domains",
+ "#power-domain-cells", 0, &args);
+ if (!err) {
+ struct device_node *state_node =
+ of_parse_phandle(args.np, "domain-idle-states", index);
+
+ of_node_put(args.np);
+ return state_node;
+ }
+
+ return of_parse_phandle(cpu_node, "cpu-idle-states", index);
+}
+EXPORT_SYMBOL(of_get_cpu_state_node);
+
/**
* __of_device_is_compatible() - Check if the node matches given constraints
* @device: pointer to node
diff --git a/include/linux/of.h b/include/linux/of.h
index a5aee3c438ad..f9f0c65c095c 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -348,6 +348,8 @@ extern const void *of_get_property(const struct device_node *node,
int *lenp);
extern struct device_node *of_get_cpu_node(int cpu, unsigned int *thread);
extern struct device_node *of_get_next_cpu_node(struct device_node *prev);
+extern struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
+ int index);
#define for_each_property_of_node(dn, pp) \
for (pp = dn->properties; pp != NULL; pp = pp->next)
@@ -762,6 +764,12 @@ static inline struct device_node *of_get_next_cpu_node(struct device_node *prev)
return NULL;
}
+static inline struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
+ int index)
+{
+ return NULL;
+}
+
static inline int of_n_addr_cells(struct device_node *np)
{
return 0;
--
2.17.1
To let the PSCI driver parse for the hierarchical CPU topology in DT and
thus potentially initiate the corresponding PM domain data structures,
let's call psci_dt_topology_init() from the existing topology_init()
subsys_initcall.
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Lina Iyer <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
arch/arm64/kernel/setup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f4fc1e0544b7..4d59a72f8b0b 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -364,6 +364,9 @@ static int __init topology_init(void)
{
int i;
+ if (acpi_disabled)
+ psci_dt_topology_init();
+
for_each_online_node(i)
register_one_node(i);
--
2.17.1
Some following changes extends the PSCI driver with some additional new
files. Let's avoid to continue cluttering the toplevel firmware directory
and first move the PSCI files into a PSCI sub-directory.
Suggested-by: Mark Rutland <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes in v10:
- None.
---
drivers/firmware/Kconfig | 15 +--------------
drivers/firmware/Makefile | 3 +--
drivers/firmware/psci/Kconfig | 13 +++++++++++++
drivers/firmware/psci/Makefile | 4 ++++
drivers/firmware/{ => psci}/psci.c | 0
drivers/firmware/{ => psci}/psci_checker.c | 0
6 files changed, 19 insertions(+), 16 deletions(-)
create mode 100644 drivers/firmware/psci/Kconfig
create mode 100644 drivers/firmware/psci/Makefile
rename drivers/firmware/{ => psci}/psci.c (100%)
rename drivers/firmware/{ => psci}/psci_checker.c (100%)
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 7273e5082b41..0400afb2fec7 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -5,20 +5,6 @@
menu "Firmware Drivers"
-config ARM_PSCI_FW
- bool
-
-config ARM_PSCI_CHECKER
- bool "ARM PSCI checker"
- depends on ARM_PSCI_FW && HOTPLUG_CPU && CPU_IDLE && !TORTURE_TEST
- help
- Run the PSCI checker during startup. This checks that hotplug and
- suspend operations work correctly when using PSCI.
-
- The torture tests may interfere with the PSCI checker by turning CPUs
- on and off through hotplug, so for now torture tests and PSCI checker
- are mutually exclusive.
-
config ARM_SCMI_PROTOCOL
bool "ARM System Control and Management Interface (SCMI) Message Protocol"
depends on ARM || ARM64 || COMPILE_TEST
@@ -258,6 +244,7 @@ config TI_SCI_PROTOCOL
config HAVE_ARM_SMCCC
bool
+source "drivers/firmware/psci/Kconfig"
source "drivers/firmware/broadcom/Kconfig"
source "drivers/firmware/google/Kconfig"
source "drivers/firmware/efi/Kconfig"
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 3158dffd9914..6670ebe21463 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -2,8 +2,6 @@
#
# Makefile for the linux kernel.
#
-obj-$(CONFIG_ARM_PSCI_FW) += psci.o
-obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
obj-$(CONFIG_ARM_SCPI_PROTOCOL) += arm_scpi.o
obj-$(CONFIG_ARM_SCPI_POWER_DOMAIN) += scpi_pm_domain.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += arm_sdei.o
@@ -24,6 +22,7 @@ CFLAGS_qcom_scm-32.o :=$(call as-instr,.arch armv7-a\n.arch_extension sec,-DREQU
obj-$(CONFIG_TI_SCI_PROTOCOL) += ti_sci.o
obj-$(CONFIG_ARM_SCMI_PROTOCOL) += arm_scmi/
+obj-y += psci/
obj-y += broadcom/
obj-y += meson/
obj-$(CONFIG_GOOGLE_FIRMWARE) += google/
diff --git a/drivers/firmware/psci/Kconfig b/drivers/firmware/psci/Kconfig
new file mode 100644
index 000000000000..26a3b32bf7ab
--- /dev/null
+++ b/drivers/firmware/psci/Kconfig
@@ -0,0 +1,13 @@
+config ARM_PSCI_FW
+ bool
+
+config ARM_PSCI_CHECKER
+ bool "ARM PSCI checker"
+ depends on ARM_PSCI_FW && HOTPLUG_CPU && CPU_IDLE && !TORTURE_TEST
+ help
+ Run the PSCI checker during startup. This checks that hotplug and
+ suspend operations work correctly when using PSCI.
+
+ The torture tests may interfere with the PSCI checker by turning CPUs
+ on and off through hotplug, so for now torture tests and PSCI checker
+ are mutually exclusive.
diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
new file mode 100644
index 000000000000..1956b882470f
--- /dev/null
+++ b/drivers/firmware/psci/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+obj-$(CONFIG_ARM_PSCI_FW) += psci.o
+obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci/psci.c
similarity index 100%
rename from drivers/firmware/psci.c
rename to drivers/firmware/psci/psci.c
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci/psci_checker.c
similarity index 100%
rename from drivers/firmware/psci_checker.c
rename to drivers/firmware/psci/psci_checker.c
--
2.17.1
Hi Ulf,
On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
>When the hierarchical CPU topology is used and when a CPU has been put
>offline (hotplug), that same CPU prevents its PM domain and thus also
>potential master PM domains, from being powered off. This is because genpd
>observes the CPU's struct device to remain being active from a runtime PM
>point of view.
>
>To deal with this, let's decrease the runtime PM usage count by calling
>pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
>offline. Consequentially, we must then increase the runtime PM usage for
>the CPU, while putting it online again.
>
>Signed-off-by: Ulf Hansson <[email protected]>
>---
>
>Changes in v10:
> - Make it work when the hierarchical CPU topology is used, which may be
> used both for OSI and PC mode.
> - Rework the code to prevent "BUG: sleeping function called from
> invalid context".
>---
> drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
>diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
>index b03bccce0a5d..f62c4963eb62 100644
>--- a/drivers/firmware/psci/psci.c
>+++ b/drivers/firmware/psci/psci.c
>@@ -15,6 +15,7 @@
>
> #include <linux/acpi.h>
> #include <linux/arm-smccc.h>
>+#include <linux/cpu.h>
> #include <linux/cpuidle.h>
> #include <linux/errno.h>
> #include <linux/linkage.h>
>@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
>
> static int psci_cpu_off(u32 state)
> {
>+ struct device *dev;
> int err;
> u32 fn;
>
>+ /*
>+ * When the hierarchical CPU topology is used, decrease the runtime PM
>+ * usage count for the current CPU, as to allow other parts in the
>+ * topology to enter low power states.
>+ */
>+ if (psci_dt_topology) {
>+ dev = get_cpu_device(smp_processor_id());
>+ pm_runtime_put_sync_suspend(dev);
>+ }
>+
> fn = psci_function_id[PSCI_FN_CPU_OFF];
> err = invoke_psci_fn(fn, state, 0, 0);
> return psci_to_linux_errno(err);
>@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
>
> static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> {
>+ struct device *dev;
> int err;
> u32 fn;
>
>@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> /* Clear the domain state to start fresh. */
> psci_set_domain_state(0);
>+
>+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
>+ if (!err && psci_dt_topology) {
>+ dev = get_cpu_device(cpuid);
>+ pm_runtime_get_sync(dev);
I booted with a single CPU on my SDM845 device and when I tried to
online CPU1 and I see a crash.
# echo 1 > /sys/devices/system/cpu/cpu1/online
[ 86.339204] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
[ 86.340195] Detected VIPT I-cache on CPU1
[ 86.348075] Mem abort info:
[ 86.348092] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
[ 86.352125] ESR = 0x96000006
[ 86.352194] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
[ 86.354956] Exception class = DABT (current EL), IL = 32 bits
[ 86.377700] SET = 0, FnV = 0
[ 86.380788] EA = 0, S1PTW = 0
[ 86.383967] Data abort info:
[ 86.386882] ISV = 0, ISS = 0x00000006
[ 86.390760] CM = 0, WnR = 0
[ 86.393755] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 86.400430] [0000000000000188] pgd=00000001f5233003, pud=00000001f5234003, pmd=0000000000000000
[ 86.409203] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 86.414824] Modules linked in:
[ 86.417915] CPU: 0 PID: 1533 Comm: sh Not tainted 4.20.0-rc3-30359-gff2e21952bd5 #782
[ 86.425807] Hardware name: Qualcomm Technologies, Inc. SDM845 MTP (DT)
[ 86.432387] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 86.437233] pc : __pm_runtime_resume+0x20/0x74
[ 86.441720] lr : psci_cpu_on+0x84/0x90
[ 86.445498] sp : ffff00000db43a10
[ 86.448842] x29: ffff00000db43a10 x28: ffff80017562b500
[ 86.454200] x27: ffff000009159000 x26: 0000000000000055
[ 86.459556] x25: 0000000000000000 x24: ffff0000092c4bc8
[ 86.464913] x23: ffff000008fb8000 x22: ffff00000916a000
[ 86.470269] x21: 0000000000000100 x20: ffff000009314190
[ 86.475625] x19: 0000000000000000 x18: 0000000000000000
[ 86.480979] x17: 0000000000000000 x16: 0000000000000000
[ 86.486334] x15: 0000000000000000 x14: ffff000009162600
[ 86.491690] x13: 0000000000000300 x12: 0000000000000010
[ 86.497047] x11: ffffffffffffffff x10: ffffffffffffffff
[ 86.502399] x9 : 0000000000000001 x8 : 0000000000000000
[ 86.507753] x7 : 0000000000000000 x6 : 0000000000000000
[ 86.513108] x5 : 0000000000000000 x4 : 0000000000000000
[ 86.518463] x3 : 0000000000000188 x2 : 0000800174385000
[ 86.523820] x1 : 0000000000000004 x0 : 0000000000000000
[ 86.529175] Process sh (pid: 1533, stack limit = 0x(____ptrval____))
[ 86.535585] Call trace:
[ 86.538063] __pm_runtime_resume+0x20/0x74
[ 86.542197] psci_cpu_on+0x84/0x90
[ 86.545639] cpu_psci_cpu_boot+0x3c/0x6c
[ 86.549593] __cpu_up+0x68/0x210
[ 86.552852] bringup_cpu+0x30/0xe0
[ 86.556293] cpuhp_invoke_callback+0x84/0x1e0
[ 86.560689] _cpu_up+0xe0/0x1d0
[ 86.563862] do_cpu_up+0x90/0xb0
[ 86.567118] cpu_up+0x10/0x18
[ 86.570113] cpu_subsys_online+0x44/0x98
[ 86.574079] device_online+0x68/0xac
[ 86.577685] online_store+0xa8/0xb4
[ 86.581202] dev_attr_store+0x18/0x28
[ 86.584908] sysfs_kf_write+0x40/0x48
[ 86.588606] kernfs_fop_write+0xcc/0x1cc
[ 86.592563] __vfs_write+0x40/0x16c
[ 86.596078] vfs_write+0xa8/0x1a0
[ 86.599424] ksys_write+0x58/0xbc
[ 86.602768] __arm64_sys_write+0x18/0x20
[ 86.606733] el0_svc_common+0x94/0xf0
[ 86.610433] el0_svc_handler+0x24/0x80
[ 86.614215] el0_svc+0x8/0x7c0
[ 86.617300] Code: aa0003f3 361000e1 91062263 f9800071 (885f7c60)
[ 86.623447] ---[ end trace 4573c3c0e0761290 ]---
>+ }
>+
> return psci_to_linux_errno(err);
> }
>
>--
>2.17.1
>
Thanks,
Lina
On Thu, 29 Nov 2018 at 23:31, Lina Iyer <[email protected]> wrote:
>
> Hi Ulf,
>
> On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
> >When the hierarchical CPU topology is used and when a CPU has been put
> >offline (hotplug), that same CPU prevents its PM domain and thus also
> >potential master PM domains, from being powered off. This is because genpd
> >observes the CPU's struct device to remain being active from a runtime PM
> >point of view.
> >
> >To deal with this, let's decrease the runtime PM usage count by calling
> >pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
> >offline. Consequentially, we must then increase the runtime PM usage for
> >the CPU, while putting it online again.
> >
> >Signed-off-by: Ulf Hansson <[email protected]>
> >---
> >
> >Changes in v10:
> > - Make it work when the hierarchical CPU topology is used, which may be
> > used both for OSI and PC mode.
> > - Rework the code to prevent "BUG: sleeping function called from
> > invalid context".
> >---
> > drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> >diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> >index b03bccce0a5d..f62c4963eb62 100644
> >--- a/drivers/firmware/psci/psci.c
> >+++ b/drivers/firmware/psci/psci.c
> >@@ -15,6 +15,7 @@
> >
> > #include <linux/acpi.h>
> > #include <linux/arm-smccc.h>
> >+#include <linux/cpu.h>
> > #include <linux/cpuidle.h>
> > #include <linux/errno.h>
> > #include <linux/linkage.h>
> >@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
> >
> > static int psci_cpu_off(u32 state)
> > {
> >+ struct device *dev;
> > int err;
> > u32 fn;
> >
> >+ /*
> >+ * When the hierarchical CPU topology is used, decrease the runtime PM
> >+ * usage count for the current CPU, as to allow other parts in the
> >+ * topology to enter low power states.
> >+ */
> >+ if (psci_dt_topology) {
> >+ dev = get_cpu_device(smp_processor_id());
> >+ pm_runtime_put_sync_suspend(dev);
> >+ }
> >+
> > fn = psci_function_id[PSCI_FN_CPU_OFF];
> > err = invoke_psci_fn(fn, state, 0, 0);
> > return psci_to_linux_errno(err);
> >@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
> >
> > static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> > {
> >+ struct device *dev;
> > int err;
> > u32 fn;
> >
> >@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> > err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> > /* Clear the domain state to start fresh. */
> > psci_set_domain_state(0);
> >+
> >+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
> >+ if (!err && psci_dt_topology) {
> >+ dev = get_cpu_device(cpuid);
> >+ pm_runtime_get_sync(dev);
>
> I booted with a single CPU on my SDM845 device and when I tried to
> online CPU1 and I see a crash.
Thanks for testing!
If I understand correctly, that means that you haven't registered CPU1
using register_cpu(), hence there are no struct device created for it.
It sound like a special case, but on the other hand we shouldn't
crash, or course.
I guess a simple check like this would help.
if (dev)
pm_runtime_get_sync(dev);
...and then we need a similar check in psci_cpu_off() to deal with
putting the CPU offline.
Could you try this and see if it helps?
>
> # echo 1 > /sys/devices/system/cpu/cpu1/online
>
> [ 86.339204] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
> [ 86.340195] Detected VIPT I-cache on CPU1
> [ 86.348075] Mem abort info:
> [ 86.348092] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
> [ 86.352125] ESR = 0x96000006
> [ 86.352194] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
> [ 86.354956] Exception class = DABT (current EL), IL = 32 bits
> [ 86.377700] SET = 0, FnV = 0
> [ 86.380788] EA = 0, S1PTW = 0
> [ 86.383967] Data abort info:
> [ 86.386882] ISV = 0, ISS = 0x00000006
> [ 86.390760] CM = 0, WnR = 0
> [ 86.393755] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> [ 86.400430] [0000000000000188] pgd=00000001f5233003, pud=00000001f5234003, pmd=0000000000000000
> [ 86.409203] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [ 86.414824] Modules linked in:
> [ 86.417915] CPU: 0 PID: 1533 Comm: sh Not tainted 4.20.0-rc3-30359-gff2e21952bd5 #782
> [ 86.425807] Hardware name: Qualcomm Technologies, Inc. SDM845 MTP (DT)
> [ 86.432387] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [ 86.437233] pc : __pm_runtime_resume+0x20/0x74
> [ 86.441720] lr : psci_cpu_on+0x84/0x90
> [ 86.445498] sp : ffff00000db43a10
> [ 86.448842] x29: ffff00000db43a10 x28: ffff80017562b500
> [ 86.454200] x27: ffff000009159000 x26: 0000000000000055
> [ 86.459556] x25: 0000000000000000 x24: ffff0000092c4bc8
> [ 86.464913] x23: ffff000008fb8000 x22: ffff00000916a000
> [ 86.470269] x21: 0000000000000100 x20: ffff000009314190
> [ 86.475625] x19: 0000000000000000 x18: 0000000000000000
> [ 86.480979] x17: 0000000000000000 x16: 0000000000000000
> [ 86.486334] x15: 0000000000000000 x14: ffff000009162600
> [ 86.491690] x13: 0000000000000300 x12: 0000000000000010
> [ 86.497047] x11: ffffffffffffffff x10: ffffffffffffffff
> [ 86.502399] x9 : 0000000000000001 x8 : 0000000000000000
> [ 86.507753] x7 : 0000000000000000 x6 : 0000000000000000
> [ 86.513108] x5 : 0000000000000000 x4 : 0000000000000000
> [ 86.518463] x3 : 0000000000000188 x2 : 0000800174385000
> [ 86.523820] x1 : 0000000000000004 x0 : 0000000000000000
> [ 86.529175] Process sh (pid: 1533, stack limit = 0x(____ptrval____))
> [ 86.535585] Call trace:
> [ 86.538063] __pm_runtime_resume+0x20/0x74
> [ 86.542197] psci_cpu_on+0x84/0x90
> [ 86.545639] cpu_psci_cpu_boot+0x3c/0x6c
> [ 86.549593] __cpu_up+0x68/0x210
> [ 86.552852] bringup_cpu+0x30/0xe0
> [ 86.556293] cpuhp_invoke_callback+0x84/0x1e0
> [ 86.560689] _cpu_up+0xe0/0x1d0
> [ 86.563862] do_cpu_up+0x90/0xb0
> [ 86.567118] cpu_up+0x10/0x18
> [ 86.570113] cpu_subsys_online+0x44/0x98
> [ 86.574079] device_online+0x68/0xac
> [ 86.577685] online_store+0xa8/0xb4
> [ 86.581202] dev_attr_store+0x18/0x28
> [ 86.584908] sysfs_kf_write+0x40/0x48
> [ 86.588606] kernfs_fop_write+0xcc/0x1cc
> [ 86.592563] __vfs_write+0x40/0x16c
> [ 86.596078] vfs_write+0xa8/0x1a0
> [ 86.599424] ksys_write+0x58/0xbc
> [ 86.602768] __arm64_sys_write+0x18/0x20
> [ 86.606733] el0_svc_common+0x94/0xf0
> [ 86.610433] el0_svc_handler+0x24/0x80
> [ 86.614215] el0_svc+0x8/0x7c0
> [ 86.617300] Code: aa0003f3 361000e1 91062263 f9800071 (885f7c60)
> [ 86.623447] ---[ end trace 4573c3c0e0761290 ]---
>
> >+ }
> >+
> > return psci_to_linux_errno(err);
> > }
> >
> >--
> >2.17.1
> >
>
> Thanks,
> Lina
On Fri, Nov 30 2018 at 01:25 -0700, Ulf Hansson wrote:
>On Thu, 29 Nov 2018 at 23:31, Lina Iyer <[email protected]> wrote:
>>
>> Hi Ulf,
>>
>> On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
>> >When the hierarchical CPU topology is used and when a CPU has been put
>> >offline (hotplug), that same CPU prevents its PM domain and thus also
>> >potential master PM domains, from being powered off. This is because genpd
>> >observes the CPU's struct device to remain being active from a runtime PM
>> >point of view.
>> >
>> >To deal with this, let's decrease the runtime PM usage count by calling
>> >pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
>> >offline. Consequentially, we must then increase the runtime PM usage for
>> >the CPU, while putting it online again.
>> >
>> >Signed-off-by: Ulf Hansson <[email protected]>
>> >---
>> >
>> >Changes in v10:
>> > - Make it work when the hierarchical CPU topology is used, which may be
>> > used both for OSI and PC mode.
>> > - Rework the code to prevent "BUG: sleeping function called from
>> > invalid context".
>> >---
>> > drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
>> > 1 file changed, 20 insertions(+)
>> >
>> >diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
>> >index b03bccce0a5d..f62c4963eb62 100644
>> >--- a/drivers/firmware/psci/psci.c
>> >+++ b/drivers/firmware/psci/psci.c
>> >@@ -15,6 +15,7 @@
>> >
>> > #include <linux/acpi.h>
>> > #include <linux/arm-smccc.h>
>> >+#include <linux/cpu.h>
>> > #include <linux/cpuidle.h>
>> > #include <linux/errno.h>
>> > #include <linux/linkage.h>
>> >@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
>> >
>> > static int psci_cpu_off(u32 state)
>> > {
>> >+ struct device *dev;
>> > int err;
>> > u32 fn;
>> >
>> >+ /*
>> >+ * When the hierarchical CPU topology is used, decrease the runtime PM
>> >+ * usage count for the current CPU, as to allow other parts in the
>> >+ * topology to enter low power states.
>> >+ */
>> >+ if (psci_dt_topology) {
>> >+ dev = get_cpu_device(smp_processor_id());
>> >+ pm_runtime_put_sync_suspend(dev);
>> >+ }
>> >+
>> > fn = psci_function_id[PSCI_FN_CPU_OFF];
>> > err = invoke_psci_fn(fn, state, 0, 0);
>> > return psci_to_linux_errno(err);
>> >@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
>> >
>> > static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
>> > {
>> >+ struct device *dev;
>> > int err;
>> > u32 fn;
>> >
>> >@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
>> > err = invoke_psci_fn(fn, cpuid, entry_point, 0);
>> > /* Clear the domain state to start fresh. */
>> > psci_set_domain_state(0);
>> >+
>> >+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
>> >+ if (!err && psci_dt_topology) {
>> >+ dev = get_cpu_device(cpuid);
>> >+ pm_runtime_get_sync(dev);
>>
>> I booted with a single CPU on my SDM845 device and when I tried to
>> online CPU1 and I see a crash.
>
>Thanks for testing!
>
>If I understand correctly, that means that you haven't registered CPU1
>using register_cpu(), hence there are no struct device created for it.
>It sound like a special case, but on the other hand we shouldn't
>crash, or course.
This infact is pretty common. Devices boot with only with low power
cores and bring in the high perf cores only when needed.
>
>I guess a simple check like this would help.
>
>if (dev)
> pm_runtime_get_sync(dev);
>
>...and then we need a similar check in psci_cpu_off() to deal with
>putting the CPU offline.
>
>Could you try this and see if it helps?
>
Yes, it fixes the issue.
Thanks,
Lina
>>
>> # echo 1 > /sys/devices/system/cpu/cpu1/online
>>
>> [ 86.339204] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
>> [ 86.340195] Detected VIPT I-cache on CPU1
>> [ 86.348075] Mem abort info:
>> [ 86.348092] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
>> [ 86.352125] ESR = 0x96000006
>> [ 86.352194] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
>> [ 86.354956] Exception class = DABT (current EL), IL = 32 bits
>> [ 86.377700] SET = 0, FnV = 0
>> [ 86.380788] EA = 0, S1PTW = 0
>> [ 86.383967] Data abort info:
>> [ 86.386882] ISV = 0, ISS = 0x00000006
>> [ 86.390760] CM = 0, WnR = 0
>> [ 86.393755] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>> [ 86.400430] [0000000000000188] pgd=00000001f5233003, pud=00000001f5234003, pmd=0000000000000000
>> [ 86.409203] Internal error: Oops: 96000006 [#1] PREEMPT SMP
>> [ 86.414824] Modules linked in:
>> [ 86.417915] CPU: 0 PID: 1533 Comm: sh Not tainted 4.20.0-rc3-30359-gff2e21952bd5 #782
>> [ 86.425807] Hardware name: Qualcomm Technologies, Inc. SDM845 MTP (DT)
>> [ 86.432387] pstate: 80400005 (Nzcv daif +PAN -UAO)
>> [ 86.437233] pc : __pm_runtime_resume+0x20/0x74
>> [ 86.441720] lr : psci_cpu_on+0x84/0x90
>> [ 86.445498] sp : ffff00000db43a10
>> [ 86.448842] x29: ffff00000db43a10 x28: ffff80017562b500
>> [ 86.454200] x27: ffff000009159000 x26: 0000000000000055
>> [ 86.459556] x25: 0000000000000000 x24: ffff0000092c4bc8
>> [ 86.464913] x23: ffff000008fb8000 x22: ffff00000916a000
>> [ 86.470269] x21: 0000000000000100 x20: ffff000009314190
>> [ 86.475625] x19: 0000000000000000 x18: 0000000000000000
>> [ 86.480979] x17: 0000000000000000 x16: 0000000000000000
>> [ 86.486334] x15: 0000000000000000 x14: ffff000009162600
>> [ 86.491690] x13: 0000000000000300 x12: 0000000000000010
>> [ 86.497047] x11: ffffffffffffffff x10: ffffffffffffffff
>> [ 86.502399] x9 : 0000000000000001 x8 : 0000000000000000
>> [ 86.507753] x7 : 0000000000000000 x6 : 0000000000000000
>> [ 86.513108] x5 : 0000000000000000 x4 : 0000000000000000
>> [ 86.518463] x3 : 0000000000000188 x2 : 0000800174385000
>> [ 86.523820] x1 : 0000000000000004 x0 : 0000000000000000
>> [ 86.529175] Process sh (pid: 1533, stack limit = 0x(____ptrval____))
>> [ 86.535585] Call trace:
>> [ 86.538063] __pm_runtime_resume+0x20/0x74
>> [ 86.542197] psci_cpu_on+0x84/0x90
>> [ 86.545639] cpu_psci_cpu_boot+0x3c/0x6c
>> [ 86.549593] __cpu_up+0x68/0x210
>> [ 86.552852] bringup_cpu+0x30/0xe0
>> [ 86.556293] cpuhp_invoke_callback+0x84/0x1e0
>> [ 86.560689] _cpu_up+0xe0/0x1d0
>> [ 86.563862] do_cpu_up+0x90/0xb0
>> [ 86.567118] cpu_up+0x10/0x18
>> [ 86.570113] cpu_subsys_online+0x44/0x98
>> [ 86.574079] device_online+0x68/0xac
>> [ 86.577685] online_store+0xa8/0xb4
>> [ 86.581202] dev_attr_store+0x18/0x28
>> [ 86.584908] sysfs_kf_write+0x40/0x48
>> [ 86.588606] kernfs_fop_write+0xcc/0x1cc
>> [ 86.592563] __vfs_write+0x40/0x16c
>> [ 86.596078] vfs_write+0xa8/0x1a0
>> [ 86.599424] ksys_write+0x58/0xbc
>> [ 86.602768] __arm64_sys_write+0x18/0x20
>> [ 86.606733] el0_svc_common+0x94/0xf0
>> [ 86.610433] el0_svc_handler+0x24/0x80
>> [ 86.614215] el0_svc+0x8/0x7c0
>> [ 86.617300] Code: aa0003f3 361000e1 91062263 f9800071 (885f7c60)
>> [ 86.623447] ---[ end trace 4573c3c0e0761290 ]---
>>
>> >+ }
>> >+
>> > return psci_to_linux_errno(err);
>> > }
>> >
>> >--
>> >2.17.1
>> >
>>
>> Thanks,
>> Lina
On Thu, Nov 29 2018 at 10:49 -0700, Ulf Hansson wrote:
>When the hierarchical CPU topology layout is used in DT, we need to setup
>the corresponding PM domain data structures, as to allow a CPU and a group
>of CPUs to be power managed accordingly. Let's enable this by deploying
>support through the genpd interface.
>
>Additionally, when the OS initiated mode is supported by the PSCI FW, let's
>also parse the domain idle states DT bindings as to make genpd responsible
>for the state selection, when the states are compatible with
>"domain-idle-state". Otherwise, when only Platform Coordinated mode is
>supported, we rely solely on the state selection to be managed through the
>regular cpuidle framework.
>
>If the initialization of the PM domain data structures succeeds and the OS
>initiated mode is supported, we try to switch to it. In case it fails,
>let's fall back into a degraded mode, rather than bailing out and returning
>an error code.
>
>Due to that the OS initiated mode may become enabled, we need to adjust to
>maintain backwards compatibility for a kernel started through a kexec call.
>Do this by explicitly switch to Platform Coordinated mode during boot.
>
>To try to initiate the PM domain data structures, the PSCI driver shall
>call the new function, psci_dt_init_pm_domains(). However, this is done
>from following changes.
>
>Cc: Lina Iyer <[email protected]>
>Co-developed-by: Lina Iyer <[email protected]>
>Signed-off-by: Ulf Hansson <[email protected]>
>---
>
>Changes in V10:
> - Enable the PM domains to be used for both PC and OSI mode.
> - Fixup error paths.
> - Move the management of kexec started kernels into this patch.
> - Rewrite changelog.
>
>---
> drivers/firmware/psci/Makefile | 2 +-
> drivers/firmware/psci/psci.c | 7 +-
> drivers/firmware/psci/psci.h | 6 +
> drivers/firmware/psci/psci_pm_domain.c | 262 +++++++++++++++++++++++++
> 4 files changed, 275 insertions(+), 2 deletions(-)
> create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>
>diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
>index 1956b882470f..ff300f1fec86 100644
>--- a/drivers/firmware/psci/Makefile
>+++ b/drivers/firmware/psci/Makefile
>@@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
> #
>-obj-$(CONFIG_ARM_PSCI_FW) += psci.o
>+obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
> obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
>diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
>index 623591b541a4..19af2093151b 100644
>--- a/drivers/firmware/psci/psci.c
>+++ b/drivers/firmware/psci/psci.c
>@@ -704,9 +704,14 @@ static int __init psci_1_0_init(struct device_node *np)
> if (err)
> return err;
>
>- if (psci_has_osi_support())
>+ if (psci_has_osi_support()) {
> pr_info("OSI mode supported.\n");
>
>+ /* Make sure we default to PC mode. */
>+ invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
>+ PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
>+ }
>+
> return 0;
> }
>
>diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
>index 7d9d38fd57e1..8cf6d7206fab 100644
>--- a/drivers/firmware/psci/psci.h
>+++ b/drivers/firmware/psci/psci.h
>@@ -11,4 +11,10 @@ void psci_set_domain_state(u32 state);
> bool psci_has_osi_support(void);
> int psci_dt_parse_state_node(struct device_node *np, u32 *state);
>
>+#ifdef CONFIG_CPU_IDLE
>+int psci_dt_init_pm_domains(struct device_node *np);
>+#else
>+static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
>+#endif
>+
> #endif /* __PSCI_H */
>diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
>new file mode 100644
>index 000000000000..d0dc38e96f85
>--- /dev/null
>+++ b/drivers/firmware/psci/psci_pm_domain.c
>@@ -0,0 +1,262 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * PM domains for CPUs via genpd - managed by PSCI.
>+ *
>+ * Copyright (C) 2018 Linaro Ltd.
>+ * Author: Ulf Hansson <[email protected]>
>+ *
>+ */
>+
>+#define pr_fmt(fmt) "psci: " fmt
>+
>+#include <linux/device.h>
>+#include <linux/kernel.h>
>+#include <linux/pm_domain.h>
>+#include <linux/slab.h>
>+#include <linux/string.h>
>+
>+#include "psci.h"
>+
>+#ifdef CONFIG_CPU_IDLE
>+
>+struct psci_pd_provider {
>+ struct list_head link;
>+ struct device_node *node;
>+};
>+
>+static LIST_HEAD(psci_pd_providers);
>+static bool osi_mode_enabled;
>+
>+static int psci_pd_power_off(struct generic_pm_domain *pd)
>+{
>+ struct genpd_power_state *state = &pd->states[pd->state_idx];
>+ u32 *pd_state;
>+ u32 composite_pd_state;
>+
>+ /* If we have failed to enable OSI mode, then abort power off. */
>+ if (psci_has_osi_support() && !osi_mode_enabled)
>+ return -EBUSY;
>+
>+ if (!state->data)
>+ return 0;
>+
>+ /* When OSI mode is enabled, set the corresponding domain state. */
>+ pd_state = state->data;
>+ composite_pd_state = *pd_state | psci_get_domain_state();
This should not be needed. The domain_state should be 0x0 being set
after coming out of idle.
>+ psci_set_domain_state(composite_pd_state);
The three lines can be summarized as:
psci_set_domain_state(*state->data);
Thanks,
Lina
On Mon, 3 Dec 2018 at 17:37, Lina Iyer <[email protected]> wrote:
>
> On Thu, Nov 29 2018 at 10:49 -0700, Ulf Hansson wrote:
> >When the hierarchical CPU topology layout is used in DT, we need to setup
> >the corresponding PM domain data structures, as to allow a CPU and a group
> >of CPUs to be power managed accordingly. Let's enable this by deploying
> >support through the genpd interface.
> >
> >Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> >also parse the domain idle states DT bindings as to make genpd responsible
> >for the state selection, when the states are compatible with
> >"domain-idle-state". Otherwise, when only Platform Coordinated mode is
> >supported, we rely solely on the state selection to be managed through the
> >regular cpuidle framework.
> >
> >If the initialization of the PM domain data structures succeeds and the OS
> >initiated mode is supported, we try to switch to it. In case it fails,
> >let's fall back into a degraded mode, rather than bailing out and returning
> >an error code.
> >
> >Due to that the OS initiated mode may become enabled, we need to adjust to
> >maintain backwards compatibility for a kernel started through a kexec call.
> >Do this by explicitly switch to Platform Coordinated mode during boot.
> >
> >To try to initiate the PM domain data structures, the PSCI driver shall
> >call the new function, psci_dt_init_pm_domains(). However, this is done
> >from following changes.
> >
> >Cc: Lina Iyer <[email protected]>
> >Co-developed-by: Lina Iyer <[email protected]>
> >Signed-off-by: Ulf Hansson <[email protected]>
> >---
> >
> >Changes in V10:
> > - Enable the PM domains to be used for both PC and OSI mode.
> > - Fixup error paths.
> > - Move the management of kexec started kernels into this patch.
> > - Rewrite changelog.
> >
> >---
> > drivers/firmware/psci/Makefile | 2 +-
> > drivers/firmware/psci/psci.c | 7 +-
> > drivers/firmware/psci/psci.h | 6 +
> > drivers/firmware/psci/psci_pm_domain.c | 262 +++++++++++++++++++++++++
> > 4 files changed, 275 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/firmware/psci/psci_pm_domain.c
> >
> >diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
> >index 1956b882470f..ff300f1fec86 100644
> >--- a/drivers/firmware/psci/Makefile
> >+++ b/drivers/firmware/psci/Makefile
> >@@ -1,4 +1,4 @@
> > # SPDX-License-Identifier: GPL-2.0
> > #
> >-obj-$(CONFIG_ARM_PSCI_FW) += psci.o
> >+obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
> > obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
> >diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> >index 623591b541a4..19af2093151b 100644
> >--- a/drivers/firmware/psci/psci.c
> >+++ b/drivers/firmware/psci/psci.c
> >@@ -704,9 +704,14 @@ static int __init psci_1_0_init(struct device_node *np)
> > if (err)
> > return err;
> >
> >- if (psci_has_osi_support())
> >+ if (psci_has_osi_support()) {
> > pr_info("OSI mode supported.\n");
> >
> >+ /* Make sure we default to PC mode. */
> >+ invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> >+ PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
> >+ }
> >+
> > return 0;
> > }
> >
> >diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> >index 7d9d38fd57e1..8cf6d7206fab 100644
> >--- a/drivers/firmware/psci/psci.h
> >+++ b/drivers/firmware/psci/psci.h
> >@@ -11,4 +11,10 @@ void psci_set_domain_state(u32 state);
> > bool psci_has_osi_support(void);
> > int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> >
> >+#ifdef CONFIG_CPU_IDLE
> >+int psci_dt_init_pm_domains(struct device_node *np);
> >+#else
> >+static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> >+#endif
> >+
> > #endif /* __PSCI_H */
> >diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> >new file mode 100644
> >index 000000000000..d0dc38e96f85
> >--- /dev/null
> >+++ b/drivers/firmware/psci/psci_pm_domain.c
> >@@ -0,0 +1,262 @@
> >+// SPDX-License-Identifier: GPL-2.0
> >+/*
> >+ * PM domains for CPUs via genpd - managed by PSCI.
> >+ *
> >+ * Copyright (C) 2018 Linaro Ltd.
> >+ * Author: Ulf Hansson <[email protected]>
> >+ *
> >+ */
> >+
> >+#define pr_fmt(fmt) "psci: " fmt
> >+
> >+#include <linux/device.h>
> >+#include <linux/kernel.h>
> >+#include <linux/pm_domain.h>
> >+#include <linux/slab.h>
> >+#include <linux/string.h>
> >+
> >+#include "psci.h"
> >+
> >+#ifdef CONFIG_CPU_IDLE
> >+
> >+struct psci_pd_provider {
> >+ struct list_head link;
> >+ struct device_node *node;
> >+};
> >+
> >+static LIST_HEAD(psci_pd_providers);
> >+static bool osi_mode_enabled;
> >+
> >+static int psci_pd_power_off(struct generic_pm_domain *pd)
> >+{
> >+ struct genpd_power_state *state = &pd->states[pd->state_idx];
> >+ u32 *pd_state;
> >+ u32 composite_pd_state;
> >+
> >+ /* If we have failed to enable OSI mode, then abort power off. */
> >+ if (psci_has_osi_support() && !osi_mode_enabled)
> >+ return -EBUSY;
> >+
> >+ if (!state->data)
> >+ return 0;
> >+
> >+ /* When OSI mode is enabled, set the corresponding domain state. */
> >+ pd_state = state->data;
> >+ composite_pd_state = *pd_state | psci_get_domain_state();
> This should not be needed. The domain_state should be 0x0 being set
> after coming out of idle.
> >+ psci_set_domain_state(composite_pd_state);
> The three lines can be summarized as:
> psci_set_domain_state(*state->data);
Sure, let me change accordingly and thanks for spotting this!
Kind regards
Uffe
On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
>Introduce a new PSCI DT helper function, psci_dt_attach_cpu(), which takes
>a CPU number as an in-parameter and attaches the CPU's struct device to its
>corresponding PM domain. Additionally, the helper prepares the CPU to be
>power managed via runtime PM, which is the last step needed to enable the
>interaction with the PM domain through the runtime PM callbacks.
>
>Signed-off-by: Ulf Hansson <[email protected]>
>---
>
>Changes in v10:
> - New patch: Replaces "PM / Domains: Add helper functions to
> attach/detach CPUs to/from genpd".
>
>---
> drivers/firmware/psci/psci.h | 1 +
> drivers/firmware/psci/psci_pm_domain.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
>diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
>index 05af462cc96e..fbc9980dee69 100644
>--- a/drivers/firmware/psci/psci.h
>+++ b/drivers/firmware/psci/psci.h
>@@ -15,6 +15,7 @@ int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> int psci_dt_init_pm_domains(struct device_node *np);
> int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> struct device_node *cpu_node, u32 *psci_states);
>+int psci_dt_attach_cpu(int cpu);
> #else
> static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> #endif
>diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
>index 6c9d6a644c7f..b0fa7da8a0ce 100644
>--- a/drivers/firmware/psci/psci_pm_domain.c
>+++ b/drivers/firmware/psci/psci_pm_domain.c
>@@ -12,8 +12,10 @@
> #include <linux/device.h>
> #include <linux/kernel.h>
> #include <linux/pm_domain.h>
>+#include <linux/pm_runtime.h>
> #include <linux/slab.h>
> #include <linux/string.h>
>+#include <linux/cpu.h>
> #include <linux/cpuidle.h>
> #include <linux/cpu_pm.h>
>
>@@ -367,4 +369,21 @@ int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
>
> return 0;
> }
>+
>+int psci_dt_attach_cpu(int cpu)
>+{
>+ struct device *dev = get_cpu_device(cpu);
>+ int ret;
>+
>+ ret = dev_pm_domain_attach(dev, true);
>+ if (ret)
>+ return ret;
>+
>+ pm_runtime_irq_safe(dev);
>+ pm_runtime_get_noresume(dev);
>+ pm_runtime_set_active(dev);
You would want to set this only if the CPU is online. Otherwise we will
not power down the domain, if the CPU was never brought online.
>+ pm_runtime_enable(dev);
>+
>+ return 0;
>+}
> #endif
>--
>2.17.1
>
On Tue, 4 Dec 2018 at 19:45, Lina Iyer <[email protected]> wrote:
>
> On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
> >Introduce a new PSCI DT helper function, psci_dt_attach_cpu(), which takes
> >a CPU number as an in-parameter and attaches the CPU's struct device to its
> >corresponding PM domain. Additionally, the helper prepares the CPU to be
> >power managed via runtime PM, which is the last step needed to enable the
> >interaction with the PM domain through the runtime PM callbacks.
> >
> >Signed-off-by: Ulf Hansson <[email protected]>
> >---
> >
> >Changes in v10:
> > - New patch: Replaces "PM / Domains: Add helper functions to
> > attach/detach CPUs to/from genpd".
> >
> >---
> > drivers/firmware/psci/psci.h | 1 +
> > drivers/firmware/psci/psci_pm_domain.c | 19 +++++++++++++++++++
> > 2 files changed, 20 insertions(+)
> >
> >diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> >index 05af462cc96e..fbc9980dee69 100644
> >--- a/drivers/firmware/psci/psci.h
> >+++ b/drivers/firmware/psci/psci.h
> >@@ -15,6 +15,7 @@ int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> > int psci_dt_init_pm_domains(struct device_node *np);
> > int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> > struct device_node *cpu_node, u32 *psci_states);
> >+int psci_dt_attach_cpu(int cpu);
> > #else
> > static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> > #endif
> >diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> >index 6c9d6a644c7f..b0fa7da8a0ce 100644
> >--- a/drivers/firmware/psci/psci_pm_domain.c
> >+++ b/drivers/firmware/psci/psci_pm_domain.c
> >@@ -12,8 +12,10 @@
> > #include <linux/device.h>
> > #include <linux/kernel.h>
> > #include <linux/pm_domain.h>
> >+#include <linux/pm_runtime.h>
> > #include <linux/slab.h>
> > #include <linux/string.h>
> >+#include <linux/cpu.h>
> > #include <linux/cpuidle.h>
> > #include <linux/cpu_pm.h>
> >
> >@@ -367,4 +369,21 @@ int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> >
> > return 0;
> > }
> >+
> >+int psci_dt_attach_cpu(int cpu)
> >+{
> >+ struct device *dev = get_cpu_device(cpu);
> >+ int ret;
> >+
> >+ ret = dev_pm_domain_attach(dev, true);
> >+ if (ret)
> >+ return ret;
> >+
> >+ pm_runtime_irq_safe(dev);
> >+ pm_runtime_get_noresume(dev);
> >+ pm_runtime_set_active(dev);
> You would want to set this only if the CPU is online. Otherwise we will
> not power down the domain, if the CPU was never brought online.
Nice catch!
The platforms I tested this series on brings all their CPUs online
during boot, hence I haven't observed the problem.
I will post a new version soon to address the problem. Again, thanks
for your review!
[...]
Kind regards
Uffe
Rafael, Sudeep, Lorenzo, Mark,
On Thu, 29 Nov 2018 at 18:47, Ulf Hansson <[email protected]> wrote:
>
> Over the years this series have been iterated and discussed at various Linux
> conferences and LKML. In this new v10, a quite significant amount of changes
> have been made to address comments from v8 and v9. A summary is available
> below, although let's start with a brand new clarification of the motivation
> behind this series.
>
> For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
> From a CPU idle state perspective, this means some states may be shared among a
> group of CPUs (aka CPU cluster).
>
> To deal with idle management of a group of CPUs, sometimes the kernel needs to
> be involved to manage the last-man standing algorithm, simply because it can't
> rely solely on power management FWs to deal with this. Depending on the
> platform, of course.
>
> There are a couple of typical scenarios for when the kernel needs to be in
> control, dealing with synchronization of when the last CPU in a cluster is about
> to enter a deep idle state.
>
> 1)
> The kernel needs to carry out so called last-man activities before the
> CPU cluster can enter a deep idle state. This may for example involve to
> configure external logics for wakeups, as the GIC may no longer be functional
> once a deep cluster idle state have been entered. Likewise, these operations
> may need to be restored, when the first CPU wakes up.
>
> 2)
> Other more generic I/O devices, such as an MMC controller for example, may be a
> part of the same power domain as the CPU cluster, due to a shared power-rail.
> For these scenarios, when the MMC controller is in use dealing with an MMC
> request, a deeper idle state of the CPU cluster may needs to be temporarily
> disabled. This is needed to retain the MMC controller in a functional state,
> else it may loose its register-context in the middle of serving a request.
>
> In this series, we are extending the generic PM domain (aka genpd) to be used
> for also CPU devices. Hence the goal is to re-use much of its current code to
> help us manage the last-man standing synchronization. Moreover, as we already
> use genpd to model power domains for generic I/O devices, both 1) and 2) can be
> address with its help.
>
> Moreover, to address these problems for ARM64 DT based platforms, we are
> deploying support for genpd and runtime PM to the PSCI FW driver - and finally
> we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
> layout.
>
> The series has been tested on the QCOM 410c dragonboard and the Hisilicon Hikey
> board. You may also find the code at:
>
> git.linaro.org/people/ulf.hansson/linux-pm.git next
It's soon been three weeks since I posted this and I would really
appreciate some feedback.
Rafael, I need your feedback on patch 1->4.
Mark, Sudeep, Lorenzo, please have a look at the PSCI related changes.
When it comes to the the cpuidle related changes, I have pinged Daniel
offlist - and he is preparing some responses.
Kind regards
Uffe
>
> Kind regards
> Ulf Hansson
>
>
> Changes in v10:
> - Quite significant changes have been to the PSCI driver deployment. According
> to an agreement with Lorenzo, the hierarchical CPU layout for PSCI should be
> orthogonal to whether the PSCI FW supports OSI or not. This has been taken
> care of in this version.
> - Drop the generic attach/detach helpers of CPUs to genpd, instead make that
> related code internal to PSCI, for now.
> - Fix "BUG: sleeping for invalid context" for hotplug, as reported by Raju.
> - Addressed various comments from version 8 and 9.
> - Clarified changelogs and re-wrote the cover-letter to better explain the
> motivations behind these changes.
>
> Changes in v9:
> - Collect only a subset from the changes in v8.
> - Patch 3 is new, documenting existing genpd flags. Future wise, this means
> when a new genpd flag is invented, we must also properly document it.
> - No changes have been made to the patches picked from v8.
> - Dropped the text from v8 cover-letter[1], to avoid confusion. When posting v10
> (or whatever the next version containing the rest becomes), I am going re-write
> the cover-letter to clarify, more exactly, the problems this series intends to
> solve. The earlier text was simply too vague.
>
> [1]
> https://lwn.net/Articles/758091/
>
> Changes in v8:
> - Added some tags for reviews and acks.
> - Cleanup timer patch (patch6) according to comments from Rafael.
> - Rebased series on top of v4.18rc1 - it applied cleanly, except for patch 5.
> - While adopting patch 5 to new genpd changes, I took the opportunity to
> improve the new function description a bit.
> - Corrected malformed SPDX-License-Identifier in patch20.
>
> Changes in v7:
> - Addressed comments concerning the PSCI changes from Mark Rutland, which moves
> the psci firmware driver to a new firmware subdir and change to force PSCI PC
> mode during boot to cope with kexec'ed booted kernels.
> - Added some maintainers in cc for the timer/nohz patches.
> - Minor update to the new genpd governor, taking into account the state's
> poweroff latency while validating the sleep duration time.
> - Addressed a problem pointed out by Geert Uytterhoeven, around calling
> pm_runtime_get|put() for CPUs that has not been attached to a CPU PM domain.
> - Re-based on Linus' latest master.
>
>
> Lina Iyer (5):
> timer: Export next wakeup time of a CPU
> dt: psci: Update DT bindings to support hierarchical PSCI states
> cpuidle: dt: Support hierarchical CPU idle states
> drivers: firmware: psci: Support hierarchical CPU idle states
> arm64: dts: Convert to the hierarchical CPU topology layout for
> MSM8916
>
> Ulf Hansson (22):
> PM / Domains: Add generic data pointer to genpd_power_state struct
> PM / Domains: Add support for CPU devices to genpd
> PM / Domains: Add genpd governor for CPUs
> of: base: Add of_get_cpu_state_node() to get idle states for a CPU
> node
> ARM/ARM64: cpuidle: Let back-end init ops take the driver as input
> drivers: firmware: psci: Move psci to separate directory
> MAINTAINERS: Update files for PSCI
> drivers: firmware: psci: Split psci_dt_cpu_init_idle()
> drivers: firmware: psci: Simplify state node parsing
> drivers: firmware: psci: Simplify error path of psci_dt_init()
> drivers: firmware: psci: Announce support for OS initiated suspend
> mode
> drivers: firmware: psci: Prepare to use OS initiated suspend mode
> drivers: firmware: psci: Prepare to support PM domains
> drivers: firmware: psci: Add support for PM domains using genpd
> drivers: firmware: psci: Add hierarchical domain idle states converter
> drivers: firmware: psci: Introduce psci_dt_topology_init()
> drivers: firmware: psci: Add a helper to attach a CPU to its PM domain
> drivers: firmware: psci: Attach the CPU's device to its PM domain
> drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
> drivers: firmware: psci: Support CPU hotplug for the hierarchical
> model
> arm64: kernel: Respect the hierarchical CPU topology in DT for PSCI
> arm64: dts: hikey: Convert to the hierarchical CPU topology layout
>
> .../devicetree/bindings/arm/psci.txt | 166 ++++++++
> MAINTAINERS | 2 +-
> arch/arm/include/asm/cpuidle.h | 4 +-
> arch/arm/kernel/cpuidle.c | 5 +-
> arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 +++-
> arch/arm64/boot/dts/qcom/msm8916.dtsi | 57 ++-
> arch/arm64/include/asm/cpu_ops.h | 4 +-
> arch/arm64/include/asm/cpuidle.h | 6 +-
> arch/arm64/kernel/cpuidle.c | 6 +-
> arch/arm64/kernel/setup.c | 3 +
> drivers/base/power/domain.c | 74 +++-
> drivers/base/power/domain_governor.c | 61 ++-
> drivers/cpuidle/cpuidle-arm.c | 2 +-
> drivers/cpuidle/dt_idle_states.c | 5 +-
> drivers/firmware/Kconfig | 15 +-
> drivers/firmware/Makefile | 3 +-
> drivers/firmware/psci/Kconfig | 13 +
> drivers/firmware/psci/Makefile | 4 +
> drivers/firmware/{ => psci}/psci.c | 240 ++++++++---
> drivers/firmware/psci/psci.h | 23 ++
> drivers/firmware/{ => psci}/psci_checker.c | 0
> drivers/firmware/psci/psci_pm_domain.c | 389 ++++++++++++++++++
> drivers/of/base.c | 35 ++
> drivers/soc/qcom/spm.c | 3 +-
> include/linux/of.h | 8 +
> include/linux/pm_domain.h | 19 +-
> include/linux/psci.h | 6 +-
> include/linux/tick.h | 8 +
> include/uapi/linux/psci.h | 5 +
> kernel/time/tick-sched.c | 13 +
> 30 files changed, 1163 insertions(+), 103 deletions(-)
> create mode 100644 drivers/firmware/psci/Kconfig
> create mode 100644 drivers/firmware/psci/Makefile
> rename drivers/firmware/{ => psci}/psci.c (76%)
> create mode 100644 drivers/firmware/psci/psci.h
> rename drivers/firmware/{ => psci}/psci_checker.c (100%)
> create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>
> --
> 2.17.1
>
On 29/11/2018 18:46, Ulf Hansson wrote:
> Let's add a data pointer to the genpd_power_state struct, to allow a genpd
> backend driver to store per state specific data. In order to introduce the
> pointer, we also need to adopt how genpd frees the allocated data for the
> default genpd_power_state struct, that it may allocate at pm_genpd_init().
>
> More precisely, let's use an internal genpd flag to understand when the
> states needs to be freed by genpd. When freeing the states data in
> genpd_remove(), let's also clear the corresponding genpd->states pointer
> and reset the genpd->state_count. In this way, a genpd backend driver
> becomes aware of when there is state specific data for it to free.
>
> Cc: Lina Iyer <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - Update the patch allow backend drivers to free the states specific
> data during genpd removal. Due to this added complexity, I decided to
> keep the patch separate, rather than fold it into the patch that makes
> use of the new void pointer, which was suggested by Rafael.
> - Claim authorship of the patch as lots of changes has been done since
> the original pick up from Lina Iyer.
>
> ---
> drivers/base/power/domain.c | 8 ++++++--
> include/linux/pm_domain.h | 3 ++-
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 7f38a92b444a..e27b91d36a2a 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1620,7 +1620,7 @@ static int genpd_set_default_power_state(struct generic_pm_domain *genpd)
>
> genpd->states = state;
> genpd->state_count = 1;
> - genpd->free = state;
> + genpd->free_state = true;
>
> return 0;
> }
> @@ -1736,7 +1736,11 @@ static int genpd_remove(struct generic_pm_domain *genpd)
> list_del(&genpd->gpd_list_node);
> genpd_unlock(genpd);
> cancel_work_sync(&genpd->power_off_work);
> - kfree(genpd->free);
> + if (genpd->free_state) {
> + kfree(genpd->states);
> + genpd->states = NULL;
> + genpd->state_count = 0;
Why these two initializations? After genpd_remove, this structure
shouldn't be used anymore, no ?
> + }
Instead of a flag, replacing the 'free' pointer to a 'free' callback
will allow to keep the free path self-encapsulated in domain.c
genpd->free(genpd->states);
Patch 18/27 can fill this field with its specific free pointer.
> pr_debug("%s: removed %s\n", __func__, genpd->name);
>
> return 0;
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index 3b5d7280e52e..f9e09bd4152c 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -69,6 +69,7 @@ struct genpd_power_state {
> s64 residency_ns;
> struct fwnode_handle *fwnode;
> ktime_t idle_time;
> + void *data;
> };
>
> struct genpd_lock_ops;
> @@ -110,7 +111,7 @@ struct generic_pm_domain {
> struct genpd_power_state *states;
> unsigned int state_count; /* number of states */
> unsigned int state_idx; /* state that genpd will go to when off */
> - void *free; /* Free the state that was allocated for default */
> + bool free_state; /* Free the state that was allocated for default */
> ktime_t on_time;
> ktime_t accounting_time;
> const struct genpd_lock_ops *lock_ops;
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Tue, 18 Dec 2018 at 11:39, Daniel Lezcano <[email protected]> wrote:
>
> On 29/11/2018 18:46, Ulf Hansson wrote:
> > Let's add a data pointer to the genpd_power_state struct, to allow a genpd
> > backend driver to store per state specific data. In order to introduce the
> > pointer, we also need to adopt how genpd frees the allocated data for the
> > default genpd_power_state struct, that it may allocate at pm_genpd_init().
> >
> > More precisely, let's use an internal genpd flag to understand when the
> > states needs to be freed by genpd. When freeing the states data in
> > genpd_remove(), let's also clear the corresponding genpd->states pointer
> > and reset the genpd->state_count. In this way, a genpd backend driver
> > becomes aware of when there is state specific data for it to free.
> >
> > Cc: Lina Iyer <[email protected]>
> > Co-developed-by: Lina Iyer <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - Update the patch allow backend drivers to free the states specific
> > data during genpd removal. Due to this added complexity, I decided to
> > keep the patch separate, rather than fold it into the patch that makes
> > use of the new void pointer, which was suggested by Rafael.
> > - Claim authorship of the patch as lots of changes has been done since
> > the original pick up from Lina Iyer.
> >
> > ---
> > drivers/base/power/domain.c | 8 ++++++--
> > include/linux/pm_domain.h | 3 ++-
> > 2 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 7f38a92b444a..e27b91d36a2a 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -1620,7 +1620,7 @@ static int genpd_set_default_power_state(struct generic_pm_domain *genpd)
> >
> > genpd->states = state;
> > genpd->state_count = 1;
> > - genpd->free = state;
> > + genpd->free_state = true;
> >
> > return 0;
> > }
> > @@ -1736,7 +1736,11 @@ static int genpd_remove(struct generic_pm_domain *genpd)
> > list_del(&genpd->gpd_list_node);
> > genpd_unlock(genpd);
> > cancel_work_sync(&genpd->power_off_work);
> > - kfree(genpd->free);
> > + if (genpd->free_state) {
> > + kfree(genpd->states);
> > + genpd->states = NULL;
> > + genpd->state_count = 0;
>
> Why these two initializations? After genpd_remove, this structure
> shouldn't be used anymore, no ?
Correct.
>
> > + }
>
> Instead of a flag, replacing the 'free' pointer to a 'free' callback
> will allow to keep the free path self-encapsulated in domain.c
>
> genpd->free(genpd->states);
Right, I get your idea and it makes sense. Let me convert to that.
>
> Patch 18/27 can fill this field with its specific free pointer.
Yep!
[...]
Thanks for reviewing!
Kind regards
Uffe
On 29/11/2018 18:46, Ulf Hansson wrote:
> To enable a device belonging to a CPU to be attached to a PM domain managed
> by genpd, let's do a few changes to it, as to make it convenient to manage
> the specifics around CPUs.
>
> To be able to quickly find out what CPUs that are attached to a genpd,
> which typically becomes useful from a genpd governor as following changes
> is about to show, let's add a cpumask to the struct generic_pm_domain. At
> the point when a CPU device gets attached to a genpd, let's update its
> cpumask. Moreover, let's also propagate changes to the cpumask upwards in
> the topology to the master PM domains. In this way, the cpumask for a genpd
> hierarchically reflects all CPUs attached to the topology below it.
>
> Finally, let's make this an opt-in feature, to avoid having to manage CPUs
> and the cpumask for a genpd that doesn't need it. For that reason, let's
> add a new genpd configuration bit, GENPD_FLAG_CPU_DOMAIN.
>
> Cc: Lina Iyer <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - Don't allocate the cpumask when not used.
> - Simplify the code that updates the cpumask.
> - Document the GENPD_FLAG_CPU_DOMAIN.
>
> ---
> drivers/base/power/domain.c | 66 ++++++++++++++++++++++++++++++++++++-
> include/linux/pm_domain.h | 13 ++++++++
> 2 files changed, 78 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index e27b91d36a2a..c3ff8e395308 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -20,6 +20,7 @@
> #include <linux/sched.h>
> #include <linux/suspend.h>
> #include <linux/export.h>
> +#include <linux/cpu.h>
>
> #include "power.h"
>
> @@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = {
> #define genpd_is_irq_safe(genpd) (genpd->flags & GENPD_FLAG_IRQ_SAFE)
> #define genpd_is_always_on(genpd) (genpd->flags & GENPD_FLAG_ALWAYS_ON)
> #define genpd_is_active_wakeup(genpd) (genpd->flags & GENPD_FLAG_ACTIVE_WAKEUP)
> +#define genpd_is_cpu_domain(genpd) (genpd->flags & GENPD_FLAG_CPU_DOMAIN)
>
> static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev,
> const struct generic_pm_domain *genpd)
> @@ -1377,6 +1379,56 @@ static void genpd_free_dev_data(struct device *dev,
> dev_pm_put_subsys_data(dev);
> }
>
> +static void __genpd_update_cpumask(struct generic_pm_domain *genpd,
> + int cpu, bool set, unsigned int depth)
> +{
> + struct gpd_link *link;
> +
> + if (!genpd_is_cpu_domain(genpd))
> + return;
With this test, we won't continue updating the cpumask for the other
masters. Is it done on purpose ?
> + list_for_each_entry(link, &genpd->slave_links, slave_node) {
> + struct generic_pm_domain *master = link->master;
> +
> + genpd_lock_nested(master, depth + 1);
> + __genpd_update_cpumask(master, cpu, set, depth + 1);
> + genpd_unlock(master);
> + }
> +
> + if (set)
> + cpumask_set_cpu(cpu, genpd->cpus);
> + else
> + cpumask_clear_cpu(cpu, genpd->cpus);
> +}
> +
> +static void genpd_update_cpumask(struct generic_pm_domain *genpd,
> + struct device *dev, bool set)
> +{
> + int cpu;
> +
> + if (!genpd_is_cpu_domain(genpd))
> + return;
> +
> + for_each_possible_cpu(cpu) {
> + if (get_cpu_device(cpu) == dev) {
> + __genpd_update_cpumask(genpd, cpu, set, 0);
> + return;
> + }
> + }
> +}
> +
> +static void genpd_set_cpumask(struct generic_pm_domain *genpd,
> + struct device *dev)
> +{
> + genpd_update_cpumask(genpd, dev, true);
> +}
> +
> +static void genpd_clear_cpumask(struct generic_pm_domain *genpd,
> + struct device *dev)
> +{
> + genpd_update_cpumask(genpd, dev, false);
> +}
> +
> static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
> struct gpd_timing_data *td)
> {
> @@ -1398,6 +1450,8 @@ static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev,
> if (ret)
> goto out;
>
> + genpd_set_cpumask(genpd, dev);
> +
> dev_pm_domain_set(dev, &genpd->domain);
>
> genpd->device_count++;
> @@ -1459,6 +1513,7 @@ static int genpd_remove_device(struct generic_pm_domain *genpd,
> if (genpd->detach_dev)
> genpd->detach_dev(genpd, dev);
>
> + genpd_clear_cpumask(genpd, dev);
> dev_pm_domain_set(dev, NULL);
>
> list_del_init(&pdd->list_node);
> @@ -1686,11 +1741,18 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
> if (genpd_is_always_on(genpd) && !genpd_status_on(genpd))
> return -EINVAL;
>
> + if (genpd_is_cpu_domain(genpd) &&
> + !zalloc_cpumask_var(&genpd->cpus, GFP_KERNEL))
> + return -ENOMEM;
> +
> /* Use only one "off" state if there were no states declared */
> if (genpd->state_count == 0) {
> ret = genpd_set_default_power_state(genpd);
> - if (ret)
> + if (ret) {
> + if (genpd_is_cpu_domain(genpd))
> + free_cpumask_var(genpd->cpus);
> return ret;
> + }
> } else if (!gov) {
> pr_warn("%s : no governor for states\n", genpd->name);
> }
> @@ -1736,6 +1798,8 @@ static int genpd_remove(struct generic_pm_domain *genpd)
> list_del(&genpd->gpd_list_node);
> genpd_unlock(genpd);
> cancel_work_sync(&genpd->power_off_work);
> + if (genpd_is_cpu_domain(genpd))
> + free_cpumask_var(genpd->cpus);
> if (genpd->free_state) {
> kfree(genpd->states);
> genpd->states = NULL;
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index f9e09bd4152c..5a4673605d22 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -16,6 +16,7 @@
> #include <linux/of.h>
> #include <linux/notifier.h>
> #include <linux/spinlock.h>
> +#include <linux/cpumask.h>
>
> /*
> * Flags to control the behaviour of a genpd.
> @@ -42,11 +43,22 @@
> * GENPD_FLAG_ACTIVE_WAKEUP: Instructs genpd to keep the PM domain powered
> * on, in case any of its attached devices is used
> * in the wakeup path to serve system wakeups.
> + *
> + * GENPD_FLAG_CPU_DOMAIN: Instructs genpd that it should expect to get
> + * devices attached, which may belong to CPUs or
> + * possibly have subdomains with CPUs attached.
> + * This flag enables the genpd backend driver to
> + * deploy idle power management support for CPUs
> + * and groups of CPUs. Note that, the backend
> + * driver must then comply with the so called,
> + * last-man-standing algorithm, for the CPUs in the
> + * PM domain.
> */
> #define GENPD_FLAG_PM_CLK (1U << 0)
> #define GENPD_FLAG_IRQ_SAFE (1U << 1)
> #define GENPD_FLAG_ALWAYS_ON (1U << 2)
> #define GENPD_FLAG_ACTIVE_WAKEUP (1U << 3)
> +#define GENPD_FLAG_CPU_DOMAIN (1U << 4)
>
> enum gpd_status {
> GPD_STATE_ACTIVE = 0, /* PM domain is active */
> @@ -93,6 +105,7 @@ struct generic_pm_domain {
> unsigned int suspended_count; /* System suspend device counter */
> unsigned int prepared_count; /* Suspend counter of prepared devices */
> unsigned int performance_state; /* Aggregated max performance state */
> + cpumask_var_t cpus; /* A cpumask of the attached CPUs */
> int (*power_off)(struct generic_pm_domain *domain);
> int (*power_on)(struct generic_pm_domain *domain);
> unsigned int (*opp_to_performance_state)(struct generic_pm_domain *genpd,
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Wed, 19 Dec 2018 at 10:53, Daniel Lezcano <[email protected]> wrote:
>
> On 29/11/2018 18:46, Ulf Hansson wrote:
> > To enable a device belonging to a CPU to be attached to a PM domain managed
> > by genpd, let's do a few changes to it, as to make it convenient to manage
> > the specifics around CPUs.
> >
> > To be able to quickly find out what CPUs that are attached to a genpd,
> > which typically becomes useful from a genpd governor as following changes
> > is about to show, let's add a cpumask to the struct generic_pm_domain. At
> > the point when a CPU device gets attached to a genpd, let's update its
> > cpumask. Moreover, let's also propagate changes to the cpumask upwards in
> > the topology to the master PM domains. In this way, the cpumask for a genpd
> > hierarchically reflects all CPUs attached to the topology below it.
> >
> > Finally, let's make this an opt-in feature, to avoid having to manage CPUs
> > and the cpumask for a genpd that doesn't need it. For that reason, let's
> > add a new genpd configuration bit, GENPD_FLAG_CPU_DOMAIN.
> >
> > Cc: Lina Iyer <[email protected]>
> > Co-developed-by: Lina Iyer <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - Don't allocate the cpumask when not used.
> > - Simplify the code that updates the cpumask.
> > - Document the GENPD_FLAG_CPU_DOMAIN.
> >
> > ---
> > drivers/base/power/domain.c | 66 ++++++++++++++++++++++++++++++++++++-
> > include/linux/pm_domain.h | 13 ++++++++
> > 2 files changed, 78 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index e27b91d36a2a..c3ff8e395308 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -20,6 +20,7 @@
> > #include <linux/sched.h>
> > #include <linux/suspend.h>
> > #include <linux/export.h>
> > +#include <linux/cpu.h>
> >
> > #include "power.h"
> >
> > @@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = {
> > #define genpd_is_irq_safe(genpd) (genpd->flags & GENPD_FLAG_IRQ_SAFE)
> > #define genpd_is_always_on(genpd) (genpd->flags & GENPD_FLAG_ALWAYS_ON)
> > #define genpd_is_active_wakeup(genpd) (genpd->flags & GENPD_FLAG_ACTIVE_WAKEUP)
> > +#define genpd_is_cpu_domain(genpd) (genpd->flags & GENPD_FLAG_CPU_DOMAIN)
> >
> > static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev,
> > const struct generic_pm_domain *genpd)
> > @@ -1377,6 +1379,56 @@ static void genpd_free_dev_data(struct device *dev,
> > dev_pm_put_subsys_data(dev);
> > }
> >
> > +static void __genpd_update_cpumask(struct generic_pm_domain *genpd,
> > + int cpu, bool set, unsigned int depth)
> > +{
> > + struct gpd_link *link;
> > +
> > + if (!genpd_is_cpu_domain(genpd))
> > + return;
>
> With this test, we won't continue updating the cpumask for the other
> masters. Is it done on purpose ?
Correct, and yes it's on purpose.
We are not even allocating the cpumask for the genpd in question,
unless it has the GENPD_FLAG_CPU_DOMAIN set.
[...]
Kind regards
Uffe
On Wed, 19 Dec 2018 at 10:54, Daniel Lezcano <[email protected]> wrote:
>
> On 29/11/2018 18:46, Ulf Hansson wrote:
> > As it's now perfectly possible that a PM domain managed by genpd contains
> > devices belonging to CPUs, we should start to take into account the
> > residency values for the idle states during the state selection process.
> > The residency value specifies the minimum duration of time, the CPU or a
> > group of CPUs, needs to spend in an idle state to not waste energy entering
> > it.
> >
> > To deal with this, let's add a new genpd governor, pm_domain_cpu_gov, that
> > may be used for a PM domain that have CPU devices attached or if the CPUs
> > are attached through subdomains.
> >
> > The new governor computes the minimum expected idle duration time for the
> > online CPUs being attached to the PM domain and its subdomains. Then in the
> > state selection process, trying the deepest state first, it verifies that
> > the idle duration time satisfies the state's residency value.
> >
> > It should be noted that, when computing the minimum expected idle duration
> > time, we use the information from tick_nohz_get_next_wakeup(), to find the
> > next wakeup for the related CPUs. Future wise, this may deserve to be
> > improved, as there are more reasons to why a CPU may be woken up from idle.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Daniel Lezcano <[email protected]>
> > Cc: Lina Iyer <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Co-developed-by: Lina Iyer <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - Fold in patch that extended the new genpd CPU governor to cope with
> > QoS constraints, as to avoid confusion.
> > - Simplified the code according to suggestions from Rafael.
> >
> > ---
> > drivers/base/power/domain_governor.c | 61 +++++++++++++++++++++++++++-
> > include/linux/pm_domain.h | 3 ++
> > 2 files changed, 63 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
> > index 99896fbf18e4..61a7c3c03c98 100644
> > --- a/drivers/base/power/domain_governor.c
> > +++ b/drivers/base/power/domain_governor.c
> > @@ -10,6 +10,9 @@
> > #include <linux/pm_domain.h>
> > #include <linux/pm_qos.h>
> > #include <linux/hrtimer.h>
> > +#include <linux/cpumask.h>
> > +#include <linux/ktime.h>
> > +#include <linux/tick.h>
> >
> > static int dev_update_qos_constraint(struct device *dev, void *data)
> > {
> > @@ -211,8 +214,10 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
> > struct generic_pm_domain *genpd = pd_to_genpd(pd);
> > struct gpd_link *link;
> >
> > - if (!genpd->max_off_time_changed)
> > + if (!genpd->max_off_time_changed) {
> > + genpd->state_idx = genpd->cached_power_down_state_idx;
> > return genpd->cached_power_down_ok;
> > + }
> >
> > /*
> > * We have to invalidate the cached results for the masters, so
> > @@ -237,6 +242,7 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
> > genpd->state_idx--;
> > }
> >
> > + genpd->cached_power_down_state_idx = genpd->state_idx;
> > return genpd->cached_power_down_ok;
> > }
> >
> > @@ -245,6 +251,54 @@ static bool always_on_power_down_ok(struct dev_pm_domain *domain)
> > return false;
> > }
> >
> > +static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > +{
> > + struct generic_pm_domain *genpd = pd_to_genpd(pd);
> > + ktime_t domain_wakeup, cpu_wakeup;
> > + s64 idle_duration_ns;
> > + int cpu, i;
> > +
> > + /* Validate dev PM QoS constraints. */
> > + if (!default_power_down_ok(pd))
> > + return false;
> > +
> > + if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
> > + return true;
>
> Is it possible to have this function called without the
> GENPD_FLAG_CPU_DOMAIN flag set in the genpd?
Theoretically yes, however in practice, probably not.
Do note, if the GENPD_FLAG_CPU_DOMAIN isn't set, then we haven't
allocated the cpumask for the genpd, so then we shouldn't use it.
>
> > + /*
> > + * Find the next wakeup for any of the online CPUs within the PM domain
> > + * and its subdomains. Note, we only need the genpd->cpus, as it already
> > + * contains a mask of all CPUs from subdomains.
> > + */
> > + domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
> > + for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
> > + cpu_wakeup = tick_nohz_get_next_wakeup(cpu);
> > + if (ktime_before(cpu_wakeup, domain_wakeup))
> > + domain_wakeup = cpu_wakeup;
> > + }
> > +
[...]
Kind regards
Uffe
On 29/11/2018 18:46, Ulf Hansson wrote:
> As it's now perfectly possible that a PM domain managed by genpd contains
> devices belonging to CPUs, we should start to take into account the
> residency values for the idle states during the state selection process.
> The residency value specifies the minimum duration of time, the CPU or a
> group of CPUs, needs to spend in an idle state to not waste energy entering
> it.
>
> To deal with this, let's add a new genpd governor, pm_domain_cpu_gov, that
> may be used for a PM domain that have CPU devices attached or if the CPUs
> are attached through subdomains.
>
> The new governor computes the minimum expected idle duration time for the
> online CPUs being attached to the PM domain and its subdomains. Then in the
> state selection process, trying the deepest state first, it verifies that
> the idle duration time satisfies the state's residency value.
>
> It should be noted that, when computing the minimum expected idle duration
> time, we use the information from tick_nohz_get_next_wakeup(), to find the
> next wakeup for the related CPUs. Future wise, this may deserve to be
> improved, as there are more reasons to why a CPU may be woken up from idle.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Daniel Lezcano <[email protected]>
> Cc: Lina Iyer <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - Fold in patch that extended the new genpd CPU governor to cope with
> QoS constraints, as to avoid confusion.
> - Simplified the code according to suggestions from Rafael.
>
> ---
> drivers/base/power/domain_governor.c | 61 +++++++++++++++++++++++++++-
> include/linux/pm_domain.h | 3 ++
> 2 files changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
> index 99896fbf18e4..61a7c3c03c98 100644
> --- a/drivers/base/power/domain_governor.c
> +++ b/drivers/base/power/domain_governor.c
> @@ -10,6 +10,9 @@
> #include <linux/pm_domain.h>
> #include <linux/pm_qos.h>
> #include <linux/hrtimer.h>
> +#include <linux/cpumask.h>
> +#include <linux/ktime.h>
> +#include <linux/tick.h>
>
> static int dev_update_qos_constraint(struct device *dev, void *data)
> {
> @@ -211,8 +214,10 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
> struct generic_pm_domain *genpd = pd_to_genpd(pd);
> struct gpd_link *link;
>
> - if (!genpd->max_off_time_changed)
> + if (!genpd->max_off_time_changed) {
> + genpd->state_idx = genpd->cached_power_down_state_idx;
> return genpd->cached_power_down_ok;
> + }
>
> /*
> * We have to invalidate the cached results for the masters, so
> @@ -237,6 +242,7 @@ static bool default_power_down_ok(struct dev_pm_domain *pd)
> genpd->state_idx--;
> }
>
> + genpd->cached_power_down_state_idx = genpd->state_idx;
> return genpd->cached_power_down_ok;
> }
>
> @@ -245,6 +251,54 @@ static bool always_on_power_down_ok(struct dev_pm_domain *domain)
> return false;
> }
>
> +static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> +{
> + struct generic_pm_domain *genpd = pd_to_genpd(pd);
> + ktime_t domain_wakeup, cpu_wakeup;
> + s64 idle_duration_ns;
> + int cpu, i;
> +
> + /* Validate dev PM QoS constraints. */
> + if (!default_power_down_ok(pd))
> + return false;
> +
> + if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
> + return true;
Is it possible to have this function called without the
GENPD_FLAG_CPU_DOMAIN flag set in the genpd?
> + /*
> + * Find the next wakeup for any of the online CPUs within the PM domain
> + * and its subdomains. Note, we only need the genpd->cpus, as it already
> + * contains a mask of all CPUs from subdomains.
> + */
> + domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
> + for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
> + cpu_wakeup = tick_nohz_get_next_wakeup(cpu);
> + if (ktime_before(cpu_wakeup, domain_wakeup))
> + domain_wakeup = cpu_wakeup;
> + }
> +
> + /* The minimum idle duration is from now - until the next wakeup. */
> + idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, ktime_get()));
> + if (idle_duration_ns <= 0)
> + return false;
> +
> + /*
> + * Find the deepest idle state that has its residency value satisfied
> + * and by also taking into account the power off latency for the state.
> + * Start at the state picked by the dev PM QoS constraint validation.
> + */
> + i = genpd->state_idx;
> + do {
> + if (idle_duration_ns >= (genpd->states[i].residency_ns +
> + genpd->states[i].power_off_latency_ns)) {
> + genpd->state_idx = i;
> + return true;
> + }
> + } while (--i >= 0);
> +
> + return false;
> +}
> +
> struct dev_power_governor simple_qos_governor = {
> .suspend_ok = default_suspend_ok,
> .power_down_ok = default_power_down_ok,
> @@ -257,3 +311,8 @@ struct dev_power_governor pm_domain_always_on_gov = {
> .power_down_ok = always_on_power_down_ok,
> .suspend_ok = default_suspend_ok,
> };
> +
> +struct dev_power_governor pm_domain_cpu_gov = {
> + .suspend_ok = default_suspend_ok,
> + .power_down_ok = cpu_power_down_ok,
> +};
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index 5a4673605d22..969a9b36c0db 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -116,6 +116,7 @@ struct generic_pm_domain {
> s64 max_off_time_ns; /* Maximum allowed "suspended" time. */
> bool max_off_time_changed;
> bool cached_power_down_ok;
> + bool cached_power_down_state_idx;
> int (*attach_dev)(struct generic_pm_domain *domain,
> struct device *dev);
> void (*detach_dev)(struct generic_pm_domain *domain,
> @@ -195,6 +196,7 @@ int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state);
>
> extern struct dev_power_governor simple_qos_governor;
> extern struct dev_power_governor pm_domain_always_on_gov;
> +extern struct dev_power_governor pm_domain_cpu_gov;
> #else
>
> static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev)
> @@ -238,6 +240,7 @@ static inline int dev_pm_genpd_set_performance_state(struct device *dev,
>
> #define simple_qos_governor (*(struct dev_power_governor *)(NULL))
> #define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL))
> +#define pm_domain_cpu_gov (*(struct dev_power_governor *)(NULL))
> #endif
>
> #ifdef CONFIG_PM_GENERIC_DOMAINS_SLEEP
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On 29/11/2018 18:46, Ulf Hansson wrote:
> The CPU's idle state nodes are currently parsed at the common cpuidle DT
> library, but also when initializing back-end data for the arch specific CPU
> operations, as in the PSCI driver case.
>
> To avoid open-coding, let's introduce of_get_cpu_state_node(), which takes
> the device node for the CPU and the index to the requested idle state node,
> as in-parameters. In case a corresponding idle state node is found, it
> returns the node with the refcount incremented for it, else it returns
> NULL.
>
> Moreover, for ARM, there are two generic methods, to describe the CPU's
> idle states, either via the flattened description through the
> "cpu-idle-states" binding [1] or via the hierarchical layout, using the
> "power-domains" and the "domain-idle-states" bindings [2]. Hence, let's
> take both options into account.
>
> [1]
> Documentation/devicetree/bindings/arm/idle-states.txt
> [2]
> Documentation/devicetree/bindings/arm/psci.txt
>
> Cc: Rob Herring <[email protected]>
> Cc: [email protected]
> Cc: Lina Iyer <[email protected]>
> Suggested-by: Sudeep Holla <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> Reviewed-by: Rob Herring <[email protected]>
Nitpicking some kerneldoc formatting below.
Other than that:
Reviewed-by: Daniel Lezcano <[email protected]>
> ---
>
> Changes in v10:
> - None.
>
> ---
> drivers/of/base.c | 35 +++++++++++++++++++++++++++++++++++
> include/linux/of.h | 8 ++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 09692c9b32a7..8f6974a22006 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -429,6 +429,41 @@ int of_cpu_node_to_id(struct device_node *cpu_node)
> }
> EXPORT_SYMBOL(of_cpu_node_to_id);
>
> +/**
> + * of_get_cpu_state_node - Get CPU's idle state node at the given index
> + *
> + * @cpu_node: The device node for the CPU
> + * @index: The index in the list of the idle states
> + *
> + * Two generic methods can be used to describe a CPU's idle states, either via
> + * a flattened description through the "cpu-idle-states" binding or via the
> + * hierarchical layout, using the "power-domains" and the "domain-idle-states"
> + * bindings. This function check for both and returns the idle state node for
> + * the requested index.
> + *
> + * In case and idle state node is found at index, the refcount incremented for
s/and/an/
s/index/@index/
> + * it, so call of_node_put() on it when done. Returns NULL if not found.
The Return description must be in a separate section.
> + */
> +struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
> + int index)
> +{
> + struct of_phandle_args args;
> + int err;
> +
> + err = of_parse_phandle_with_args(cpu_node, "power-domains",
> + "#power-domain-cells", 0, &args);
> + if (!err) {
> + struct device_node *state_node =
> + of_parse_phandle(args.np, "domain-idle-states", index);
> +
> + of_node_put(args.np);
> + return state_node;
> + }
> +
> + return of_parse_phandle(cpu_node, "cpu-idle-states", index);
> +}
> +EXPORT_SYMBOL(of_get_cpu_state_node);
> +
> /**
> * __of_device_is_compatible() - Check if the node matches given constraints
> * @device: pointer to node
> diff --git a/include/linux/of.h b/include/linux/of.h
> index a5aee3c438ad..f9f0c65c095c 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -348,6 +348,8 @@ extern const void *of_get_property(const struct device_node *node,
> int *lenp);
> extern struct device_node *of_get_cpu_node(int cpu, unsigned int *thread);
> extern struct device_node *of_get_next_cpu_node(struct device_node *prev);
> +extern struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
> + int index);
>
> #define for_each_property_of_node(dn, pp) \
> for (pp = dn->properties; pp != NULL; pp = pp->next)
> @@ -762,6 +764,12 @@ static inline struct device_node *of_get_next_cpu_node(struct device_node *prev)
> return NULL;
> }
>
> +static inline struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
> + int index)
> +{
> + return NULL;
> +}
> +
> static inline int of_n_addr_cells(struct device_node *np)
> {
> return 0;
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On 29/11/2018 18:46, Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> Currently CPU's idle states are represented in a flattened model, via the
> "cpu-idle-states" binding from within the CPU's device nodes.
>
> Support the hierarchical layout during parsing and validating of the CPU's
> idle states. This is simply done by calling the new OF helper,
> of_get_cpu_state_node().
>
> Cc: Lina Iyer <[email protected]>
> Suggested-by: Sudeep Holla <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
> ---
>
> Changes in v10:
> - None.
>
> ---
> drivers/cpuidle/dt_idle_states.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpuidle/dt_idle_states.c b/drivers/cpuidle/dt_idle_states.c
> index 53342b7f1010..13f9b7cd32d1 100644
> --- a/drivers/cpuidle/dt_idle_states.c
> +++ b/drivers/cpuidle/dt_idle_states.c
> @@ -118,8 +118,7 @@ static bool idle_state_valid(struct device_node *state_node, unsigned int idx,
> for (cpu = cpumask_next(cpumask_first(cpumask), cpumask);
> cpu < nr_cpu_ids; cpu = cpumask_next(cpu, cpumask)) {
> cpu_node = of_cpu_device_node_get(cpu);
> - curr_state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
> - idx);
> + curr_state_node = of_get_cpu_state_node(cpu_node, idx);
> if (state_node != curr_state_node)
> valid = false;
>
> @@ -176,7 +175,7 @@ int dt_init_idle_driver(struct cpuidle_driver *drv,
> cpu_node = of_cpu_device_node_get(cpumask_first(cpumask));
>
> for (i = 0; ; i++) {
> - state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
> + state_node = of_get_cpu_state_node(cpu_node, i);
> if (!state_node)
> break;
>
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Wed, 19 Dec 2018 at 12:17, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Thu, Nov 29, 2018 at 06:46:57PM +0100, Ulf Hansson wrote:
> > When the hierarchical CPU topology is used and when a CPU has been put
> > offline (hotplug), that same CPU prevents its PM domain and thus also
> > potential master PM domains, from being powered off. This is because genpd
> > observes the CPU's struct device to remain being active from a runtime PM
> > point of view.
> >
> > To deal with this, let's decrease the runtime PM usage count by calling
> > pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
> > offline. Consequentially, we must then increase the runtime PM usage for
> > the CPU, while putting it online again.
> >
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - Make it work when the hierarchical CPU topology is used, which may be
> > used both for OSI and PC mode.
> > - Rework the code to prevent "BUG: sleeping function called from
> > invalid context".
> > ---
> > drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> > index b03bccce0a5d..f62c4963eb62 100644
> > --- a/drivers/firmware/psci/psci.c
> > +++ b/drivers/firmware/psci/psci.c
> > @@ -15,6 +15,7 @@
> >
> > #include <linux/acpi.h>
> > #include <linux/arm-smccc.h>
> > +#include <linux/cpu.h>
> > #include <linux/cpuidle.h>
> > #include <linux/errno.h>
> > #include <linux/linkage.h>
> > @@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
> >
> > static int psci_cpu_off(u32 state)
> > {
> > + struct device *dev;
> > int err;
> > u32 fn;
> >
> > + /*
> > + * When the hierarchical CPU topology is used, decrease the runtime PM
> > + * usage count for the current CPU, as to allow other parts in the
> > + * topology to enter low power states.
> > + */
> > + if (psci_dt_topology) {
> > + dev = get_cpu_device(smp_processor_id());
> > + pm_runtime_put_sync_suspend(dev);
> > + }
> > +
> > fn = psci_function_id[PSCI_FN_CPU_OFF];
> > err = invoke_psci_fn(fn, state, 0, 0);
> > return psci_to_linux_errno(err);
> > @@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
> >
> > static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> > {
> > + struct device *dev;
> > int err;
> > u32 fn;
> >
> > @@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> > err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> > /* Clear the domain state to start fresh. */
> > psci_set_domain_state(0);
> > +
> > + /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
> > + if (!err && psci_dt_topology) {
> > + dev = get_cpu_device(cpuid);
>
> I do not like adding this code in the cpu_{on/off} method themselves, I will
> have a look at the patchset as whole to see how we can restructure it.
Any suggestions are welcome, of course. This was the best and most
simple option I could come up with.
Another option, could be to simply to remove the runtime PM deployment
from psci_cpu_off|on() altogether and just live with that limitation
for now. That works for me as well.
>
> More to the point, using cpuid as a logical cpu id is wrong, it is a
> physical id that you should convert to a logical id through
> get_logical_index().
Oh, didn't know that, thanks for pointing that out!
>
> Lorenzo
>
> > + pm_runtime_get_sync(dev);
> > + }
> > +
> > return psci_to_linux_errno(err);
> > }
> >
> > --
> > 2.17.1
> >
Kind regards
Uffe
On 29/11/2018 18:46, Ulf Hansson wrote:
> Instead of having each psci init function taking care of the of_node_put(),
> let's deal with that from psci_dt_init(), as this enables a bit simpler
> error path for each psci init function.
>
> Cc: Lina Iyer <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> Acked-by: Mark Rutland <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
> ---
>
> Changes in v10:
> - None.
>
> ---
> drivers/firmware/psci/psci.c | 23 ++++++++++-------------
> 1 file changed, 10 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 631e20720a22..6bfa47cbd174 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -609,9 +609,9 @@ static int __init psci_0_2_init(struct device_node *np)
> int err;
>
> err = get_set_conduit_method(np);
> -
> if (err)
> - goto out_put_node;
> + return err;
> +
> /*
> * Starting with v0.2, the PSCI specification introduced a call
> * (PSCI_VERSION) that allows probing the firmware version, so
> @@ -619,11 +619,7 @@ static int __init psci_0_2_init(struct device_node *np)
> * can be carried out according to the specific version reported
> * by firmware
> */
> - err = psci_probe();
> -
> -out_put_node:
> - of_node_put(np);
> - return err;
> + return psci_probe();
> }
>
> /*
> @@ -635,9 +631,8 @@ static int __init psci_0_1_init(struct device_node *np)
> int err;
>
> err = get_set_conduit_method(np);
> -
> if (err)
> - goto out_put_node;
> + return err;
>
> pr_info("Using PSCI v0.1 Function IDs from DT\n");
>
> @@ -661,9 +656,7 @@ static int __init psci_0_1_init(struct device_node *np)
> psci_ops.migrate = psci_migrate;
> }
>
> -out_put_node:
> - of_node_put(np);
> - return err;
> + return 0;
> }
>
> static const struct of_device_id psci_of_match[] __initconst = {
> @@ -678,6 +671,7 @@ int __init psci_dt_init(void)
> struct device_node *np;
> const struct of_device_id *matched_np;
> psci_initcall_t init_fn;
> + int ret;
>
> np = of_find_matching_node_and_match(NULL, psci_of_match, &matched_np);
>
> @@ -685,7 +679,10 @@ int __init psci_dt_init(void)
> return -ENODEV;
>
> init_fn = (psci_initcall_t)matched_np->data;
> - return init_fn(np);
> + ret = init_fn(np);
> +
> + of_node_put(np);
> + return ret;
> }
>
> #ifdef CONFIG_ACPI
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Wed, 19 Dec 2018 at 13:11, Daniel Lezcano <[email protected]> wrote:
>
> On 29/11/2018 18:46, Ulf Hansson wrote:
> > From: Lina Iyer <[email protected]>
> >
> > Currently CPU's idle states are represented in a flattened model, via the
> > "cpu-idle-states" binding from within the CPU's device nodes.
> >
> > Support the hierarchical layout, simply by converting to calling the new OF
> > helper, of_get_cpu_state_node().
> >
> > Cc: Lina Iyer <[email protected]>
> > Suggested-by: Sudeep Holla <[email protected]>
> > Signed-off-by: Lina Iyer <[email protected]>
> > Co-developed-by: Ulf Hansson <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
>
> Fold it with 07/27 ?
I can do that. However, normally we try to keep changes that touches
different subsystems, separate from each other. Of course sometimes
it's not possible and sometimes it just doesn't make sense to separate
changes.
Perhaps the PSCI maintainers and Rafael can give their opinion.
Kind regards
Uffe
>
> >
> > Changes in v10:
> > - None.
> >
> > ---
> > drivers/firmware/psci/psci.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> > index cbfc936d251c..631e20720a22 100644
> > --- a/drivers/firmware/psci/psci.c
> > +++ b/drivers/firmware/psci/psci.c
> > @@ -300,7 +300,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
> > return -ENOMEM;
> >
> > for (i = 0; i < num_state_nodes; i++) {
> > - state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
> > + state_node = of_get_cpu_state_node(cpu_node, i);
> > if (!state_node)
> > break;
> >
> >
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
On Thu, Nov 29, 2018 at 06:46:57PM +0100, Ulf Hansson wrote:
> When the hierarchical CPU topology is used and when a CPU has been put
> offline (hotplug), that same CPU prevents its PM domain and thus also
> potential master PM domains, from being powered off. This is because genpd
> observes the CPU's struct device to remain being active from a runtime PM
> point of view.
>
> To deal with this, let's decrease the runtime PM usage count by calling
> pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
> offline. Consequentially, we must then increase the runtime PM usage for
> the CPU, while putting it online again.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - Make it work when the hierarchical CPU topology is used, which may be
> used both for OSI and PC mode.
> - Rework the code to prevent "BUG: sleeping function called from
> invalid context".
> ---
> drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index b03bccce0a5d..f62c4963eb62 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -15,6 +15,7 @@
>
> #include <linux/acpi.h>
> #include <linux/arm-smccc.h>
> +#include <linux/cpu.h>
> #include <linux/cpuidle.h>
> #include <linux/errno.h>
> #include <linux/linkage.h>
> @@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
>
> static int psci_cpu_off(u32 state)
> {
> + struct device *dev;
> int err;
> u32 fn;
>
> + /*
> + * When the hierarchical CPU topology is used, decrease the runtime PM
> + * usage count for the current CPU, as to allow other parts in the
> + * topology to enter low power states.
> + */
> + if (psci_dt_topology) {
> + dev = get_cpu_device(smp_processor_id());
> + pm_runtime_put_sync_suspend(dev);
> + }
> +
> fn = psci_function_id[PSCI_FN_CPU_OFF];
> err = invoke_psci_fn(fn, state, 0, 0);
> return psci_to_linux_errno(err);
> @@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
>
> static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> {
> + struct device *dev;
> int err;
> u32 fn;
>
> @@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> /* Clear the domain state to start fresh. */
> psci_set_domain_state(0);
> +
> + /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
> + if (!err && psci_dt_topology) {
> + dev = get_cpu_device(cpuid);
I do not like adding this code in the cpu_{on/off} method themselves, I will
have a look at the patchset as whole to see how we can restructure it.
More to the point, using cpuid as a logical cpu id is wrong, it is a
physical id that you should convert to a logical id through
get_logical_index().
Lorenzo
> + pm_runtime_get_sync(dev);
> + }
> +
> return psci_to_linux_errno(err);
> }
>
> --
> 2.17.1
>
On 29/11/2018 18:46, Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> Currently CPU's idle states are represented in a flattened model, via the
> "cpu-idle-states" binding from within the CPU's device nodes.
>
> Support the hierarchical layout, simply by converting to calling the new OF
> helper, of_get_cpu_state_node().
>
> Cc: Lina Iyer <[email protected]>
> Suggested-by: Sudeep Holla <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
Fold it with 07/27 ?
>
> Changes in v10:
> - None.
>
> ---
> drivers/firmware/psci/psci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index cbfc936d251c..631e20720a22 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -300,7 +300,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
> return -ENOMEM;
>
> for (i = 0; i < num_state_nodes; i++) {
> - state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
> + state_node = of_get_cpu_state_node(cpu_node, i);
> if (!state_node)
> break;
>
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On 29/11/2018 18:46, Ulf Hansson wrote:
> PSCI firmware v1.0+, supports two different modes for CPU_SUSPEND. The
> Platform Coordinated mode, which is the default and mandatory mode, while
> support for the OS initiated mode is optional.
>
> This change introduces initial support for the OS initiated mode, in way
> that it adds the related PSCI bits from the spec and prints a message in
> the log to inform whether the mode is supported by the PSCI FW.
>
> Cc: Lina Iyer <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
> ---
>
> Changes in v10:
> - None.
>
> ---
> drivers/firmware/psci/psci.c | 21 ++++++++++++++++++++-
> include/uapi/linux/psci.h | 5 +++++
> 2 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 6bfa47cbd174..4f0cbc95e41b 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -95,6 +95,11 @@ static inline bool psci_has_ext_power_state(void)
> PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
> }
>
> +static inline bool psci_has_osi_support(void)
> +{
> + return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
> +}
> +
> static inline bool psci_power_state_loses_context(u32 state)
> {
> const u32 mask = psci_has_ext_power_state() ?
> @@ -659,10 +664,24 @@ static int __init psci_0_1_init(struct device_node *np)
> return 0;
> }
>
> +static int __init psci_1_0_init(struct device_node *np)
> +{
> + int err;
> +
> + err = psci_0_2_init(np);
> + if (err)
> + return err;
> +
> + if (psci_has_osi_support())
> + pr_info("OSI mode supported.\n");
> +
> + return 0;
> +}
> +
> static const struct of_device_id psci_of_match[] __initconst = {
> { .compatible = "arm,psci", .data = psci_0_1_init},
> { .compatible = "arm,psci-0.2", .data = psci_0_2_init},
> - { .compatible = "arm,psci-1.0", .data = psci_0_2_init},
> + { .compatible = "arm,psci-1.0", .data = psci_1_0_init},
> {},
> };
>
> diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
> index b3bcabe380da..581f72085c33 100644
> --- a/include/uapi/linux/psci.h
> +++ b/include/uapi/linux/psci.h
> @@ -49,6 +49,7 @@
>
> #define PSCI_1_0_FN_PSCI_FEATURES PSCI_0_2_FN(10)
> #define PSCI_1_0_FN_SYSTEM_SUSPEND PSCI_0_2_FN(14)
> +#define PSCI_1_0_FN_SET_SUSPEND_MODE PSCI_0_2_FN(15)
>
> #define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
>
> @@ -97,6 +98,10 @@
> #define PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK \
> (0x1 << PSCI_1_0_FEATURES_CPU_SUSPEND_PF_SHIFT)
>
> +#define PSCI_1_0_OS_INITIATED BIT(0)
> +#define PSCI_1_0_SUSPEND_MODE_PC 0
> +#define PSCI_1_0_SUSPEND_MODE_OSI 1
> +
> /* PSCI return values (inclusive of all PSCI versions) */
> #define PSCI_RET_SUCCESS 0
> #define PSCI_RET_NOT_SUPPORTED -1
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Thu, 20 Dec 2018 at 15:19, Daniel Lezcano <[email protected]> wrote:
>
> On 29/11/2018 18:46, Ulf Hansson wrote:
> > Following changes are about to implement support for PM domains to PSCI.
> > Those changes are mainly going to be implemented in a new separate file,
> > hence a couple of the internal PSCI functions needs to be shared to be
> > accessible. So, let's do that via adding new PSCI header file.
> >
> > Moreover, the changes deploying support for PM domains, needs to be able to
> > switch the PSCI FW into the OS initiated mode. For that reason, let's add a
> > new function that deals with this and share it via the new PSCI header
> > file.
> >
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - New patch. Re-places the earlier patch: "drivers: firmware: psci:
> > Share a few internal PSCI functions".
> >
> > ---
> > drivers/firmware/psci/psci.c | 28 +++++++++++++++++++++-------
> > drivers/firmware/psci/psci.h | 14 ++++++++++++++
> > 2 files changed, 35 insertions(+), 7 deletions(-)
> > create mode 100644 drivers/firmware/psci/psci.h
> >
> > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> > index 8dbcdecc2ae4..623591b541a4 100644
> > --- a/drivers/firmware/psci/psci.c
> > +++ b/drivers/firmware/psci/psci.c
> > @@ -34,6 +34,8 @@
> > #include <asm/smp_plat.h>
> > #include <asm/suspend.h>
> >
> > +#include "psci.h"
> > +
> > /*
> > * While a 64-bit OS can make calls with SMC32 calling conventions, for some
> > * calls it is necessary to use SMC64 to pass or return 64-bit values.
> > @@ -90,23 +92,35 @@ static u32 psci_function_id[PSCI_FN_MAX];
> > static DEFINE_PER_CPU(u32, domain_state);
> > static u32 psci_cpu_suspend_feature;
> >
> > -static inline u32 psci_get_domain_state(void)
> > +u32 psci_get_domain_state(void)
> > {
> > return __this_cpu_read(domain_state);
> > }
> >
> > -static inline void psci_set_domain_state(u32 state)
> > +void psci_set_domain_state(u32 state)
> > {
> > __this_cpu_write(domain_state, state);
> > }
> >
> > +bool psci_set_osi_mode(void)
> > +{
> > + int ret;
> > +
> > + ret = invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> > + PSCI_1_0_SUSPEND_MODE_OSI, 0, 0);
> > + if (ret)
> > + pr_warn("failed to enable OSI mode: %d\n", ret);
> > +
> > + return !ret;
> > +}
>
> Please keep the convention with the error code (0 => success)
>
> In the next patch it can be called:
>
> if (psci_has_osi_support())
> osi_mode_enabled = psci_set_osi_mode() ? false : true;
>
Sure!
> > +
> > static inline bool psci_has_ext_power_state(void)
> > {
> > return psci_cpu_suspend_feature &
> > PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
> > }
> >
> > -static inline bool psci_has_osi_support(void)
> > +bool psci_has_osi_support(void)
> > {
> > return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
> > }
> > @@ -285,10 +299,7 @@ static int __init psci_features(u32 psci_func_id)
> > psci_func_id, 0, 0);
> > }
> >
> > -#ifdef CONFIG_CPU_IDLE
> > -static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
> > -
> > -static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> > +int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> > {
> > int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
> >
> > @@ -305,6 +316,9 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_CPU_IDLE
>
> It would be nicer if you can remove the CONFIG_CPU_IDLE by replacing it
> with a specific one (eg. CONFIG_PSCI_IDLE) and make it depend on
> CONFIG_CPU_IDLE, so the config options stay contained in their
> respective subsystems directory.
I am all for simplifying the Kconfig options in here, as indeed it's
rather messy. However, I would rather avoid folding in additional
cleanup changes to this series, is already extensive enough.
Would you be okay if we deal with that on top?
[...]
Kind regards
Uffe
On 20/12/2018 16:41, Ulf Hansson wrote:
> On Thu, 20 Dec 2018 at 15:09, Daniel Lezcano <[email protected]> wrote:
>>
>> On 29/11/2018 18:46, Ulf Hansson wrote:
>>> To enable the OS initiated mode, the CPU topology needs to be described
>>> using the hierarchical model in DT. When used, the idle state bits for the
>>> CPU are created by ORing the bits for PM domain's idle state.
>>>
>>> Let's prepare the PSCI driver to deal with this, via introducing a per CPU
>>> variable called domain_state and by adding internal helpers to read/write
>>> the value of the variable.
>>
>> What are the domain states ? What values can they have ?
>
> The existing psci_power_state, also defined as a per cpu variable,
> contains fixed values reflecting the corresponding
> arm,psci-suspend-param for the idle state in question.
>
> This isn't sufficient, when using the hierarchical CPU topology in DT
> and when OSI mode is supported, because of the way we vote with the
> PSCI CPU suspend parameter. Parts of this parameter shall inform about
> what state to allow for the cluster, while other parts tells the state
> for the CPU.
>
> The new "domain states" per CPU variable, gets dynamically changed
> when actively used by following patches that implements the PSCI PM
> domain support. Depending on what state the PM domain picks, the genpd
> ->power_off() callback sets a new "domain states" value, reflecting
> the state for the cluster.
>
> Does it makes sense? If you like, I can try to update the changelog to
> clarify this?
Yes, it makes sense. May be give a pointer or an information about the
parameter encoding in addition to the description above can help.
>>> Cc: Lina Iyer <[email protected]>
>>> Co-developed-by: Lina Iyer <[email protected]>
>>> Signed-off-by: Ulf Hansson <[email protected]>
>>> ---
>>>
>>> Changes in v10:
>>> - Use __this_cpu_read|write() rather than this_cpu_read|write().
>>>
>>> ---
>>> drivers/firmware/psci/psci.c | 26 ++++++++++++++++++++++----
>>> 1 file changed, 22 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
>>> index 4f0cbc95e41b..8dbcdecc2ae4 100644
>>> --- a/drivers/firmware/psci/psci.c
>>> +++ b/drivers/firmware/psci/psci.c
>>> @@ -87,8 +87,19 @@ static u32 psci_function_id[PSCI_FN_MAX];
>>> (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
>>> PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
>>>
>>> +static DEFINE_PER_CPU(u32, domain_state);
>>> static u32 psci_cpu_suspend_feature;
>>>
>>> +static inline u32 psci_get_domain_state(void)
>>> +{
>>> + return __this_cpu_read(domain_state);
>>> +}
>>> +
>>> +static inline void psci_set_domain_state(u32 state)
>>> +{
>>> + __this_cpu_write(domain_state, state);
>>> +}
>>> +
>>> static inline bool psci_has_ext_power_state(void)
>>> {
>>> return psci_cpu_suspend_feature &
>>> @@ -187,6 +198,8 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
>>>
>>> fn = psci_function_id[PSCI_FN_CPU_ON];
>>> err = invoke_psci_fn(fn, cpuid, entry_point, 0);
>>> + /* Clear the domain state to start fresh. */
>>> + psci_set_domain_state(0);
>>> return psci_to_linux_errno(err);
>>
>> I think this change is ambiguous:
>>
>> - if it is a change of the state because of the cpu_on, then I was
>> expecting a similar change in cpu_off and the change only if
>> invoke_psci_fn() succeeds.
>
> You are right. This rather belongs to patch 24, as its intent is to
> deal with CPU hotplug.
>
>>
>> - if it is a change to take opportunity of the code path to initialize
>> the domain state, I suggest to remove it from there and make it very
>> explicit with static DEFINE_PER_CPU(u32, domain_state) = { 0 };
>
> We shouldn't need to explicitly set static variables to zero, as that
> should be managed by the compiler.
Yeah, that was the purpose of the *very explicit* words, that is tell
the reader, the initialization relies on the static variables being set
to zero.
> Let me simply remove the call to psci_set_domain_state(0) and instead
> consider it for patch 24.
Yes, sure.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On 29/11/2018 18:46, Ulf Hansson wrote:
> To enable the OS initiated mode, the CPU topology needs to be described
> using the hierarchical model in DT. When used, the idle state bits for the
> CPU are created by ORing the bits for PM domain's idle state.
>
> Let's prepare the PSCI driver to deal with this, via introducing a per CPU
> variable called domain_state and by adding internal helpers to read/write
> the value of the variable.
What are the domain states ? What values can they have ?
> Cc: Lina Iyer <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - Use __this_cpu_read|write() rather than this_cpu_read|write().
>
> ---
> drivers/firmware/psci/psci.c | 26 ++++++++++++++++++++++----
> 1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 4f0cbc95e41b..8dbcdecc2ae4 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -87,8 +87,19 @@ static u32 psci_function_id[PSCI_FN_MAX];
> (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
> PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
>
> +static DEFINE_PER_CPU(u32, domain_state);
> static u32 psci_cpu_suspend_feature;
>
> +static inline u32 psci_get_domain_state(void)
> +{
> + return __this_cpu_read(domain_state);
> +}
> +
> +static inline void psci_set_domain_state(u32 state)
> +{
> + __this_cpu_write(domain_state, state);
> +}
> +
> static inline bool psci_has_ext_power_state(void)
> {
> return psci_cpu_suspend_feature &
> @@ -187,6 +198,8 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
>
> fn = psci_function_id[PSCI_FN_CPU_ON];
> err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> + /* Clear the domain state to start fresh. */
> + psci_set_domain_state(0);
> return psci_to_linux_errno(err);
I think this change is ambiguous:
- if it is a change of the state because of the cpu_on, then I was
expecting a similar change in cpu_off and the change only if
invoke_psci_fn() succeeds.
- if it is a change to take opportunity of the code path to initialize
the domain state, I suggest to remove it from there and make it very
explicit with static DEFINE_PER_CPU(u32, domain_state) = { 0 };
> }
>
> @@ -409,15 +422,17 @@ int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
> static int psci_suspend_finisher(unsigned long index)
> {
> u32 *state = __this_cpu_read(psci_power_state);
> + u32 composite_state = state[index - 1] | psci_get_domain_state();
>
> - return psci_ops.cpu_suspend(state[index - 1],
> - __pa_symbol(cpu_resume));
> + return psci_ops.cpu_suspend(composite_state, __pa_symbol(cpu_resume));
> }
>
> int psci_cpu_suspend_enter(unsigned long index)
> {
> int ret;
> u32 *state = __this_cpu_read(psci_power_state);
> + u32 composite_state = state[index - 1] | psci_get_domain_state();
> +
> /*
> * idle state index 0 corresponds to wfi, should never be called
> * from the cpu_suspend operations
> @@ -425,11 +440,14 @@ int psci_cpu_suspend_enter(unsigned long index)
> if (WARN_ON_ONCE(!index))
> return -EINVAL;
>
> - if (!psci_power_state_loses_context(state[index - 1]))
> - ret = psci_ops.cpu_suspend(state[index - 1], 0);
> + if (!psci_power_state_loses_context(composite_state))
> + ret = psci_ops.cpu_suspend(composite_state, 0);
> else
> ret = cpu_suspend(index, psci_suspend_finisher);
>
> + /* Clear the domain state to start fresh when back from idle. */
> + psci_set_domain_state(0);
> +
> return ret;
> }
>
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On 29/11/2018 18:46, Ulf Hansson wrote:
> Following changes are about to implement support for PM domains to PSCI.
> Those changes are mainly going to be implemented in a new separate file,
> hence a couple of the internal PSCI functions needs to be shared to be
> accessible. So, let's do that via adding new PSCI header file.
>
> Moreover, the changes deploying support for PM domains, needs to be able to
> switch the PSCI FW into the OS initiated mode. For that reason, let's add a
> new function that deals with this and share it via the new PSCI header
> file.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - New patch. Re-places the earlier patch: "drivers: firmware: psci:
> Share a few internal PSCI functions".
>
> ---
> drivers/firmware/psci/psci.c | 28 +++++++++++++++++++++-------
> drivers/firmware/psci/psci.h | 14 ++++++++++++++
> 2 files changed, 35 insertions(+), 7 deletions(-)
> create mode 100644 drivers/firmware/psci/psci.h
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 8dbcdecc2ae4..623591b541a4 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -34,6 +34,8 @@
> #include <asm/smp_plat.h>
> #include <asm/suspend.h>
>
> +#include "psci.h"
> +
> /*
> * While a 64-bit OS can make calls with SMC32 calling conventions, for some
> * calls it is necessary to use SMC64 to pass or return 64-bit values.
> @@ -90,23 +92,35 @@ static u32 psci_function_id[PSCI_FN_MAX];
> static DEFINE_PER_CPU(u32, domain_state);
> static u32 psci_cpu_suspend_feature;
>
> -static inline u32 psci_get_domain_state(void)
> +u32 psci_get_domain_state(void)
> {
> return __this_cpu_read(domain_state);
> }
>
> -static inline void psci_set_domain_state(u32 state)
> +void psci_set_domain_state(u32 state)
> {
> __this_cpu_write(domain_state, state);
> }
>
> +bool psci_set_osi_mode(void)
> +{
> + int ret;
> +
> + ret = invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> + PSCI_1_0_SUSPEND_MODE_OSI, 0, 0);
> + if (ret)
> + pr_warn("failed to enable OSI mode: %d\n", ret);
> +
> + return !ret;
> +}
Please keep the convention with the error code (0 => success)
In the next patch it can be called:
if (psci_has_osi_support())
osi_mode_enabled = psci_set_osi_mode() ? false : true;
> +
> static inline bool psci_has_ext_power_state(void)
> {
> return psci_cpu_suspend_feature &
> PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
> }
>
> -static inline bool psci_has_osi_support(void)
> +bool psci_has_osi_support(void)
> {
> return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
> }
> @@ -285,10 +299,7 @@ static int __init psci_features(u32 psci_func_id)
> psci_func_id, 0, 0);
> }
>
> -#ifdef CONFIG_CPU_IDLE
> -static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
> -
> -static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> +int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> {
> int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
>
> @@ -305,6 +316,9 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> return 0;
> }
>
> +#ifdef CONFIG_CPU_IDLE
It would be nicer if you can remove the CONFIG_CPU_IDLE by replacing it
with a specific one (eg. CONFIG_PSCI_IDLE) and make it depend on
CONFIG_CPU_IDLE, so the config options stay contained in their
respective subsystems directory.
> +static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
> +
> static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
> struct device_node *cpu_node, int cpu)
> {
> diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> new file mode 100644
> index 000000000000..7d9d38fd57e1
> --- /dev/null
> +++ b/drivers/firmware/psci/psci.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSCI_H
> +#define __PSCI_H
> +
> +struct device_node;
> +
> +bool psci_set_osi_mode(void);
> +u32 psci_get_domain_state(void);
> +void psci_set_domain_state(u32 state);
> +bool psci_has_osi_support(void);
> +int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> +
> +#endif /* __PSCI_H */
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On 29/11/2018 18:46, Ulf Hansson wrote:
> When the hierarchical CPU topology layout is used in DT, we need to setup
> the corresponding PM domain data structures, as to allow a CPU and a group
> of CPUs to be power managed accordingly. Let's enable this by deploying
> support through the genpd interface.
>
> Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> also parse the domain idle states DT bindings as to make genpd responsible
> for the state selection, when the states are compatible with
> "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> supported, we rely solely on the state selection to be managed through the
> regular cpuidle framework.
>
> If the initialization of the PM domain data structures succeeds and the OS
> initiated mode is supported, we try to switch to it. In case it fails,
> let's fall back into a degraded mode, rather than bailing out and returning
> an error code.
>
> Due to that the OS initiated mode may become enabled, we need to adjust to
> maintain backwards compatibility for a kernel started through a kexec call.
> Do this by explicitly switch to Platform Coordinated mode during boot.
>
> To try to initiate the PM domain data structures, the PSCI driver shall
> call the new function, psci_dt_init_pm_domains(). However, this is done
> from following changes.
>
> Cc: Lina Iyer <[email protected]>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in V10:
> - Enable the PM domains to be used for both PC and OSI mode.
> - Fixup error paths.
> - Move the management of kexec started kernels into this patch.
> - Rewrite changelog.
>
> ---
> drivers/firmware/psci/Makefile | 2 +-
> drivers/firmware/psci/psci.c | 7 +-
> drivers/firmware/psci/psci.h | 6 +
> drivers/firmware/psci/psci_pm_domain.c | 262 +++++++++++++++++++++++++
> 4 files changed, 275 insertions(+), 2 deletions(-)
> create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>
> diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
> index 1956b882470f..ff300f1fec86 100644
> --- a/drivers/firmware/psci/Makefile
> +++ b/drivers/firmware/psci/Makefile
> @@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
> #
> -obj-$(CONFIG_ARM_PSCI_FW) += psci.o
> +obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
Same comment as 17/27.
+obj-$(CONFIG_PSCI_IDLE) += psci_pm_domain.o
> obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 623591b541a4..19af2093151b 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -704,9 +704,14 @@ static int __init psci_1_0_init(struct device_node *np)
> if (err)
> return err;
>
> - if (psci_has_osi_support())
> + if (psci_has_osi_support()) {
> pr_info("OSI mode supported.\n");
>
> + /* Make sure we default to PC mode. */
> + invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> + PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
> + }
> +
> return 0;
> }
>
> diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> index 7d9d38fd57e1..8cf6d7206fab 100644
> --- a/drivers/firmware/psci/psci.h
> +++ b/drivers/firmware/psci/psci.h
> @@ -11,4 +11,10 @@ void psci_set_domain_state(u32 state);
> bool psci_has_osi_support(void);
> int psci_dt_parse_state_node(struct device_node *np, u32 *state);
>
> +#ifdef CONFIG_CPU_IDLE
Same comment as 17/27 for the config option.
> +int psci_dt_init_pm_domains(struct device_node *np);
> +#else
> +static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> +#endif
> +
> #endif /* __PSCI_H */
> diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> new file mode 100644
> index 000000000000..d0dc38e96f85
> --- /dev/null
> +++ b/drivers/firmware/psci/psci_pm_domain.c
> @@ -0,0 +1,262 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PM domains for CPUs via genpd - managed by PSCI.
> + *
> + * Copyright (C) 2018 Linaro Ltd.
> + * Author: Ulf Hansson <[email protected]>
> + *
> + */
> +
> +#define pr_fmt(fmt) "psci: " fmt
> +
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/pm_domain.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +
> +#include "psci.h"
> +
> +#ifdef CONFIG_CPU_IDLE
Same comment as 17/27 for the config option. This condition should go away.
> +struct psci_pd_provider {
> + struct list_head link;
> + struct device_node *node;
> +};
> +
> +static LIST_HEAD(psci_pd_providers);
> +static bool osi_mode_enabled;
> +
> +static int psci_pd_power_off(struct generic_pm_domain *pd)
> +{
> + struct genpd_power_state *state = &pd->states[pd->state_idx];
> + u32 *pd_state;
> + u32 composite_pd_state;
> +
> + /* If we have failed to enable OSI mode, then abort power off. */
> + if (psci_has_osi_support() && !osi_mode_enabled)
> + return -EBUSY;
I'm not sure EBUSY is the best error code to describe this situation.
May be ENOTSUP ?
However, how possible is it to pass in this function if the OSI mode was
not enabled ?
> + if (!state->data)
> + return 0;
> +
> + /* When OSI mode is enabled, set the corresponding domain state. */
> + pd_state = state->data;
> + composite_pd_state = *pd_state | psci_get_domain_state();
> + psci_set_domain_state(composite_pd_state);
> +
> + return 0;
> +}
> +
> +static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
> + int state_count)
__init ?
> +{
> + int i, ret;
> + u32 psci_state, *psci_state_buf;
> +
> + for (i = 0; i < state_count; i++) {
> + ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
> + &psci_state);
> + if (ret)
> + goto free_state;
> +
> + psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
> + if (!psci_state_buf) {
> + ret = -ENOMEM;
> + goto free_state;
> + }
> + *psci_state_buf = psci_state;
> + states[i].data = psci_state_buf;
> + }
> +
> + return 0;
> +
> +free_state:
> + while (i >= 0) {
> + kfree(states[i].data);
> + i--;
> + }
for (; i >= 0; i--)
> + return ret;
> +}
> +
> +static int psci_pd_parse_states(struct device_node *np,
> + struct genpd_power_state **states, int *state_count)
__init ?
> +{
> + int ret;
> +
> + /* Parse the domain idle states. */
> + ret = of_genpd_parse_idle_states(np, states, state_count);
> + if (ret)
> + return ret;
> +
> + /* Fill out the PSCI specifics for each found state. */
> + ret = psci_pd_parse_state_nodes(*states, *state_count);
> + if (ret)
> + kfree(*states);
> +
> + return ret;
> +}
> +
> +static int psci_pd_init(struct device_node *np)
__init ?
> +{
> + struct generic_pm_domain *pd;
> + struct psci_pd_provider *pd_provider;
> + struct dev_power_governor *pd_gov;
> + struct genpd_power_state *states = NULL;
> + int i, ret = -ENOMEM, state_count = 0;
> +
> + pd = kzalloc(sizeof(*pd), GFP_KERNEL);
> + if (!pd)
> + goto out;
> +
> + pd_provider = kzalloc(sizeof(*pd_provider), GFP_KERNEL);
> + if (!pd_provider)
> + goto free_pd;
> +
> + pd->name = kasprintf(GFP_KERNEL, "%pOF", np);
> + if (!pd->name)
> + goto free_pd_prov;
> +
> + /*
> + * For OSI mode, parse the domain idle states and let genpd manage the
> + * state selection for those being compatible with "domain-idle-state".
> + */
> + if (psci_has_osi_support()) {
> + ret = psci_pd_parse_states(np, &states, &state_count);
> + if (ret)
> + goto free_name;
> + }
> +
> + pd->name = kbasename(pd->name);
> + pd->power_off = psci_pd_power_off;
> + pd->states = states;
> + pd->state_count = state_count;
> + pd->flags |= GENPD_FLAG_IRQ_SAFE | GENPD_FLAG_CPU_DOMAIN;
> +
> + /* Use governor for CPU PM domains if it has some states to manage. */
> + pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL;
> +
> + ret = pm_genpd_init(pd, pd_gov, false);
> + if (ret)
> + goto free_state;
> +
> + ret = of_genpd_add_provider_simple(np, pd);
> + if (ret)
> + goto remove_pd;
> +
> + pd_provider->node = of_node_get(np);
> + list_add(&pd_provider->link, &psci_pd_providers);
> +
> + pr_debug("init PM domain %s\n", pd->name);
> + return 0;
> +
> +remove_pd:
> + pm_genpd_remove(pd);
> +free_state:
> + for (i = 0; i < state_count; i++)
> + kfree(states[i].data);
> + kfree(states);
> +free_name:
> + kfree(pd->name);
> +free_pd_prov:
> + kfree(pd_provider);
> +free_pd:
> + kfree(pd);
> +out:
> + pr_err("failed to init PM domain ret=%d %pOF\n", ret, np);
> + return ret;
> +}
> +
> +static void psci_pd_remove(void)
> +{
> + struct psci_pd_provider *pd_provider, *it;
> + struct generic_pm_domain *genpd;
> + int i;
> +
> + list_for_each_entry_safe(pd_provider, it, &psci_pd_providers, link) {
> + of_genpd_del_provider(pd_provider->node);
> +
> + genpd = of_genpd_remove_last(pd_provider->node);
> + if (!IS_ERR(genpd)) {
> + for (i = 0; i < genpd->state_count; i++)
> + kfree(genpd->states[i].data);
> + kfree(genpd->states);
> + kfree(genpd);
> + }
> +
> + of_node_put(pd_provider->node);
> + list_del(&pd_provider->link);
> + kfree(pd_provider);
> + }
> +}
> +
> +static int psci_pd_init_topology(struct device_node *np)
__init ?
> +{
> + struct device_node *node;
> + struct of_phandle_args child, parent;
> + int ret;
> +
> + for_each_child_of_node(np, node) {
> + if (of_parse_phandle_with_args(node, "power-domains",
> + "#power-domain-cells", 0, &parent))
> + continue;
> +
> + child.np = node;
> + child.args_count = 0;
> +
> + ret = of_genpd_add_subdomain(&parent, &child);
> + of_node_put(parent.np);
> + if (ret) {
> + of_node_put(node);
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> +int psci_dt_init_pm_domains(struct device_node *np)
__init ?
> +{
> + struct device_node *node;
> + int ret, pd_count = 0;
> +
> + /*
> + * Parse child nodes for the "#power-domain-cells" property and
> + * initialize a genpd/genpd-of-provider pair when it's found.
> + */
> + for_each_child_of_node(np, node) {
> + if (!of_find_property(node, "#power-domain-cells", NULL))
> + continue;
> +
> + ret = psci_pd_init(node);
> + if (ret)
> + goto put_node;
> +
> + pd_count++;
> + }
> +
> + /* Bail out if not using the hierarchical CPU topology. */
> + if (!pd_count)
> + return 0;
> +
> + /* Link genpd masters/subdomains to model the CPU topology. */
> + ret = psci_pd_init_topology(np);
> + if (ret)
> + goto remove_pd;
> +
> + /* Try to enable OSI mode if supported. */
> + if (psci_has_osi_support())
> + osi_mode_enabled = psci_set_osi_mode();
> +
> + pr_info("Initialized CPU PM domain topology\n");
> + return pd_count;
> +
> +put_node:
> + of_node_put(node);
> +remove_pd:
> + if (pd_count)
> + psci_pd_remove();
> + pr_err("failed to create CPU PM domains ret=%d\n", ret);
> + return ret;
> +}
> +#endif
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Thu, 20 Dec 2018 at 15:09, Daniel Lezcano <[email protected]> wrote:
>
> On 29/11/2018 18:46, Ulf Hansson wrote:
> > To enable the OS initiated mode, the CPU topology needs to be described
> > using the hierarchical model in DT. When used, the idle state bits for the
> > CPU are created by ORing the bits for PM domain's idle state.
> >
> > Let's prepare the PSCI driver to deal with this, via introducing a per CPU
> > variable called domain_state and by adding internal helpers to read/write
> > the value of the variable.
>
> What are the domain states ? What values can they have ?
The existing psci_power_state, also defined as a per cpu variable,
contains fixed values reflecting the corresponding
arm,psci-suspend-param for the idle state in question.
This isn't sufficient, when using the hierarchical CPU topology in DT
and when OSI mode is supported, because of the way we vote with the
PSCI CPU suspend parameter. Parts of this parameter shall inform about
what state to allow for the cluster, while other parts tells the state
for the CPU.
The new "domain states" per CPU variable, gets dynamically changed
when actively used by following patches that implements the PSCI PM
domain support. Depending on what state the PM domain picks, the genpd
->power_off() callback sets a new "domain states" value, reflecting
the state for the cluster.
Does it makes sense? If you like, I can try to update the changelog to
clarify this?
>
> > Cc: Lina Iyer <[email protected]>
> > Co-developed-by: Lina Iyer <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - Use __this_cpu_read|write() rather than this_cpu_read|write().
> >
> > ---
> > drivers/firmware/psci/psci.c | 26 ++++++++++++++++++++++----
> > 1 file changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> > index 4f0cbc95e41b..8dbcdecc2ae4 100644
> > --- a/drivers/firmware/psci/psci.c
> > +++ b/drivers/firmware/psci/psci.c
> > @@ -87,8 +87,19 @@ static u32 psci_function_id[PSCI_FN_MAX];
> > (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
> > PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
> >
> > +static DEFINE_PER_CPU(u32, domain_state);
> > static u32 psci_cpu_suspend_feature;
> >
> > +static inline u32 psci_get_domain_state(void)
> > +{
> > + return __this_cpu_read(domain_state);
> > +}
> > +
> > +static inline void psci_set_domain_state(u32 state)
> > +{
> > + __this_cpu_write(domain_state, state);
> > +}
> > +
> > static inline bool psci_has_ext_power_state(void)
> > {
> > return psci_cpu_suspend_feature &
> > @@ -187,6 +198,8 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> >
> > fn = psci_function_id[PSCI_FN_CPU_ON];
> > err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> > + /* Clear the domain state to start fresh. */
> > + psci_set_domain_state(0);
> > return psci_to_linux_errno(err);
>
> I think this change is ambiguous:
>
> - if it is a change of the state because of the cpu_on, then I was
> expecting a similar change in cpu_off and the change only if
> invoke_psci_fn() succeeds.
You are right. This rather belongs to patch 24, as its intent is to
deal with CPU hotplug.
>
> - if it is a change to take opportunity of the code path to initialize
> the domain state, I suggest to remove it from there and make it very
> explicit with static DEFINE_PER_CPU(u32, domain_state) = { 0 };
We shouldn't need to explicitly set static variables to zero, as that
should be managed by the compiler.
Let me simply remove the call to psci_set_domain_state(0) and instead
consider it for patch 24.
[...]
Thanks for reviewing!
Kind regards
Uffe
On 20/12/2018 16:49, Ulf Hansson wrote:
[ ... ]
>>> +#ifdef CONFIG_CPU_IDLE
>>
>> It would be nicer if you can remove the CONFIG_CPU_IDLE by replacing it
>> with a specific one (eg. CONFIG_PSCI_IDLE) and make it depend on
>> CONFIG_CPU_IDLE, so the config options stay contained in their
>> respective subsystems directory.
>
> I am all for simplifying the Kconfig options in here, as indeed it's
> rather messy. However, I would rather avoid folding in additional
> cleanup changes to this series, is already extensive enough.
>
> Would you be okay if we deal with that on top?
IMO, there are patches in this series which can be grouped into a
cleanup + set the scene patchset and merged immediately. An option
similar to ARM_SCMI_POWER_DOMAIN can be part of it.
However, if you swear you will do the change after and sign with your
blood, I'm fine with that 0:)
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
[...]
> > -obj-$(CONFIG_ARM_PSCI_FW) += psci.o
> > +obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
>
> Same comment as 17/27.
>
> +obj-$(CONFIG_PSCI_IDLE) += psci_pm_domain.o
Let's discuss that in patch 17 to agree on the way forward.
[...]
> > +static int psci_pd_power_off(struct generic_pm_domain *pd)
> > +{
> > + struct genpd_power_state *state = &pd->states[pd->state_idx];
> > + u32 *pd_state;
> > + u32 composite_pd_state;
> > +
> > + /* If we have failed to enable OSI mode, then abort power off. */
> > + if (psci_has_osi_support() && !osi_mode_enabled)
> > + return -EBUSY;
>
> I'm not sure EBUSY is the best error code to describe this situation.
> May be ENOTSUP ?
I see your point. However, -EBUSY is the correct code for this case as
it tells genpd that the PM domain could not power off, but needs to
stay powered on.
To be clear, genpd treat -EBUSY, in a bit special way.
>
> However, how possible is it to pass in this function if the OSI mode was
> not enabled ?
If we fail to enable OSI mode, we keep having the CPUs attached to
PSCI PM PM domains. Although, as we are then running in platform
coordinated mode, there is no point allowing a cluster idle state.
The option would be to convert to the flattened model in case we fail
to enable OSI mode, but that error path is going to be rather
complicated (we need to detach CPUs, unregister domains and providers,
etc). Instead the assumption is that it simply shouldn't fail when we
decide to try. If it does, it's likely something need to be fixed
anyways.
>
> > + if (!state->data)
> > + return 0;
> > +
> > + /* When OSI mode is enabled, set the corresponding domain state. */
> > + pd_state = state->data;
> > + composite_pd_state = *pd_state | psci_get_domain_state();
> > + psci_set_domain_state(composite_pd_state);
> > +
> > + return 0;
> > +}
> > +
> > +static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
> > + int state_count)
>
> __init ?
Yes, probably. I will check and see if/where it could makes sense.
Thanks for pointing it out!
[...]
Kind regards
Uffe
On Thu, 20 Dec 2018 at 19:06, Daniel Lezcano <[email protected]> wrote:
>
> On 20/12/2018 16:49, Ulf Hansson wrote:
>
> [ ... ]
>
> >>> +#ifdef CONFIG_CPU_IDLE
> >>
> >> It would be nicer if you can remove the CONFIG_CPU_IDLE by replacing it
> >> with a specific one (eg. CONFIG_PSCI_IDLE) and make it depend on
> >> CONFIG_CPU_IDLE, so the config options stay contained in their
> >> respective subsystems directory.
> >
> > I am all for simplifying the Kconfig options in here, as indeed it's
> > rather messy. However, I would rather avoid folding in additional
> > cleanup changes to this series, is already extensive enough.
> >
> > Would you be okay if we deal with that on top?
>
> IMO, there are patches in this series which can be grouped into a
> cleanup + set the scene patchset and merged immediately. An option
> similar to ARM_SCMI_POWER_DOMAIN can be part of it.
I certainly agree to that. The tricky is, to know what pieces people
are happy with to go in. :-)
Earlier, in v9 I tried your suggested approach (kind of), but then
Lorenzo stated that it's kind of all or nothing. Maybe we can bring up
that discussion again with him and see what we can figure out.
>
> However, if you swear you will do the change after and sign with your
> blood, I'm fine with that 0:)
>
Whatever it takes!
Anyway, as stated, the reason why I want to tackle that on top, is
that I don't want to make the series more extensive than it already
is.
Agree?
Kind regards
Uffe
On 20/12/2018 22:37, Ulf Hansson wrote:
> On Thu, 20 Dec 2018 at 19:06, Daniel Lezcano <[email protected]> wrote:
>>
>> On 20/12/2018 16:49, Ulf Hansson wrote:
>>
>> [ ... ]
>>
>>>>> +#ifdef CONFIG_CPU_IDLE
>>>>
>>>> It would be nicer if you can remove the CONFIG_CPU_IDLE by replacing it
>>>> with a specific one (eg. CONFIG_PSCI_IDLE) and make it depend on
>>>> CONFIG_CPU_IDLE, so the config options stay contained in their
>>>> respective subsystems directory.
>>>
>>> I am all for simplifying the Kconfig options in here, as indeed it's
>>> rather messy. However, I would rather avoid folding in additional
>>> cleanup changes to this series, is already extensive enough.
>>>
>>> Would you be okay if we deal with that on top?
>>
>> IMO, there are patches in this series which can be grouped into a
>> cleanup + set the scene patchset and merged immediately. An option
>> similar to ARM_SCMI_POWER_DOMAIN can be part of it.
>
> I certainly agree to that. The tricky is, to know what pieces people
> are happy with to go in. :-)
>
> Earlier, in v9 I tried your suggested approach (kind of), but then
> Lorenzo stated that it's kind of all or nothing. Maybe we can bring up
> that discussion again with him and see what we can figure out.
>
>>
>> However, if you swear you will do the change after and sign with your
>> blood, I'm fine with that 0:)
>>
>
> Whatever it takes!
>
> Anyway, as stated, the reason why I want to tackle that on top, is
> that I don't want to make the series more extensive than it already
> is.
>
> Agree?
Yes
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> Over the years this series have been iterated and discussed at various Linux
> conferences and LKML. In this new v10, a quite significant amount of changes
> have been made to address comments from v8 and v9. A summary is available
> below, although let's start with a brand new clarification of the motivation
> behind this series.
I would like to raise few points, not blockers as such but need to be
discussed and resolved before proceeding further.
1. CPU Idle Retention states
- How will be deal with flattening (which brings back the DT bindings,
i.e. do we have all we need) ? Because today there are no users of
this binding yet. I know we all agreed and added after LPC2017 but
I am not convinced about flattening with only valid states.
- Will domain governor ensure not to enter deeper idles states based
on its sub-domain states. E.g.: when CPUs are in retention, so
called container/cluster domain can enter retention or below and not
power off states.
- Is the case of not calling cpu_pm_{enter,exit} handled now ?
2. Now that we have SDM845 which may soon have platform co-ordinated idle
support in mainline, I *really* would like to see some power comparison
numbers(i.e. PC without cluster idle states). This has been the main theme
for most of the discussion on this topic for years and now we are close
to have some platform, we need to try.
3. Also, after adding such complexity, we really need a platform with an
option to build and upgrade firmware easily. This will help to prevent
this being not maintained for long without a platform to test, also
avoid adding lots of quirks to deal with broken firmware so that newer
platforms deal those issues in the firmware correctly.
--
Regards,
Sudeep
On Tuesday, December 18, 2018 12:53:28 PM CET Ulf Hansson wrote:
> On Tue, 18 Dec 2018 at 11:39, Daniel Lezcano <[email protected]> wrote:
> >
> > On 29/11/2018 18:46, Ulf Hansson wrote:
> > > Let's add a data pointer to the genpd_power_state struct, to allow a genpd
> > > backend driver to store per state specific data. In order to introduce the
> > > pointer, we also need to adopt how genpd frees the allocated data for the
> > > default genpd_power_state struct, that it may allocate at pm_genpd_init().
> > >
> > > More precisely, let's use an internal genpd flag to understand when the
> > > states needs to be freed by genpd. When freeing the states data in
> > > genpd_remove(), let's also clear the corresponding genpd->states pointer
> > > and reset the genpd->state_count. In this way, a genpd backend driver
> > > becomes aware of when there is state specific data for it to free.
> > >
> > > Cc: Lina Iyer <[email protected]>
> > > Co-developed-by: Lina Iyer <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes in v10:
> > > - Update the patch allow backend drivers to free the states specific
> > > data during genpd removal. Due to this added complexity, I decided to
> > > keep the patch separate, rather than fold it into the patch that makes
> > > use of the new void pointer, which was suggested by Rafael.
> > > - Claim authorship of the patch as lots of changes has been done since
> > > the original pick up from Lina Iyer.
> > >
> > > ---
> > > drivers/base/power/domain.c | 8 ++++++--
> > > include/linux/pm_domain.h | 3 ++-
> > > 2 files changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > index 7f38a92b444a..e27b91d36a2a 100644
> > > --- a/drivers/base/power/domain.c
> > > +++ b/drivers/base/power/domain.c
> > > @@ -1620,7 +1620,7 @@ static int genpd_set_default_power_state(struct generic_pm_domain *genpd)
> > >
> > > genpd->states = state;
> > > genpd->state_count = 1;
> > > - genpd->free = state;
> > > + genpd->free_state = true;
> > >
> > > return 0;
> > > }
> > > @@ -1736,7 +1736,11 @@ static int genpd_remove(struct generic_pm_domain *genpd)
> > > list_del(&genpd->gpd_list_node);
> > > genpd_unlock(genpd);
> > > cancel_work_sync(&genpd->power_off_work);
> > > - kfree(genpd->free);
> > > + if (genpd->free_state) {
> > > + kfree(genpd->states);
> > > + genpd->states = NULL;
> > > + genpd->state_count = 0;
> >
> > Why these two initializations? After genpd_remove, this structure
> > shouldn't be used anymore, no ?
>
> Correct.
>
> >
> > > + }
> >
> > Instead of a flag, replacing the 'free' pointer to a 'free' callback
> > will allow to keep the free path self-encapsulated in domain.c
> >
> > genpd->free(genpd->states);
>
> Right, I get your idea and it makes sense. Let me convert to that.
OK, so I'm expecting an update here.
On Wednesday, December 19, 2018 11:02:05 AM CET Ulf Hansson wrote:
> On Wed, 19 Dec 2018 at 10:53, Daniel Lezcano <[email protected]> wrote:
> >
> > On 29/11/2018 18:46, Ulf Hansson wrote:
> > > To enable a device belonging to a CPU to be attached to a PM domain managed
> > > by genpd, let's do a few changes to it, as to make it convenient to manage
> > > the specifics around CPUs.
> > >
> > > To be able to quickly find out what CPUs that are attached to a genpd,
> > > which typically becomes useful from a genpd governor as following changes
> > > is about to show, let's add a cpumask to the struct generic_pm_domain. At
> > > the point when a CPU device gets attached to a genpd, let's update its
> > > cpumask. Moreover, let's also propagate changes to the cpumask upwards in
> > > the topology to the master PM domains. In this way, the cpumask for a genpd
> > > hierarchically reflects all CPUs attached to the topology below it.
> > >
> > > Finally, let's make this an opt-in feature, to avoid having to manage CPUs
> > > and the cpumask for a genpd that doesn't need it. For that reason, let's
> > > add a new genpd configuration bit, GENPD_FLAG_CPU_DOMAIN.
> > >
> > > Cc: Lina Iyer <[email protected]>
> > > Co-developed-by: Lina Iyer <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes in v10:
> > > - Don't allocate the cpumask when not used.
> > > - Simplify the code that updates the cpumask.
> > > - Document the GENPD_FLAG_CPU_DOMAIN.
> > >
> > > ---
> > > drivers/base/power/domain.c | 66 ++++++++++++++++++++++++++++++++++++-
> > > include/linux/pm_domain.h | 13 ++++++++
> > > 2 files changed, 78 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > index e27b91d36a2a..c3ff8e395308 100644
> > > --- a/drivers/base/power/domain.c
> > > +++ b/drivers/base/power/domain.c
> > > @@ -20,6 +20,7 @@
> > > #include <linux/sched.h>
> > > #include <linux/suspend.h>
> > > #include <linux/export.h>
> > > +#include <linux/cpu.h>
> > >
> > > #include "power.h"
> > >
> > > @@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = {
> > > #define genpd_is_irq_safe(genpd) (genpd->flags & GENPD_FLAG_IRQ_SAFE)
> > > #define genpd_is_always_on(genpd) (genpd->flags & GENPD_FLAG_ALWAYS_ON)
> > > #define genpd_is_active_wakeup(genpd) (genpd->flags & GENPD_FLAG_ACTIVE_WAKEUP)
> > > +#define genpd_is_cpu_domain(genpd) (genpd->flags & GENPD_FLAG_CPU_DOMAIN)
> > >
> > > static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev,
> > > const struct generic_pm_domain *genpd)
> > > @@ -1377,6 +1379,56 @@ static void genpd_free_dev_data(struct device *dev,
> > > dev_pm_put_subsys_data(dev);
> > > }
> > >
> > > +static void __genpd_update_cpumask(struct generic_pm_domain *genpd,
> > > + int cpu, bool set, unsigned int depth)
> > > +{
> > > + struct gpd_link *link;
> > > +
> > > + if (!genpd_is_cpu_domain(genpd))
> > > + return;
> >
> > With this test, we won't continue updating the cpumask for the other
> > masters. Is it done on purpose ?
>
> Correct, and yes it's on purpose.
>
> We are not even allocating the cpumask for the genpd in question,
> unless it has the GENPD_FLAG_CPU_DOMAIN set.
So this patch is generally fine by me. You can add my ACK to it when
submitting it again, unless it is changed, so I know I've seen it
already.
On Thursday, November 29, 2018 6:46:36 PM CET Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> Knowing the sleep duration of CPUs, is known to be needed while selecting
> the most energy efficient idle state for a CPU or a group of CPUs.
>
> However, to be able to compute the sleep duration, we need to know at what
> time the next expected wakeup is for the CPU. Therefore, let's export this
> information via a new function, tick_nohz_get_next_wakeup(). Following
> changes make use of it.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Daniel Lezcano <[email protected]>
> Cc: Lina Iyer <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v10:
> - Updated function header of tick_nohz_get_next_wakeup().
>
> ---
> include/linux/tick.h | 8 ++++++++
> kernel/time/tick-sched.c | 13 +++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 55388ab45fd4..e48f6b26b425 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -125,6 +125,7 @@ extern bool tick_nohz_idle_got_tick(void);
> extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
> extern unsigned long tick_nohz_get_idle_calls(void);
> extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
> +extern ktime_t tick_nohz_get_next_wakeup(int cpu);
> extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
>
> @@ -151,6 +152,13 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
> *delta_next = TICK_NSEC;
> return *delta_next;
> }
> +
> +static inline ktime_t tick_nohz_get_next_wakeup(int cpu)
> +{
> + /* Next wake up is the tick period, assume it starts now */
> + return ktime_add(ktime_get(), TICK_NSEC);
> +}
> +
> static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
> static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 69e673b88474..7a9166506503 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1089,6 +1089,19 @@ unsigned long tick_nohz_get_idle_calls(void)
> return ts->idle_calls;
> }
>
> +/**
> + * tick_nohz_get_next_wakeup - return the next wake up of the CPU
> + * @cpu: the particular CPU to get next wake up for
> + *
> + * Called for idle CPUs only.
> + */
> +ktime_t tick_nohz_get_next_wakeup(int cpu)
> +{
> + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
> +
> + return dev->next_event;
> +}
> +
> static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> {
> #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>
Well, I have concerns regarding this one.
I don't believe it is valid to call this new function for non-idle CPUs and
the kerneldoc kind of says so, but the next patch doesn't actually prevent
it from being called for a non-idle CPU (at the time it is called in there
the target CPU may not be idle any more AFAICS).
In principle, the cpuidle core can store this value, say in struct
cpuidle_device of the given CPU, and expose a helper to access it from
genpd, but that would be extra overhead totally unnecessary on everthing
that doesn't use genpd for cpuidle.
So maybe the driver could store it in its ->enter callback? After all,
the driver knows that genpd is going to be used later.
On Monday, December 17, 2018 5:12:54 PM CET Ulf Hansson wrote:
> Rafael, Sudeep, Lorenzo, Mark,
>
> On Thu, 29 Nov 2018 at 18:47, Ulf Hansson <[email protected]> wrote:
> >
> > Over the years this series have been iterated and discussed at various Linux
> > conferences and LKML. In this new v10, a quite significant amount of changes
> > have been made to address comments from v8 and v9. A summary is available
> > below, although let's start with a brand new clarification of the motivation
> > behind this series.
> >
> > For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
> > From a CPU idle state perspective, this means some states may be shared among a
> > group of CPUs (aka CPU cluster).
> >
> > To deal with idle management of a group of CPUs, sometimes the kernel needs to
> > be involved to manage the last-man standing algorithm, simply because it can't
> > rely solely on power management FWs to deal with this. Depending on the
> > platform, of course.
> >
> > There are a couple of typical scenarios for when the kernel needs to be in
> > control, dealing with synchronization of when the last CPU in a cluster is about
> > to enter a deep idle state.
> >
> > 1)
> > The kernel needs to carry out so called last-man activities before the
> > CPU cluster can enter a deep idle state. This may for example involve to
> > configure external logics for wakeups, as the GIC may no longer be functional
> > once a deep cluster idle state have been entered. Likewise, these operations
> > may need to be restored, when the first CPU wakes up.
> >
> > 2)
> > Other more generic I/O devices, such as an MMC controller for example, may be a
> > part of the same power domain as the CPU cluster, due to a shared power-rail.
> > For these scenarios, when the MMC controller is in use dealing with an MMC
> > request, a deeper idle state of the CPU cluster may needs to be temporarily
> > disabled. This is needed to retain the MMC controller in a functional state,
> > else it may loose its register-context in the middle of serving a request.
> >
> > In this series, we are extending the generic PM domain (aka genpd) to be used
> > for also CPU devices. Hence the goal is to re-use much of its current code to
> > help us manage the last-man standing synchronization. Moreover, as we already
> > use genpd to model power domains for generic I/O devices, both 1) and 2) can be
> > address with its help.
> >
> > Moreover, to address these problems for ARM64 DT based platforms, we are
> > deploying support for genpd and runtime PM to the PSCI FW driver - and finally
> > we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
> > layout.
> >
> > The series has been tested on the QCOM 410c dragonboard and the Hisilicon Hikey
> > board. You may also find the code at:
> >
> > git.linaro.org/people/ulf.hansson/linux-pm.git next
>
> It's soon been three weeks since I posted this and I would really
> appreciate some feedback.
>
> Rafael, I need your feedback on patch 1->4.
Sorry for the delay, I've replied to the patches.
The bottom line is that the mechanism introduced in patch 3 and used
in patch 4 doesn't look particularly clean to me.
Cheers,
Rafael
On Fri, 11 Jan 2019 at 12:07, Rafael J. Wysocki <[email protected]> wrote:
>
> On Thursday, November 29, 2018 6:46:36 PM CET Ulf Hansson wrote:
> > From: Lina Iyer <[email protected]>
> >
> > Knowing the sleep duration of CPUs, is known to be needed while selecting
> > the most energy efficient idle state for a CPU or a group of CPUs.
> >
> > However, to be able to compute the sleep duration, we need to know at what
> > time the next expected wakeup is for the CPU. Therefore, let's export this
> > information via a new function, tick_nohz_get_next_wakeup(). Following
> > changes make use of it.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Daniel Lezcano <[email protected]>
> > Cc: Lina Iyer <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Signed-off-by: Lina Iyer <[email protected]>
> > Co-developed-by: Ulf Hansson <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v10:
> > - Updated function header of tick_nohz_get_next_wakeup().
> >
> > ---
> > include/linux/tick.h | 8 ++++++++
> > kernel/time/tick-sched.c | 13 +++++++++++++
> > 2 files changed, 21 insertions(+)
> >
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 55388ab45fd4..e48f6b26b425 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -125,6 +125,7 @@ extern bool tick_nohz_idle_got_tick(void);
> > extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
> > extern unsigned long tick_nohz_get_idle_calls(void);
> > extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
> > +extern ktime_t tick_nohz_get_next_wakeup(int cpu);
> > extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> >
> > @@ -151,6 +152,13 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
> > *delta_next = TICK_NSEC;
> > return *delta_next;
> > }
> > +
> > +static inline ktime_t tick_nohz_get_next_wakeup(int cpu)
> > +{
> > + /* Next wake up is the tick period, assume it starts now */
> > + return ktime_add(ktime_get(), TICK_NSEC);
> > +}
> > +
> > static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
> > static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 69e673b88474..7a9166506503 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1089,6 +1089,19 @@ unsigned long tick_nohz_get_idle_calls(void)
> > return ts->idle_calls;
> > }
> >
> > +/**
> > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU
> > + * @cpu: the particular CPU to get next wake up for
> > + *
> > + * Called for idle CPUs only.
> > + */
> > +ktime_t tick_nohz_get_next_wakeup(int cpu)
> > +{
> > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
> > +
> > + return dev->next_event;
> > +}
> > +
> > static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> > {
> > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> >
>
> Well, I have concerns regarding this one.
>
> I don't believe it is valid to call this new function for non-idle CPUs and
> the kerneldoc kind of says so, but the next patch doesn't actually prevent
> it from being called for a non-idle CPU (at the time it is called in there
> the target CPU may not be idle any more AFAICS).
You are correct, but let me clarify things.
We are calling this new API from the new genpd governor, which may
have a cpumask indicating there is more than one CPU attached to its
PM domain+sub-PM domains. In other words, we may call the API for
another CPU than the one we are executing on.
When the new genpd governor is called, all CPUs in the cpumask of the
genpd in question, are already runtime suspended and will remain so
throughout the decisions made by the governor.
However, because of the race condition, which needs to be manged by
the genpd backend driver and its corresponding FW, one of the CPU in
the genpd cpumask could potentially wake up from idle when the genpd
governor runs. However, as a part of exiting from idle, that CPU needs
to wait for the call to pm_runtime_get_sync() to return before
completing the exit patch of idle. This also means waiting for the
genpd governor to finish.
The point is, no matter what decision the governor takes under these
circumstances, the genpd backend driver and its FW must manage this
race condition during the last man standing. For PSCI OSI mode, it
means that if a cluster idle state is suggested by Linux during these
circumstances, it must be prevented and aborted.
>
> In principle, the cpuidle core can store this value, say in struct
> cpuidle_device of the given CPU, and expose a helper to access it from
> genpd, but that would be extra overhead totally unnecessary on everthing
> that doesn't use genpd for cpuidle.
>
> So maybe the driver could store it in its ->enter callback? After all,
> the driver knows that genpd is going to be used later.
This would work, but it wouldn't really change much when it comes to
the race condition described above. Of course it would turn the code
into being more cpuidle specific, which seems reasonable to me.
Anyway, if I understand your suggestion, in principle it means
changing $subject patch in such way that the API should not take "int
cpu" as an in-parameter, but instead only use __this_cpu() to read out
the next event for current idle CPU.
Additionally, we need another new cpuidle API, which genpd can call to
retrieve a new per CPU "next event data" stored by the cpuidle driver
from its ->enter() callback. Is this a correct interpretation of your
suggestion?
Kind regards
Uffe
On Wed, Jan 16, 2019 at 8:58 AM Ulf Hansson <[email protected]> wrote:
>
> On Fri, 11 Jan 2019 at 12:07, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thursday, November 29, 2018 6:46:36 PM CET Ulf Hansson wrote:
> > > From: Lina Iyer <[email protected]>
> > >
> > > Knowing the sleep duration of CPUs, is known to be needed while selecting
> > > the most energy efficient idle state for a CPU or a group of CPUs.
> > >
> > > However, to be able to compute the sleep duration, we need to know at what
> > > time the next expected wakeup is for the CPU. Therefore, let's export this
> > > information via a new function, tick_nohz_get_next_wakeup(). Following
> > > changes make use of it.
> > >
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Daniel Lezcano <[email protected]>
> > > Cc: Lina Iyer <[email protected]>
> > > Cc: Frederic Weisbecker <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Signed-off-by: Lina Iyer <[email protected]>
> > > Co-developed-by: Ulf Hansson <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes in v10:
> > > - Updated function header of tick_nohz_get_next_wakeup().
> > >
> > > ---
> > > include/linux/tick.h | 8 ++++++++
> > > kernel/time/tick-sched.c | 13 +++++++++++++
> > > 2 files changed, 21 insertions(+)
> > >
> > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > index 55388ab45fd4..e48f6b26b425 100644
> > > --- a/include/linux/tick.h
> > > +++ b/include/linux/tick.h
> > > @@ -125,6 +125,7 @@ extern bool tick_nohz_idle_got_tick(void);
> > > extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
> > > extern unsigned long tick_nohz_get_idle_calls(void);
> > > extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
> > > +extern ktime_t tick_nohz_get_next_wakeup(int cpu);
> > > extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > > extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > >
> > > @@ -151,6 +152,13 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
> > > *delta_next = TICK_NSEC;
> > > return *delta_next;
> > > }
> > > +
> > > +static inline ktime_t tick_nohz_get_next_wakeup(int cpu)
> > > +{
> > > + /* Next wake up is the tick period, assume it starts now */
> > > + return ktime_add(ktime_get(), TICK_NSEC);
> > > +}
> > > +
> > > static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
> > > static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > >
> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > index 69e673b88474..7a9166506503 100644
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -1089,6 +1089,19 @@ unsigned long tick_nohz_get_idle_calls(void)
> > > return ts->idle_calls;
> > > }
> > >
> > > +/**
> > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU
> > > + * @cpu: the particular CPU to get next wake up for
> > > + *
> > > + * Called for idle CPUs only.
> > > + */
> > > +ktime_t tick_nohz_get_next_wakeup(int cpu)
> > > +{
> > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
> > > +
> > > + return dev->next_event;
> > > +}
> > > +
> > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> > > {
> > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> > >
> >
> > Well, I have concerns regarding this one.
> >
> > I don't believe it is valid to call this new function for non-idle CPUs and
> > the kerneldoc kind of says so, but the next patch doesn't actually prevent
> > it from being called for a non-idle CPU (at the time it is called in there
> > the target CPU may not be idle any more AFAICS).
>
> You are correct, but let me clarify things.
>
> We are calling this new API from the new genpd governor, which may
> have a cpumask indicating there is more than one CPU attached to its
> PM domain+sub-PM domains. In other words, we may call the API for
> another CPU than the one we are executing on.
>
> When the new genpd governor is called, all CPUs in the cpumask of the
> genpd in question, are already runtime suspended and will remain so
> throughout the decisions made by the governor.
>
> However, because of the race condition, which needs to be manged by
> the genpd backend driver and its corresponding FW, one of the CPU in
> the genpd cpumask could potentially wake up from idle when the genpd
> governor runs. However, as a part of exiting from idle, that CPU needs
> to wait for the call to pm_runtime_get_sync() to return before
> completing the exit patch of idle. This also means waiting for the
> genpd governor to finish.
OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off.
> The point is, no matter what decision the governor takes under these
> circumstances, the genpd backend driver and its FW must manage this
> race condition during the last man standing. For PSCI OSI mode, it
> means that if a cluster idle state is suggested by Linux during these
> circumstances, it must be prevented and aborted.
I would suggest putting a comment to explain that somewhere as it is
not really obvious.
> >
> > In principle, the cpuidle core can store this value, say in struct
> > cpuidle_device of the given CPU, and expose a helper to access it from
> > genpd, but that would be extra overhead totally unnecessary on everthing
> > that doesn't use genpd for cpuidle.
> >
> > So maybe the driver could store it in its ->enter callback? After all,
> > the driver knows that genpd is going to be used later.
>
> This would work, but it wouldn't really change much when it comes to
> the race condition described above.
No, it wouldn't make the race go away.
> Of course it would turn the code
> into being more cpuidle specific, which seems reasonable to me.
>
> Anyway, if I understand your suggestion, in principle it means
> changing $subject patch in such way that the API should not take "int
> cpu" as an in-parameter, but instead only use __this_cpu() to read out
> the next event for current idle CPU.
Yes.
> Additionally, we need another new cpuidle API, which genpd can call to
> retrieve a new per CPU "next event data" stored by the cpuidle driver
> from its ->enter() callback. Is this a correct interpretation of your
> suggestion?
Yes, it is.
Generally, something like "cpuidle, give me the wakeup time of this
CPU". And it may very well give you 0 if the CPU has woken up
already. :-)
On Wed, 16 Jan 2019 at 11:59, Rafael J. Wysocki <[email protected]> wrote:
>
> On Wed, Jan 16, 2019 at 8:58 AM Ulf Hansson <[email protected]> wrote:
> >
> > On Fri, 11 Jan 2019 at 12:07, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thursday, November 29, 2018 6:46:36 PM CET Ulf Hansson wrote:
> > > > From: Lina Iyer <[email protected]>
> > > >
> > > > Knowing the sleep duration of CPUs, is known to be needed while selecting
> > > > the most energy efficient idle state for a CPU or a group of CPUs.
> > > >
> > > > However, to be able to compute the sleep duration, we need to know at what
> > > > time the next expected wakeup is for the CPU. Therefore, let's export this
> > > > information via a new function, tick_nohz_get_next_wakeup(). Following
> > > > changes make use of it.
> > > >
> > > > Cc: Thomas Gleixner <[email protected]>
> > > > Cc: Daniel Lezcano <[email protected]>
> > > > Cc: Lina Iyer <[email protected]>
> > > > Cc: Frederic Weisbecker <[email protected]>
> > > > Cc: Ingo Molnar <[email protected]>
> > > > Signed-off-by: Lina Iyer <[email protected]>
> > > > Co-developed-by: Ulf Hansson <[email protected]>
> > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > ---
> > > >
> > > > Changes in v10:
> > > > - Updated function header of tick_nohz_get_next_wakeup().
> > > >
> > > > ---
> > > > include/linux/tick.h | 8 ++++++++
> > > > kernel/time/tick-sched.c | 13 +++++++++++++
> > > > 2 files changed, 21 insertions(+)
> > > >
> > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > index 55388ab45fd4..e48f6b26b425 100644
> > > > --- a/include/linux/tick.h
> > > > +++ b/include/linux/tick.h
> > > > @@ -125,6 +125,7 @@ extern bool tick_nohz_idle_got_tick(void);
> > > > extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
> > > > extern unsigned long tick_nohz_get_idle_calls(void);
> > > > extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
> > > > +extern ktime_t tick_nohz_get_next_wakeup(int cpu);
> > > > extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > > > extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > > >
> > > > @@ -151,6 +152,13 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
> > > > *delta_next = TICK_NSEC;
> > > > return *delta_next;
> > > > }
> > > > +
> > > > +static inline ktime_t tick_nohz_get_next_wakeup(int cpu)
> > > > +{
> > > > + /* Next wake up is the tick period, assume it starts now */
> > > > + return ktime_add(ktime_get(), TICK_NSEC);
> > > > +}
> > > > +
> > > > static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
> > > > static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > > >
> > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > > index 69e673b88474..7a9166506503 100644
> > > > --- a/kernel/time/tick-sched.c
> > > > +++ b/kernel/time/tick-sched.c
> > > > @@ -1089,6 +1089,19 @@ unsigned long tick_nohz_get_idle_calls(void)
> > > > return ts->idle_calls;
> > > > }
> > > >
> > > > +/**
> > > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU
> > > > + * @cpu: the particular CPU to get next wake up for
> > > > + *
> > > > + * Called for idle CPUs only.
> > > > + */
> > > > +ktime_t tick_nohz_get_next_wakeup(int cpu)
> > > > +{
> > > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
> > > > +
> > > > + return dev->next_event;
> > > > +}
> > > > +
> > > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> > > > {
> > > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> > > >
> > >
> > > Well, I have concerns regarding this one.
> > >
> > > I don't believe it is valid to call this new function for non-idle CPUs and
> > > the kerneldoc kind of says so, but the next patch doesn't actually prevent
> > > it from being called for a non-idle CPU (at the time it is called in there
> > > the target CPU may not be idle any more AFAICS).
> >
> > You are correct, but let me clarify things.
> >
> > We are calling this new API from the new genpd governor, which may
> > have a cpumask indicating there is more than one CPU attached to its
> > PM domain+sub-PM domains. In other words, we may call the API for
> > another CPU than the one we are executing on.
> >
> > When the new genpd governor is called, all CPUs in the cpumask of the
> > genpd in question, are already runtime suspended and will remain so
> > throughout the decisions made by the governor.
> >
> > However, because of the race condition, which needs to be manged by
> > the genpd backend driver and its corresponding FW, one of the CPU in
> > the genpd cpumask could potentially wake up from idle when the genpd
> > governor runs. However, as a part of exiting from idle, that CPU needs
> > to wait for the call to pm_runtime_get_sync() to return before
> > completing the exit patch of idle. This also means waiting for the
> > genpd governor to finish.
>
> OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off.
Correct.
This is the part that is not very nice, but ideally it should be a
rather rare condition as it only happens during the last man standing
point.
>
> > The point is, no matter what decision the governor takes under these
> > circumstances, the genpd backend driver and its FW must manage this
> > race condition during the last man standing. For PSCI OSI mode, it
> > means that if a cluster idle state is suggested by Linux during these
> > circumstances, it must be prevented and aborted.
>
> I would suggest putting a comment to explain that somewhere as it is
> not really obvious.
Let me see if can squeeze in that somewhere, probably it's best suited
in the new genpd governor code somewhere.
>
> > >
> > > In principle, the cpuidle core can store this value, say in struct
> > > cpuidle_device of the given CPU, and expose a helper to access it from
> > > genpd, but that would be extra overhead totally unnecessary on everthing
> > > that doesn't use genpd for cpuidle.
> > >
> > > So maybe the driver could store it in its ->enter callback? After all,
> > > the driver knows that genpd is going to be used later.
> >
> > This would work, but it wouldn't really change much when it comes to
> > the race condition described above.
>
> No, it wouldn't make the race go away.
>
> > Of course it would turn the code
> > into being more cpuidle specific, which seems reasonable to me.
> >
> > Anyway, if I understand your suggestion, in principle it means
> > changing $subject patch in such way that the API should not take "int
> > cpu" as an in-parameter, but instead only use __this_cpu() to read out
> > the next event for current idle CPU.
>
> Yes.
>
> > Additionally, we need another new cpuidle API, which genpd can call to
> > retrieve a new per CPU "next event data" stored by the cpuidle driver
> > from its ->enter() callback. Is this a correct interpretation of your
> > suggestion?
>
> Yes, it is.
Thanks for confirming!
>
> Generally, something like "cpuidle, give me the wakeup time of this
> CPU". And it may very well give you 0 if the CPU has woken up
> already. :-)
Yep, I was thinking something like that, so in principle it may
minimize the window of receiving in-correct "next wakeup data" in
genpd for a non-idle CPU, but again it doesn't solve the race
condition.
Alright, I re-spin this according to your suggestions. Thanks for reviewing!
Kind regards
Uffe
On Thu, 3 Jan 2019 at 13:06, Sudeep Holla <[email protected]> wrote:
>
> On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> > Over the years this series have been iterated and discussed at various Linux
> > conferences and LKML. In this new v10, a quite significant amount of changes
> > have been made to address comments from v8 and v9. A summary is available
> > below, although let's start with a brand new clarification of the motivation
> > behind this series.
>
> I would like to raise few points, not blockers as such but need to be
> discussed and resolved before proceeding further.
> 1. CPU Idle Retention states
> - How will be deal with flattening (which brings back the DT bindings,
> i.e. do we have all we need) ? Because today there are no users of
> this binding yet. I know we all agreed and added after LPC2017 but
> I am not convinced about flattening with only valid states.
Not exactly sure I understand what you are concerned about here. When
it comes to users of the new DT binding, I am converting two new
platforms in this series to use of it.
Note, the flattened model is still a valid option to describe the CPU
idle states after these changes. Especially when there are no last man
standing activities to manage by Linux and no shared resource that
need to prevent cluster idle states, when it's active.
> - Will domain governor ensure not to enter deeper idles states based
> on its sub-domain states. E.g.: when CPUs are in retention, so
> called container/cluster domain can enter retention or below and not
> power off states.
I have tried to point this out as a known limitation in genpd of the
current series, possibly I have failed to communicate that clearly.
Anyway, I fully agree that this needs to be addressed in a future
step.
Note that, this isn't a specific limitation to how idle states are
selected for CPUs and CPU clusters by genpd, but is rather a
limitation to any hierarchical PM domain topology managed by genpd
that has multiple idle states.
Do note, I already started hacking on this and intend to post patches
on top of this series, as these changes isn't needed for those two
ARM64 platforms I have deployed support for.
> - Is the case of not calling cpu_pm_{enter,exit} handled now ?
It is still called, so no changes in regards to that as apart of this series.
When it comes to actually manage the "last man activities" as part of
selecting an idle state of the cluster, that is going to be addressed
on top as "optimizations".
In principle we should not need to call cpu_pm_enter|exit() in the
idle path at all, but rather only cpu_cluster_pm_enter|exit() when a
cluster idle state is selected. That should improve latency when
selecting an idle state for a CPU. However, to reach that point
additional changes are needed in various drivers, such as the gic
driver for example.
>
> 2. Now that we have SDM845 which may soon have platform co-ordinated idle
> support in mainline, I *really* would like to see some power comparison
> numbers(i.e. PC without cluster idle states). This has been the main theme
> for most of the discussion on this topic for years and now we are close
> to have some platform, we need to try.
I have quite recently been talking to Qcom folkz about this as well,
but no commitments are made.
Although I fully agree that some comparison would be great, it still
doesn't matter much, as we anyway need to support PSCI OSI mode in
Linux. Lorenzo have agreed to this as well.
>
> 3. Also, after adding such complexity, we really need a platform with an
> option to build and upgrade firmware easily. This will help to prevent
> this being not maintained for long without a platform to test, also
> avoid adding lots of quirks to deal with broken firmware so that newer
> platforms deal those issues in the firmware correctly.
I don't see how this series change anything from what we already have
today with the PSCI FW. No matter of OSI or PC mode is used, there are
complexity involved.
Although, of course I agree with you, that we should continue to try
to convince ARM vendors about moving to the public version of ATF and
avoid proprietary FW binaries as much as possible.
Kind regards
Uffe
On Wed, Jan 16, 2019 at 10:10:08AM +0100, Ulf Hansson wrote:
> On Thu, 3 Jan 2019 at 13:06, Sudeep Holla <[email protected]> wrote:
> >
> > On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> > > Over the years this series have been iterated and discussed at various Linux
> > > conferences and LKML. In this new v10, a quite significant amount of changes
> > > have been made to address comments from v8 and v9. A summary is available
> > > below, although let's start with a brand new clarification of the motivation
> > > behind this series.
> >
> > I would like to raise few points, not blockers as such but need to be
> > discussed and resolved before proceeding further.
> > 1. CPU Idle Retention states
> > - How will be deal with flattening (which brings back the DT bindings,
> > i.e. do we have all we need) ? Because today there are no users of
> > this binding yet. I know we all agreed and added after LPC2017 but
> > I am not convinced about flattening with only valid states.
>
> Not exactly sure I understand what you are concerned about here. When
> it comes to users of the new DT binding, I am converting two new
> platforms in this series to use of it.
>
Yes that's exactly my concern. So if someone updates DT(since it's part
of the kernel still), but don't update the firmware(for complexity reasons)
the end result on those platform is broken CPUIdle which is a regression/
feature break and that's what I am objecting here.
> Note, the flattened model is still a valid option to describe the CPU
> idle states after these changes. Especially when there are no last man
> standing activities to manage by Linux and no shared resource that
> need to prevent cluster idle states, when it's active.
Since OSI vs PC is discoverable, we shouldn't tie up with DT in anyway.
>
> > - Will domain governor ensure not to enter deeper idles states based
> > on its sub-domain states. E.g.: when CPUs are in retention, so
> > called container/cluster domain can enter retention or below and not
> > power off states.
>
> I have tried to point this out as a known limitation in genpd of the
> current series, possibly I have failed to communicate that clearly.
> Anyway, I fully agree that this needs to be addressed in a future
> step.
>
Sorry, I might have missed to read. The point is if we are sacrificing
few retention states with this new feature, I am sure PC would perform
better that OSI on platforms which has retention states. Another
reason for having comparison data or we should simply assume and state
clearly OSI may perform bad on such system until the support is added.
> Note that, this isn't a specific limitation to how idle states are
> selected for CPUs and CPU clusters by genpd, but is rather a
> limitation to any hierarchical PM domain topology managed by genpd
> that has multiple idle states.
>
Agreed, but with flattened mode we compile the list of valid states so
the limitation is automatically eliminated.
> Do note, I already started hacking on this and intend to post patches
> on top of this series, as these changes isn't needed for those two
> ARM64 platforms I have deployed support for.
>
Good to know.
> > - Is the case of not calling cpu_pm_{enter,exit} handled now ?
>
> It is still called, so no changes in regards to that as apart of this series.
>
OK, so I assume for now we are not going to support retention states with OSI
for now ?
> When it comes to actually manage the "last man activities" as part of
> selecting an idle state of the cluster, that is going to be addressed
> on top as "optimizations".
>
OK
> In principle we should not need to call cpu_pm_enter|exit() in the
> idle path at all,
Not sure if we can do that. We need to notify things like PMU, FP, GIC
which have per cpu context too and not just "cluster" context.
> but rather only cpu_cluster_pm_enter|exit() when a cluster idle state is
> selected.
We need to avoid relying on concept of "cluster" and just think of power
domains and what's hanging on those domains. Sorry for naive question, but
does genpd have concept of notifiers. I do understand that it's more
bottom up approach where each entity in genpd saves the context and requests
to enter a particular state. But with CPU devices like GIC/VFP/PMU, it
needs to be more top down approach where CPU genpd has to enter a enter
so it notifies the devices attached to it to save it's context. Not ideal
but that's current solution. Because with the new DT bindings, platforms
can express if PMU/GIC is in per cpu domain or any pd in the hierarchy and
we ideally need to honor that. But that's optimisation, just mentioning.
> That should improve latency when
> selecting an idle state for a CPU. However, to reach that point
> additional changes are needed in various drivers, such as the gic
> driver for example.
>
Agreed.
> >
> > 2. Now that we have SDM845 which may soon have platform co-ordinated idle
> > support in mainline, I *really* would like to see some power comparison
> > numbers(i.e. PC without cluster idle states). This has been the main theme
> > for most of the discussion on this topic for years and now we are close
> > to have some platform, we need to try.
>
> I have quite recently been talking to Qcom folkz about this as well,
> but no commitments are made.
>
Indeed that's the worrying. IMO, this is requested since day#1 and not
even simple interest is shown, but that's another topic.
> Although I fully agree that some comparison would be great, it still
> doesn't matter much, as we anyway need to support PSCI OSI mode in
> Linux. Lorenzo have agreed to this as well.
>
OK, I am fine if others agree. Since we are sacrificing on few (retention)
states that might disappear with OSI, I am still very much still interested
as OSI might perform bad that PC especially in such cases.
> >
> > 3. Also, after adding such complexity, we really need a platform with an
> > option to build and upgrade firmware easily. This will help to prevent
> > this being not maintained for long without a platform to test, also
> > avoid adding lots of quirks to deal with broken firmware so that newer
> > platforms deal those issues in the firmware correctly.
>
> I don't see how this series change anything from what we already have
> today with the PSCI FW. No matter of OSI or PC mode is used, there are
> complexity involved.
>
I agree, but PC is already merged, mainitained and well tested regularly
as it's default mode that must be supported and TF-A supports/maintains
that. OSI is new and is on platform which may not have much commitments
and can be thrown away and any bugs we find in future many need to worked
around in kernel. That's what I meant as worrying.
> Although, of course I agree with you, that we should continue to try
> to convince ARM vendors about moving to the public version of ATF and
> avoid proprietary FW binaries as much as possible.
>
Indeed.
--
Regards,
Sudeep
On Thu, 17 Jan 2019 at 18:44, Sudeep Holla <[email protected]> wrote:
>
> On Wed, Jan 16, 2019 at 10:10:08AM +0100, Ulf Hansson wrote:
> > On Thu, 3 Jan 2019 at 13:06, Sudeep Holla <[email protected]> wrote:
> > >
> > > On Thu, Nov 29, 2018 at 06:46:33PM +0100, Ulf Hansson wrote:
> > > > Over the years this series have been iterated and discussed at various Linux
> > > > conferences and LKML. In this new v10, a quite significant amount of changes
> > > > have been made to address comments from v8 and v9. A summary is available
> > > > below, although let's start with a brand new clarification of the motivation
> > > > behind this series.
> > >
> > > I would like to raise few points, not blockers as such but need to be
> > > discussed and resolved before proceeding further.
> > > 1. CPU Idle Retention states
> > > - How will be deal with flattening (which brings back the DT bindings,
> > > i.e. do we have all we need) ? Because today there are no users of
> > > this binding yet. I know we all agreed and added after LPC2017 but
> > > I am not convinced about flattening with only valid states.
> >
> > Not exactly sure I understand what you are concerned about here. When
> > it comes to users of the new DT binding, I am converting two new
> > platforms in this series to use of it.
> >
>
> Yes that's exactly my concern. So if someone updates DT(since it's part
> of the kernel still), but don't update the firmware(for complexity reasons)
> the end result on those platform is broken CPUIdle which is a regression/
> feature break and that's what I am objecting here.
There is not going to be a regression if that happens, you have got
that wrong. Let me clarify why.
For Hikey example, which is one of those platforms I convert into
using the new hierarchical DT bindings for the CPUs. It still uses the
existing PSCI FW, which is supporting PSCI PC mode only.
In this case, the PSCI FW driver, observes that there is no OSI mode
support in the FW, which triggers it to convert the hierarchically
described idle states into regular flattened cpuidle states. In this
way, the idle states can be manged by the cpuidle framework per CPU,
as they are currently.
So, why convert Hikey to the new DT bindings? It makes Linux aware of
the topology, thus it can monitor when the last CPU in the cluster
enters idle - and then take care of "last man activities".
>
> > Note, the flattened model is still a valid option to describe the CPU
> > idle states after these changes. Especially when there are no last man
> > standing activities to manage by Linux and no shared resource that
> > need to prevent cluster idle states, when it's active.
>
> Since OSI vs PC is discoverable, we shouldn't tie up with DT in anyway.
As stated above, we aren't. OSI and PC mode are orthogonal to the DT bindings.
>
> >
> > > - Will domain governor ensure not to enter deeper idles states based
> > > on its sub-domain states. E.g.: when CPUs are in retention, so
> > > called container/cluster domain can enter retention or below and not
> > > power off states.
> >
> > I have tried to point this out as a known limitation in genpd of the
> > current series, possibly I have failed to communicate that clearly.
> > Anyway, I fully agree that this needs to be addressed in a future
> > step.
> >
>
> Sorry, I might have missed to read. The point is if we are sacrificing
> few retention states with this new feature, I am sure PC would perform
> better that OSI on platforms which has retention states. Another
> reason for having comparison data or we should simply assume and state
> clearly OSI may perform bad on such system until the support is added.
I now understand that I misread your question. We are not scarifying
any idle states at all. Not in PC mode and not in OSI mode.
>
> > Note that, this isn't a specific limitation to how idle states are
> > selected for CPUs and CPU clusters by genpd, but is rather a
> > limitation to any hierarchical PM domain topology managed by genpd
> > that has multiple idle states.
> >
>
> Agreed, but with flattened mode we compile the list of valid states so
> the limitation is automatically eliminated.
What I was trying to point out above, was a limitation in genpd and
with its governors. If the PM domains have multiple idle states and
also have multiple sub-domain levels, the selection of idle state may
not be correct. However, that scenario doesn't exist for Hikey/410c.
Apologize for the noise, I simply thought it was this limitation you
referred to.
>
> > Do note, I already started hacking on this and intend to post patches
> > on top of this series, as these changes isn't needed for those two
> > ARM64 platforms I have deployed support for.
> >
>
> Good to know.
>
> > > - Is the case of not calling cpu_pm_{enter,exit} handled now ?
> >
> > It is still called, so no changes in regards to that as apart of this series.
> >
>
> OK, so I assume for now we are not going to support retention states with OSI
> for now ?
>
> > When it comes to actually manage the "last man activities" as part of
> > selecting an idle state of the cluster, that is going to be addressed
> > on top as "optimizations".
> >
>
> OK
>
> > In principle we should not need to call cpu_pm_enter|exit() in the
> > idle path at all,
>
> Not sure if we can do that. We need to notify things like PMU, FP, GIC
> which have per cpu context too and not just "cluster" context.
>
> > but rather only cpu_cluster_pm_enter|exit() when a cluster idle state is
> > selected.
>
> We need to avoid relying on concept of "cluster" and just think of power
> domains and what's hanging on those domains.
I fully agree. I just wanted to use a well know term to avoid confusion.
> Sorry for naive question, but
> does genpd have concept of notifiers. I do understand that it's more
> bottom up approach where each entity in genpd saves the context and requests
> to enter a particular state. But with CPU devices like GIC/VFP/PMU, it
> needs to be more top down approach where CPU genpd has to enter a enter
> so it notifies the devices attached to it to save it's context.
No, genpd don't have on/off notifiers . There have been attempts to
add them, but those didn't make it.
Anyway, it's nice that you brings this up! The problem is well
described and the approach you suggest may very well be the right one.
In principle, I am also worried that the cpu_cluster_pm_enter|exist()
notifiers, doesn't scale. We may fire them when we shouldn't and
consumers may get them when they don't need them.
> Not ideal
> but that's current solution. Because with the new DT bindings, platforms
> can express if PMU/GIC is in per cpu domain or any pd in the hierarchy and
> we ideally need to honor that. But that's optimisation, just mentioning.
Overall, it's great that you mention this - and I just want to
confirm. I have this in mind when I am thinking of the next steps.
In regards to the next steps, hopefully we can move forward with
$subject series soon, so we really can start discussing the next steps
for real. I even think we need some of them to be implemented, before
we can see the full benefits made to latency and energy efficiency.
>
> > That should improve latency when
> > selecting an idle state for a CPU. However, to reach that point
> > additional changes are needed in various drivers, such as the gic
> > driver for example.
> >
>
> Agreed.
>
> > >
> > > 2. Now that we have SDM845 which may soon have platform co-ordinated idle
> > > support in mainline, I *really* would like to see some power comparison
> > > numbers(i.e. PC without cluster idle states). This has been the main theme
> > > for most of the discussion on this topic for years and now we are close
> > > to have some platform, we need to try.
> >
> > I have quite recently been talking to Qcom folkz about this as well,
> > but no commitments are made.
> >
>
> Indeed that's the worrying. IMO, this is requested since day#1 and not
> even simple interest is shown, but that's another topic.
Well, at least we keep talking about it and I am sure we will be able
to compare at some point.
Another option is simply to implement support for OSI mode in the
public ARM Trusted Firmware, any of us could do that. That would open
up for testing for a bunch of "open" platforms, like Hikey for
example.
>
> > Although I fully agree that some comparison would be great, it still
> > doesn't matter much, as we anyway need to support PSCI OSI mode in
> > Linux. Lorenzo have agreed to this as well.
> >
>
> OK, I am fine if others agree. Since we are sacrificing on few (retention)
> states that might disappear with OSI, I am still very much still interested
> as OSI might perform bad that PC especially in such cases.
>
> > >
> > > 3. Also, after adding such complexity, we really need a platform with an
> > > option to build and upgrade firmware easily. This will help to prevent
> > > this being not maintained for long without a platform to test, also
> > > avoid adding lots of quirks to deal with broken firmware so that newer
> > > platforms deal those issues in the firmware correctly.
> >
> > I don't see how this series change anything from what we already have
> > today with the PSCI FW. No matter of OSI or PC mode is used, there are
> > complexity involved.
> >
>
> I agree, but PC is already merged, mainitained and well tested regularly
> as it's default mode that must be supported and TF-A supports/maintains
> that. OSI is new and is on platform which may not have much commitments
> and can be thrown away and any bugs we find in future many need to worked
> around in kernel. That's what I meant as worrying.
I see what you are saying. Hopefully my earlier answers above will
make you less worry. :-)
>
> > Although, of course I agree with you, that we should continue to try
> > to convince ARM vendors about moving to the public version of ATF and
> > avoid proprietary FW binaries as much as possible.
> >
>
> Indeed.
>
> --
> Regards,
> Sudeep
Kind regards
Uffe
[...]
> > > > > +/**
> > > > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU
> > > > > + * @cpu: the particular CPU to get next wake up for
> > > > > + *
> > > > > + * Called for idle CPUs only.
> > > > > + */
> > > > > +ktime_t tick_nohz_get_next_wakeup(int cpu)
> > > > > +{
> > > > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
> > > > > +
> > > > > + return dev->next_event;
> > > > > +}
> > > > > +
> > > > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> > > > > {
> > > > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> > > > >
> > > >
> > > > Well, I have concerns regarding this one.
> > > >
> > > > I don't believe it is valid to call this new function for non-idle CPUs and
> > > > the kerneldoc kind of says so, but the next patch doesn't actually prevent
> > > > it from being called for a non-idle CPU (at the time it is called in there
> > > > the target CPU may not be idle any more AFAICS).
> > >
> > > You are correct, but let me clarify things.
> > >
> > > We are calling this new API from the new genpd governor, which may
> > > have a cpumask indicating there is more than one CPU attached to its
> > > PM domain+sub-PM domains. In other words, we may call the API for
> > > another CPU than the one we are executing on.
> > >
> > > When the new genpd governor is called, all CPUs in the cpumask of the
> > > genpd in question, are already runtime suspended and will remain so
> > > throughout the decisions made by the governor.
> > >
> > > However, because of the race condition, which needs to be manged by
> > > the genpd backend driver and its corresponding FW, one of the CPU in
> > > the genpd cpumask could potentially wake up from idle when the genpd
> > > governor runs. However, as a part of exiting from idle, that CPU needs
> > > to wait for the call to pm_runtime_get_sync() to return before
> > > completing the exit patch of idle. This also means waiting for the
> > > genpd governor to finish.
> >
> > OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off.
>
> Correct.
>
> This is the part that is not very nice, but ideally it should be a
> rather rare condition as it only happens during the last man standing
> point.
>
> >
> > > The point is, no matter what decision the governor takes under these
> > > circumstances, the genpd backend driver and its FW must manage this
> > > race condition during the last man standing. For PSCI OSI mode, it
> > > means that if a cluster idle state is suggested by Linux during these
> > > circumstances, it must be prevented and aborted.
> >
> > I would suggest putting a comment to explain that somewhere as it is
> > not really obvious.
>
> Let me see if can squeeze in that somewhere, probably it's best suited
> in the new genpd governor code somewhere.
>
> >
> > > >
> > > > In principle, the cpuidle core can store this value, say in struct
> > > > cpuidle_device of the given CPU, and expose a helper to access it from
> > > > genpd, but that would be extra overhead totally unnecessary on everthing
> > > > that doesn't use genpd for cpuidle.
> > > >
> > > > So maybe the driver could store it in its ->enter callback? After all,
> > > > the driver knows that genpd is going to be used later.
> > >
> > > This would work, but it wouldn't really change much when it comes to
> > > the race condition described above.
> >
> > No, it wouldn't make the race go away.
> >
> > > Of course it would turn the code
> > > into being more cpuidle specific, which seems reasonable to me.
> > >
> > > Anyway, if I understand your suggestion, in principle it means
> > > changing $subject patch in such way that the API should not take "int
> > > cpu" as an in-parameter, but instead only use __this_cpu() to read out
> > > the next event for current idle CPU.
> >
> > Yes.
I have looked closer to this and it turned out that it seems that I
should probably not need introduce an entirely new thing here. Instead
I should likely be able to re-factor the current
tick_nohz_get_sleep_length() and tick_nohz_next_event(), as those are
in principle doing the similar things as I need. So I started hacking
on that, when Daniel Lezcano told me that he already have a patch
doing exactly what I want. :-) However, in the context of his "next
wakeup prediction" work, but that shouldn't matter.
If I can make it work, I will fold in his patch in the next version of
the series instead.
Please tell if you already at this point, see any issues with this approach.
> >
> > > Additionally, we need another new cpuidle API, which genpd can call to
> > > retrieve a new per CPU "next event data" stored by the cpuidle driver
> > > from its ->enter() callback. Is this a correct interpretation of your
> > > suggestion?
> >
> > Yes, it is.
>
> Thanks for confirming!
>
> >
> > Generally, something like "cpuidle, give me the wakeup time of this
> > CPU". And it may very well give you 0 if the CPU has woken up
> > already. :-)
>
> Yep, I was thinking something like that, so in principle it may
> minimize the window of receiving in-correct "next wakeup data" in
> genpd for a non-idle CPU, but again it doesn't solve the race
> condition.
>
Kind regards
Uffe
On Fri, Jan 25, 2019 at 11:04 AM Ulf Hansson <[email protected]> wrote:
>
> [...]
>
> > > > > > +/**
> > > > > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU
> > > > > > + * @cpu: the particular CPU to get next wake up for
> > > > > > + *
> > > > > > + * Called for idle CPUs only.
> > > > > > + */
> > > > > > +ktime_t tick_nohz_get_next_wakeup(int cpu)
> > > > > > +{
> > > > > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu);
> > > > > > +
> > > > > > + return dev->next_event;
> > > > > > +}
> > > > > > +
> > > > > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
> > > > > > {
> > > > > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> > > > > >
> > > > >
> > > > > Well, I have concerns regarding this one.
> > > > >
> > > > > I don't believe it is valid to call this new function for non-idle CPUs and
> > > > > the kerneldoc kind of says so, but the next patch doesn't actually prevent
> > > > > it from being called for a non-idle CPU (at the time it is called in there
> > > > > the target CPU may not be idle any more AFAICS).
> > > >
> > > > You are correct, but let me clarify things.
> > > >
> > > > We are calling this new API from the new genpd governor, which may
> > > > have a cpumask indicating there is more than one CPU attached to its
> > > > PM domain+sub-PM domains. In other words, we may call the API for
> > > > another CPU than the one we are executing on.
> > > >
> > > > When the new genpd governor is called, all CPUs in the cpumask of the
> > > > genpd in question, are already runtime suspended and will remain so
> > > > throughout the decisions made by the governor.
> > > >
> > > > However, because of the race condition, which needs to be manged by
> > > > the genpd backend driver and its corresponding FW, one of the CPU in
> > > > the genpd cpumask could potentially wake up from idle when the genpd
> > > > governor runs. However, as a part of exiting from idle, that CPU needs
> > > > to wait for the call to pm_runtime_get_sync() to return before
> > > > completing the exit patch of idle. This also means waiting for the
> > > > genpd governor to finish.
> > >
> > > OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off.
> >
> > Correct.
> >
> > This is the part that is not very nice, but ideally it should be a
> > rather rare condition as it only happens during the last man standing
> > point.
> >
> > >
> > > > The point is, no matter what decision the governor takes under these
> > > > circumstances, the genpd backend driver and its FW must manage this
> > > > race condition during the last man standing. For PSCI OSI mode, it
> > > > means that if a cluster idle state is suggested by Linux during these
> > > > circumstances, it must be prevented and aborted.
> > >
> > > I would suggest putting a comment to explain that somewhere as it is
> > > not really obvious.
> >
> > Let me see if can squeeze in that somewhere, probably it's best suited
> > in the new genpd governor code somewhere.
> >
> > >
> > > > >
> > > > > In principle, the cpuidle core can store this value, say in struct
> > > > > cpuidle_device of the given CPU, and expose a helper to access it from
> > > > > genpd, but that would be extra overhead totally unnecessary on everthing
> > > > > that doesn't use genpd for cpuidle.
> > > > >
> > > > > So maybe the driver could store it in its ->enter callback? After all,
> > > > > the driver knows that genpd is going to be used later.
> > > >
> > > > This would work, but it wouldn't really change much when it comes to
> > > > the race condition described above.
> > >
> > > No, it wouldn't make the race go away.
> > >
> > > > Of course it would turn the code
> > > > into being more cpuidle specific, which seems reasonable to me.
> > > >
> > > > Anyway, if I understand your suggestion, in principle it means
> > > > changing $subject patch in such way that the API should not take "int
> > > > cpu" as an in-parameter, but instead only use __this_cpu() to read out
> > > > the next event for current idle CPU.
> > >
> > > Yes.
>
> I have looked closer to this and it turned out that it seems that I
> should probably not need introduce an entirely new thing here. Instead
> I should likely be able to re-factor the current
> tick_nohz_get_sleep_length() and tick_nohz_next_event(), as those are
> in principle doing the similar things as I need. So I started hacking
> on that, when Daniel Lezcano told me that he already have a patch
> doing exactly what I want. :-) However, in the context of his "next
> wakeup prediction" work, but that shouldn't matter.
>
> If I can make it work, I will fold in his patch in the next version of
> the series instead.
>
> Please tell if you already at this point, see any issues with this approach.
Not in principle as long as you do that in the context of the cpuidle
framework. That is, I still would like to have this "cpuidle, give
me the wakeup time of this CPU" I/F to the genpd governor, but you can
do the above to implement it as long as I'm concerned.