This series enables support for hierarchical CPU arrangement, managed by PSCI
for ARM/ARM64. It's based on using the generic PM domain (genpd), which
recently was extended to manage devices belonging to CPUs.
The last two DTS patches enables the hierarchical topology to be used for the
Qcom 410c Dragonboard and the Hisilicon Hikey board. The former uses PSCI OS-
initiated mode, while the latter uses the PSCI Platform-Coordinated mode. In
other words, the hierarchical description of the topology in DT, is orthogonal
to the supported PSCI CPU suspend mode.
Do note, these patches have been posted earlier, but then being part of bigger
series, which at that point also included the needed infrastructure changes to
genpd and cpuidle. Rather than continue to carry the old version history,
which may be a bit confusing, I decided to start over. Although, for clarity,
the changelog below explains what changes that have been made since the last
submission was made.
Changes since last submission:
- Converted to use dev_pm_domain_attach_by_name() rather than
dev_pm_domain_attach(),when attaching a CPU to its PM domain. This is done to
cope with multiple PM domains per CPU, if that turns out to be needed in the
future. Changes mainly consisted of storing the returned struct device* from
dev_pm_domain_attach_by_name() into a per CPU struct.
- Due to above changes, some simplification of the code became possible, in
particular the deployment of runtime PM became a bit nicer, I think.
- Moved some of the new code inside "#ifdef CONFIG_CPU_IDLE".
- Addressed various comments for each patch.
The series is also available at:
git.linaro.org/people/ulf.hansson/linux-pm.git next
More background (if you are still awake):
For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
From a CPU idle state perspective, this means some states may be shared among a
group of CPUs (aka CPU cluster).
To deal with idle management of a group of CPUs, sometimes the kernel needs to
be involved to manage the last-man standing algorithm, simply because it can't
rely solely on power management FWs to deal with this. Depending on the
platform, of course.
There are a couple of typical scenarios for when the kernel needs to be in
control, dealing with synchronization of when the last CPU in a cluster is about
to enter a deep idle state.
1)
The kernel needs to carry out so called last-man activities before the
CPU cluster can enter a deep idle state. This may for example involve to
configure external logics for wakeups, as the GIC may no longer be functional
once a deep cluster idle state have been entered. Likewise, these operations
may need to be restored, when the first CPU wakes up.
2)
Other more generic I/O devices, such as an MMC controller for example, may be a
part of the same power domain as the CPU cluster, due to a shared power-rail.
For these scenarios, when the MMC controller is in use dealing with an MMC
request, a deeper idle state of the CPU cluster may needs to be temporarily
disabled. This is needed to retain the MMC controller in a functional state,
else it may loose its register-context in the middle of serving a request.
Kind regards
Ulf Hansson
Lina Iyer (4):
dt: psci: Update DT bindings to support hierarchical PSCI states
cpuidle: dt: Support hierarchical CPU idle states
drivers: firmware: psci: Support hierarchical CPU idle states
arm64: dts: Convert to the hierarchical CPU topology layout for
MSM8916
Ulf Hansson (14):
of: base: Add of_get_cpu_state_node() to get idle states for a CPU
node
ARM/ARM64: cpuidle: Let back-end init ops take the driver as input
drivers: firmware: psci: Simplify state node parsing
drivers: firmware: psci: Prepare to use OS initiated suspend mode
drivers: firmware: psci: Prepare to support PM domains
drivers: firmware: psci: Add support for PM domains using genpd
drivers: firmware: psci: Add hierarchical domain idle states converter
drivers: firmware: psci: Introduce psci_dt_topology_init()
drivers: firmware: psci: Add a helper to attach a CPU to its PM domain
drivers: firmware: psci: Attach the CPU's device to its PM domain
drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
drivers: firmware: psci: Support CPU hotplug for the hierarchical
model
arm64: kernel: Respect the hierarchical CPU topology in DT for PSCI
arm64: dts: hikey: Convert to the hierarchical CPU topology layout
.../devicetree/bindings/arm/psci.txt | 166 ++++++++
arch/arm/include/asm/cpuidle.h | 4 +-
arch/arm/kernel/cpuidle.c | 5 +-
arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 +++-
arch/arm64/boot/dts/qcom/msm8916.dtsi | 57 ++-
arch/arm64/include/asm/cpu_ops.h | 4 +-
arch/arm64/include/asm/cpuidle.h | 6 +-
arch/arm64/kernel/cpuidle.c | 6 +-
arch/arm64/kernel/setup.c | 3 +
drivers/cpuidle/cpuidle-arm.c | 2 +-
drivers/cpuidle/dt_idle_states.c | 5 +-
drivers/firmware/psci/Makefile | 2 +-
drivers/firmware/psci/psci.c | 219 ++++++++--
drivers/firmware/psci/psci.h | 29 ++
drivers/firmware/psci/psci_pm_domain.c | 403 ++++++++++++++++++
drivers/of/base.c | 36 ++
drivers/soc/qcom/spm.c | 3 +-
include/linux/of.h | 8 +
include/linux/psci.h | 6 +-
19 files changed, 987 insertions(+), 64 deletions(-)
create mode 100644 drivers/firmware/psci/psci.h
create mode 100644 drivers/firmware/psci/psci_pm_domain.c
--
2.17.1
From: Lina Iyer <[email protected]>
Update DT bindings to represent hierarchical CPU and CPU PM domain idle
states for PSCI. Also update the PSCI examples to clearly show how
flattened and hierarchical idle states can be represented in DT.
Signed-off-by: Lina Iyer <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Sudeep Holla <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- None.
---
.../devicetree/bindings/arm/psci.txt | 166 ++++++++++++++++++
1 file changed, 166 insertions(+)
diff --git a/Documentation/devicetree/bindings/arm/psci.txt b/Documentation/devicetree/bindings/arm/psci.txt
index a2c4f1d52492..e6d3553c8df8 100644
--- a/Documentation/devicetree/bindings/arm/psci.txt
+++ b/Documentation/devicetree/bindings/arm/psci.txt
@@ -105,7 +105,173 @@ Case 3: PSCI v0.2 and PSCI v0.1.
...
};
+ARM systems can have multiple cores sometimes in hierarchical arrangement.
+This often, but not always, maps directly to the processor power topology of
+the system. Individual nodes in a topology have their own specific power states
+and can be better represented in DT hierarchically.
+
+For these cases, the definitions of the idle states for the CPUs and the CPU
+topology, must conform to the domain idle state specification [3]. The domain
+idle states themselves, must be compatible with the defined 'domain-idle-state'
+binding [1], and also need to specify the arm,psci-suspend-param property for
+each idle state.
+
+DT allows representing CPUs and CPU idle states in two different ways -
+
+The flattened model as given in Example 1, lists CPU's idle states followed by
+the domain idle state that the CPUs may choose. Note that the idle states are
+all compatible with "arm,idle-state". Additionally, for the domain idle state
+the "arm,psci-suspend-param" represents a superset of the CPU's idle state.
+
+Example 2 represents the hierarchical model of CPUs and domain idle states.
+CPUs define their domain provider in their psci DT node. The domain controls
+the power to the CPU and possibly other h/w blocks that would enter an idle
+state along with the CPU. The CPU's idle states may therefore be considered as
+the domain's idle states and have the compatible "arm,idle-state". Such domains
+may also be embedded within another domain that may represent common h/w blocks
+between these CPUs. The idle states of the CPU topology shall be represented as
+the domain's idle states. Note that for the domain idle state, the
+"arm,psci-suspend-param" represents idle states hierarchically.
+
+In PSCI firmware v1.0, the OS-Initiated mode is introduced. However, the
+flattened vs hierarchical DT representation is orthogonal to the OS-Initiated
+vs the platform-coordinated PSCI CPU suspend modes, thus should be considered
+independent of each other.
+
+The hierarchical representation helps and makes it easy to implement OSI mode
+and OS implementations may choose to mandate it. For the default platform-
+coordinated mode, both representations are viable options.
+
+Example 1: Flattened representation of CPU and domain idle states
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ CPU0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a53", "arm,armv8";
+ reg = <0x0>;
+ enable-method = "psci";
+ cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
+ <&CLUSTER_PWRDN>;
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57", "arm,armv8";
+ reg = <0x100>;
+ enable-method = "psci";
+ cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
+ <&CLUSTER_PWRDN>;
+ };
+
+ idle-states {
+ CPU_PWRDN: cpu-power-down {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x0000001>;
+ entry-latency-us = <10>;
+ exit-latency-us = <10>;
+ min-residency-us = <100>;
+ };
+
+ CLUSTER_RET: cluster-retention {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x1000011>;
+ entry-latency-us = <500>;
+ exit-latency-us = <500>;
+ min-residency-us = <2000>;
+ };
+
+ CLUSTER_PWRDN: cluster-power-down {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x1000031>;
+ entry-latency-us = <2000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <6000>;
+ };
+ };
+
+ psci {
+ compatible = "arm,psci-0.2";
+ method = "smc";
+ };
+
+Example 2: Hierarchical representation of CPU and domain idle states
+
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ CPU0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a53", "arm,armv8";
+ reg = <0x0>;
+ enable-method = "psci";
+ power-domains = <&CPU_PD0>;
+ power-domain-names = "psci";
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57", "arm,armv8";
+ reg = <0x100>;
+ enable-method = "psci";
+ power-domains = <&CPU_PD1>;
+ power-domain-names = "psci";
+ };
+
+ idle-states {
+ CPU_PWRDN: cpu-power-down {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x0000001>;
+ entry-latency-us = <10>;
+ exit-latency-us = <10>;
+ min-residency-us = <100>;
+ };
+
+ CLUSTER_RET: cluster-retention {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000010>;
+ entry-latency-us = <500>;
+ exit-latency-us = <500>;
+ min-residency-us = <2000>;
+ };
+
+ CLUSTER_PWRDN: cluster-power-down {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000030>;
+ entry-latency-us = <2000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <6000>;
+ };
+ };
+ };
+
+ psci {
+ compatible = "arm,psci-1.0";
+ method = "smc";
+
+ CPU_PD0: cpu-pd0 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CPU_PWRDN>;
+ power-domains = <&CLUSTER_PD>;
+ };
+
+ CPU_PD1: cpu-pd1 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CPU_PWRDN>;
+ power-domains = <&CLUSTER_PD>;
+ };
+
+ CLUSTER_PD: cluster-pd {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>;
+ };
+ };
+
[1] Kernel documentation - ARM idle states bindings
Documentation/devicetree/bindings/arm/idle-states.txt
[2] Power State Coordination Interface (PSCI) specification
http://infocenter.arm.com/help/topic/com.arm.doc.den0022c/DEN0022C_Power_State_Coordination_Interface.pdf
+[3]. PM Domains description
+ Documentation/devicetree/bindings/power/power_domain.txt
--
2.17.1
Instead of iterating through all the state nodes in DT, to find out how
many states that needs to be allocated, let's use the number already known
by the cpuidle driver. In this way we can drop the iteration altogether.
Signed-off-by: Ulf Hansson <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
Changes:
- None.
---
drivers/firmware/psci/psci.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 88e90e0f06b9..9c2180bcee4c 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -306,26 +306,20 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
struct device_node *cpu_node, int cpu)
{
- int i, ret = 0, count = 0;
+ int i, ret = 0, num_state_nodes = drv->state_count - 1;
u32 *psci_states;
struct device_node *state_node;
- /* Count idle states */
- while ((state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
- count))) {
- count++;
- of_node_put(state_node);
- }
-
- if (!count)
- return -ENODEV;
-
- psci_states = kcalloc(count, sizeof(*psci_states), GFP_KERNEL);
+ psci_states = kcalloc(num_state_nodes, sizeof(*psci_states),
+ GFP_KERNEL);
if (!psci_states)
return -ENOMEM;
- for (i = 0; i < count; i++) {
+ for (i = 0; i < num_state_nodes; i++) {
state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ if (!state_node)
+ break;
+
ret = psci_dt_parse_state_node(state_node, &psci_states[i]);
of_node_put(state_node);
@@ -335,6 +329,11 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
pr_debug("psci-power-state %#x index %d\n", psci_states[i], i);
}
+ if (i != num_state_nodes) {
+ ret = -ENODEV;
+ goto free_mem;
+ }
+
/* Idle states parsed correctly, initialize per-cpu pointer */
per_cpu(psci_power_state, cpu) = psci_states;
return 0;
--
2.17.1
From: Lina Iyer <[email protected]>
Currently CPU's idle states are represented in a flattened model, via the
"cpu-idle-states" binding from within the CPU's device nodes.
Support the hierarchical layout, simply by converting to calling the new OF
helper, of_get_cpu_state_node().
Suggested-by: Sudeep Holla <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- None.
---
drivers/firmware/psci/psci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 9c2180bcee4c..b11560f7c4b9 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -316,7 +316,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
return -ENOMEM;
for (i = 0; i < num_state_nodes; i++) {
- state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ state_node = of_get_cpu_state_node(cpu_node, i);
if (!state_node)
break;
--
2.17.1
The per CPU variable psci_power_state, contains an array of fixed values,
which reflects the corresponding arm,psci-suspend-param parsed from DT, for
each of the available CPU idle states.
This isn't sufficient when using the hierarchical CPU topology in DT in
combination with having PSCI OS initiated (OSI) mode enabled. More
precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
idle state the cluster (a group of CPUs) should enter, while in PSCI
Platform Coordinated (PC) mode, each CPU independently votes for an idle
state of the cluster.
For this reason, let's introduce an additional per CPU variable called
domain_state and implement two helper functions to read/write its values.
Following patches, which implements PM domain support for PSCI, will use
the domain_state variable and set it to corresponding bits that represents
the selected idle state for the cluster.
Finally, in psci_cpu_suspend_enter() and psci_suspend_finisher(), let's
take into account the values in the domain_state, as to get the complete
suspend parameter.
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Clarify changelog.
- Drop changes in psci_cpu_on() as it belongs in the patch for hotplug.
- Move some code inside "#ifdef CONFIG_CPU_IDLE".
---
drivers/firmware/psci/psci.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index b11560f7c4b9..4aec513136e4 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -285,6 +285,17 @@ static int __init psci_features(u32 psci_func_id)
#ifdef CONFIG_CPU_IDLE
static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
+static DEFINE_PER_CPU(u32, domain_state);
+
+static inline u32 psci_get_domain_state(void)
+{
+ return __this_cpu_read(domain_state);
+}
+
+static inline void psci_set_domain_state(u32 state)
+{
+ __this_cpu_write(domain_state, state);
+}
static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
{
@@ -420,15 +431,17 @@ int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
static int psci_suspend_finisher(unsigned long index)
{
u32 *state = __this_cpu_read(psci_power_state);
+ u32 composite_state = state[index - 1] | psci_get_domain_state();
- return psci_ops.cpu_suspend(state[index - 1],
- __pa_symbol(cpu_resume));
+ return psci_ops.cpu_suspend(composite_state, __pa_symbol(cpu_resume));
}
int psci_cpu_suspend_enter(unsigned long index)
{
int ret;
u32 *state = __this_cpu_read(psci_power_state);
+ u32 composite_state = state[index - 1] | psci_get_domain_state();
+
/*
* idle state index 0 corresponds to wfi, should never be called
* from the cpu_suspend operations
@@ -436,11 +449,14 @@ int psci_cpu_suspend_enter(unsigned long index)
if (WARN_ON_ONCE(!index))
return -EINVAL;
- if (!psci_power_state_loses_context(state[index - 1]))
- ret = psci_ops.cpu_suspend(state[index - 1], 0);
+ if (!psci_power_state_loses_context(composite_state))
+ ret = psci_ops.cpu_suspend(composite_state, 0);
else
ret = cpu_suspend(index, psci_suspend_finisher);
+ /* Clear the domain state to start fresh when back from idle. */
+ psci_set_domain_state(0);
+
return ret;
}
--
2.17.1
To be able to initiate the PM domain data structures for PSCI at a specific
point during boot, let's export a new init function,
psci_dt_topology_init(). If CONFIG_CPU_IDLE is set, it calls
psci_dt_init_pm_domains(), which performs the actual initialization.
Note that, it may seem like feasible idea to hook into the existing
psci_dt_init() function, rather than adding a new separate init function.
However, this doesn't work because psci_dt_init() is called early in the
boot sequence, when allocating dynamic data structures isn't yet possible.
Subsequent changes calls this new init function.
Finally, following changes on top needs to know whether the hierarchical PM
domain topology is used or not. Therefore, let's store this information in
an internal PSCI flag.
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Moved some code inside "#ifdef CONFIG_CPU_IDLE".
- Updated changelog.
---
drivers/firmware/psci/psci.c | 30 ++++++++++++++++++++++++++++++
include/linux/psci.h | 2 ++
2 files changed, 32 insertions(+)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index bfef300b7ebe..28745234b53f 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -297,6 +297,7 @@ static int __init psci_features(u32 psci_func_id)
#ifdef CONFIG_CPU_IDLE
static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
static DEFINE_PER_CPU(u32, domain_state);
+static bool psci_dt_topology;
static inline u32 psci_get_domain_state(void)
{
@@ -480,6 +481,19 @@ static const struct cpuidle_ops psci_cpuidle_ops __initconst = {
CPUIDLE_METHOD_OF_DECLARE(psci, "psci", &psci_cpuidle_ops);
#endif
+
+static int __init _psci_dt_topology_init(struct device_node *np)
+{
+ int ret;
+
+ /* Initialize the CPU PM domains based on topology described in DT. */
+ ret = psci_dt_init_pm_domains(np);
+ psci_dt_topology = ret > 0;
+
+ return ret;
+}
+#else
+static inline int _psci_dt_topology_init(struct device_node *np) { return 0; }
#endif
static int psci_system_suspend(unsigned long unused)
@@ -758,6 +772,22 @@ int __init psci_dt_init(void)
return ret;
}
+int __init psci_dt_topology_init(void)
+{
+ struct device_node *np;
+ int ret;
+
+ np = of_find_matching_node_and_match(NULL, psci_of_match, NULL);
+ if (!np)
+ return -ENODEV;
+
+ /* Initialize the topology described in DT. */
+ ret = _psci_dt_topology_init(np);
+
+ of_node_put(np);
+ return ret;
+}
+
#ifdef CONFIG_ACPI
/*
* We use PSCI 0.2+ when ACPI is deployed on ARM64 and it's
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 4f29a3bff379..16beccccbbcc 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -55,8 +55,10 @@ extern struct psci_operations psci_ops;
#if defined(CONFIG_ARM_PSCI_FW)
int __init psci_dt_init(void);
+int __init psci_dt_topology_init(void);
#else
static inline int psci_dt_init(void) { return 0; }
+static inline int psci_dt_topology_init(void) { return 0; }
#endif
#if defined(CONFIG_ARM_PSCI_FW) && defined(CONFIG_ACPI)
--
2.17.1
To enable the OS to manage last-man standing activities for a CPU, while an
idle state for a group of CPUs is selected, let's convert the Hikey
platform into using the hierarchical CPU topology layout.
Cc: Wei Xu <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- None.
---
arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 ++++++++++++++++++++---
1 file changed, 76 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
index 108e2a4227f6..36ff460f428f 100644
--- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
@@ -20,6 +20,64 @@
psci {
compatible = "arm,psci-0.2";
method = "smc";
+
+ CPU_PD0: cpu-pd0 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD1: cpu-pd1 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD2: cpu-pd2 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD3: cpu-pd3 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD0>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD4: cpu-pd4 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD5: cpu-pd5 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD6: cpu-pd6 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CPU_PD7: cpu-pd7 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD1>;
+ domain-idle-states = <&CPU_SLEEP>;
+ };
+
+ CLUSTER_PD0: cluster-pd0 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_SLEEP>;
+ };
+
+ CLUSTER_PD1: cluster-pd1 {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_SLEEP>;
+ };
};
cpus {
@@ -70,9 +128,8 @@
};
CLUSTER_SLEEP: cluster-sleep {
- compatible = "arm,idle-state";
- local-timer-stop;
- arm,psci-suspend-param = <0x1010000>;
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000000>;
entry-latency-us = <1000>;
exit-latency-us = <700>;
min-residency-us = <2700>;
@@ -88,9 +145,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD0>;
+ power-domain-names = "psci";
};
cpu1: cpu@1 {
@@ -101,9 +159,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD1>;
+ power-domain-names = "psci";
};
cpu2: cpu@2 {
@@ -114,9 +173,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD2>;
+ power-domain-names = "psci";
};
cpu3: cpu@3 {
@@ -127,9 +187,10 @@
next-level-cache = <&CLUSTER0_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD3>;
+ power-domain-names = "psci";
};
cpu4: cpu@100 {
@@ -140,9 +201,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD4>;
+ power-domain-names = "psci";
};
cpu5: cpu@101 {
@@ -153,9 +215,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD5>;
+ power-domain-names = "psci";
};
cpu6: cpu@102 {
@@ -166,9 +229,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD6>;
+ power-domain-names = "psci";
};
cpu7: cpu@103 {
@@ -179,9 +243,10 @@
next-level-cache = <&CLUSTER1_L2>;
clocks = <&stub_clock 0>;
operating-points-v2 = <&cpu_opp_table>;
- cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
#cooling-cells = <2>; /* min followed by max */
dynamic-power-coefficient = <311>;
+ power-domains = <&CPU_PD7>;
+ power-domain-names = "psci";
};
CLUSTER0_L2: l2-cache0 {
--
2.17.1
The CPU's idle state nodes are currently parsed at the common cpuidle DT
library, but also when initializing back-end data for the arch specific CPU
operations, as in the PSCI driver case.
To avoid open-coding, let's introduce of_get_cpu_state_node(), which takes
the device node for the CPU and the index to the requested idle state node,
as in-parameters. In case a corresponding idle state node is found, it
returns the node with the refcount incremented for it, else it returns
NULL.
Moreover, for ARM, there are two generic methods, to describe the CPU's
idle states, either via the flattened description through the
"cpu-idle-states" binding [1] or via the hierarchical layout, using the
"power-domains" and the "domain-idle-states" bindings [2]. Hence, let's
take both options into account.
[1]
Documentation/devicetree/bindings/arm/idle-states.txt
[2]
Documentation/devicetree/bindings/arm/psci.txt
Suggested-by: Sudeep Holla <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Fixed some kernel docs typos.
- Fall-back to use "cpu-idle-states" when "power-domains" is present but
"domain-idle-states" is missing.
---
drivers/of/base.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/of.h | 8 ++++++++
2 files changed, 44 insertions(+)
diff --git a/drivers/of/base.c b/drivers/of/base.c
index 20e0e7ee4edf..05866f0c65b4 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -477,6 +477,42 @@ int of_cpu_node_to_id(struct device_node *cpu_node)
}
EXPORT_SYMBOL(of_cpu_node_to_id);
+/**
+ * of_get_cpu_state_node - Get CPU's idle state node at the given index
+ *
+ * @cpu_node: The device node for the CPU
+ * @index: The index in the list of the idle states
+ *
+ * Two generic methods can be used to describe a CPU's idle states, either via
+ * a flattened description through the "cpu-idle-states" binding or via the
+ * hierarchical layout, using the "power-domains" and the "domain-idle-states"
+ * bindings. This function check for both and returns the idle state node for
+ * the requested index.
+ *
+ * In case an idle state node is found at @index, the refcount is incremented
+ * for it, so call of_node_put() on it when done. Returns NULL if not found.
+ */
+struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
+ int index)
+{
+ struct of_phandle_args args;
+ int err;
+
+ err = of_parse_phandle_with_args(cpu_node, "power-domains",
+ "#power-domain-cells", 0, &args);
+ if (!err) {
+ struct device_node *state_node =
+ of_parse_phandle(args.np, "domain-idle-states", index);
+
+ of_node_put(args.np);
+ if (state_node)
+ return state_node;
+ }
+
+ return of_parse_phandle(cpu_node, "cpu-idle-states", index);
+}
+EXPORT_SYMBOL(of_get_cpu_state_node);
+
/**
* __of_device_is_compatible() - Check if the node matches given constraints
* @device: pointer to node
diff --git a/include/linux/of.h b/include/linux/of.h
index 0cf857012f11..6ae5c2c4b104 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -351,6 +351,8 @@ extern const void *of_get_property(const struct device_node *node,
int *lenp);
extern struct device_node *of_get_cpu_node(int cpu, unsigned int *thread);
extern struct device_node *of_get_next_cpu_node(struct device_node *prev);
+extern struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
+ int index);
#define for_each_property_of_node(dn, pp) \
for (pp = dn->properties; pp != NULL; pp = pp->next)
@@ -765,6 +767,12 @@ static inline struct device_node *of_get_next_cpu_node(struct device_node *prev)
return NULL;
}
+static inline struct device_node *of_get_cpu_state_node(struct device_node *cpu_node,
+ int index)
+{
+ return NULL;
+}
+
static inline int of_n_addr_cells(struct device_node *np)
{
return 0;
--
2.17.1
From: Lina Iyer <[email protected]>
Currently CPU's idle states are represented in a flattened model, via the
"cpu-idle-states" binding from within the CPU's device nodes.
Support the hierarchical layout during parsing and validating of the CPU's
idle states. This is simply done by calling the new OF helper,
of_get_cpu_state_node().
Suggested-by: Sudeep Holla <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- None.
---
drivers/cpuidle/dt_idle_states.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/cpuidle/dt_idle_states.c b/drivers/cpuidle/dt_idle_states.c
index add9569636b5..97ad25399ca8 100644
--- a/drivers/cpuidle/dt_idle_states.c
+++ b/drivers/cpuidle/dt_idle_states.c
@@ -114,8 +114,7 @@ static bool idle_state_valid(struct device_node *state_node, unsigned int idx,
for (cpu = cpumask_next(cpumask_first(cpumask), cpumask);
cpu < nr_cpu_ids; cpu = cpumask_next(cpu, cpumask)) {
cpu_node = of_cpu_device_node_get(cpu);
- curr_state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
- idx);
+ curr_state_node = of_get_cpu_state_node(cpu_node, idx);
if (state_node != curr_state_node)
valid = false;
@@ -173,7 +172,7 @@ int dt_init_idle_driver(struct cpuidle_driver *drv,
cpu_node = of_cpu_device_node_get(cpumask_first(cpumask));
for (i = 0; ; i++) {
- state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
+ state_node = of_get_cpu_state_node(cpu_node, i);
if (!state_node)
break;
--
2.17.1
Introduce a new PSCI DT helper function, psci_dt_attach_cpu(), which takes
a CPU number as an in-parameter and tries to attach the CPU's struct device
to its corresponding PM domain. Let's use dev_pm_domain_attach_by_name(),
as to be able to specify "psci" as the required "power-domain-names".
Additionally, let the helper prepare the CPU to be power managed via
runtime PM.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Take into account whether the CPU is online/offline.
- Convert from dev_pm_domain_attach() into using the more future proof
dev_pm_domain_attach_by_name().
---
drivers/firmware/psci/psci.h | 3 +++
drivers/firmware/psci/psci_pm_domain.c | 17 +++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
index c36e0e6649e9..b4254405b8ef 100644
--- a/drivers/firmware/psci/psci.h
+++ b/drivers/firmware/psci/psci.h
@@ -4,6 +4,7 @@
#define __PSCI_H
struct cpuidle_driver;
+struct device;
struct device_node;
int psci_set_osi_mode(void);
@@ -16,10 +17,12 @@ int psci_dt_parse_state_node(struct device_node *np, u32 *state);
int psci_dt_init_pm_domains(struct device_node *np);
int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
struct device_node *cpu_node, u32 *psci_states);
+struct device *psci_dt_attach_cpu(int cpu);
#else
static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
static inline int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
struct device_node *cpu_node, u32 *psci_states) { return 0; }
+static inline struct device *psci_dt_attach_cpu(int cpu) { return NULL; }
#endif
#endif
diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
index 3aa645dba81b..1cbe745ee001 100644
--- a/drivers/firmware/psci/psci_pm_domain.c
+++ b/drivers/firmware/psci/psci_pm_domain.c
@@ -12,8 +12,10 @@
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/cpu.h>
#include <linux/cpuidle.h>
#include <linux/cpu_pm.h>
@@ -383,4 +385,19 @@ int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
return 0;
}
+
+struct device *psci_dt_attach_cpu(int cpu)
+{
+ struct device *dev, *cpu_dev = get_cpu_device(cpu);
+
+ dev = dev_pm_domain_attach_by_name(cpu_dev, "psci");
+ if (IS_ERR_OR_NULL(dev))
+ return dev;
+
+ pm_runtime_irq_safe(dev);
+ if (cpu_online(cpu))
+ pm_runtime_get_sync(dev);
+
+ return dev;
+}
#endif
--
2.17.1
To allow arch back-end init ops to operate on the cpuidle driver for the
corresponding CPU, let's pass along a pointer to the struct cpuidle_driver*
and forward it the relevant layers of callbacks for ARM/ARM64.
Following changes for the PSCI firmware driver starts making use of this.
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andy Gross <[email protected]>
Cc: David Brown <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
Changes:
- None.
Note:
- This patch is needed by the subsequent patch, but more importantly,
also by "[PATCH 10/18] drivers: firmware: psci: Add hierarchical
domain idle states converter".
---
arch/arm/include/asm/cpuidle.h | 4 ++--
arch/arm/kernel/cpuidle.c | 5 +++--
arch/arm64/include/asm/cpu_ops.h | 4 +++-
arch/arm64/include/asm/cpuidle.h | 6 ++++--
arch/arm64/kernel/cpuidle.c | 6 +++---
drivers/cpuidle/cpuidle-arm.c | 2 +-
drivers/firmware/psci/psci.c | 7 ++++---
drivers/soc/qcom/spm.c | 3 ++-
include/linux/psci.h | 4 +++-
9 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
index 6b2ff7243b4b..bee0a7847733 100644
--- a/arch/arm/include/asm/cpuidle.h
+++ b/arch/arm/include/asm/cpuidle.h
@@ -32,7 +32,7 @@ struct device_node;
struct cpuidle_ops {
int (*suspend)(unsigned long arg);
- int (*init)(struct device_node *, int cpu);
+ int (*init)(struct cpuidle_driver *, struct device_node *, int cpu);
};
struct of_cpuidle_method {
@@ -47,6 +47,6 @@ struct of_cpuidle_method {
extern int arm_cpuidle_suspend(int index);
-extern int arm_cpuidle_init(int cpu);
+extern int arm_cpuidle_init(struct cpuidle_driver *drv, int cpu);
#endif
diff --git a/arch/arm/kernel/cpuidle.c b/arch/arm/kernel/cpuidle.c
index fda5579123a8..43778c9b373d 100644
--- a/arch/arm/kernel/cpuidle.c
+++ b/arch/arm/kernel/cpuidle.c
@@ -122,6 +122,7 @@ static int __init arm_cpuidle_read_ops(struct device_node *dn, int cpu)
/**
* arm_cpuidle_init() - Initialize cpuidle_ops for a specific cpu
+ * @drv: the drv to be initialized
* @cpu: the cpu to be initialized
*
* Initialize the cpuidle ops with the device for the cpu and then call
@@ -137,7 +138,7 @@ static int __init arm_cpuidle_read_ops(struct device_node *dn, int cpu)
* -ENXIO if the HW reports a failure or a misconfiguration,
* -ENOMEM if the HW report an memory allocation failure
*/
-int __init arm_cpuidle_init(int cpu)
+int __init arm_cpuidle_init(struct cpuidle_driver *drv, int cpu)
{
struct device_node *cpu_node = of_cpu_device_node_get(cpu);
int ret;
@@ -147,7 +148,7 @@ int __init arm_cpuidle_init(int cpu)
ret = arm_cpuidle_read_ops(cpu_node, cpu);
if (!ret)
- ret = cpuidle_ops[cpu].init(cpu_node, cpu);
+ ret = cpuidle_ops[cpu].init(drv, cpu_node, cpu);
of_node_put(cpu_node);
diff --git a/arch/arm64/include/asm/cpu_ops.h b/arch/arm64/include/asm/cpu_ops.h
index 8f03446cf89f..8db870c29f1b 100644
--- a/arch/arm64/include/asm/cpu_ops.h
+++ b/arch/arm64/include/asm/cpu_ops.h
@@ -19,6 +19,8 @@
#include <linux/init.h>
#include <linux/threads.h>
+struct cpuidle_driver;
+
/**
* struct cpu_operations - Callback operations for hotplugging CPUs.
*
@@ -58,7 +60,7 @@ struct cpu_operations {
int (*cpu_kill)(unsigned int cpu);
#endif
#ifdef CONFIG_CPU_IDLE
- int (*cpu_init_idle)(unsigned int);
+ int (*cpu_init_idle)(struct cpuidle_driver *, unsigned int);
int (*cpu_suspend)(unsigned long);
#endif
};
diff --git a/arch/arm64/include/asm/cpuidle.h b/arch/arm64/include/asm/cpuidle.h
index 3c5ddb429ea2..3fd3efb61649 100644
--- a/arch/arm64/include/asm/cpuidle.h
+++ b/arch/arm64/include/asm/cpuidle.h
@@ -4,11 +4,13 @@
#include <asm/proc-fns.h>
+struct cpuidle_driver;
+
#ifdef CONFIG_CPU_IDLE
-extern int arm_cpuidle_init(unsigned int cpu);
+extern int arm_cpuidle_init(struct cpuidle_driver *drv, unsigned int cpu);
extern int arm_cpuidle_suspend(int index);
#else
-static inline int arm_cpuidle_init(unsigned int cpu)
+static inline int arm_cpuidle_init(struct cpuidle_driver *drv, unsigned int cpu)
{
return -EOPNOTSUPP;
}
diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index f2d13810daa8..aaf9dc5cb87a 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -18,13 +18,13 @@
#include <asm/cpuidle.h>
#include <asm/cpu_ops.h>
-int arm_cpuidle_init(unsigned int cpu)
+int arm_cpuidle_init(struct cpuidle_driver *drv, unsigned int cpu)
{
int ret = -EOPNOTSUPP;
if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_suspend &&
cpu_ops[cpu]->cpu_init_idle)
- ret = cpu_ops[cpu]->cpu_init_idle(cpu);
+ ret = cpu_ops[cpu]->cpu_init_idle(drv, cpu);
return ret;
}
@@ -51,7 +51,7 @@ int arm_cpuidle_suspend(int index)
int acpi_processor_ffh_lpi_probe(unsigned int cpu)
{
- return arm_cpuidle_init(cpu);
+ return arm_cpuidle_init(NULL, cpu);
}
int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c
index 3a407a3ef22b..39413973b21d 100644
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -106,7 +106,7 @@ static int __init arm_idle_init_cpu(int cpu)
* Call arch CPU operations in order to initialize
* idle states suspend back-end specific data
*/
- ret = arm_cpuidle_init(cpu);
+ ret = arm_cpuidle_init(drv, cpu);
/*
* Allow the initialization to continue for other CPUs, if the reported
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index fe090ef43d28..88e90e0f06b9 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -303,7 +303,8 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
return 0;
}
-static int psci_dt_cpu_init_idle(struct device_node *cpu_node, int cpu)
+static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, int cpu)
{
int i, ret = 0, count = 0;
u32 *psci_states;
@@ -391,7 +392,7 @@ static int __maybe_unused psci_acpi_cpu_init_idle(unsigned int cpu)
}
#endif
-int psci_cpu_init_idle(unsigned int cpu)
+int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
{
struct device_node *cpu_node;
int ret;
@@ -410,7 +411,7 @@ int psci_cpu_init_idle(unsigned int cpu)
if (!cpu_node)
return -ENODEV;
- ret = psci_dt_cpu_init_idle(cpu_node, cpu);
+ ret = psci_dt_cpu_init_idle(drv, cpu_node, cpu);
of_node_put(cpu_node);
diff --git a/drivers/soc/qcom/spm.c b/drivers/soc/qcom/spm.c
index 53807e839664..6e967f0a8608 100644
--- a/drivers/soc/qcom/spm.c
+++ b/drivers/soc/qcom/spm.c
@@ -208,7 +208,8 @@ static const struct of_device_id qcom_idle_state_match[] __initconst = {
{ },
};
-static int __init qcom_cpuidle_init(struct device_node *cpu_node, int cpu)
+static int __init qcom_cpuidle_init(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, int cpu)
{
const struct of_device_id *match_id;
struct device_node *state_node;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 8b1b3b5935ab..4f29a3bff379 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -20,9 +20,11 @@
#define PSCI_POWER_STATE_TYPE_STANDBY 0
#define PSCI_POWER_STATE_TYPE_POWER_DOWN 1
+struct cpuidle_driver;
+
bool psci_tos_resident_on(int cpu);
-int psci_cpu_init_idle(unsigned int cpu);
+int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu);
int psci_cpu_suspend_enter(unsigned long index);
enum psci_conduit {
--
2.17.1
When the hierarchical CPU topology is used and when a CPU has been put
offline (hotplug), that same CPU prevents its PM domain and thus also
potential master PM domains, from being powered off. This is because genpd
observes the CPU's attached device as being active from a runtime PM point
of view.
To deal with this, let's decrease the runtime PM usage count by calling
pm_runtime_put_sync_suspend() of the attached struct device when putting
the CPU offline. Consequentially, we must then increase the runtime PM
usage count, while putting the CPU online again.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Use get_logical_index() to find the CPU number.
- Verify that a corresponding struct device* has been attached to the
PM domain before doing runtime PM refrence counting.
- Clear the domain state when the CPU goes offline, to start fresh.
- Move code to internal helper functions and move them inside
"ifdef CONFIG_CPU_IDLE.
---
drivers/firmware/psci/psci.c | 47 +++++++++++++++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 2c4157d3a616..5ad93c3694b5 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -15,6 +15,7 @@
#include <linux/acpi.h>
#include <linux/arm-smccc.h>
+#include <linux/cpu.h>
#include <linux/cpuidle.h>
#include <linux/errno.h>
#include <linux/linkage.h>
@@ -93,6 +94,9 @@ static u32 psci_function_id[PSCI_FN_MAX];
static u32 psci_cpu_suspend_feature;
static bool psci_system_reset2_supported;
+static void psci_cpuidle_cpu_off(void);
+static void psci_cpuidle_cpu_on(unsigned long cpuid);
+
static inline bool psci_has_ext_power_state(void)
{
return psci_cpu_suspend_feature &
@@ -188,6 +192,8 @@ static int psci_cpu_off(u32 state)
int err;
u32 fn;
+ psci_cpuidle_cpu_off();
+
fn = psci_function_id[PSCI_FN_CPU_OFF];
err = invoke_psci_fn(fn, state, 0, 0);
return psci_to_linux_errno(err);
@@ -200,7 +206,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
fn = psci_function_id[PSCI_FN_CPU_ON];
err = invoke_psci_fn(fn, cpuid, entry_point, 0);
- return psci_to_linux_errno(err);
+ err = psci_to_linux_errno(err);
+ if (err)
+ return err;
+
+ psci_cpuidle_cpu_on(cpuid);
+
+ return 0;
}
static int psci_migrate(unsigned long cpuid)
@@ -540,8 +552,41 @@ static int __init _psci_dt_topology_init(struct device_node *np)
return ret;
}
+
+static void psci_cpuidle_cpu_off(void)
+{
+ struct device *dev = __this_cpu_read(psci_cpuidle_data.dev);
+
+ /*
+ * Drop the runtime PM usage count if the CPU has been attached to a
+ * CPU PM domain. This is needed to, for example, not prevent other
+ * master domains in the hierarchy to remain powered on.
+ */
+ if (dev)
+ pm_runtime_put_sync_suspend(dev);
+}
+
+static void psci_cpuidle_cpu_on(unsigned long cpuid)
+{
+ struct device *dev;
+ int cpu;
+
+ if (!psci_dt_topology)
+ return;
+
+ cpu = get_logical_index(cpuid);
+ if (cpu < 0)
+ return;
+
+ dev = per_cpu(psci_cpuidle_data.dev, cpu);
+ if (dev)
+ pm_runtime_get_sync(dev);
+}
+
#else
static inline int _psci_dt_topology_init(struct device_node *np) { return 0; }
+static void psci_cpuidle_cpu_off(void) {}
+static void psci_cpuidle_cpu_on(unsigned long cpuid) {}
#endif
static int psci_system_suspend(unsigned long unused)
--
2.17.1
Subsequent changes implements support for PM domains to PSCI. Those changes
are mainly implemented in a new separate c-file, hence a couple of the
internal PSCI functions needs to be shared to be accessible. Let's do that
via adding a new PSCI header file.
Moreover, to implement support for PM domains, switching the PSCI FW into
the OS initiated mode is sometimes needed. Therefore, let's share a new
function that implement this.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Convert psci_set_osi_mode() to return an int.
- Don't share psci_get_domain_state() as that's no longer needed.
- Update changelog.
---
drivers/firmware/psci/psci.c | 17 ++++++++++++++---
drivers/firmware/psci/psci.h | 16 ++++++++++++++++
2 files changed, 30 insertions(+), 3 deletions(-)
create mode 100644 drivers/firmware/psci/psci.h
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 4aec513136e4..0e91d864e346 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -34,6 +34,8 @@
#include <asm/smp_plat.h>
#include <asm/suspend.h>
+#include "psci.h"
+
/*
* While a 64-bit OS can make calls with SMC32 calling conventions, for some
* calls it is necessary to use SMC64 to pass or return 64-bit values.
@@ -96,7 +98,7 @@ static inline bool psci_has_ext_power_state(void)
PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
}
-static inline bool psci_has_osi_support(void)
+bool psci_has_osi_support(void)
{
return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
}
@@ -161,6 +163,15 @@ static u32 psci_get_version(void)
return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
}
+int psci_set_osi_mode(void)
+{
+ int err;
+
+ err = invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
+ PSCI_1_0_SUSPEND_MODE_OSI, 0, 0);
+ return psci_to_linux_errno(err);
+}
+
static int psci_cpu_suspend(u32 state, unsigned long entry_point)
{
int err;
@@ -292,12 +303,12 @@ static inline u32 psci_get_domain_state(void)
return __this_cpu_read(domain_state);
}
-static inline void psci_set_domain_state(u32 state)
+void psci_set_domain_state(u32 state)
{
__this_cpu_write(domain_state, state);
}
-static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
+int psci_dt_parse_state_node(struct device_node *np, u32 *state)
{
int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
new file mode 100644
index 000000000000..f2277c3ad405
--- /dev/null
+++ b/drivers/firmware/psci/psci.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PSCI_H
+#define __PSCI_H
+
+struct device_node;
+
+int psci_set_osi_mode(void);
+bool psci_has_osi_support(void);
+
+#ifdef CONFIG_CPU_IDLE
+void psci_set_domain_state(u32 state);
+int psci_dt_parse_state_node(struct device_node *np, u32 *state);
+#endif
+
+#endif /* __PSCI_H */
--
2.17.1
From: Lina Iyer <[email protected]>
In the hierarchical layout, we are creating power domains around each CPU
and describes the idle states for them inside the power domain provider
node. Note that, the CPU's idle states still needs to be compatible with
"arm,idle-state".
Furthermore, represent the CPU cluster as a separate master power domain,
powering the CPU's power domains. The cluster node, contains the idle
states for the cluster and each idle state needs to be compatible with the
"domain-idle-state".
If the running platform is using a PSCI FW that supports the OS initiated
CPU suspend mode, which likely should be the case unless the PSCI FW is
very old, this change triggers the PSCI driver to enable it.
Cc: Andy Gross <[email protected]>
Cc: David Brown <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Co-developed-by: Ulf Hansson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- None.
---
arch/arm64/boot/dts/qcom/msm8916.dtsi | 57 +++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi b/arch/arm64/boot/dts/qcom/msm8916.dtsi
index 0803ca8c02da..1bb33f0326b5 100644
--- a/arch/arm64/boot/dts/qcom/msm8916.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi
@@ -110,10 +110,11 @@
reg = <0x0>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD0>;
+ power-domain-names = "psci";
};
CPU1: cpu@1 {
@@ -122,10 +123,11 @@
reg = <0x1>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD1>;
+ power-domain-names = "psci";
};
CPU2: cpu@2 {
@@ -134,10 +136,11 @@
reg = <0x2>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD2>;
+ power-domain-names = "psci";
};
CPU3: cpu@3 {
@@ -146,10 +149,11 @@
reg = <0x3>;
next-level-cache = <&L2_0>;
enable-method = "psci";
- cpu-idle-states = <&CPU_SPC>;
clocks = <&apcs>;
operating-points-v2 = <&cpu_opp_table>;
#cooling-cells = <2>;
+ power-domains = <&CPU_PD3>;
+ power-domain-names = "psci";
};
L2_0: l2-cache {
@@ -166,12 +170,57 @@
min-residency-us = <2000>;
local-timer-stop;
};
+
+ CLUSTER_RET: cluster-retention {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000010>;
+ entry-latency-us = <500>;
+ exit-latency-us = <500>;
+ min-residency-us = <2000>;
+ };
+
+ CLUSTER_PWRDN: cluster-gdhs {
+ compatible = "domain-idle-state";
+ arm,psci-suspend-param = <0x1000030>;
+ entry-latency-us = <2000>;
+ exit-latency-us = <2000>;
+ min-residency-us = <6000>;
+ };
};
};
psci {
compatible = "arm,psci-1.0";
method = "smc";
+
+ CPU_PD0: cpu-pd0 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CPU_PD1: cpu-pd1 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CPU_PD2: cpu-pd2 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CPU_PD3: cpu-pd3 {
+ #power-domain-cells = <0>;
+ power-domains = <&CLUSTER_PD>;
+ domain-idle-states = <&CPU_SPC>;
+ };
+
+ CLUSTER_PD: cluster-pd {
+ #power-domain-cells = <0>;
+ domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>;
+ };
};
pmu {
--
2.17.1
In order to allow the CPU to be power managed through a potential PM domain
and the corresponding topology, it needs to be attached to it. For that
reason, check if the PM domain data structures have been initiated for PSCI
and if so, let's try to attach the CPU device to its PM domain.
However, before attaching the CPU to its PM domain, we need to check
whether the PSCI firmware supports OS initiated mode or not. If that isn't
the case, we rely solely on the cpuidle framework to deal with the idle
state selection, which means we need to parse DT and convert the
hierarchical described domain idle states into regular cpuidle states,
hence let's do that.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Adapt to updated psci_dt_attach_cpu() helper, as it now returns a
struct device * instead of an int.
- Create a per CPU struct, to store the relevant PSCI cpuidle data.
---
drivers/firmware/psci/psci.c | 46 +++++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 28745234b53f..54e23d4ed0ea 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -295,7 +295,13 @@ static int __init psci_features(u32 psci_func_id)
}
#ifdef CONFIG_CPU_IDLE
-static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
+
+struct psci_cpuidle_data {
+ u32 *psci_states;
+ struct device *dev;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
static DEFINE_PER_CPU(u32, domain_state);
static bool psci_dt_topology;
@@ -332,8 +338,9 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
int i, ret = 0, num_state_nodes = drv->state_count - 1;
u32 *psci_states;
struct device_node *state_node;
+ struct psci_cpuidle_data *data = per_cpu_ptr(&psci_cpuidle_data, cpu);
- psci_states = kcalloc(num_state_nodes, sizeof(*psci_states),
+ psci_states = kcalloc(CPUIDLE_STATE_MAX, sizeof(*psci_states),
GFP_KERNEL);
if (!psci_states)
return -ENOMEM;
@@ -357,8 +364,31 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
goto free_mem;
}
- /* Idle states parsed correctly, initialize per-cpu pointer */
- per_cpu(psci_power_state, cpu) = psci_states;
+ /*
+ * If the hierarchical CPU topology is used, let's attach the CPU device
+ * to its corresponding PM domain. If OSI mode isn't supported, convert
+ * the additional domain idle states from the hierarchical DT layout
+ * into regular flattened cpuidle states, as to let cpuidle manage them.
+ */
+ if (psci_dt_topology) {
+ struct device *dev;
+
+ if (!psci_has_osi_support()) {
+ ret = psci_dt_pm_domains_parse_states(drv, cpu_node,
+ psci_states);
+ if (ret)
+ goto free_mem;
+ }
+
+ dev = psci_dt_attach_cpu(cpu);
+ if (IS_ERR_OR_NULL(dev))
+ goto free_mem;
+
+ data->dev = dev;
+ }
+
+ /* Idle states parsed correctly, store them in the per-cpu struct. */
+ data->psci_states = psci_states;
return 0;
free_mem:
@@ -403,8 +433,8 @@ static int __maybe_unused psci_acpi_cpu_init_idle(unsigned int cpu)
}
psci_states[i] = state;
}
- /* Idle states parsed correctly, initialize per-cpu pointer */
- per_cpu(psci_power_state, cpu) = psci_states;
+ /* Idle states parsed correctly, store them in the per-cpu struct. */
+ per_cpu(psci_cpuidle_data.psci_states, cpu) = psci_states;
return 0;
}
#else
@@ -442,7 +472,7 @@ int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
static int psci_suspend_finisher(unsigned long index)
{
- u32 *state = __this_cpu_read(psci_power_state);
+ u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
u32 composite_state = state[index - 1] | psci_get_domain_state();
return psci_ops.cpu_suspend(composite_state, __pa_symbol(cpu_resume));
@@ -451,7 +481,7 @@ static int psci_suspend_finisher(unsigned long index)
int psci_cpu_suspend_enter(unsigned long index)
{
int ret;
- u32 *state = __this_cpu_read(psci_power_state);
+ u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
u32 composite_state = state[index - 1] | psci_get_domain_state();
/*
--
2.17.1
If the hierarchical CPU topology is used, but the OS initiated mode isn't
supported, we need to rely solely on the regular cpuidle framework to
manage the idle state selection, rather than using genpd and its governor.
For this reason, introduce a new PSCI DT helper function,
psci_dt_pm_domains_parse_states(), which parses and converts the
hierarchically described domain idle states from DT, into regular flattened
cpuidle states. The converted states are added to the existing cpuidle
driver's array of idle states, which make them available for cpuidle.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Some simplification of the code.
---
drivers/firmware/psci/psci.h | 5 ++
drivers/firmware/psci/psci_pm_domain.c | 118 +++++++++++++++++++++++++
2 files changed, 123 insertions(+)
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
index 00d2e3dcef49..c36e0e6649e9 100644
--- a/drivers/firmware/psci/psci.h
+++ b/drivers/firmware/psci/psci.h
@@ -3,6 +3,7 @@
#ifndef __PSCI_H
#define __PSCI_H
+struct cpuidle_driver;
struct device_node;
int psci_set_osi_mode(void);
@@ -13,8 +14,12 @@ void psci_set_domain_state(u32 state);
int psci_dt_parse_state_node(struct device_node *np, u32 *state);
#ifdef CONFIG_PM_GENERIC_DOMAINS_OF
int psci_dt_init_pm_domains(struct device_node *np);
+int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, u32 *psci_states);
#else
static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
+static inline int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, u32 *psci_states) { return 0; }
#endif
#endif
diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
index 3c6ca846caf4..3aa645dba81b 100644
--- a/drivers/firmware/psci/psci_pm_domain.c
+++ b/drivers/firmware/psci/psci_pm_domain.c
@@ -14,6 +14,10 @@
#include <linux/pm_domain.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/cpuidle.h>
+#include <linux/cpu_pm.h>
+
+#include <asm/cpuidle.h>
#include "psci.h"
@@ -104,6 +108,53 @@ static void psci_pd_free_states(struct genpd_power_state *states,
kfree(states);
}
+static int psci_pd_enter_pc(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
+{
+ return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx);
+}
+
+static void psci_pd_enter_s2idle_pc(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
+{
+ psci_pd_enter_pc(dev, drv, idx);
+}
+
+static void psci_pd_convert_states(struct cpuidle_state *idle_state,
+ u32 *psci_state, struct genpd_power_state *state)
+{
+ u32 *state_data = state->data;
+ u64 target_residency_us = state->residency_ns;
+ u64 exit_latency_us = state->power_on_latency_ns +
+ state->power_off_latency_ns;
+
+ *psci_state = *state_data;
+ do_div(target_residency_us, 1000);
+ idle_state->target_residency = target_residency_us;
+ do_div(exit_latency_us, 1000);
+ idle_state->exit_latency = exit_latency_us;
+ idle_state->enter = &psci_pd_enter_pc;
+ idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc;
+ idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+
+ strncpy(idle_state->name, to_of_node(state->fwnode)->name,
+ CPUIDLE_NAME_LEN - 1);
+ strncpy(idle_state->desc, to_of_node(state->fwnode)->name,
+ CPUIDLE_NAME_LEN - 1);
+}
+
+static bool psci_pd_is_provider(struct device_node *np)
+{
+ struct psci_pd_provider *pd_prov, *it;
+
+ list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) {
+ if (pd_prov->node == np)
+ return true;
+ }
+
+ return false;
+}
+
static int psci_pd_init(struct device_node *np)
{
struct generic_pm_domain *pd;
@@ -265,4 +316,71 @@ int psci_dt_init_pm_domains(struct device_node *np)
pr_err("failed to create CPU PM domains ret=%d\n", ret);
return ret;
}
+
+int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
+ struct device_node *cpu_node, u32 *psci_states)
+{
+ struct genpd_power_state *pd_states;
+ struct of_phandle_args args;
+ int ret, pd_state_count, i, state_idx, psci_idx;
+ u32 cpu_psci_state = psci_states[drv->state_count - 2];
+ struct device_node *np = of_node_get(cpu_node);
+
+
+ /* Walk the CPU topology to find compatible domain idle states. */
+ while (np) {
+ ret = of_parse_phandle_with_args(np, "power-domains",
+ "#power-domain-cells", 0, &args);
+ of_node_put(np);
+ if (ret)
+ return 0;
+
+ np = args.np;
+
+ /* Verify that the node represents a psci pd provider. */
+ if (!psci_pd_is_provider(np)) {
+ of_node_put(np);
+ return 0;
+ }
+
+ /* Parse for compatible domain idle states. */
+ ret = psci_pd_parse_states(np, &pd_states, &pd_state_count);
+ if (ret) {
+ of_node_put(np);
+ return ret;
+ }
+
+ i = 0;
+ while (i < pd_state_count) {
+
+ state_idx = drv->state_count;
+ if (state_idx >= CPUIDLE_STATE_MAX) {
+ pr_warn("exceeding max cpuidle states\n");
+ of_node_put(np);
+ return 0;
+ }
+
+ /* WFI state is not part of psci_states. */
+ psci_idx = state_idx - 1 + i;
+ psci_pd_convert_states(&drv->states[state_idx + i],
+ &psci_states[psci_idx], &pd_states[i]);
+
+ /*
+ * In the hierarchical CPU topology the master PM domain
+ * idle state's DT property, "arm,psci-suspend-param",
+ * don't contain the bits for the idle state of the CPU,
+ * let's add those here.
+ */
+ psci_states[psci_idx] |= cpu_psci_state;
+ pr_debug("psci-power-state %#x index %d\n",
+ psci_states[psci_idx], psci_idx);
+
+ drv->state_count++;
+ i++;
+ }
+ psci_pd_free_states(pd_states, pd_state_count);
+ }
+
+ return 0;
+}
#endif
--
2.17.1
To let the PSCI driver parse for the hierarchical CPU topology in DT and
thus potentially initiate the corresponding PM domain data structures,
let's call psci_dt_topology_init() from the existing topology_init()
subsys_initcall.
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- None.
---
arch/arm64/kernel/setup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 413d566405d1..f1559223c55b 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -367,6 +367,9 @@ static int __init topology_init(void)
{
int i;
+ if (acpi_disabled)
+ psci_dt_topology_init();
+
for_each_online_node(i)
register_one_node(i);
--
2.17.1
When the hierarchical CPU topology layout is used in DT, we need to setup
the corresponding PM domain data structures, as to allow a CPU and a group
of CPUs to be power managed accordingly. Let's enable this by deploying
support through the genpd interface.
Additionally, when the OS initiated mode is supported by the PSCI FW, let's
also parse the domain idle states DT bindings as to make genpd responsible
for the state selection, when the states are compatible with
"domain-idle-state". Otherwise, when only Platform Coordinated mode is
supported, we rely solely on the state selection to be managed through the
regular cpuidle framework.
If the initialization of the PM domain data structures succeeds and the OS
initiated mode is supported, we try to switch to it. In case it fails,
let's fall back into a degraded mode, rather than bailing out and returning
an error code.
Due to that the OS initiated mode may become enabled, we need to adjust to
maintain backwards compatibility for a kernel started through a kexec call.
Do this by explicitly switch to Platform Coordinated mode during boot.
Finally, the actual initialization of the PM domain data structures, is
done via calling the new shared function, psci_dt_init_pm_domains().
However, this is implemented by subsequent changes.
Co-developed-by: Lina Iyer <[email protected]>
Signed-off-by: Lina Iyer <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Simplify code setting domain_state at power off.
- Use the genpd ->free_state() callback to manage freeing of states.
- Fixup a bogus while loop.
---
drivers/firmware/psci/Makefile | 2 +-
drivers/firmware/psci/psci.c | 7 +-
drivers/firmware/psci/psci.h | 5 +
drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
4 files changed, 280 insertions(+), 2 deletions(-)
create mode 100644 drivers/firmware/psci/psci_pm_domain.c
diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
index 1956b882470f..ff300f1fec86 100644
--- a/drivers/firmware/psci/Makefile
+++ b/drivers/firmware/psci/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
#
-obj-$(CONFIG_ARM_PSCI_FW) += psci.o
+obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 0e91d864e346..bfef300b7ebe 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -721,9 +721,14 @@ static int __init psci_1_0_init(struct device_node *np)
if (err)
return err;
- if (psci_has_osi_support())
+ if (psci_has_osi_support()) {
pr_info("OSI mode supported.\n");
+ /* Make sure we default to PC mode. */
+ invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
+ PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
+ }
+
return 0;
}
diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
index f2277c3ad405..00d2e3dcef49 100644
--- a/drivers/firmware/psci/psci.h
+++ b/drivers/firmware/psci/psci.h
@@ -11,6 +11,11 @@ bool psci_has_osi_support(void);
#ifdef CONFIG_CPU_IDLE
void psci_set_domain_state(u32 state);
int psci_dt_parse_state_node(struct device_node *np, u32 *state);
+#ifdef CONFIG_PM_GENERIC_DOMAINS_OF
+int psci_dt_init_pm_domains(struct device_node *np);
+#else
+static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
+#endif
#endif
#endif /* __PSCI_H */
diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
new file mode 100644
index 000000000000..3c6ca846caf4
--- /dev/null
+++ b/drivers/firmware/psci/psci_pm_domain.c
@@ -0,0 +1,268 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PM domains for CPUs via genpd - managed by PSCI.
+ *
+ * Copyright (C) 2019 Linaro Ltd.
+ * Author: Ulf Hansson <[email protected]>
+ *
+ */
+
+#define pr_fmt(fmt) "psci: " fmt
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/pm_domain.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "psci.h"
+
+#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_PM_GENERIC_DOMAINS_OF)
+
+struct psci_pd_provider {
+ struct list_head link;
+ struct device_node *node;
+};
+
+static LIST_HEAD(psci_pd_providers);
+static bool osi_mode_enabled;
+
+static int psci_pd_power_off(struct generic_pm_domain *pd)
+{
+ struct genpd_power_state *state = &pd->states[pd->state_idx];
+ u32 *pd_state;
+
+ /* If we have failed to enable OSI mode, then abort power off. */
+ if (psci_has_osi_support() && !osi_mode_enabled)
+ return -EBUSY;
+
+ if (!state->data)
+ return 0;
+
+ /* When OSI mode is enabled, set the corresponding domain state. */
+ pd_state = state->data;
+ psci_set_domain_state(*pd_state);
+
+ return 0;
+}
+
+static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
+ int state_count)
+{
+ int i, ret;
+ u32 psci_state, *psci_state_buf;
+
+ for (i = 0; i < state_count; i++) {
+ ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
+ &psci_state);
+ if (ret)
+ goto free_state;
+
+ psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
+ if (!psci_state_buf) {
+ ret = -ENOMEM;
+ goto free_state;
+ }
+ *psci_state_buf = psci_state;
+ states[i].data = psci_state_buf;
+ }
+
+ return 0;
+
+free_state:
+ i--;
+ for (; i >= 0; i--)
+ kfree(states[i].data);
+ return ret;
+}
+
+static int psci_pd_parse_states(struct device_node *np,
+ struct genpd_power_state **states, int *state_count)
+{
+ int ret;
+
+ /* Parse the domain idle states. */
+ ret = of_genpd_parse_idle_states(np, states, state_count);
+ if (ret)
+ return ret;
+
+ /* Fill out the PSCI specifics for each found state. */
+ ret = psci_pd_parse_state_nodes(*states, *state_count);
+ if (ret)
+ kfree(*states);
+
+ return ret;
+}
+
+static void psci_pd_free_states(struct genpd_power_state *states,
+ unsigned int state_count)
+{
+ int i;
+
+ for (i = 0; i < state_count; i++)
+ kfree(states[i].data);
+ kfree(states);
+}
+
+static int psci_pd_init(struct device_node *np)
+{
+ struct generic_pm_domain *pd;
+ struct psci_pd_provider *pd_provider;
+ struct dev_power_governor *pd_gov;
+ struct genpd_power_state *states = NULL;
+ int ret = -ENOMEM, state_count = 0;
+
+ pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+ if (!pd)
+ goto out;
+
+ pd_provider = kzalloc(sizeof(*pd_provider), GFP_KERNEL);
+ if (!pd_provider)
+ goto free_pd;
+
+ pd->name = kasprintf(GFP_KERNEL, "%pOF", np);
+ if (!pd->name)
+ goto free_pd_prov;
+
+ /*
+ * For OSI mode, parse the domain idle states and let genpd manage the
+ * state selection for those being compatible with "domain-idle-state".
+ */
+ if (psci_has_osi_support()) {
+ ret = psci_pd_parse_states(np, &states, &state_count);
+ if (ret)
+ goto free_name;
+ pd->free_states = psci_pd_free_states;
+ }
+
+ pd->name = kbasename(pd->name);
+ pd->power_off = psci_pd_power_off;
+ pd->states = states;
+ pd->state_count = state_count;
+ pd->flags |= GENPD_FLAG_IRQ_SAFE | GENPD_FLAG_CPU_DOMAIN;
+
+ /* Use governor for CPU PM domains if it has some states to manage. */
+ pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL;
+
+ ret = pm_genpd_init(pd, pd_gov, false);
+ if (ret) {
+ psci_pd_free_states(states, state_count);
+ goto free_name;
+ }
+
+ ret = of_genpd_add_provider_simple(np, pd);
+ if (ret)
+ goto remove_pd;
+
+ pd_provider->node = of_node_get(np);
+ list_add(&pd_provider->link, &psci_pd_providers);
+
+ pr_debug("init PM domain %s\n", pd->name);
+ return 0;
+
+remove_pd:
+ pm_genpd_remove(pd);
+free_name:
+ kfree(pd->name);
+free_pd_prov:
+ kfree(pd_provider);
+free_pd:
+ kfree(pd);
+out:
+ pr_err("failed to init PM domain ret=%d %pOF\n", ret, np);
+ return ret;
+}
+
+static void psci_pd_remove(void)
+{
+ struct psci_pd_provider *pd_provider, *it;
+ struct generic_pm_domain *genpd;
+
+ list_for_each_entry_safe(pd_provider, it, &psci_pd_providers, link) {
+ of_genpd_del_provider(pd_provider->node);
+
+ genpd = of_genpd_remove_last(pd_provider->node);
+ if (!IS_ERR(genpd))
+ kfree(genpd);
+
+ of_node_put(pd_provider->node);
+ list_del(&pd_provider->link);
+ kfree(pd_provider);
+ }
+}
+
+static int psci_pd_init_topology(struct device_node *np)
+{
+ struct device_node *node;
+ struct of_phandle_args child, parent;
+ int ret;
+
+ for_each_child_of_node(np, node) {
+ if (of_parse_phandle_with_args(node, "power-domains",
+ "#power-domain-cells", 0, &parent))
+ continue;
+
+ child.np = node;
+ child.args_count = 0;
+
+ ret = of_genpd_add_subdomain(&parent, &child);
+ of_node_put(parent.np);
+ if (ret) {
+ of_node_put(node);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+int psci_dt_init_pm_domains(struct device_node *np)
+{
+ struct device_node *node;
+ int ret, pd_count = 0;
+
+ /*
+ * Parse child nodes for the "#power-domain-cells" property and
+ * initialize a genpd/genpd-of-provider pair when it's found.
+ */
+ for_each_child_of_node(np, node) {
+ if (!of_find_property(node, "#power-domain-cells", NULL))
+ continue;
+
+ ret = psci_pd_init(node);
+ if (ret)
+ goto put_node;
+
+ pd_count++;
+ }
+
+ /* Bail out if not using the hierarchical CPU topology. */
+ if (!pd_count)
+ return 0;
+
+ /* Link genpd masters/subdomains to model the CPU topology. */
+ ret = psci_pd_init_topology(np);
+ if (ret)
+ goto remove_pd;
+
+ /* Try to enable OSI mode if supported. */
+ if (psci_has_osi_support()) {
+ ret = psci_set_osi_mode();
+ if (ret)
+ pr_warn("failed to enable OSI mode: %d\n", ret);
+ else
+ osi_mode_enabled = true;
+ }
+
+ pr_info("Initialized CPU PM domain topology\n");
+ return pd_count;
+
+put_node:
+ of_node_put(node);
+remove_pd:
+ if (pd_count)
+ psci_pd_remove();
+ pr_err("failed to create CPU PM domains ret=%d\n", ret);
+ return ret;
+}
+#endif
--
2.17.1
When the hierarchical CPU topology layout is used in DT, let's allow the
CPU to be power managed through its PM domain, via deploying runtime PM
support.
To know for which idle states runtime PM reference counting is needed,
let's store the index of deepest idle state for the CPU, in a per CPU
variable. This allows psci_cpu_suspend_enter() to compare this index with
the requested idle state index and then act accordingly.
Signed-off-by: Ulf Hansson <[email protected]>
---
Changes:
- Simplify the code by using the new per CPU struct, that stores the
needed struct device*.
---
drivers/firmware/psci/psci.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 54e23d4ed0ea..2c4157d3a616 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -20,6 +20,7 @@
#include <linux/linkage.h>
#include <linux/of.h>
#include <linux/pm.h>
+#include <linux/pm_runtime.h>
#include <linux/printk.h>
#include <linux/psci.h>
#include <linux/reboot.h>
@@ -298,6 +299,7 @@ static int __init psci_features(u32 psci_func_id)
struct psci_cpuidle_data {
u32 *psci_states;
+ u32 rpm_state_id;
struct device *dev;
};
@@ -385,6 +387,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
goto free_mem;
data->dev = dev;
+ data->rpm_state_id = drv->state_count - 1;
}
/* Idle states parsed correctly, store them in the per-cpu struct. */
@@ -481,8 +484,11 @@ static int psci_suspend_finisher(unsigned long index)
int psci_cpu_suspend_enter(unsigned long index)
{
int ret;
- u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
- u32 composite_state = state[index - 1] | psci_get_domain_state();
+ struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
+ u32 *states = data->psci_states;
+ struct device *dev = data->dev;
+ bool runtime_pm = (dev && data->rpm_state_id == index);
+ u32 composite_state;
/*
* idle state index 0 corresponds to wfi, should never be called
@@ -491,11 +497,23 @@ int psci_cpu_suspend_enter(unsigned long index)
if (WARN_ON_ONCE(!index))
return -EINVAL;
+ /*
+ * Do runtime PM if we are using the hierarchical CPU toplogy, but only
+ * when cpuidle have selected the deepest idle state for the CPU.
+ */
+ if (runtime_pm)
+ pm_runtime_put_sync_suspend(dev);
+
+ composite_state = states[index - 1] | psci_get_domain_state();
+
if (!psci_power_state_loses_context(composite_state))
ret = psci_ops.cpu_suspend(composite_state, 0);
else
ret = cpu_suspend(index, psci_suspend_finisher);
+ if (runtime_pm)
+ pm_runtime_get_sync(dev);
+
/* Clear the domain state to start fresh when back from idle. */
psci_set_domain_state(0);
--
2.17.1
On Mon, May 13, 2019 at 9:23 PM Ulf Hansson <[email protected]> wrote:
>
> This series enables support for hierarchical CPU arrangement, managed by PSCI
> for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> recently was extended to manage devices belonging to CPUs.
ACK for the patches touching cpuidle in this series (from the
framework perspective), but I'm assuming it to be taken care of by
ARM/ARM64 maintainers.
On Tue, 14 May 2019 at 10:08, Rafael J. Wysocki <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 9:23 PM Ulf Hansson <[email protected]> wrote:
> >
> > This series enables support for hierarchical CPU arrangement, managed by PSCI
> > for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> > recently was extended to manage devices belonging to CPUs.
>
> ACK for the patches touching cpuidle in this series (from the
> framework perspective), but I'm assuming it to be taken care of by
> ARM/ARM64 maintainers.
Thanks for the ack! Yes, this is for PSCI/ARM maintainers.
BTW, apologize for sending this in the merge window, but wanted to
take the opportunity for people to have a look before OSPM Pisa next
week.
Kind regards
Uffe
Sudeep, Lorenzo, Mark,
On Mon, 13 May 2019 at 21:23, Ulf Hansson <[email protected]> wrote:
>
> This series enables support for hierarchical CPU arrangement, managed by PSCI
> for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> recently was extended to manage devices belonging to CPUs.
>
> The last two DTS patches enables the hierarchical topology to be used for the
> Qcom 410c Dragonboard and the Hisilicon Hikey board. The former uses PSCI OS-
> initiated mode, while the latter uses the PSCI Platform-Coordinated mode. In
> other words, the hierarchical description of the topology in DT, is orthogonal
> to the supported PSCI CPU suspend mode.
>
> Do note, these patches have been posted earlier, but then being part of bigger
> series, which at that point also included the needed infrastructure changes to
> genpd and cpuidle. Rather than continue to carry the old version history,
> which may be a bit confusing, I decided to start over. Although, for clarity,
> the changelog below explains what changes that have been made since the last
> submission was made.
Is there anything I can do to help the review to get going here?
FYI, I hosted a talk about "cluster idle" at OSPM in Pisa a few weeks
ago. There is a couple of slides [1] with flowcharts of how it works,
that may be of interest for you.
Kind regards
Uffe
[...]
[1]
http://retis.sssup.it/ospm-summit/Downloads/01_02-ClusterIdle_UlfHansson.pdf
On Mon, May 13, 2019 at 09:22:46PM +0200, Ulf Hansson wrote:
> To allow arch back-end init ops to operate on the cpuidle driver for the
> corresponding CPU, let's pass along a pointer to the struct cpuidle_driver*
> and forward it the relevant layers of callbacks for ARM/ARM64.
>
I may be wrong, but I see drv is just used to get state_count mostly.
It's also used in flattening part, but that should not be here either.
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:47PM +0200, Ulf Hansson wrote:
> Instead of iterating through all the state nodes in DT, to find out how
> many states that needs to be allocated, let's use the number already known
> by the cpuidle driver. In this way we can drop the iteration altogether.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> Acked-by: Daniel Lezcano <[email protected]>
> ---
>
> Changes:
> - None.
>
> ---
> drivers/firmware/psci/psci.c | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 88e90e0f06b9..9c2180bcee4c 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -306,26 +306,20 @@ static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
> struct device_node *cpu_node, int cpu)
> {
> - int i, ret = 0, count = 0;
> + int i, ret = 0, num_state_nodes = drv->state_count - 1;
> u32 *psci_states;
> struct device_node *state_node;
>
> - /* Count idle states */
> - while ((state_node = of_parse_phandle(cpu_node, "cpu-idle-states",
> - count))) {
> - count++;
> - of_node_put(state_node);
> - }
> -
> - if (!count)
> - return -ENODEV;
> -
> - psci_states = kcalloc(count, sizeof(*psci_states), GFP_KERNEL);
> + psci_states = kcalloc(num_state_nodes, sizeof(*psci_states),
> + GFP_KERNEL);
> if (!psci_states)
> return -ENOMEM;
>
> - for (i = 0; i < count; i++) {
> + for (i = 0; i < num_state_nodes; i++) {
> state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
Why not start using of_get_cpu_state_node here which was introduced in
earlier patch as part of this simplification ?
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:48PM +0200, Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> Currently CPU's idle states are represented in a flattened model, via the
> "cpu-idle-states" binding from within the CPU's device nodes.
>
> Support the hierarchical layout, simply by converting to calling the new OF
> helper, of_get_cpu_state_node().
>
> Suggested-by: Sudeep Holla <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - None.
>
> ---
> drivers/firmware/psci/psci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 9c2180bcee4c..b11560f7c4b9 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -316,7 +316,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
> return -ENOMEM;
>
> for (i = 0; i < num_state_nodes; i++) {
> - state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
> + state_node = of_get_cpu_state_node(cpu_node, i);
Ah I spoke too early, it's introduced here. Can be part of earlier
simplification patch, but I am fine either way.
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:49PM +0200, Ulf Hansson wrote:
> The per CPU variable psci_power_state, contains an array of fixed values,
> which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> each of the available CPU idle states.
>
> This isn't sufficient when using the hierarchical CPU topology in DT in
> combination with having PSCI OS initiated (OSI) mode enabled. More
> precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> idle state the cluster (a group of CPUs) should enter, while in PSCI
> Platform Coordinated (PC) mode, each CPU independently votes for an idle
> state of the cluster.
>
> For this reason, let's introduce an additional per CPU variable called
> domain_state and implement two helper functions to read/write its values.
> Following patches, which implements PM domain support for PSCI, will use
> the domain_state variable and set it to corresponding bits that represents
> the selected idle state for the cluster.
>
> Finally, in psci_cpu_suspend_enter() and psci_suspend_finisher(), let's
> take into account the values in the domain_state, as to get the complete
> suspend parameter.
>
I understand it was split to ease review, but this patch also does
nothing as domain_state = 0 always. I was trying hard to find where it's
set, but I assume it will be done in later patches. Again may be this
can be squashed into the first caller of psci_set_domain_state
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
> When the hierarchical CPU topology layout is used in DT, we need to setup
> the corresponding PM domain data structures, as to allow a CPU and a group
> of CPUs to be power managed accordingly. Let's enable this by deploying
> support through the genpd interface.
>
> Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> also parse the domain idle states DT bindings as to make genpd responsible
> for the state selection, when the states are compatible with
> "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> supported, we rely solely on the state selection to be managed through the
> regular cpuidle framework.
>
> If the initialization of the PM domain data structures succeeds and the OS
> initiated mode is supported, we try to switch to it. In case it fails,
> let's fall back into a degraded mode, rather than bailing out and returning
> an error code.
>
> Due to that the OS initiated mode may become enabled, we need to adjust to
> maintain backwards compatibility for a kernel started through a kexec call.
> Do this by explicitly switch to Platform Coordinated mode during boot.
>
> Finally, the actual initialization of the PM domain data structures, is
> done via calling the new shared function, psci_dt_init_pm_domains().
> However, this is implemented by subsequent changes.
>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - Simplify code setting domain_state at power off.
> - Use the genpd ->free_state() callback to manage freeing of states.
> - Fixup a bogus while loop.
>
> ---
> drivers/firmware/psci/Makefile | 2 +-
> drivers/firmware/psci/psci.c | 7 +-
> drivers/firmware/psci/psci.h | 5 +
> drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
> 4 files changed, 280 insertions(+), 2 deletions(-)
> create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>
> diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
> index 1956b882470f..ff300f1fec86 100644
> --- a/drivers/firmware/psci/Makefile
> +++ b/drivers/firmware/psci/Makefile
> @@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
> #
> -obj-$(CONFIG_ARM_PSCI_FW) += psci.o
> +obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
> obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 0e91d864e346..bfef300b7ebe 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -721,9 +721,14 @@ static int __init psci_1_0_init(struct device_node *np)
> if (err)
> return err;
>
> - if (psci_has_osi_support())
> + if (psci_has_osi_support()) {
> pr_info("OSI mode supported.\n");
>
> + /* Make sure we default to PC mode. */
> + invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> + PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
> + }
> +
This can go on it's own, any issues with that ?
> return 0;
> }
>
> diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> index f2277c3ad405..00d2e3dcef49 100644
> --- a/drivers/firmware/psci/psci.h
> +++ b/drivers/firmware/psci/psci.h
> @@ -11,6 +11,11 @@ bool psci_has_osi_support(void);
> #ifdef CONFIG_CPU_IDLE
> void psci_set_domain_state(u32 state);
> int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> +#ifdef CONFIG_PM_GENERIC_DOMAINS_OF
> +int psci_dt_init_pm_domains(struct device_node *np);
> +#else
> +static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> +#endif
> #endif
>
> #endif /* __PSCI_H */
> diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> new file mode 100644
> index 000000000000..3c6ca846caf4
> --- /dev/null
> +++ b/drivers/firmware/psci/psci_pm_domain.c
> @@ -0,0 +1,268 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PM domains for CPUs via genpd - managed by PSCI.
> + *
> + * Copyright (C) 2019 Linaro Ltd.
> + * Author: Ulf Hansson <[email protected]>
> + *
> + */
> +
> +#define pr_fmt(fmt) "psci: " fmt
> +
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/pm_domain.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +
> +#include "psci.h"
> +
> +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_PM_GENERIC_DOMAINS_OF)
> +
> +struct psci_pd_provider {
> + struct list_head link;
> + struct device_node *node;
> +};
> +
> +static LIST_HEAD(psci_pd_providers);
> +static bool osi_mode_enabled;
> +
> +static int psci_pd_power_off(struct generic_pm_domain *pd)
> +{
> + struct genpd_power_state *state = &pd->states[pd->state_idx];
> + u32 *pd_state;
> +
> + /* If we have failed to enable OSI mode, then abort power off. */
> + if (psci_has_osi_support() && !osi_mode_enabled)
> + return -EBUSY;
> +
> + if (!state->data)
> + return 0;
> +
> + /* When OSI mode is enabled, set the corresponding domain state. */
> + pd_state = state->data;
> + psci_set_domain_state(*pd_state);
> +
> + return 0;
> +}
> +
> +static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
> + int state_count)
> +{
> + int i, ret;
> + u32 psci_state, *psci_state_buf;
> +
> + for (i = 0; i < state_count; i++) {
> + ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
> + &psci_state);
> + if (ret)
> + goto free_state;
> +
> + psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
> + if (!psci_state_buf) {
> + ret = -ENOMEM;
> + goto free_state;
> + }
> + *psci_state_buf = psci_state;
> + states[i].data = psci_state_buf;
> + }
> +
> + return 0;
> +
> +free_state:
> + i--;
> + for (; i >= 0; i--)
> + kfree(states[i].data);
> + return ret;
> +}
> +
> +static int psci_pd_parse_states(struct device_node *np,
> + struct genpd_power_state **states, int *state_count)
> +{
> + int ret;
> +
> + /* Parse the domain idle states. */
> + ret = of_genpd_parse_idle_states(np, states, state_count);
> + if (ret)
> + return ret;
> +
Lots of things here in this file are not psci specific. They can be
moved as generic CPU PM domain support.
> + /* Fill out the PSCI specifics for each found state. */
> + ret = psci_pd_parse_state_nodes(*states, *state_count);
> + if (ret)
> + kfree(*states);
> +
Things like above are PSCI.
I am trying to see if we can do something to achieve partitions like we
have today: psci.c just has PSCI specific stuff and dt_idle_states.c
deals with generic idle stuff.
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:57PM +0200, Ulf Hansson wrote:
> When the hierarchical CPU topology is used and when a CPU has been put
> offline (hotplug), that same CPU prevents its PM domain and thus also
> potential master PM domains, from being powered off. This is because genpd
> observes the CPU's attached device as being active from a runtime PM point
> of view.
>
> To deal with this, let's decrease the runtime PM usage count by calling
> pm_runtime_put_sync_suspend() of the attached struct device when putting
> the CPU offline. Consequentially, we must then increase the runtime PM
> usage count, while putting the CPU online again.
>
Why is this firmware/driver specific ? Why can't this be dealt in core
pm-domain ? I am concerned that if any other architectures or firmware
method decides to use this feature, this need to be duplicated there.
The way I see this is pure reference counting and is hardware/firmware/
driver agnostic and can be made generic.
--
Regards,
Sudeep
On Tue, May 14, 2019 at 10:58:04AM +0200, Ulf Hansson wrote:
> On Tue, 14 May 2019 at 10:08, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 9:23 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > This series enables support for hierarchical CPU arrangement, managed by PSCI
> > > for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> > > recently was extended to manage devices belonging to CPUs.
> >
> > ACK for the patches touching cpuidle in this series (from the
> > framework perspective), but I'm assuming it to be taken care of by
> > ARM/ARM64 maintainers.
>
> Thanks for the ack! Yes, this is for PSCI/ARM maintainers.
>
> BTW, apologize for sending this in the merge window, but wanted to
> take the opportunity for people to have a look before OSPM Pisa next
> week.
>
I will start looking at this series. But I would request PSCI/other
maintainers to wait until we see some comparison data before we merge.
If they are fine to merge w/o that, I am fine. As of now we have just
1-2 platforms to test(that too not so simple to get started) and the
long term support for them are questionable. Also with SDM845 supporting
PC, we have excellent opportunity to compare and conclude the results
found.
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:50PM +0200, Ulf Hansson wrote:
> Subsequent changes implements support for PM domains to PSCI. Those changes
> are mainly implemented in a new separate c-file, hence a couple of the
> internal PSCI functions needs to be shared to be accessible. Let's do that
> via adding a new PSCI header file.
>
> Moreover, to implement support for PM domains, switching the PSCI FW into
> the OS initiated mode is sometimes needed. Therefore, let's share a new
> function that implement this.
>
This looks fine.
--
Regards,
Sudeep
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - Convert psci_set_osi_mode() to return an int.
> - Don't share psci_get_domain_state() as that's no longer needed.
> - Update changelog.
>
> ---
> drivers/firmware/psci/psci.c | 17 ++++++++++++++---
> drivers/firmware/psci/psci.h | 16 ++++++++++++++++
> 2 files changed, 30 insertions(+), 3 deletions(-)
> create mode 100644 drivers/firmware/psci/psci.h
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 4aec513136e4..0e91d864e346 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -34,6 +34,8 @@
> #include <asm/smp_plat.h>
> #include <asm/suspend.h>
>
> +#include "psci.h"
> +
> /*
> * While a 64-bit OS can make calls with SMC32 calling conventions, for some
> * calls it is necessary to use SMC64 to pass or return 64-bit values.
> @@ -96,7 +98,7 @@ static inline bool psci_has_ext_power_state(void)
> PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
> }
>
> -static inline bool psci_has_osi_support(void)
> +bool psci_has_osi_support(void)
> {
> return psci_cpu_suspend_feature & PSCI_1_0_OS_INITIATED;
> }
> @@ -161,6 +163,15 @@ static u32 psci_get_version(void)
> return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
> }
>
> +int psci_set_osi_mode(void)
> +{
> + int err;
> +
> + err = invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> + PSCI_1_0_SUSPEND_MODE_OSI, 0, 0);
> + return psci_to_linux_errno(err);
> +}
> +
> static int psci_cpu_suspend(u32 state, unsigned long entry_point)
> {
> int err;
> @@ -292,12 +303,12 @@ static inline u32 psci_get_domain_state(void)
> return __this_cpu_read(domain_state);
> }
>
> -static inline void psci_set_domain_state(u32 state)
> +void psci_set_domain_state(u32 state)
> {
> __this_cpu_write(domain_state, state);
> }
>
> -static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> +int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> {
> int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
>
> diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> new file mode 100644
> index 000000000000..f2277c3ad405
> --- /dev/null
> +++ b/drivers/firmware/psci/psci.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PSCI_H
> +#define __PSCI_H
> +
> +struct device_node;
> +
> +int psci_set_osi_mode(void);
> +bool psci_has_osi_support(void);
> +
> +#ifdef CONFIG_CPU_IDLE
> +void psci_set_domain_state(u32 state);
> +int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> +#endif
> +
> +#endif /* __PSCI_H */
> --
> 2.17.1
>
On Fri 07 Jun 08:42 PDT 2019, Sudeep Holla wrote:
> On Tue, May 14, 2019 at 10:58:04AM +0200, Ulf Hansson wrote:
> > On Tue, 14 May 2019 at 10:08, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Mon, May 13, 2019 at 9:23 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > This series enables support for hierarchical CPU arrangement, managed by PSCI
> > > > for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> > > > recently was extended to manage devices belonging to CPUs.
> > >
> > > ACK for the patches touching cpuidle in this series (from the
> > > framework perspective), but I'm assuming it to be taken care of by
> > > ARM/ARM64 maintainers.
> >
> > Thanks for the ack! Yes, this is for PSCI/ARM maintainers.
> >
> > BTW, apologize for sending this in the merge window, but wanted to
> > take the opportunity for people to have a look before OSPM Pisa next
> > week.
> >
>
> I will start looking at this series. But I would request PSCI/other
> maintainers to wait until we see some comparison data before we merge.
What comparison are you asking for here? Do you want to see the
improvement this series gives or are you hoping to compare it with some
other mechanism?
> If they are fine to merge w/o that, I am fine. As of now we have just
> 1-2 platforms to test(that too not so simple to get started) and the
> long term support for them are questionable.
Why is the support for these platforms questionable? People are actively
working on these platforms and the feature set constantly improving.
> Also with SDM845 supporting PC, we have excellent opportunity to
> compare and conclude the results found.
That's correct, ATF exists for SDM845. But with the standard choice of
firmware you will get OSI and I don't know of a board out there where
you can switch between them and do a apple to apple comparison.
Devices such as RB3 (96boards SDM845), Pixel3 and the Windows laptops
are all OSI only.
So landing this support is not a question of PC or OSI being the better
choice, it's a question of do we want to be able to enter these lower
power states - with the upstream kernel - on any past, present or future
Qualcomm devices.
Regards,
Bjorn
On Fri, 7 Jun 2019 at 17:01, Sudeep Holla <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:46PM +0200, Ulf Hansson wrote:
> > To allow arch back-end init ops to operate on the cpuidle driver for the
> > corresponding CPU, let's pass along a pointer to the struct cpuidle_driver*
> > and forward it the relevant layers of callbacks for ARM/ARM64.
> >
>
> I may be wrong, but I see drv is just used to get state_count mostly.
> It's also used in flattening part, but that should not be here either.
Let me copy the note I added below the changelog for $subject patch,
as hopefully that should clarify the reason to why this is needed.
"- This patch is needed by the subsequent patch, but more importantly,
also by "[PATCH 10/18] drivers: firmware: psci: Add hierarchical
domain idle states converter"."
Kind regards
Uffe
On Fri, 7 Jun 2019 at 17:27, Sudeep Holla <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
> > When the hierarchical CPU topology layout is used in DT, we need to setup
> > the corresponding PM domain data structures, as to allow a CPU and a group
> > of CPUs to be power managed accordingly. Let's enable this by deploying
> > support through the genpd interface.
> >
> > Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> > also parse the domain idle states DT bindings as to make genpd responsible
> > for the state selection, when the states are compatible with
> > "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> > supported, we rely solely on the state selection to be managed through the
> > regular cpuidle framework.
> >
> > If the initialization of the PM domain data structures succeeds and the OS
> > initiated mode is supported, we try to switch to it. In case it fails,
> > let's fall back into a degraded mode, rather than bailing out and returning
> > an error code.
> >
> > Due to that the OS initiated mode may become enabled, we need to adjust to
> > maintain backwards compatibility for a kernel started through a kexec call.
> > Do this by explicitly switch to Platform Coordinated mode during boot.
> >
> > Finally, the actual initialization of the PM domain data structures, is
> > done via calling the new shared function, psci_dt_init_pm_domains().
> > However, this is implemented by subsequent changes.
> >
> > Co-developed-by: Lina Iyer <[email protected]>
> > Signed-off-by: Lina Iyer <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes:
> > - Simplify code setting domain_state at power off.
> > - Use the genpd ->free_state() callback to manage freeing of states.
> > - Fixup a bogus while loop.
> >
> > ---
> > drivers/firmware/psci/Makefile | 2 +-
> > drivers/firmware/psci/psci.c | 7 +-
> > drivers/firmware/psci/psci.h | 5 +
> > drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
> > 4 files changed, 280 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/firmware/psci/psci_pm_domain.c
> >
> > diff --git a/drivers/firmware/psci/Makefile b/drivers/firmware/psci/Makefile
> > index 1956b882470f..ff300f1fec86 100644
> > --- a/drivers/firmware/psci/Makefile
> > +++ b/drivers/firmware/psci/Makefile
> > @@ -1,4 +1,4 @@
> > # SPDX-License-Identifier: GPL-2.0
> > #
> > -obj-$(CONFIG_ARM_PSCI_FW) += psci.o
> > +obj-$(CONFIG_ARM_PSCI_FW) += psci.o psci_pm_domain.o
> > obj-$(CONFIG_ARM_PSCI_CHECKER) += psci_checker.o
> > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> > index 0e91d864e346..bfef300b7ebe 100644
> > --- a/drivers/firmware/psci/psci.c
> > +++ b/drivers/firmware/psci/psci.c
> > @@ -721,9 +721,14 @@ static int __init psci_1_0_init(struct device_node *np)
> > if (err)
> > return err;
> >
> > - if (psci_has_osi_support())
> > + if (psci_has_osi_support()) {
> > pr_info("OSI mode supported.\n");
> >
> > + /* Make sure we default to PC mode. */
> > + invoke_psci_fn(PSCI_1_0_FN_SET_SUSPEND_MODE,
> > + PSCI_1_0_SUSPEND_MODE_PC, 0, 0);
> > + }
> > +
>
> This can go on it's own, any issues with that ?
Sure, I can move that out to a separate patch.
>
> > return 0;
> > }
> >
> > diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> > index f2277c3ad405..00d2e3dcef49 100644
> > --- a/drivers/firmware/psci/psci.h
> > +++ b/drivers/firmware/psci/psci.h
> > @@ -11,6 +11,11 @@ bool psci_has_osi_support(void);
> > #ifdef CONFIG_CPU_IDLE
> > void psci_set_domain_state(u32 state);
> > int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> > +#ifdef CONFIG_PM_GENERIC_DOMAINS_OF
> > +int psci_dt_init_pm_domains(struct device_node *np);
> > +#else
> > +static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> > +#endif
> > #endif
> >
> > #endif /* __PSCI_H */
> > diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> > new file mode 100644
> > index 000000000000..3c6ca846caf4
> > --- /dev/null
> > +++ b/drivers/firmware/psci/psci_pm_domain.c
> > @@ -0,0 +1,268 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * PM domains for CPUs via genpd - managed by PSCI.
> > + *
> > + * Copyright (C) 2019 Linaro Ltd.
> > + * Author: Ulf Hansson <[email protected]>
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "psci: " fmt
> > +
> > +#include <linux/device.h>
> > +#include <linux/kernel.h>
> > +#include <linux/pm_domain.h>
> > +#include <linux/slab.h>
> > +#include <linux/string.h>
> > +
> > +#include "psci.h"
> > +
> > +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_PM_GENERIC_DOMAINS_OF)
> > +
> > +struct psci_pd_provider {
> > + struct list_head link;
> > + struct device_node *node;
> > +};
> > +
> > +static LIST_HEAD(psci_pd_providers);
> > +static bool osi_mode_enabled;
> > +
> > +static int psci_pd_power_off(struct generic_pm_domain *pd)
> > +{
> > + struct genpd_power_state *state = &pd->states[pd->state_idx];
> > + u32 *pd_state;
> > +
> > + /* If we have failed to enable OSI mode, then abort power off. */
> > + if (psci_has_osi_support() && !osi_mode_enabled)
> > + return -EBUSY;
> > +
> > + if (!state->data)
> > + return 0;
> > +
> > + /* When OSI mode is enabled, set the corresponding domain state. */
> > + pd_state = state->data;
> > + psci_set_domain_state(*pd_state);
> > +
> > + return 0;
> > +}
> > +
> > +static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
> > + int state_count)
> > +{
> > + int i, ret;
> > + u32 psci_state, *psci_state_buf;
> > +
> > + for (i = 0; i < state_count; i++) {
> > + ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
> > + &psci_state);
> > + if (ret)
> > + goto free_state;
> > +
> > + psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
> > + if (!psci_state_buf) {
> > + ret = -ENOMEM;
> > + goto free_state;
> > + }
> > + *psci_state_buf = psci_state;
> > + states[i].data = psci_state_buf;
> > + }
> > +
> > + return 0;
> > +
> > +free_state:
> > + i--;
> > + for (; i >= 0; i--)
> > + kfree(states[i].data);
> > + return ret;
> > +}
> > +
> > +static int psci_pd_parse_states(struct device_node *np,
> > + struct genpd_power_state **states, int *state_count)
> > +{
> > + int ret;
> > +
> > + /* Parse the domain idle states. */
> > + ret = of_genpd_parse_idle_states(np, states, state_count);
> > + if (ret)
> > + return ret;
> > +
>
>
> Lots of things here in this file are not psci specific. They can be
> moved as generic CPU PM domain support.
What exactly do you mean by CPU PM domain support?
The current split is based upon how the generic PM domain (genpd)
supports CPU devices (see GENPD_FLAG_CPU_DOMAIN), which is already
available.
I agree that finding the right balance between what can be made
generic and driver specific is not always obvious. Often it's better
to start with having more things in the driver code, then move things
into a common framework, later on, when that turns out to make sense.
>
> > + /* Fill out the PSCI specifics for each found state. */
> > + ret = psci_pd_parse_state_nodes(*states, *state_count);
> > + if (ret)
> > + kfree(*states);
> > +
>
> Things like above are PSCI.
>
> I am trying to see if we can do something to achieve partitions like we
> have today: psci.c just has PSCI specific stuff and dt_idle_states.c
> deals with generic idle stuff.
I am open to any suggestions. Although, I am not sure I understand
your comment and nor the reason to why you want me to change.
So, what is the problem with having the code that you refer to, inside
drivers/firmware/psci/psci_pm_domain.c? Can't we just start with that
and see how it plays?
Kind regards
Uffe
On Fri, 7 Jun 2019 at 17:17, Sudeep Holla <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:49PM +0200, Ulf Hansson wrote:
> > The per CPU variable psci_power_state, contains an array of fixed values,
> > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > each of the available CPU idle states.
> >
> > This isn't sufficient when using the hierarchical CPU topology in DT in
> > combination with having PSCI OS initiated (OSI) mode enabled. More
> > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > state of the cluster.
> >
> > For this reason, let's introduce an additional per CPU variable called
> > domain_state and implement two helper functions to read/write its values.
> > Following patches, which implements PM domain support for PSCI, will use
> > the domain_state variable and set it to corresponding bits that represents
> > the selected idle state for the cluster.
> >
> > Finally, in psci_cpu_suspend_enter() and psci_suspend_finisher(), let's
> > take into account the values in the domain_state, as to get the complete
> > suspend parameter.
> >
>
> I understand it was split to ease review, but this patch also does
> nothing as domain_state = 0 always. I was trying hard to find where it's
> set, but I assume it will be done in later patches. Again may be this
> can be squashed into the first caller of psci_set_domain_state
You have a point, but I am worried that it would look like this series
is solely needed to support OSI mode. This is not the case. Let me
explain.
Having $subject patch separate shows the specific changes needed to
support OSI mode. The first caller of psci_set_domain_state() is added
in patch9, however, patch9 is useful no matter of OSI or PC mode.
Moreover, if I squash $subject patch with patch9, I would have to
squash also the subsequent patch (patch8), as it depends on $subject
patch.
So, to conclude, are you happy with this as is or do you want me to
squash the patches?
Kind regards
Uffe
On Fri, 7 Jun 2019 at 17:31, Sudeep Holla <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:57PM +0200, Ulf Hansson wrote:
> > When the hierarchical CPU topology is used and when a CPU has been put
> > offline (hotplug), that same CPU prevents its PM domain and thus also
> > potential master PM domains, from being powered off. This is because genpd
> > observes the CPU's attached device as being active from a runtime PM point
> > of view.
> >
> > To deal with this, let's decrease the runtime PM usage count by calling
> > pm_runtime_put_sync_suspend() of the attached struct device when putting
> > the CPU offline. Consequentially, we must then increase the runtime PM
> > usage count, while putting the CPU online again.
> >
>
> Why is this firmware/driver specific ? Why can't this be dealt in core
> pm-domain ? I am concerned that if any other architectures or firmware
> method decides to use this feature, this need to be duplicated there.
What is the core pm-domain? Do you refer to the generic PM domain (genpd), no?
In such case, this is not the job of genpd, but rather the opposite
(to *monitor* the reference count).
>
> The way I see this is pure reference counting and is hardware/firmware/
> driver agnostic and can be made generic.
As stated in the another reply, I would rather start with having more
things driver specific rather than generic. Later on we can always
consider to move/split things, when there are more users.
In this particular case, the runtime PM reference counting is done on
the struct device*, that genpd returned via
dev_pm_domain_attach_by_name(). And because
dev_pm_domain_attach_by_name() is called from PSCI code, I decided to
keep this struct device* internal to PSCI.
Kind regards
Uffe
On Fri, Jun 07, 2019 at 12:34:07PM -0700, Bjorn Andersson wrote:
> On Fri 07 Jun 08:42 PDT 2019, Sudeep Holla wrote:
>
> > On Tue, May 14, 2019 at 10:58:04AM +0200, Ulf Hansson wrote:
> > > On Tue, 14 May 2019 at 10:08, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Mon, May 13, 2019 at 9:23 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > This series enables support for hierarchical CPU arrangement, managed by PSCI
> > > > > for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> > > > > recently was extended to manage devices belonging to CPUs.
> > > >
> > > > ACK for the patches touching cpuidle in this series (from the
> > > > framework perspective), but I'm assuming it to be taken care of by
> > > > ARM/ARM64 maintainers.
> > >
> > > Thanks for the ack! Yes, this is for PSCI/ARM maintainers.
> > >
> > > BTW, apologize for sending this in the merge window, but wanted to
> > > take the opportunity for people to have a look before OSPM Pisa next
> > > week.
> > >
> >
> > I will start looking at this series. But I would request PSCI/other
> > maintainers to wait until we see some comparison data before we merge.
>
> What comparison are you asking for here? Do you want to see the
> improvement this series gives or are you hoping to compare it with some
> other mechanism?
>
OK, I have mentioned this many times already, let me repeat it again.
This series adds an alternative to the existing PC mode of CPU idle
management. And it's clear that the main reason for the same is the
improvement OSI mode offers vs the PC mode. I am asking the comparison
for the same. And yes we need to compare apples with apples and not
oranges here.
> > If they are fine to merge w/o that, I am fine. As of now we have just
> > 1-2 platforms to test(that too not so simple to get started) and the
> > long term support for them are questionable.
>
> Why is the support for these platforms questionable? People are actively
> working on these platforms and the feature set constantly improving.
>
Qualcomm will never fix any firmware issues and we need to quirk
any bugs found. I would prefer the first platform to minimize those
as it would be reference. But I am sure QC won't care about firmware
on SDM845 anymore, so not an ideal fit.
We need to add support in TF-A to build complete reference story around
OSI mode.
> > Also with SDM845 supporting PC, we have excellent opportunity to
> > compare and conclude the results found.
>
> That's correct, ATF exists for SDM845. But with the standard choice of
> firmware you will get OSI and I don't know of a board out there where
> you can switch between them and do a apple to apple comparison.
>
One that's not PSCI compliant, system must boot in PC. If QC was any
serious about this, they would have attempted to fix them in firmware.
We have given this comment at-least 4 years back and if that's not
still in the current gen products, it says something. Sorry I don't
trust the firmware story from QC.
> Devices such as RB3 (96boards SDM845), Pixel3 and the Windows laptops
> are all OSI only.
>
Again not fully PSCI compliant.
>
> So landing this support is not a question of PC or OSI being the better
> choice, it's a question of do we want to be able to enter these lower
> power states - with the upstream kernel - on any past, present or future
> Qualcomm devices.
>
Nope, I disagree. Better they fix future products. This is a new feature
in the kernel with the claim that it's better and since last 2-3 years
no efforts are made to prove the claim. So I am not really worried
about running low power modes on their past/present devices, but more
worried about the precedence this might set with unproven claim and other
vendors moving to this without considering all the implications.
--
Regards,
Sudeep
On Mon, Jun 10, 2019 at 12:21:10PM +0200, Ulf Hansson wrote:
> On Fri, 7 Jun 2019 at 17:17, Sudeep Holla <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 09:22:49PM +0200, Ulf Hansson wrote:
> > > The per CPU variable psci_power_state, contains an array of fixed values,
> > > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > > each of the available CPU idle states.
> > >
> > > This isn't sufficient when using the hierarchical CPU topology in DT in
> > > combination with having PSCI OS initiated (OSI) mode enabled. More
> > > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > > state of the cluster.
> > >
> > > For this reason, let's introduce an additional per CPU variable called
> > > domain_state and implement two helper functions to read/write its values.
> > > Following patches, which implements PM domain support for PSCI, will use
> > > the domain_state variable and set it to corresponding bits that represents
> > > the selected idle state for the cluster.
> > >
> > > Finally, in psci_cpu_suspend_enter() and psci_suspend_finisher(), let's
> > > take into account the values in the domain_state, as to get the complete
> > > suspend parameter.
> > >
> >
> > I understand it was split to ease review, but this patch also does
> > nothing as domain_state = 0 always. I was trying hard to find where it's
> > set, but I assume it will be done in later patches. Again may be this
> > can be squashed into the first caller of psci_set_domain_state
>
> You have a point, but I am worried that it would look like this series
> is solely needed to support OSI mode. This is not the case. Let me
> explain.
>
> Having $subject patch separate shows the specific changes needed to
> support OSI mode. The first caller of psci_set_domain_state() is added
> in patch9, however, patch9 is useful no matter of OSI or PC mode.
>
> Moreover, if I squash $subject patch with patch9, I would have to
> squash also the subsequent patch (patch8), as it depends on $subject
> patch.
>
> So, to conclude, are you happy with this as is or do you want me to
> squash the patches?
>
Yes I am fine either way. As I put the comments in the same flow as I
did review, I thought it's worth mentioning if someone else get similar
thoughts. I am fine if you prefer to keep it the same way unless someone
else raise the same point.
--
Regards,
Sudeep
On Mon, Jun 10, 2019 at 12:21:41PM +0200, Ulf Hansson wrote:
> On Fri, 7 Jun 2019 at 17:27, Sudeep Holla <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
> > > When the hierarchical CPU topology layout is used in DT, we need to setup
> > > the corresponding PM domain data structures, as to allow a CPU and a group
> > > of CPUs to be power managed accordingly. Let's enable this by deploying
> > > support through the genpd interface.
> > >
> > > Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> > > also parse the domain idle states DT bindings as to make genpd responsible
> > > for the state selection, when the states are compatible with
> > > "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> > > supported, we rely solely on the state selection to be managed through the
> > > regular cpuidle framework.
> > >
> > > If the initialization of the PM domain data structures succeeds and the OS
> > > initiated mode is supported, we try to switch to it. In case it fails,
> > > let's fall back into a degraded mode, rather than bailing out and returning
> > > an error code.
> > >
> > > Due to that the OS initiated mode may become enabled, we need to adjust to
> > > maintain backwards compatibility for a kernel started through a kexec call.
> > > Do this by explicitly switch to Platform Coordinated mode during boot.
> > >
> > > Finally, the actual initialization of the PM domain data structures, is
> > > done via calling the new shared function, psci_dt_init_pm_domains().
> > > However, this is implemented by subsequent changes.
> > >
> > > Co-developed-by: Lina Iyer <[email protected]>
> > > Signed-off-by: Lina Iyer <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes:
> > > - Simplify code setting domain_state at power off.
> > > - Use the genpd ->free_state() callback to manage freeing of states.
> > > - Fixup a bogus while loop.
> > >
> > > ---
> > > drivers/firmware/psci/Makefile | 2 +-
> > > drivers/firmware/psci/psci.c | 7 +-
> > > drivers/firmware/psci/psci.h | 5 +
> > > drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
> > > 4 files changed, 280 insertions(+), 2 deletions(-)
> > > create mode 100644 drivers/firmware/psci/psci_pm_domain.c
> > >
[...]
> > > +
> > > +static int psci_pd_parse_states(struct device_node *np,
> > > + struct genpd_power_state **states, int *state_count)
> > > +{
> > > + int ret;
> > > +
> > > + /* Parse the domain idle states. */
> > > + ret = of_genpd_parse_idle_states(np, states, state_count);
> > > + if (ret)
> > > + return ret;
> > > +
> >
> >
> > Lots of things here in this file are not psci specific. They can be
> > moved as generic CPU PM domain support.
>
> What exactly do you mean by CPU PM domain support?
>
> The current split is based upon how the generic PM domain (genpd)
> supports CPU devices (see GENPD_FLAG_CPU_DOMAIN), which is already
> available.
>
> I agree that finding the right balance between what can be made
> generic and driver specific is not always obvious. Often it's better
> to start with having more things in the driver code, then move things
> into a common framework, later on, when that turns out to make sense.
>
Indeed, I agree. But when reviewing this time I thought it should be
possible to push generic stuff into existing dt_idle_driver. I must
admit that I haven't thought much in details, just thought of expressing
the idea and see. But yes it's difficult to find the balance but at the
same time we need reasons to have these in psci :)
> >
> > > + /* Fill out the PSCI specifics for each found state. */
> > > + ret = psci_pd_parse_state_nodes(*states, *state_count);
> > > + if (ret)
> > > + kfree(*states);
> > > +
> >
> > Things like above are PSCI.
> >
> > I am trying to see if we can do something to achieve partitions like we
> > have today: psci.c just has PSCI specific stuff and dt_idle_states.c
> > deals with generic idle stuff.
>
> I am open to any suggestions. Although, I am not sure I understand
> your comment and nor the reason to why you want me to change.
>
> So, what is the problem with having the code that you refer to, inside
> drivers/firmware/psci/psci_pm_domain.c? Can't we just start with that
> and see how it plays?
>
I need to think how to partition this well. I don't have suggestions
right away, but I need to get convinced what we have here is best we
can do or come up with a better solution. I didn't like it as is at
this time.
--
Regards,
Sudeep
On Mon, Jun 10, 2019 at 12:21:47PM +0200, Ulf Hansson wrote:
> On Fri, 7 Jun 2019 at 17:31, Sudeep Holla <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 09:22:57PM +0200, Ulf Hansson wrote:
> > > When the hierarchical CPU topology is used and when a CPU has been put
> > > offline (hotplug), that same CPU prevents its PM domain and thus also
> > > potential master PM domains, from being powered off. This is because genpd
> > > observes the CPU's attached device as being active from a runtime PM point
> > > of view.
> > >
> > > To deal with this, let's decrease the runtime PM usage count by calling
> > > pm_runtime_put_sync_suspend() of the attached struct device when putting
> > > the CPU offline. Consequentially, we must then increase the runtime PM
> > > usage count, while putting the CPU online again.
> > >
> >
> > Why is this firmware/driver specific ? Why can't this be dealt in core
> > pm-domain ? I am concerned that if any other architectures or firmware
> > method decides to use this feature, this need to be duplicated there.
>
> What is the core pm-domain? Do you refer to the generic PM domain (genpd), no?
>
Sorry for my bad choice of names. I just wrote names as I understand
rather than looking for exact match. But yes, I meant generic place
where such ref-counting is done currently for other things.
> In such case, this is not the job of genpd, but rather the opposite
> (to *monitor* the reference count).
>
OK, I need to understand that then.
> >
> > The way I see this is pure reference counting and is hardware/firmware/
> > driver agnostic and can be made generic.
>
> As stated in the another reply, I would rather start with having more
> things driver specific rather than generic. Later on we can always
> consider to move/split things, when there are more users.
>
> In this particular case, the runtime PM reference counting is done on
> the struct device*, that genpd returned via
> dev_pm_domain_attach_by_name(). And because
> dev_pm_domain_attach_by_name() is called from PSCI code, I decided to
> keep this struct device* internal to PSCI.
Sure, I understand your intent. I have just mentioned my thoughts/comments
as I reviewed.
--
Regards,
Sudeep
On Mon, 10 Jun 2019 at 12:32, Sudeep Holla <[email protected]> wrote:
>
> On Fri, Jun 07, 2019 at 12:34:07PM -0700, Bjorn Andersson wrote:
> > On Fri 07 Jun 08:42 PDT 2019, Sudeep Holla wrote:
> >
> > > On Tue, May 14, 2019 at 10:58:04AM +0200, Ulf Hansson wrote:
> > > > On Tue, 14 May 2019 at 10:08, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Mon, May 13, 2019 at 9:23 PM Ulf Hansson <[email protected]> wrote:
> > > > > >
> > > > > > This series enables support for hierarchical CPU arrangement, managed by PSCI
> > > > > > for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> > > > > > recently was extended to manage devices belonging to CPUs.
> > > > >
> > > > > ACK for the patches touching cpuidle in this series (from the
> > > > > framework perspective), but I'm assuming it to be taken care of by
> > > > > ARM/ARM64 maintainers.
> > > >
> > > > Thanks for the ack! Yes, this is for PSCI/ARM maintainers.
> > > >
> > > > BTW, apologize for sending this in the merge window, but wanted to
> > > > take the opportunity for people to have a look before OSPM Pisa next
> > > > week.
> > > >
> > >
> > > I will start looking at this series. But I would request PSCI/other
> > > maintainers to wait until we see some comparison data before we merge.
> >
> > What comparison are you asking for here? Do you want to see the
> > improvement this series gives or are you hoping to compare it with some
> > other mechanism?
> >
>
> OK, I have mentioned this many times already, let me repeat it again.
> This series adds an alternative to the existing PC mode of CPU idle
> management. And it's clear that the main reason for the same is the
> improvement OSI mode offers vs the PC mode. I am asking the comparison
> for the same. And yes we need to compare apples with apples and not
> oranges here.
In the cover letter you see the two main reasons behind this series.
Yeah, OSI support is a part of the series, but OSI or PC mode is
orthogonal to the overall changes this series implements.
When it comes to comparing OSI mode vs PC mode, let's try to avoid
that tiring discussion again, please. :-)
My summary from the earlier ones, is that because the PSCI spec
includes support for OSI, we should also support it in the kernel (and
ATF). In a discussion offlist, Lorenzo agreed that it's okay to add,
without an apple to apple comparison. Maybe Lorenzo can fill in and
state this publicly, to save us all some time?
My final point in regards to the OSI mode support, it's a minor part
of the series. I don't see how that should hurt from a maintenance
point of view, or perhaps I am wrong? In any case, I offer my help
with review/maintenance in any form as you may see need/fit.
[...]
Kind regards
Uffe
On Mon, Jun 10, 2019 at 05:54:39PM +0200, Ulf Hansson wrote:
[...]
> My summary from the earlier ones, is that because the PSCI spec
> includes support for OSI, we should also support it in the kernel (and
> ATF). In a discussion offlist, Lorenzo agreed that it's okay to add,
> without an apple to apple comparison. Maybe Lorenzo can fill in and
> state this publicly, to save us all some time?
The comparison should have been made before even requesting PSCI OSI
mode changes to the specifications, so we have a chip on our shoulders
anyway.
We will enable PSCI OSI but that's not where the problem lies, enabling
PSCI OSI from a firmware perspective should take 10 lines of code,
not:
drivers/firmware/psci/Makefile | 2 +-
drivers/firmware/psci/psci.c | 219 ++++++++--
drivers/firmware/psci/psci.h | 29 ++
drivers/firmware/psci/psci_pm_domain.c | 403 ++++++++++++++++++
I have some concerns about these changes that I will state in the
relevant patches.
> My final point in regards to the OSI mode support, it's a minor part
> of the series. I don't see how that should hurt from a maintenance
> point of view, or perhaps I am wrong? In any case, I offer my help
> with review/maintenance in any form as you may see need/fit.
I will go through the series but most of this code should move
to core PM code, it has nothing to do with PSCI.
BTW, apologies for the delay, I was away.
Thanks,
Lorenzo
On Mon, 10 Jun 2019 at 19:16, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Mon, Jun 10, 2019 at 05:54:39PM +0200, Ulf Hansson wrote:
>
> [...]
>
> > My summary from the earlier ones, is that because the PSCI spec
> > includes support for OSI, we should also support it in the kernel (and
> > ATF). In a discussion offlist, Lorenzo agreed that it's okay to add,
> > without an apple to apple comparison. Maybe Lorenzo can fill in and
> > state this publicly, to save us all some time?
>
> The comparison should have been made before even requesting PSCI OSI
> mode changes to the specifications, so we have a chip on our shoulders
> anyway.
>
> We will enable PSCI OSI but that's not where the problem lies, enabling
> PSCI OSI from a firmware perspective should take 10 lines of code,
> not:
Thanks for confirming!
>
> drivers/firmware/psci/Makefile | 2 +-
> drivers/firmware/psci/psci.c | 219 ++++++++--
> drivers/firmware/psci/psci.h | 29 ++
> drivers/firmware/psci/psci_pm_domain.c | 403 ++++++++++++++++++
>
> I have some concerns about these changes that I will state in the
> relevant patches.
Most of the above changes isn't for solely for OSI, but to support a
hierarchical topology described in the PSCI DT layout. This is for
example needed when other resources shares the same power rail as the
CPU cluster.
In other words, the series is orthogonal to whether OSI or PC mode is
used for PSCI, just to make that clear. BTW, this is what you
requested me to change into, a while ago.
>
> > My final point in regards to the OSI mode support, it's a minor part
> > of the series. I don't see how that should hurt from a maintenance
> > point of view, or perhaps I am wrong? In any case, I offer my help
> > with review/maintenance in any form as you may see need/fit.
>
> I will go through the series but most of this code should move
> to core PM code, it has nothing to do with PSCI.
I am looking forward to your review - and for sure, I am open to suggestions!
>
> BTW, apologies for the delay, I was away.
>
> Thanks,
> Lorenzo
Kind regards
Uffe
On Mon, 10 Jun 2019 at 20:57, Ulf Hansson <[email protected]> wrote:
>
> On Mon, 10 Jun 2019 at 19:16, Lorenzo Pieralisi
> <[email protected]> wrote:
> >
> > On Mon, Jun 10, 2019 at 05:54:39PM +0200, Ulf Hansson wrote:
> >
> > [...]
> >
> > > My summary from the earlier ones, is that because the PSCI spec
> > > includes support for OSI, we should also support it in the kernel (and
> > > ATF). In a discussion offlist, Lorenzo agreed that it's okay to add,
> > > without an apple to apple comparison. Maybe Lorenzo can fill in and
> > > state this publicly, to save us all some time?
> >
> > The comparison should have been made before even requesting PSCI OSI
> > mode changes to the specifications, so we have a chip on our shoulders
> > anyway.
> >
> > We will enable PSCI OSI but that's not where the problem lies, enabling
> > PSCI OSI from a firmware perspective should take 10 lines of code,
> > not:
>
> Thanks for confirming!
>
> >
> > drivers/firmware/psci/Makefile | 2 +-
> > drivers/firmware/psci/psci.c | 219 ++++++++--
> > drivers/firmware/psci/psci.h | 29 ++
> > drivers/firmware/psci/psci_pm_domain.c | 403 ++++++++++++++++++
> >
> > I have some concerns about these changes that I will state in the
> > relevant patches.
>
> Most of the above changes isn't for solely for OSI, but to support a
> hierarchical topology described in the PSCI DT layout. This is for
> example needed when other resources shares the same power rail as the
> CPU cluster.
>
> In other words, the series is orthogonal to whether OSI or PC mode is
> used for PSCI, just to make that clear. BTW, this is what you
> requested me to change into, a while ago.
>
> >
> > > My final point in regards to the OSI mode support, it's a minor part
> > > of the series. I don't see how that should hurt from a maintenance
> > > point of view, or perhaps I am wrong? In any case, I offer my help
> > > with review/maintenance in any form as you may see need/fit.
> >
> > I will go through the series but most of this code should move
> > to core PM code, it has nothing to do with PSCI.
>
> I am looking forward to your review - and for sure, I am open to suggestions!
>
> >
> > BTW, apologies for the delay, I was away.
Lorenzo, a gentle ping.
Kind regards
Uffe
On Mon, May 13, 2019 at 09:22:52PM +0200, Ulf Hansson wrote:
> If the hierarchical CPU topology is used, but the OS initiated mode isn't
> supported, we need to rely solely on the regular cpuidle framework to
> manage the idle state selection, rather than using genpd and its governor.
>
> For this reason, introduce a new PSCI DT helper function,
> psci_dt_pm_domains_parse_states(), which parses and converts the
> hierarchically described domain idle states from DT, into regular flattened
> cpuidle states. The converted states are added to the existing cpuidle
> driver's array of idle states, which make them available for cpuidle.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - Some simplification of the code.
>
> ---
> drivers/firmware/psci/psci.h | 5 ++
> drivers/firmware/psci/psci_pm_domain.c | 118 +++++++++++++++++++++++++
> 2 files changed, 123 insertions(+)
>
> diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> index 00d2e3dcef49..c36e0e6649e9 100644
> --- a/drivers/firmware/psci/psci.h
> +++ b/drivers/firmware/psci/psci.h
> @@ -3,6 +3,7 @@
> #ifndef __PSCI_H
> #define __PSCI_H
>
> +struct cpuidle_driver;
> struct device_node;
>
> int psci_set_osi_mode(void);
> @@ -13,8 +14,12 @@ void psci_set_domain_state(u32 state);
> int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> #ifdef CONFIG_PM_GENERIC_DOMAINS_OF
> int psci_dt_init_pm_domains(struct device_node *np);
> +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> + struct device_node *cpu_node, u32 *psci_states);
> #else
> static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> +static inline int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> + struct device_node *cpu_node, u32 *psci_states) { return 0; }
> #endif
> #endif
>
> diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> index 3c6ca846caf4..3aa645dba81b 100644
> --- a/drivers/firmware/psci/psci_pm_domain.c
> +++ b/drivers/firmware/psci/psci_pm_domain.c
> @@ -14,6 +14,10 @@
> #include <linux/pm_domain.h>
> #include <linux/slab.h>
> #include <linux/string.h>
> +#include <linux/cpuidle.h>
> +#include <linux/cpu_pm.h>
> +
> +#include <asm/cpuidle.h>
>
> #include "psci.h"
>
> @@ -104,6 +108,53 @@ static void psci_pd_free_states(struct genpd_power_state *states,
> kfree(states);
> }
>
> +static int psci_pd_enter_pc(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int idx)
> +{
> + return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx);
> +}
> +
> +static void psci_pd_enter_s2idle_pc(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int idx)
> +{
> + psci_pd_enter_pc(dev, drv, idx);
> +}
> +
> +static void psci_pd_convert_states(struct cpuidle_state *idle_state,
> + u32 *psci_state, struct genpd_power_state *state)
> +{
> + u32 *state_data = state->data;
> + u64 target_residency_us = state->residency_ns;
> + u64 exit_latency_us = state->power_on_latency_ns +
> + state->power_off_latency_ns;
> +
> + *psci_state = *state_data;
> + do_div(target_residency_us, 1000);
> + idle_state->target_residency = target_residency_us;
> + do_div(exit_latency_us, 1000);
> + idle_state->exit_latency = exit_latency_us;
> + idle_state->enter = &psci_pd_enter_pc;
> + idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc;
> + idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
This is arbitrary and not necessarily true.
I think that this patch is useful to represent my reservations about the
current approach. As a matter of fact, idle state entry will always be a
CPUidle decision.
You only need PM domain information to understand when all CPUs
in a power domain are actually idle but that's all genPD can do
in this respect.
I think this patchset would be much simpler if both CPUidle and
genPD governor would work on *one* set of idle states, globally
indexed (and that would be true for PSCI suspend parameters too).
To work with a unified set of idle states between CPUidle and genPD
(tossing some ideas around):
- We can implement a genPD CPUidle governor that in its select method
takes into account genPD information (for instance by avoiding
selection of idle states that require multiple cpus to be in idle
to be effectively active)
- We can use genPD to enable/disable CPUidle states through runtime
PM information
There may be other ways. My point is that current code, with two (or
more if the hierarchy grows) sets of idle states across two subsystems
(CPUidle and genPD) is not very well defined and honestly very hard to
grasp and prone to errors.
> +
> + strncpy(idle_state->name, to_of_node(state->fwnode)->name,
> + CPUIDLE_NAME_LEN - 1);
> + strncpy(idle_state->desc, to_of_node(state->fwnode)->name,
> + CPUIDLE_NAME_LEN - 1);
> +}
> +
> +static bool psci_pd_is_provider(struct device_node *np)
> +{
> + struct psci_pd_provider *pd_prov, *it;
> +
> + list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) {
> + if (pd_prov->node == np)
> + return true;
> + }
> +
> + return false;
> +}
> +
> static int psci_pd_init(struct device_node *np)
> {
> struct generic_pm_domain *pd;
> @@ -265,4 +316,71 @@ int psci_dt_init_pm_domains(struct device_node *np)
> pr_err("failed to create CPU PM domains ret=%d\n", ret);
> return ret;
> }
> +
> +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> + struct device_node *cpu_node, u32 *psci_states)
> +{
> + struct genpd_power_state *pd_states;
> + struct of_phandle_args args;
> + int ret, pd_state_count, i, state_idx, psci_idx;
> + u32 cpu_psci_state = psci_states[drv->state_count - 2];
This (-2) is very dodgy and I doubt it would work on hierarchies going
above "cluster" level.
As I say above, I think we should work towards a single array of
idle states to be selected by a CPUidle governor using genPD
runtime information to bias the results according to the number
of CPUs in a genPD that entered/exit idle.
To be more precise, all idles states should be "domain-idle-state"
compatible, even the CPU ones, the distinction between what CPUidle
and genPD manage is a bit stretched IMO in this patchset.
We will have a chance to talk about this but I thought I would
comment publically if anyone else is willing to chime in, this
is not a PSCI problem at all, it is a CPUidle/genPD coexistence
design problem which is much broader.
Lorenzo
> + struct device_node *np = of_node_get(cpu_node);
> +
> +
> + /* Walk the CPU topology to find compatible domain idle states. */
> + while (np) {
> + ret = of_parse_phandle_with_args(np, "power-domains",
> + "#power-domain-cells", 0, &args);
> + of_node_put(np);
> + if (ret)
> + return 0;
> +
> + np = args.np;
> +
> + /* Verify that the node represents a psci pd provider. */
> + if (!psci_pd_is_provider(np)) {
> + of_node_put(np);
> + return 0;
> + }
> +
> + /* Parse for compatible domain idle states. */
> + ret = psci_pd_parse_states(np, &pd_states, &pd_state_count);
> + if (ret) {
> + of_node_put(np);
> + return ret;
> + }
> +
> + i = 0;
> + while (i < pd_state_count) {
> +
> + state_idx = drv->state_count;
> + if (state_idx >= CPUIDLE_STATE_MAX) {
> + pr_warn("exceeding max cpuidle states\n");
> + of_node_put(np);
> + return 0;
> + }
> +
> + /* WFI state is not part of psci_states. */
> + psci_idx = state_idx - 1 + i;
> + psci_pd_convert_states(&drv->states[state_idx + i],
> + &psci_states[psci_idx], &pd_states[i]);
> +
> + /*
> + * In the hierarchical CPU topology the master PM domain
> + * idle state's DT property, "arm,psci-suspend-param",
> + * don't contain the bits for the idle state of the CPU,
> + * let's add those here.
> + */
> + psci_states[psci_idx] |= cpu_psci_state;
> + pr_debug("psci-power-state %#x index %d\n",
> + psci_states[psci_idx], psci_idx);
> +
> + drv->state_count++;
> + i++;
> + }
> + psci_pd_free_states(pd_states, pd_state_count);
> + }
> +
> + return 0;
> +}
> #endif
> --
> 2.17.1
>
Lorenzo,
Just wanted to give some feedback to you in public, even if we have
already discussed this series, offlist, last week.
On Tue, 9 Jul 2019 at 17:31, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:52PM +0200, Ulf Hansson wrote:
> > If the hierarchical CPU topology is used, but the OS initiated mode isn't
> > supported, we need to rely solely on the regular cpuidle framework to
> > manage the idle state selection, rather than using genpd and its governor.
> >
> > For this reason, introduce a new PSCI DT helper function,
> > psci_dt_pm_domains_parse_states(), which parses and converts the
> > hierarchically described domain idle states from DT, into regular flattened
> > cpuidle states. The converted states are added to the existing cpuidle
> > driver's array of idle states, which make them available for cpuidle.
> >
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes:
> > - Some simplification of the code.
> >
> > ---
> > drivers/firmware/psci/psci.h | 5 ++
> > drivers/firmware/psci/psci_pm_domain.c | 118 +++++++++++++++++++++++++
> > 2 files changed, 123 insertions(+)
> >
> > diff --git a/drivers/firmware/psci/psci.h b/drivers/firmware/psci/psci.h
> > index 00d2e3dcef49..c36e0e6649e9 100644
> > --- a/drivers/firmware/psci/psci.h
> > +++ b/drivers/firmware/psci/psci.h
> > @@ -3,6 +3,7 @@
> > #ifndef __PSCI_H
> > #define __PSCI_H
> >
> > +struct cpuidle_driver;
> > struct device_node;
> >
> > int psci_set_osi_mode(void);
> > @@ -13,8 +14,12 @@ void psci_set_domain_state(u32 state);
> > int psci_dt_parse_state_node(struct device_node *np, u32 *state);
> > #ifdef CONFIG_PM_GENERIC_DOMAINS_OF
> > int psci_dt_init_pm_domains(struct device_node *np);
> > +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> > + struct device_node *cpu_node, u32 *psci_states);
> > #else
> > static inline int psci_dt_init_pm_domains(struct device_node *np) { return 0; }
> > +static inline int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> > + struct device_node *cpu_node, u32 *psci_states) { return 0; }
> > #endif
> > #endif
> >
> > diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> > index 3c6ca846caf4..3aa645dba81b 100644
> > --- a/drivers/firmware/psci/psci_pm_domain.c
> > +++ b/drivers/firmware/psci/psci_pm_domain.c
> > @@ -14,6 +14,10 @@
> > #include <linux/pm_domain.h>
> > #include <linux/slab.h>
> > #include <linux/string.h>
> > +#include <linux/cpuidle.h>
> > +#include <linux/cpu_pm.h>
> > +
> > +#include <asm/cpuidle.h>
> >
> > #include "psci.h"
> >
> > @@ -104,6 +108,53 @@ static void psci_pd_free_states(struct genpd_power_state *states,
> > kfree(states);
> > }
> >
> > +static int psci_pd_enter_pc(struct cpuidle_device *dev,
> > + struct cpuidle_driver *drv, int idx)
> > +{
> > + return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx);
> > +}
> > +
> > +static void psci_pd_enter_s2idle_pc(struct cpuidle_device *dev,
> > + struct cpuidle_driver *drv, int idx)
> > +{
> > + psci_pd_enter_pc(dev, drv, idx);
> > +}
> > +
> > +static void psci_pd_convert_states(struct cpuidle_state *idle_state,
> > + u32 *psci_state, struct genpd_power_state *state)
> > +{
> > + u32 *state_data = state->data;
> > + u64 target_residency_us = state->residency_ns;
> > + u64 exit_latency_us = state->power_on_latency_ns +
> > + state->power_off_latency_ns;
> > +
> > + *psci_state = *state_data;
> > + do_div(target_residency_us, 1000);
> > + idle_state->target_residency = target_residency_us;
> > + do_div(exit_latency_us, 1000);
> > + idle_state->exit_latency = exit_latency_us;
> > + idle_state->enter = &psci_pd_enter_pc;
> > + idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc;
> > + idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
>
> This is arbitrary and not necessarily true.
The arbitrary thing you refer to here, is that the
CPUIDLE_FLAG_TIMER_STOP? Or are you referring to the complete function
psci_pd_convert_states()?
>
> I think that this patch is useful to represent my reservations about the
> current approach. As a matter of fact, idle state entry will always be a
> CPUidle decision.
>
> You only need PM domain information to understand when all CPUs
> in a power domain are actually idle but that's all genPD can do
> in this respect.
>
> I think this patchset would be much simpler if both CPUidle and
> genPD governor would work on *one* set of idle states, globally
> indexed (and that would be true for PSCI suspend parameters too).
>
> To work with a unified set of idle states between CPUidle and genPD
> (tossing some ideas around):
>
> - We can implement a genPD CPUidle governor that in its select method
> takes into account genPD information (for instance by avoiding
> selection of idle states that require multiple cpus to be in idle
> to be effectively active)
> - We can use genPD to enable/disable CPUidle states through runtime
> PM information
I don't understand how to make this work.
The CPUidle governor works on per CPU basis. The genpd governor works
on per PM domain basis, which typically can be a group of CPUs (and
even other devices) via subdomains, for example.
1.
In case of Linux being in *charge* of what idle state to pick for a
group of CPUs, that decision is best done by the genpd governor as it
operates on "groups" of CPUs. This is used for PSCI OSI mode. Of
course, there are idle states also per CPU, which potentially could be
managed by the genpd governor as well, but at this point I decided to
re-use the cpuidle governor as it's already doing the job.
2. In case the decision of what idle state to enter for the group is
done by the FW, we can rely solely on the cpuidle governor and let it
select states per CPU basis. This is used for PSCI PC mode.
>
> There may be other ways. My point is that current code, with two (or
> more if the hierarchy grows) sets of idle states across two subsystems
> (CPUidle and genPD) is not very well defined and honestly very hard to
> grasp and prone to errors.
The complexity is there, I admit that.
I guess some deeper insight about genpd+its governor+runtime PM are
needed when reviewing this, of course. As an option, you may also have
a look at my slides [1] from OSPM (Pisa) in May this year, which via
flow charts try to describes things in more detail.
In our offlist meeting, we discussed that perhaps moving some of the
new PSCI code introduced in this series, into a cpuidle driver
instead, could make things more clear. For sure, I can explore that
option, but before I go there, I think we should agree on it publicly.
In principle what it means is to invent a special cpuidle driver for
PSCI, so we would need access to some of the PSCI internal functions,
for example.
One thing though, the initialization of the PSCI PM domain topology is
a separate step, managed via the new initcall, psci_dt_topology_init()
(introduced in patch 11). That part still seems to be belong to the
PSCI code, don't you think?
>
> > +
> > + strncpy(idle_state->name, to_of_node(state->fwnode)->name,
> > + CPUIDLE_NAME_LEN - 1);
> > + strncpy(idle_state->desc, to_of_node(state->fwnode)->name,
> > + CPUIDLE_NAME_LEN - 1);
> > +}
> > +
> > +static bool psci_pd_is_provider(struct device_node *np)
> > +{
> > + struct psci_pd_provider *pd_prov, *it;
> > +
> > + list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) {
> > + if (pd_prov->node == np)
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > static int psci_pd_init(struct device_node *np)
> > {
> > struct generic_pm_domain *pd;
> > @@ -265,4 +316,71 @@ int psci_dt_init_pm_domains(struct device_node *np)
> > pr_err("failed to create CPU PM domains ret=%d\n", ret);
> > return ret;
> > }
> > +
> > +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> > + struct device_node *cpu_node, u32 *psci_states)
> > +{
> > + struct genpd_power_state *pd_states;
> > + struct of_phandle_args args;
> > + int ret, pd_state_count, i, state_idx, psci_idx;
> > + u32 cpu_psci_state = psci_states[drv->state_count - 2];
>
> This (-2) is very dodgy and I doubt it would work on hierarchies going
> above "cluster" level.
>
> As I say above, I think we should work towards a single array of
> idle states to be selected by a CPUidle governor using genPD
> runtime information to bias the results according to the number
> of CPUs in a genPD that entered/exit idle.
>
> To be more precise, all idles states should be "domain-idle-state"
> compatible, even the CPU ones, the distinction between what CPUidle
> and genPD manage is a bit stretched IMO in this patchset.
>
> We will have a chance to talk about this but I thought I would
> comment publically if anyone else is willing to chime in, this
> is not a PSCI problem at all, it is a CPUidle/genPD coexistence
> design problem which is much broader.
To move this forward, I think we need to move from vague ideas to
clear and distinct plans. Whatever that means. :-)
I understand you are concerned about the level of changes introduced
to the PSCI code. As I stated somewhere in earlier replies, I picked
that approach as I thought it was better to implement things in a PSCI
specific manner to start with, then we could move things around, when
we realize that it make sense.
Anyway, as a suggestion to address your concern, how about this:
1. Move some things out to a PSCI cpuidle driver. We need to decide
more exactly on what to move and find the right level for the
interfaces.
2. Don't attach the CPU to the PM domain topology in case the PSCI PC
mode is used. I think this makes it easier, at least as a first step,
to understand when runtime PM needs to be used/enabled.
3. Would it help if I volunteer to help you guys as a maintainer for
PSCI. At least for the part of the new code that becomes introduced?
[...]
Kind regards
Uffe
On Mon, May 13, 2019 at 09:22:59PM +0200, Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> In the hierarchical layout, we are creating power domains around each CPU
> and describes the idle states for them inside the power domain provider
> node. Note that, the CPU's idle states still needs to be compatible with
> "arm,idle-state".
>
> Furthermore, represent the CPU cluster as a separate master power domain,
> powering the CPU's power domains. The cluster node, contains the idle
> states for the cluster and each idle state needs to be compatible with the
> "domain-idle-state".
>
> If the running platform is using a PSCI FW that supports the OS initiated
> CPU suspend mode, which likely should be the case unless the PSCI FW is
> very old, this change triggers the PSCI driver to enable it.
>
> Cc: Andy Gross <[email protected]>
> Cc: David Brown <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
[...]
> @@ -166,12 +170,57 @@
> min-residency-us = <2000>;
> local-timer-stop;
> };
> +
> + CLUSTER_RET: cluster-retention {
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000010>;
> + entry-latency-us = <500>;
> + exit-latency-us = <500>;
> + min-residency-us = <2000>;
> + };
> +
> + CLUSTER_PWRDN: cluster-gdhs {
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000030>;
> + entry-latency-us = <2000>;
> + exit-latency-us = <2000>;
> + min-residency-us = <6000>;
> + };
> };
> };
I was trying to understand the composition of composite state parameters
in this series and that made me look at these DT examples.
What format does the above platform use ? I tried matching them to
both original as well as extended format and I fail to understand.
Assuming original format:
State power_state PowerLevel StateType StateID
SPC 0x40000002 0(core) 0(Retention) 0x2 (Res0 b[29]=1?)
CLUSTER_RET 0x1000010 1(clusters) 0(Retention) 0x10
CLUSTER_PWRDN 0x1000030 1(clusters) 0(Retention?) 0x30
Now extended format:
State power_state StateType StateID
SPC 0x40000002 0(Retention) 0x40000002 (Res0 b[29]=1?)
CLUSTER_RET 0x1000010 0(Retention) 0x1000010
CLUSTER_PWRDN 0x1000030 0(Retention?) 0x1000030
What am I missing ?
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:23:00PM +0200, Ulf Hansson wrote:
> To enable the OS to manage last-man standing activities for a CPU, while an
> idle state for a group of CPUs is selected, let's convert the Hikey
> platform into using the hierarchical CPU topology layout.
>
> Cc: Wei Xu <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - None.
>
> ---
> arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 ++++++++++++++++++++---
> 1 file changed, 76 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> index 108e2a4227f6..36ff460f428f 100644
> --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> cpus {
[...]
> @@ -70,9 +128,8 @@
> };
>
> CLUSTER_SLEEP: cluster-sleep {
> - compatible = "arm,idle-state";
> - local-timer-stop;
> - arm,psci-suspend-param = <0x1010000>;
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000000>;
> entry-latency-us = <1000>;
> exit-latency-us = <700>;
> min-residency-us = <2700>;
Again this must be original format and as per PSCI spec, your patch
changes this cluster sleep state into cluster retention state which I
think is not what you intended.
--
Regards,
Sudeep
On Tue, Jul 16, 2019 at 10:45:49AM +0200, Ulf Hansson wrote:
[...]
> > > +static void psci_pd_convert_states(struct cpuidle_state *idle_state,
> > > + u32 *psci_state, struct genpd_power_state *state)
> > > +{
> > > + u32 *state_data = state->data;
> > > + u64 target_residency_us = state->residency_ns;
> > > + u64 exit_latency_us = state->power_on_latency_ns +
> > > + state->power_off_latency_ns;
> > > +
> > > + *psci_state = *state_data;
> > > + do_div(target_residency_us, 1000);
> > > + idle_state->target_residency = target_residency_us;
> > > + do_div(exit_latency_us, 1000);
> > > + idle_state->exit_latency = exit_latency_us;
> > > + idle_state->enter = &psci_pd_enter_pc;
> > > + idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc;
> > > + idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
> >
> > This is arbitrary and not necessarily true.
>
> The arbitrary thing you refer to here, is that the
> CPUIDLE_FLAG_TIMER_STOP? Or are you referring to the complete function
> psci_pd_convert_states()?
I refer to CPUIDLE_FLAG_TIMER_STOP. I think that on platform coordinated
system we should not bother about the hierarchical representation of the
states (I understand I asked you to make it work but it has become too
complex, I would rather focus on making the hierarchical representation
work for all idle states combination in OSI mode).
Plus side, another level of complexity removed.
> > I think that this patch is useful to represent my reservations about the
> > current approach. As a matter of fact, idle state entry will always be a
> > CPUidle decision.
> >
> > You only need PM domain information to understand when all CPUs
> > in a power domain are actually idle but that's all genPD can do
> > in this respect.
> >
> > I think this patchset would be much simpler if both CPUidle and
> > genPD governor would work on *one* set of idle states, globally
> > indexed (and that would be true for PSCI suspend parameters too).
> >
> > To work with a unified set of idle states between CPUidle and genPD
> > (tossing some ideas around):
> >
> > - We can implement a genPD CPUidle governor that in its select method
> > takes into account genPD information (for instance by avoiding
> > selection of idle states that require multiple cpus to be in idle
> > to be effectively active)
> > - We can use genPD to enable/disable CPUidle states through runtime
> > PM information
>
> I don't understand how to make this work.
>
> The CPUidle governor works on per CPU basis. The genpd governor works
> on per PM domain basis, which typically can be a group of CPUs (and
> even other devices) via subdomains, for example.
>
> 1.
> In case of Linux being in *charge* of what idle state to pick for a
> group of CPUs, that decision is best done by the genpd governor as it
> operates on "groups" of CPUs. This is used for PSCI OSI mode. Of
> course, there are idle states also per CPU, which potentially could be
> managed by the genpd governor as well, but at this point I decided to
> re-use the cpuidle governor as it's already doing the job.
>
> 2. In case the decision of what idle state to enter for the group is
> done by the FW, we can rely solely on the cpuidle governor and let it
> select states per CPU basis. This is used for PSCI PC mode.
>
> >
> > There may be other ways. My point is that current code, with two (or
> > more if the hierarchy grows) sets of idle states across two subsystems
> > (CPUidle and genPD) is not very well defined and honestly very hard to
> > grasp and prone to errors.
>
> The complexity is there, I admit that.
>
> I guess some deeper insight about genpd+its governor+runtime PM are
> needed when reviewing this, of course. As an option, you may also have
> a look at my slides [1] from OSPM (Pisa) in May this year, which via
> flow charts try to describes things in more detail.
>
> In our offlist meeting, we discussed that perhaps moving some of the
> new PSCI code introduced in this series, into a cpuidle driver
> instead, could make things more clear. For sure, I can explore that
> option, but before I go there, I think we should agree on it publicly.
I will do it but given that the generic idle infrastructure basically
is there for PSCI and:
drivers/soc/qcom/spm.c
if we create a PSCI CPUidle driver we can write one for qcom-spm and
remove the generic idle infrastructure, there would not be much
point in keeping it in the kernel; at least on ARM64 not using
PSCI for CPUidle is not even an option.
> In principle what it means is to invent a special cpuidle driver for
> PSCI, so we would need access to some of the PSCI internal functions,
> for example.
Yes.
> One thing though, the initialization of the PSCI PM domain topology is
> a separate step, managed via the new initcall, psci_dt_topology_init()
> (introduced in patch 11). That part still seems to be belong to the
> PSCI code, don't you think?
Yes but at least we can call it from a sensible place (well, sensible,
most likely from an initcall given how idle drivers are initialized).
> > > + strncpy(idle_state->name, to_of_node(state->fwnode)->name,
> > > + CPUIDLE_NAME_LEN - 1);
> > > + strncpy(idle_state->desc, to_of_node(state->fwnode)->name,
> > > + CPUIDLE_NAME_LEN - 1);
> > > +}
> > > +
> > > +static bool psci_pd_is_provider(struct device_node *np)
> > > +{
> > > + struct psci_pd_provider *pd_prov, *it;
> > > +
> > > + list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) {
> > > + if (pd_prov->node == np)
> > > + return true;
> > > + }
> > > +
> > > + return false;
> > > +}
> > > +
> > > static int psci_pd_init(struct device_node *np)
> > > {
> > > struct generic_pm_domain *pd;
> > > @@ -265,4 +316,71 @@ int psci_dt_init_pm_domains(struct device_node *np)
> > > pr_err("failed to create CPU PM domains ret=%d\n", ret);
> > > return ret;
> > > }
> > > +
> > > +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> > > + struct device_node *cpu_node, u32 *psci_states)
> > > +{
> > > + struct genpd_power_state *pd_states;
> > > + struct of_phandle_args args;
> > > + int ret, pd_state_count, i, state_idx, psci_idx;
> > > + u32 cpu_psci_state = psci_states[drv->state_count - 2];
> >
> > This (-2) is very dodgy and I doubt it would work on hierarchies going
> > above "cluster" level.
> >
> > As I say above, I think we should work towards a single array of
> > idle states to be selected by a CPUidle governor using genPD
> > runtime information to bias the results according to the number
> > of CPUs in a genPD that entered/exit idle.
> >
> > To be more precise, all idles states should be "domain-idle-state"
> > compatible, even the CPU ones, the distinction between what CPUidle
> > and genPD manage is a bit stretched IMO in this patchset.
> >
> > We will have a chance to talk about this but I thought I would
> > comment publically if anyone else is willing to chime in, this
> > is not a PSCI problem at all, it is a CPUidle/genPD coexistence
> > design problem which is much broader.
>
> To move this forward, I think we need to move from vague ideas to
> clear and distinct plans. Whatever that means. :-)
See above.
> I understand you are concerned about the level of changes introduced
> to the PSCI code. As I stated somewhere in earlier replies, I picked
> that approach as I thought it was better to implement things in a PSCI
> specific manner to start with, then we could move things around, when
> we realize that it make sense.
I am also concerned about how the idle states are managed in
this patchset and I am pretty certain it will break when we
move away from a simple hierarchy with one CPU state and one
cluster state, we will comment on the specifics.
Moving PSCI code into a CPUidle driver will cater for the rest.
> Anyway, as a suggestion to address your concern, how about this:
>
> 1. Move some things out to a PSCI cpuidle driver. We need to decide
> more exactly on what to move and find the right level for the
> interfaces.
I will do it and post patches asap.
> 2. Don't attach the CPU to the PM domain topology in case the PSCI PC
> mode is used. I think this makes it easier, at least as a first step,
> to understand when runtime PM needs to be used/enabled.
In the PSCI CPUidle driver we can have two distinct struct
cpuidle_state->enter functions for PC and OSI, no overhead
for PC, runtime PM for OSI, decoupling done.
We can choose one or the other depending on whether:
OSI iff:
- OSI is available
- hierarchical idle states are present in DT
otherwise PC.
That's what this patch does but we will do it in a unified file.
> 3. Would it help if I volunteer to help you guys as a maintainer for
> PSCI. At least for the part of the new code that becomes introduced?
We will do as described above if that makes sense.
Thanks,
Lorenzo
On Mon, May 13, 2019 at 09:22:49PM +0200, Ulf Hansson wrote:
> The per CPU variable psci_power_state, contains an array of fixed values,
> which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> each of the available CPU idle states.
>
> This isn't sufficient when using the hierarchical CPU topology in DT in
> combination with having PSCI OS initiated (OSI) mode enabled. More
> precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> idle state the cluster (a group of CPUs) should enter, while in PSCI
> Platform Coordinated (PC) mode, each CPU independently votes for an idle
> state of the cluster.
>
> For this reason, let's introduce an additional per CPU variable called
> domain_state and implement two helper functions to read/write its values.
> Following patches, which implements PM domain support for PSCI, will use
> the domain_state variable and set it to corresponding bits that represents
> the selected idle state for the cluster.
>
> Finally, in psci_cpu_suspend_enter() and psci_suspend_finisher(), let's
> take into account the values in the domain_state, as to get the complete
> suspend parameter.
>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - Clarify changelog.
> - Drop changes in psci_cpu_on() as it belongs in the patch for hotplug.
> - Move some code inside "#ifdef CONFIG_CPU_IDLE".
>
> ---
> drivers/firmware/psci/psci.c | 24 ++++++++++++++++++++----
> 1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index b11560f7c4b9..4aec513136e4 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -285,6 +285,17 @@ static int __init psci_features(u32 psci_func_id)
>
> #ifdef CONFIG_CPU_IDLE
> static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
> +static DEFINE_PER_CPU(u32, domain_state);
> +
> +static inline u32 psci_get_domain_state(void)
> +{
> + return __this_cpu_read(domain_state);
> +}
> +
> +static inline void psci_set_domain_state(u32 state)
> +{
> + __this_cpu_write(domain_state, state);
> +}
>
> static int psci_dt_parse_state_node(struct device_node *np, u32 *state)
> {
> @@ -420,15 +431,17 @@ int psci_cpu_init_idle(struct cpuidle_driver *drv, unsigned int cpu)
> static int psci_suspend_finisher(unsigned long index)
> {
> u32 *state = __this_cpu_read(psci_power_state);
> + u32 composite_state = state[index - 1] | psci_get_domain_state();
>
The more I read this code and PSCI spec, I think it's not simple OR here
unless the specification states that. ACPI LPI explicitly stated that as
it was generic and PSCI doesn't. It can be made workable for original
format, but I think it's not that simple for extended format unless the
suspend parameters are carefully designed to achieve that, so we can't
just convert existing platforms the way it's shown on HiKey in this series.
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
> When the hierarchical CPU topology layout is used in DT, we need to setup
> the corresponding PM domain data structures, as to allow a CPU and a group
> of CPUs to be power managed accordingly. Let's enable this by deploying
> support through the genpd interface.
>
> Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> also parse the domain idle states DT bindings as to make genpd responsible
> for the state selection, when the states are compatible with
> "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> supported, we rely solely on the state selection to be managed through the
> regular cpuidle framework.
>
> If the initialization of the PM domain data structures succeeds and the OS
> initiated mode is supported, we try to switch to it. In case it fails,
> let's fall back into a degraded mode, rather than bailing out and returning
> an error code.
>
> Due to that the OS initiated mode may become enabled, we need to adjust to
> maintain backwards compatibility for a kernel started through a kexec call.
> Do this by explicitly switch to Platform Coordinated mode during boot.
>
> Finally, the actual initialization of the PM domain data structures, is
> done via calling the new shared function, psci_dt_init_pm_domains().
> However, this is implemented by subsequent changes.
>
> Co-developed-by: Lina Iyer <[email protected]>
> Signed-off-by: Lina Iyer <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - Simplify code setting domain_state at power off.
> - Use the genpd ->free_state() callback to manage freeing of states.
> - Fixup a bogus while loop.
>
> ---
> drivers/firmware/psci/Makefile | 2 +-
> drivers/firmware/psci/psci.c | 7 +-
> drivers/firmware/psci/psci.h | 5 +
> drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
> 4 files changed, 280 insertions(+), 2 deletions(-)
> create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>
[...]
> #endif /* __PSCI_H */
> diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> new file mode 100644
> index 000000000000..3c6ca846caf4
> --- /dev/null
> +++ b/drivers/firmware/psci/psci_pm_domain.c
> @@ -0,0 +1,268 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PM domains for CPUs via genpd - managed by PSCI.
> + *
> + * Copyright (C) 2019 Linaro Ltd.
> + * Author: Ulf Hansson <[email protected]>
> + *
> + */
> +
[...]
> +static int psci_pd_power_off(struct generic_pm_domain *pd)
> +{
> + struct genpd_power_state *state = &pd->states[pd->state_idx];
> + u32 *pd_state;
> +
> + /* If we have failed to enable OSI mode, then abort power off. */
> + if (psci_has_osi_support() && !osi_mode_enabled)
> + return -EBUSY;
> +
> + if (!state->data)
> + return 0;
> +
> + /* When OSI mode is enabled, set the corresponding domain state. */
> + pd_state = state->data;
> + psci_set_domain_state(*pd_state);
I trying to understand how would this scale to level 2(cluster of
clusters or for simply system). The current code for psci_set_domain_state
just stores the value @pd_state into per-cpu domain_state. E.g.: Now if
the system level pd is getting called after cluster PD, it will set the
domain state to system level PD state. It won't work with original
format and it may work with extended format if it's carefully crafted.
In short, the point is just over-writing domain_state is asking for
troubles IMO.
--
Regards,
Sudeep
On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> When the hierarchical CPU topology layout is used in DT, let's allow the
> CPU to be power managed through its PM domain, via deploying runtime PM
> support.
>
> To know for which idle states runtime PM reference counting is needed,
> let's store the index of deepest idle state for the CPU, in a per CPU
> variable. This allows psci_cpu_suspend_enter() to compare this index with
> the requested idle state index and then act accordingly.
I do not see why a system with two CPU CPUidle states, say CPU retention
and CPU shutdown, should not be calling runtime PM on CPU retention
entry.
The question then is what cluster/package/system states
are allowed for a given CPU idle state, to understand
what idle states can be actually entered at any hierarchy
level given the choice made for the CPU idle state.
In the case above, a CPU entering retention state should prevent
runtime PM selecting a cluster shutdown state; most likely firmware
would demote the request to cluster retention but still, we should
find a way to describe these dependencies.
Thanks,
Lorenzo
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - Simplify the code by using the new per CPU struct, that stores the
> needed struct device*.
>
> ---
> drivers/firmware/psci/psci.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 54e23d4ed0ea..2c4157d3a616 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -20,6 +20,7 @@
> #include <linux/linkage.h>
> #include <linux/of.h>
> #include <linux/pm.h>
> +#include <linux/pm_runtime.h>
> #include <linux/printk.h>
> #include <linux/psci.h>
> #include <linux/reboot.h>
> @@ -298,6 +299,7 @@ static int __init psci_features(u32 psci_func_id)
>
> struct psci_cpuidle_data {
> u32 *psci_states;
> + u32 rpm_state_id;
> struct device *dev;
> };
>
> @@ -385,6 +387,7 @@ static int psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
> goto free_mem;
>
> data->dev = dev;
> + data->rpm_state_id = drv->state_count - 1;
> }
>
> /* Idle states parsed correctly, store them in the per-cpu struct. */
> @@ -481,8 +484,11 @@ static int psci_suspend_finisher(unsigned long index)
> int psci_cpu_suspend_enter(unsigned long index)
> {
> int ret;
> - u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
> - u32 composite_state = state[index - 1] | psci_get_domain_state();
> + struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> + u32 *states = data->psci_states;
> + struct device *dev = data->dev;
> + bool runtime_pm = (dev && data->rpm_state_id == index);
> + u32 composite_state;
>
> /*
> * idle state index 0 corresponds to wfi, should never be called
> @@ -491,11 +497,23 @@ int psci_cpu_suspend_enter(unsigned long index)
> if (WARN_ON_ONCE(!index))
> return -EINVAL;
>
> + /*
> + * Do runtime PM if we are using the hierarchical CPU toplogy, but only
> + * when cpuidle have selected the deepest idle state for the CPU.
> + */
> + if (runtime_pm)
> + pm_runtime_put_sync_suspend(dev);
> +
> + composite_state = states[index - 1] | psci_get_domain_state();
> +
> if (!psci_power_state_loses_context(composite_state))
> ret = psci_ops.cpu_suspend(composite_state, 0);
> else
> ret = cpu_suspend(index, psci_suspend_finisher);
>
> + if (runtime_pm)
> + pm_runtime_get_sync(dev);
> +
> /* Clear the domain state to start fresh when back from idle. */
> psci_set_domain_state(0);
>
> --
> 2.17.1
>
On Tue, Jul 16 2019 at 08:47 -0600, Sudeep Holla wrote:
>On Mon, May 13, 2019 at 09:22:59PM +0200, Ulf Hansson wrote:
>> From: Lina Iyer <[email protected]>
>>
>> In the hierarchical layout, we are creating power domains around each CPU
>> and describes the idle states for them inside the power domain provider
>> node. Note that, the CPU's idle states still needs to be compatible with
>> "arm,idle-state".
>>
>> Furthermore, represent the CPU cluster as a separate master power domain,
>> powering the CPU's power domains. The cluster node, contains the idle
>> states for the cluster and each idle state needs to be compatible with the
>> "domain-idle-state".
>>
>> If the running platform is using a PSCI FW that supports the OS initiated
>> CPU suspend mode, which likely should be the case unless the PSCI FW is
>> very old, this change triggers the PSCI driver to enable it.
>>
>> Cc: Andy Gross <[email protected]>
>> Cc: David Brown <[email protected]>
>> Signed-off-by: Lina Iyer <[email protected]>
>> Co-developed-by: Ulf Hansson <[email protected]>
>> Signed-off-by: Ulf Hansson <[email protected]>
>> ---
>
>[...]
>
>> @@ -166,12 +170,57 @@
>> min-residency-us = <2000>;
>> local-timer-stop;
>> };
>> +
>> + CLUSTER_RET: cluster-retention {
>> + compatible = "domain-idle-state";
>> + arm,psci-suspend-param = <0x1000010>;
>> + entry-latency-us = <500>;
>> + exit-latency-us = <500>;
>> + min-residency-us = <2000>;
>> + };
>> +
>> + CLUSTER_PWRDN: cluster-gdhs {
>> + compatible = "domain-idle-state";
>> + arm,psci-suspend-param = <0x1000030>;
>> + entry-latency-us = <2000>;
>> + exit-latency-us = <2000>;
>> + min-residency-us = <6000>;
>> + };
>> };
>> };
>
>I was trying to understand the composition of composite state parameters
>in this series and that made me look at these DT examples.
>
This was meant to depict a hierarchical state format for OSI.
>What format does the above platform use ? I tried matching them to
>both original as well as extended format and I fail to understand.
>Assuming original format:
> State power_state PowerLevel StateType StateID
> SPC 0x40000002 0(core) 0(Retention) 0x2 (Res0 b[29]=1?)
> CLUSTER_RET 0x1000010 1(clusters) 0(Retention) 0x10
> CLUSTER_PWRDN 0x1000030 1(clusters) 0(Retention?) 0x30
>Now extended format:
> State power_state StateType StateID
> SPC 0x40000002 0(Retention) 0x40000002 (Res0 b[29]=1?)
> CLUSTER_RET 0x1000010 0(Retention) 0x1000010
The composite state would comprise of CPU state and Cluster state.
So for the last CPU entering idle -
(CLUSTER_RET | SPC)
0x41000012
> CLUSTER_PWRDN 0x1000030 0(Retention?) 0x1000030
>
(CLUSTER_PWRDN | SPC)
0x41000032
Hope this helps.
Lina
On Tue, Jul 16, 2019 at 02:36:31PM -0600, Lina Iyer wrote:
> On Tue, Jul 16 2019 at 08:47 -0600, Sudeep Holla wrote:
> > On Mon, May 13, 2019 at 09:22:59PM +0200, Ulf Hansson wrote:
> > > From: Lina Iyer <[email protected]>
> > >
> > > In the hierarchical layout, we are creating power domains around each CPU
> > > and describes the idle states for them inside the power domain provider
> > > node. Note that, the CPU's idle states still needs to be compatible with
> > > "arm,idle-state".
> > >
> > > Furthermore, represent the CPU cluster as a separate master power domain,
> > > powering the CPU's power domains. The cluster node, contains the idle
> > > states for the cluster and each idle state needs to be compatible with the
> > > "domain-idle-state".
> > >
> > > If the running platform is using a PSCI FW that supports the OS initiated
> > > CPU suspend mode, which likely should be the case unless the PSCI FW is
> > > very old, this change triggers the PSCI driver to enable it.
> > >
> > > Cc: Andy Gross <[email protected]>
> > > Cc: David Brown <[email protected]>
> > > Signed-off-by: Lina Iyer <[email protected]>
> > > Co-developed-by: Ulf Hansson <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> >
> > [...]
> >
> > > @@ -166,12 +170,57 @@
> > > min-residency-us = <2000>;
> > > local-timer-stop;
> > > };
> > > +
> > > + CLUSTER_RET: cluster-retention {
> > > + compatible = "domain-idle-state";
> > > + arm,psci-suspend-param = <0x1000010>;
> > > + entry-latency-us = <500>;
> > > + exit-latency-us = <500>;
> > > + min-residency-us = <2000>;
> > > + };
> > > +
> > > + CLUSTER_PWRDN: cluster-gdhs {
> > > + compatible = "domain-idle-state";
> > > + arm,psci-suspend-param = <0x1000030>;
> > > + entry-latency-us = <2000>;
> > > + exit-latency-us = <2000>;
> > > + min-residency-us = <6000>;
> > > + };
> > > };
> > > };
> >
> > I was trying to understand the composition of composite state parameters
> > in this series and that made me look at these DT examples.
> >
> This was meant to depict a hierarchical state format for OSI.
>
Hmm, I am more confused. We have 2 formats: original and extended.
1. Original:
31:26 Reserved. Must be zero.
25:24 PowerLevel
23:17 Reserved. Must be zero.
16 StateType
15:0 StateID
2. Extended
31 Reserved. Must be zero.
30 StateType
29:28 Reserved. Must be zero.
27:0 StateID
I was trying to match them to that. I think I commented on other patches.
I think simple OR logic breaks with extended format easily if StateIDs
are not carefully crafted which is not mandated and hence the trouble.
The same holds to original format but with PowerLevel, it slightly
relaxing things a bit but still it needs to be crafted when firmware
decides these parameters. E.g.: what is done with HiKey platform is
completely wrong.
It's helpful if we want to avoid save/restore for retention states.
CPU_PM_CPU_IDLE_ENTER_RETENTION vs CPU_PM_CPU_IDLE_ENTER
> > What format does the above platform use ? I tried matching them to
> > both original as well as extended format and I fail to understand.
> > Assuming original format:
> > State power_state PowerLevel StateType StateID
> > SPC 0x40000002 0(core) 0(Retention) 0x2 (Res0 b[29]=1?)
> > CLUSTER_RET 0x1000010 1(clusters) 0(Retention) 0x10
> > CLUSTER_PWRDN 0x1000030 1(clusters) 0(Retention?) 0x30
> > Now extended format:
> > State power_state StateType StateID
> > SPC 0x40000002 0(Retention) 0x40000002 (Res0 b[29]=1?)
> > CLUSTER_RET 0x1000010 0(Retention) 0x1000010
> The composite state would comprise of CPU state and Cluster state.
> So for the last CPU entering idle -
> (CLUSTER_RET | SPC)
> 0x41000012
> > CLUSTER_PWRDN 0x1000030 0(Retention?) 0x1000030
> >
> (CLUSTER_PWRDN | SPC)
> 0x41000032
>
> Hope this helps.
>
I just follow OR logic. I have made wrong reference to bit 29 above(I
can't read simple 32 bit number anymore :(), it should bit 30 and if
this platform follow extended state, then it makes some sense. But
I expect CLUSTER_PWRDN also to have bit 30 set. I tried to match to both
formats and failed to understand which it follows, so thought of asking.
--
Regards,
Sudeep
On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > When the hierarchical CPU topology layout is used in DT, let's allow the
> > CPU to be power managed through its PM domain, via deploying runtime PM
> > support.
> >
> > To know for which idle states runtime PM reference counting is needed,
> > let's store the index of deepest idle state for the CPU, in a per CPU
> > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > the requested idle state index and then act accordingly.
>
> I do not see why a system with two CPU CPUidle states, say CPU retention
> and CPU shutdown, should not be calling runtime PM on CPU retention
> entry.
If the CPU idle governor did select the CPU retention for the CPU, it
was probably because the target residency for the CPU shutdown state
could not be met.
In this case, there is no point in allowing any other deeper idle
states for cluster/package/system, since those have even greater
residencies, hence calling runtime PM doesn't make sense.
>
> The question then is what cluster/package/system states
> are allowed for a given CPU idle state, to understand
> what idle states can be actually entered at any hierarchy
> level given the choice made for the CPU idle state.
>
> In the case above, a CPU entering retention state should prevent
> runtime PM selecting a cluster shutdown state; most likely firmware
> would demote the request to cluster retention but still, we should
> find a way to describe these dependencies.
See above.
[...]
Kind regards
Uffe
On Tue, 16 Jul 2019 at 16:47, Sudeep Holla <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:23:00PM +0200, Ulf Hansson wrote:
> > To enable the OS to manage last-man standing activities for a CPU, while an
> > idle state for a group of CPUs is selected, let's convert the Hikey
> > platform into using the hierarchical CPU topology layout.
> >
> > Cc: Wei Xu <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes:
> > - None.
> >
> > ---
> > arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 ++++++++++++++++++++---
> > 1 file changed, 76 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> > index 108e2a4227f6..36ff460f428f 100644
> > --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> > +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> > cpus {
>
> [...]
>
> > @@ -70,9 +128,8 @@
> > };
> >
> > CLUSTER_SLEEP: cluster-sleep {
> > - compatible = "arm,idle-state";
> > - local-timer-stop;
> > - arm,psci-suspend-param = <0x1010000>;
> > + compatible = "domain-idle-state";
> > + arm,psci-suspend-param = <0x1000000>;
> > entry-latency-us = <1000>;
> > exit-latency-us = <700>;
> > min-residency-us = <2700>;
>
> Again this must be original format and as per PSCI spec, your patch
> changes this cluster sleep state into cluster retention state which I
> think is not what you intended.
If the hierarchical topology is used, the parameter for cluster states
are ORed with the deepest idle state for the CPU.
CPU_SLEEP: 0x0010000
CLUSTER_SLEEP: 0x1000000
After the ORed operation
CLUSTER_SLEEP: 0x1010000
So, this indeed works as expected.
However, are you saying that ORing the state parameters like above has
other problems? I am reading your other replies...
Kind regards
Uffe
On Tue, 16 Jul 2019 at 17:05, Sudeep Holla <[email protected]> wrote:
>
> On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
> > When the hierarchical CPU topology layout is used in DT, we need to setup
> > the corresponding PM domain data structures, as to allow a CPU and a group
> > of CPUs to be power managed accordingly. Let's enable this by deploying
> > support through the genpd interface.
> >
> > Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> > also parse the domain idle states DT bindings as to make genpd responsible
> > for the state selection, when the states are compatible with
> > "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> > supported, we rely solely on the state selection to be managed through the
> > regular cpuidle framework.
> >
> > If the initialization of the PM domain data structures succeeds and the OS
> > initiated mode is supported, we try to switch to it. In case it fails,
> > let's fall back into a degraded mode, rather than bailing out and returning
> > an error code.
> >
> > Due to that the OS initiated mode may become enabled, we need to adjust to
> > maintain backwards compatibility for a kernel started through a kexec call.
> > Do this by explicitly switch to Platform Coordinated mode during boot.
> >
> > Finally, the actual initialization of the PM domain data structures, is
> > done via calling the new shared function, psci_dt_init_pm_domains().
> > However, this is implemented by subsequent changes.
> >
> > Co-developed-by: Lina Iyer <[email protected]>
> > Signed-off-by: Lina Iyer <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes:
> > - Simplify code setting domain_state at power off.
> > - Use the genpd ->free_state() callback to manage freeing of states.
> > - Fixup a bogus while loop.
> >
> > ---
> > drivers/firmware/psci/Makefile | 2 +-
> > drivers/firmware/psci/psci.c | 7 +-
> > drivers/firmware/psci/psci.h | 5 +
> > drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
> > 4 files changed, 280 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/firmware/psci/psci_pm_domain.c
> >
>
> [...]
>
> > #endif /* __PSCI_H */
> > diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> > new file mode 100644
> > index 000000000000..3c6ca846caf4
> > --- /dev/null
> > +++ b/drivers/firmware/psci/psci_pm_domain.c
> > @@ -0,0 +1,268 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * PM domains for CPUs via genpd - managed by PSCI.
> > + *
> > + * Copyright (C) 2019 Linaro Ltd.
> > + * Author: Ulf Hansson <[email protected]>
> > + *
> > + */
> > +
>
> [...]
>
> > +static int psci_pd_power_off(struct generic_pm_domain *pd)
> > +{
> > + struct genpd_power_state *state = &pd->states[pd->state_idx];
> > + u32 *pd_state;
> > +
> > + /* If we have failed to enable OSI mode, then abort power off. */
> > + if (psci_has_osi_support() && !osi_mode_enabled)
> > + return -EBUSY;
> > +
> > + if (!state->data)
> > + return 0;
> > +
> > + /* When OSI mode is enabled, set the corresponding domain state. */
> > + pd_state = state->data;
> > + psci_set_domain_state(*pd_state);
>
> I trying to understand how would this scale to level 2(cluster of
> clusters or for simply system). The current code for psci_set_domain_state
> just stores the value @pd_state into per-cpu domain_state. E.g.: Now if
> the system level pd is getting called after cluster PD, it will set the
> domain state to system level PD state. It won't work with original
> format and it may work with extended format if it's carefully crafted.
> In short, the point is just over-writing domain_state is asking for
> troubles IMO.
Thanks for spotting this!
While walking upwards in the PM domain topology, I thought I was ORing
the domain states, but clearly the code isn't doing that.
In principle we need to do the below instead.
pd_state = state->data;
composite_pd_state = *pd_state | psci_get_domain_state();
psci_set_domain_state(composite_pd_state);
Kind regards
Uffe
On Tue, 16 Jul 2019 at 16:51, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Tue, Jul 16, 2019 at 10:45:49AM +0200, Ulf Hansson wrote:
>
> [...]
>
> > > > +static void psci_pd_convert_states(struct cpuidle_state *idle_state,
> > > > + u32 *psci_state, struct genpd_power_state *state)
> > > > +{
> > > > + u32 *state_data = state->data;
> > > > + u64 target_residency_us = state->residency_ns;
> > > > + u64 exit_latency_us = state->power_on_latency_ns +
> > > > + state->power_off_latency_ns;
> > > > +
> > > > + *psci_state = *state_data;
> > > > + do_div(target_residency_us, 1000);
> > > > + idle_state->target_residency = target_residency_us;
> > > > + do_div(exit_latency_us, 1000);
> > > > + idle_state->exit_latency = exit_latency_us;
> > > > + idle_state->enter = &psci_pd_enter_pc;
> > > > + idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc;
> > > > + idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
> > >
> > > This is arbitrary and not necessarily true.
> >
> > The arbitrary thing you refer to here, is that the
> > CPUIDLE_FLAG_TIMER_STOP? Or are you referring to the complete function
> > psci_pd_convert_states()?
>
> I refer to CPUIDLE_FLAG_TIMER_STOP. I think that on platform coordinated
> system we should not bother about the hierarchical representation of the
> states (I understand I asked you to make it work but it has become too
> complex, I would rather focus on making the hierarchical representation
> work for all idle states combination in OSI mode).
>
> Plus side, another level of complexity removed.
Oh, well, that's why I left it out in the first place, to start simple. :-)
Anyway, no problem, I will revert back to that option.
>
> > > I think that this patch is useful to represent my reservations about the
> > > current approach. As a matter of fact, idle state entry will always be a
> > > CPUidle decision.
> > >
> > > You only need PM domain information to understand when all CPUs
> > > in a power domain are actually idle but that's all genPD can do
> > > in this respect.
> > >
> > > I think this patchset would be much simpler if both CPUidle and
> > > genPD governor would work on *one* set of idle states, globally
> > > indexed (and that would be true for PSCI suspend parameters too).
> > >
> > > To work with a unified set of idle states between CPUidle and genPD
> > > (tossing some ideas around):
> > >
> > > - We can implement a genPD CPUidle governor that in its select method
> > > takes into account genPD information (for instance by avoiding
> > > selection of idle states that require multiple cpus to be in idle
> > > to be effectively active)
> > > - We can use genPD to enable/disable CPUidle states through runtime
> > > PM information
> >
> > I don't understand how to make this work.
> >
> > The CPUidle governor works on per CPU basis. The genpd governor works
> > on per PM domain basis, which typically can be a group of CPUs (and
> > even other devices) via subdomains, for example.
> >
> > 1.
> > In case of Linux being in *charge* of what idle state to pick for a
> > group of CPUs, that decision is best done by the genpd governor as it
> > operates on "groups" of CPUs. This is used for PSCI OSI mode. Of
> > course, there are idle states also per CPU, which potentially could be
> > managed by the genpd governor as well, but at this point I decided to
> > re-use the cpuidle governor as it's already doing the job.
> >
> > 2. In case the decision of what idle state to enter for the group is
> > done by the FW, we can rely solely on the cpuidle governor and let it
> > select states per CPU basis. This is used for PSCI PC mode.
> >
> > >
> > > There may be other ways. My point is that current code, with two (or
> > > more if the hierarchy grows) sets of idle states across two subsystems
> > > (CPUidle and genPD) is not very well defined and honestly very hard to
> > > grasp and prone to errors.
> >
> > The complexity is there, I admit that.
> >
> > I guess some deeper insight about genpd+its governor+runtime PM are
> > needed when reviewing this, of course. As an option, you may also have
> > a look at my slides [1] from OSPM (Pisa) in May this year, which via
> > flow charts try to describes things in more detail.
> >
> > In our offlist meeting, we discussed that perhaps moving some of the
> > new PSCI code introduced in this series, into a cpuidle driver
> > instead, could make things more clear. For sure, I can explore that
> > option, but before I go there, I think we should agree on it publicly.
>
> I will do it but given that the generic idle infrastructure basically
> is there for PSCI and:
>
> drivers/soc/qcom/spm.c
>
> if we create a PSCI CPUidle driver we can write one for qcom-spm and
> remove the generic idle infrastructure, there would not be much
> point in keeping it in the kernel; at least on ARM64 not using
> PSCI for CPUidle is not even an option.
To make it clear, I definitely like this idea!
I am not really fond of current cpuidle backend infrastructure for
ARM/ARM64, it's really hard to follow all the things that happens in
those corresponding callbacks.
>
> > In principle what it means is to invent a special cpuidle driver for
> > PSCI, so we would need access to some of the PSCI internal functions,
> > for example.
>
> Yes.
>
> > One thing though, the initialization of the PSCI PM domain topology is
> > a separate step, managed via the new initcall, psci_dt_topology_init()
> > (introduced in patch 11). That part still seems to be belong to the
> > PSCI code, don't you think?
>
> Yes but at least we can call it from a sensible place (well, sensible,
> most likely from an initcall given how idle drivers are initialized).
Okay.
>
> > > > + strncpy(idle_state->name, to_of_node(state->fwnode)->name,
> > > > + CPUIDLE_NAME_LEN - 1);
> > > > + strncpy(idle_state->desc, to_of_node(state->fwnode)->name,
> > > > + CPUIDLE_NAME_LEN - 1);
> > > > +}
> > > > +
> > > > +static bool psci_pd_is_provider(struct device_node *np)
> > > > +{
> > > > + struct psci_pd_provider *pd_prov, *it;
> > > > +
> > > > + list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) {
> > > > + if (pd_prov->node == np)
> > > > + return true;
> > > > + }
> > > > +
> > > > + return false;
> > > > +}
> > > > +
> > > > static int psci_pd_init(struct device_node *np)
> > > > {
> > > > struct generic_pm_domain *pd;
> > > > @@ -265,4 +316,71 @@ int psci_dt_init_pm_domains(struct device_node *np)
> > > > pr_err("failed to create CPU PM domains ret=%d\n", ret);
> > > > return ret;
> > > > }
> > > > +
> > > > +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv,
> > > > + struct device_node *cpu_node, u32 *psci_states)
> > > > +{
> > > > + struct genpd_power_state *pd_states;
> > > > + struct of_phandle_args args;
> > > > + int ret, pd_state_count, i, state_idx, psci_idx;
> > > > + u32 cpu_psci_state = psci_states[drv->state_count - 2];
> > >
> > > This (-2) is very dodgy and I doubt it would work on hierarchies going
> > > above "cluster" level.
> > >
> > > As I say above, I think we should work towards a single array of
> > > idle states to be selected by a CPUidle governor using genPD
> > > runtime information to bias the results according to the number
> > > of CPUs in a genPD that entered/exit idle.
> > >
> > > To be more precise, all idles states should be "domain-idle-state"
> > > compatible, even the CPU ones, the distinction between what CPUidle
> > > and genPD manage is a bit stretched IMO in this patchset.
> > >
> > > We will have a chance to talk about this but I thought I would
> > > comment publically if anyone else is willing to chime in, this
> > > is not a PSCI problem at all, it is a CPUidle/genPD coexistence
> > > design problem which is much broader.
> >
> > To move this forward, I think we need to move from vague ideas to
> > clear and distinct plans. Whatever that means. :-)
>
> See above.
>
> > I understand you are concerned about the level of changes introduced
> > to the PSCI code. As I stated somewhere in earlier replies, I picked
> > that approach as I thought it was better to implement things in a PSCI
> > specific manner to start with, then we could move things around, when
> > we realize that it make sense.
>
> I am also concerned about how the idle states are managed in
> this patchset and I am pretty certain it will break when we
> move away from a simple hierarchy with one CPU state and one
> cluster state, we will comment on the specifics.
The intent with the series is that this should be supported, no matter
of the number of states or the level of hierarchy.
Well, there are some limitations/bugs in genpd (and the genpd
governor) to support a greater level than 2, but that is on my TODO
list to fix. Again, some of my slides from OSPM Pisa explains this.
More importantly, the current deployment to PSCI should remain
unchanged after this series (unless there is a bug somewhere, as also
was pointed out by Sudeep in another reply).
>
> Moving PSCI code into a CPUidle driver will cater for the rest.
Great news, we seems to have a plan!
>
> > Anyway, as a suggestion to address your concern, how about this:
> >
> > 1. Move some things out to a PSCI cpuidle driver. We need to decide
> > more exactly on what to move and find the right level for the
> > interfaces.
>
> I will do it and post patches asap.
Okay, so I will wait for you to converting the cpuidle-arm driver into
a cpuidle-psci driver (and all the changes that comes with it) and
then base my re-base my series on top.
Then, would you mind sharing (even in an early phase) a
branch/git-tree so I can start re-basing my series on top?
>
> > 2. Don't attach the CPU to the PM domain topology in case the PSCI PC
> > mode is used. I think this makes it easier, at least as a first step,
> > to understand when runtime PM needs to be used/enabled.
>
> In the PSCI CPUidle driver we can have two distinct struct
> cpuidle_state->enter functions for PC and OSI, no overhead
> for PC, runtime PM for OSI, decoupling done.
Good idea!
>
> We can choose one or the other depending on whether:
>
> OSI iff:
> - OSI is available
> - hierarchical idle states are present in DT
>
> otherwise PC.
>
> That's what this patch does but we will do it in a unified file.
Sure, it makes sense.
>
> > 3. Would it help if I volunteer to help you guys as a maintainer for
> > PSCI. At least for the part of the new code that becomes introduced?
>
> We will do as described above if that makes sense.
Yep, I am okay with your suggestions, assuming I have understood them correctly.
BTW, have you considered to host a git tree for PSCI so we can have
changes pre-integrated and tested in Stephen Rothwell's linux-next
tree?
Kind regards
Uffe
On Thu, Jul 18, 2019 at 12:48:14PM +0200, Ulf Hansson wrote:
> On Tue, 16 Jul 2019 at 16:47, Sudeep Holla <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 09:23:00PM +0200, Ulf Hansson wrote:
> > > To enable the OS to manage last-man standing activities for a CPU, while an
> > > idle state for a group of CPUs is selected, let's convert the Hikey
> > > platform into using the hierarchical CPU topology layout.
> > >
> > > Cc: Wei Xu <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes:
> > > - None.
> > >
> > > ---
> > > arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 87 ++++++++++++++++++++---
> > > 1 file changed, 76 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> > > index 108e2a4227f6..36ff460f428f 100644
> > > --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> > > +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
> > > cpus {
> >
> > [...]
> >
> > > @@ -70,9 +128,8 @@
> > > };
> > >
> > > CLUSTER_SLEEP: cluster-sleep {
> > > - compatible = "arm,idle-state";
> > > - local-timer-stop;
> > > - arm,psci-suspend-param = <0x1010000>;
> > > + compatible = "domain-idle-state";
> > > + arm,psci-suspend-param = <0x1000000>;
> > > entry-latency-us = <1000>;
> > > exit-latency-us = <700>;
> > > min-residency-us = <2700>;
> >
> > Again this must be original format and as per PSCI spec, your patch
> > changes this cluster sleep state into cluster retention state which I
> > think is not what you intended.
>
> If the hierarchical topology is used, the parameter for cluster states
> are ORed with the deepest idle state for the CPU.
>
> CPU_SLEEP: 0x0010000
> CLUSTER_SLEEP: 0x1000000
>
> After the ORed operation
> CLUSTER_SLEEP: 0x1010000
>
> So, this indeed works as expected.
>
Yes, it works. But we are not XOR-ing so what's wrong in keeping the
StateType as required and be compliant to specification. Why do we need
to make the state param on it's own non-complaint.
What's wrong in retaining CLUSTER_SLEEP as 0x1010000 so that it reflects
the state level and type correctly on it's own ?
> However, are you saying that ORing the state parameters like above has
> other problems? I am reading your other replies...
>
Yes OR-ing may have other problems but the point I made here was more on
PSCI spec compliance for each suspend parameter values independently.
--
Regards,
Sudeep
On Thu, Jul 18, 2019 at 01:04:03PM +0200, Ulf Hansson wrote:
> On Tue, 16 Jul 2019 at 17:05, Sudeep Holla <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
> > > When the hierarchical CPU topology layout is used in DT, we need to setup
> > > the corresponding PM domain data structures, as to allow a CPU and a group
> > > of CPUs to be power managed accordingly. Let's enable this by deploying
> > > support through the genpd interface.
> > >
> > > Additionally, when the OS initiated mode is supported by the PSCI FW, let's
> > > also parse the domain idle states DT bindings as to make genpd responsible
> > > for the state selection, when the states are compatible with
> > > "domain-idle-state". Otherwise, when only Platform Coordinated mode is
> > > supported, we rely solely on the state selection to be managed through the
> > > regular cpuidle framework.
> > >
> > > If the initialization of the PM domain data structures succeeds and the OS
> > > initiated mode is supported, we try to switch to it. In case it fails,
> > > let's fall back into a degraded mode, rather than bailing out and returning
> > > an error code.
> > >
> > > Due to that the OS initiated mode may become enabled, we need to adjust to
> > > maintain backwards compatibility for a kernel started through a kexec call.
> > > Do this by explicitly switch to Platform Coordinated mode during boot.
> > >
> > > Finally, the actual initialization of the PM domain data structures, is
> > > done via calling the new shared function, psci_dt_init_pm_domains().
> > > However, this is implemented by subsequent changes.
> > >
> > > Co-developed-by: Lina Iyer <[email protected]>
> > > Signed-off-by: Lina Iyer <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes:
> > > - Simplify code setting domain_state at power off.
> > > - Use the genpd ->free_state() callback to manage freeing of states.
> > > - Fixup a bogus while loop.
> > >
> > > ---
> > > drivers/firmware/psci/Makefile | 2 +-
> > > drivers/firmware/psci/psci.c | 7 +-
> > > drivers/firmware/psci/psci.h | 5 +
> > > drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
> > > 4 files changed, 280 insertions(+), 2 deletions(-)
> > > create mode 100644 drivers/firmware/psci/psci_pm_domain.c
> > >
> >
> > [...]
> >
> > > #endif /* __PSCI_H */
> > > diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
> > > new file mode 100644
> > > index 000000000000..3c6ca846caf4
> > > --- /dev/null
> > > +++ b/drivers/firmware/psci/psci_pm_domain.c
> > > @@ -0,0 +1,268 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * PM domains for CPUs via genpd - managed by PSCI.
> > > + *
> > > + * Copyright (C) 2019 Linaro Ltd.
> > > + * Author: Ulf Hansson <[email protected]>
> > > + *
> > > + */
> > > +
> >
> > [...]
> >
> > > +static int psci_pd_power_off(struct generic_pm_domain *pd)
> > > +{
> > > + struct genpd_power_state *state = &pd->states[pd->state_idx];
> > > + u32 *pd_state;
> > > +
> > > + /* If we have failed to enable OSI mode, then abort power off. */
> > > + if (psci_has_osi_support() && !osi_mode_enabled)
> > > + return -EBUSY;
> > > +
> > > + if (!state->data)
> > > + return 0;
> > > +
> > > + /* When OSI mode is enabled, set the corresponding domain state. */
> > > + pd_state = state->data;
> > > + psci_set_domain_state(*pd_state);
> >
> > I trying to understand how would this scale to level 2(cluster of
> > clusters or for simply system). The current code for psci_set_domain_state
> > just stores the value @pd_state into per-cpu domain_state. E.g.: Now if
> > the system level pd is getting called after cluster PD, it will set the
> > domain state to system level PD state. It won't work with original
> > format and it may work with extended format if it's carefully crafted.
> > In short, the point is just over-writing domain_state is asking for
> > troubles IMO.
>
> Thanks for spotting this!
>
> While walking upwards in the PM domain topology, I thought I was ORing
> the domain states, but clearly the code isn't doing that.
>
> In principle we need to do the below instead.
>
> pd_state = state->data;
> composite_pd_state = *pd_state | psci_get_domain_state();
> psci_set_domain_state(composite_pd_state);
>
Yes 2 different issues here:
1. The direct assignment overwriting the value is problem which you agree.
2. The OR logic on it's own is bit not so clear from the specification.
Since firmware authors need to be aware of this to make all these
work. So it's not implicit, either we set this requirement in form
of binding. We were also thinking of stating composite state in the
DT, still just a thought, need to come up with examples/illustrations.
--
Regards,
Sudeep
On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> <[email protected]> wrote:
> >
> > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > > CPU to be power managed through its PM domain, via deploying runtime PM
> > > support.
> > >
> > > To know for which idle states runtime PM reference counting is needed,
> > > let's store the index of deepest idle state for the CPU, in a per CPU
> > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > > the requested idle state index and then act accordingly.
> >
> > I do not see why a system with two CPU CPUidle states, say CPU retention
> > and CPU shutdown, should not be calling runtime PM on CPU retention
> > entry.
>
> If the CPU idle governor did select the CPU retention for the CPU, it
> was probably because the target residency for the CPU shutdown state
> could not be met.
The kernel does not know what those cpu states represent, so, this is an
assumption you are making and it must be made clear that this code works
as long as your assumption is valid.
If eg a "cluster" retention state has lower target_residency than
the deepest CPU idle state this assumption is wrong.
And CPUidle and genPD governor decisions are not synced anyway so,
again, this is an assumption, not a certainty.
> In this case, there is no point in allowing any other deeper idle
> states for cluster/package/system, since those have even greater
> residencies, hence calling runtime PM doesn't make sense.
On the systems you are testing on.
Lorenzo
> > The question then is what cluster/package/system states
> > are allowed for a given CPU idle state, to understand
> > what idle states can be actually entered at any hierarchy
> > level given the choice made for the CPU idle state.
> >
> > In the case above, a CPU entering retention state should prevent
> > runtime PM selecting a cluster shutdown state; most likely firmware
> > would demote the request to cluster retention but still, we should
> > find a way to describe these dependencies.
>
> See above.
>
> [...]
>
> Kind regards
> Uffe
On Thu, Jul 18, 2019 at 01:43:44PM +0200, Ulf Hansson wrote:
[...]
> > > Anyway, as a suggestion to address your concern, how about this:
> > >
> > > 1. Move some things out to a PSCI cpuidle driver. We need to decide
> > > more exactly on what to move and find the right level for the
> > > interfaces.
> >
> > I will do it and post patches asap.
>
> Okay, so I will wait for you to converting the cpuidle-arm driver into
> a cpuidle-psci driver (and all the changes that comes with it) and
> then base my re-base my series on top.
>
> Then, would you mind sharing (even in an early phase) a
> branch/git-tree so I can start re-basing my series on top?
Sure, I should be able to post at -rc1 and will publish a branch
here [1].
> > > 2. Don't attach the CPU to the PM domain topology in case the PSCI PC
> > > mode is used. I think this makes it easier, at least as a first step,
> > > to understand when runtime PM needs to be used/enabled.
> >
> > In the PSCI CPUidle driver we can have two distinct struct
> > cpuidle_state->enter functions for PC and OSI, no overhead
> > for PC, runtime PM for OSI, decoupling done.
>
> Good idea!
>
> >
> > We can choose one or the other depending on whether:
> >
> > OSI iff:
> > - OSI is available
> > - hierarchical idle states are present in DT
> >
> > otherwise PC.
> >
> > That's what this patch does but we will do it in a unified file.
>
> Sure, it makes sense.
>
> >
> > > 3. Would it help if I volunteer to help you guys as a maintainer for
> > > PSCI. At least for the part of the new code that becomes introduced?
> >
> > We will do as described above if that makes sense.
>
> Yep, I am okay with your suggestions, assuming I have understood them correctly.
>
> BTW, have you considered to host a git tree for PSCI so we can have
> changes pre-integrated and tested in Stephen Rothwell's linux-next
> tree?
I will ask Stephen to pull when needed a branch in the tree below[1]
[1] https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git/
On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> > <[email protected]> wrote:
> > >
> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> > > > support.
> > > >
> > > > To know for which idle states runtime PM reference counting is needed,
> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > > > the requested idle state index and then act accordingly.
> > >
> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> > > entry.
> >
> > If the CPU idle governor did select the CPU retention for the CPU, it
> > was probably because the target residency for the CPU shutdown state
> > could not be met.
>
> The kernel does not know what those cpu states represent, so, this is an
> assumption you are making and it must be made clear that this code works
> as long as your assumption is valid.
>
> If eg a "cluster" retention state has lower target_residency than
> the deepest CPU idle state this assumption is wrong.
Good point, you are right. I try to find a place to document this assumption.
>
> And CPUidle and genPD governor decisions are not synced anyway so,
> again, this is an assumption, not a certainty.
>
> > In this case, there is no point in allowing any other deeper idle
> > states for cluster/package/system, since those have even greater
> > residencies, hence calling runtime PM doesn't make sense.
>
> On the systems you are testing on.
So what you are saying typically means, that if all CPUs in the same
cluster have entered the CPU retention state, on some system the
cluster may also put into a cluster retention state (assuming the
target residency is met)?
Do you know of any systems that has these characteristics?
[...]
Kind regards
Uffe
On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
>On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
><[email protected]> wrote:
>>
>> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
>> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
>> > <[email protected]> wrote:
>> > >
>> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
>> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
>> > > > CPU to be power managed through its PM domain, via deploying runtime PM
>> > > > support.
>> > > >
>> > > > To know for which idle states runtime PM reference counting is needed,
>> > > > let's store the index of deepest idle state for the CPU, in a per CPU
>> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
>> > > > the requested idle state index and then act accordingly.
>> > >
>> > > I do not see why a system with two CPU CPUidle states, say CPU retention
>> > > and CPU shutdown, should not be calling runtime PM on CPU retention
>> > > entry.
>> >
>> > If the CPU idle governor did select the CPU retention for the CPU, it
>> > was probably because the target residency for the CPU shutdown state
>> > could not be met.
>>
>> The kernel does not know what those cpu states represent, so, this is an
>> assumption you are making and it must be made clear that this code works
>> as long as your assumption is valid.
>>
>> If eg a "cluster" retention state has lower target_residency than
>> the deepest CPU idle state this assumption is wrong.
>
>Good point, you are right. I try to find a place to document this assumption.
>
>>
>> And CPUidle and genPD governor decisions are not synced anyway so,
>> again, this is an assumption, not a certainty.
>>
>> > In this case, there is no point in allowing any other deeper idle
>> > states for cluster/package/system, since those have even greater
>> > residencies, hence calling runtime PM doesn't make sense.
>>
>> On the systems you are testing on.
>
>So what you are saying typically means, that if all CPUs in the same
>cluster have entered the CPU retention state, on some system the
>cluster may also put into a cluster retention state (assuming the
>target residency is met)?
>
>Do you know of any systems that has these characteristics?
>
Many QCOM SoCs can do that. But with the hardware improving, the
power-performance benefits skew the results in favor of powering off
the cluster than keeping the CPU and cluster in retention.
Kevin H and I thought of this problem earlier on. But that is a second
level problem to solve and definitely to be thought of after we have the
support for the deepest states in the kernel. We left that out for a
later date. The idea would have been to setup the allowable state(s) in
the DT for CPU and cluster state definitions and have the genpd take
that into consideration when deciding the idle state for the domain.
Thanks,
Lina
On Thu, Jul 18 2019 at 07:19 -0600, Sudeep Holla wrote:
>On Thu, Jul 18, 2019 at 01:04:03PM +0200, Ulf Hansson wrote:
>> On Tue, 16 Jul 2019 at 17:05, Sudeep Holla <[email protected]> wrote:
>> >
>> > On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
>> > > When the hierarchical CPU topology layout is used in DT, we need to setup
>> > > the corresponding PM domain data structures, as to allow a CPU and a group
>> > > of CPUs to be power managed accordingly. Let's enable this by deploying
>> > > support through the genpd interface.
>> > >
>> > > Additionally, when the OS initiated mode is supported by the PSCI FW, let's
>> > > also parse the domain idle states DT bindings as to make genpd responsible
>> > > for the state selection, when the states are compatible with
>> > > "domain-idle-state". Otherwise, when only Platform Coordinated mode is
>> > > supported, we rely solely on the state selection to be managed through the
>> > > regular cpuidle framework.
>> > >
>> > > If the initialization of the PM domain data structures succeeds and the OS
>> > > initiated mode is supported, we try to switch to it. In case it fails,
>> > > let's fall back into a degraded mode, rather than bailing out and returning
>> > > an error code.
>> > >
>> > > Due to that the OS initiated mode may become enabled, we need to adjust to
>> > > maintain backwards compatibility for a kernel started through a kexec call.
>> > > Do this by explicitly switch to Platform Coordinated mode during boot.
>> > >
>> > > Finally, the actual initialization of the PM domain data structures, is
>> > > done via calling the new shared function, psci_dt_init_pm_domains().
>> > > However, this is implemented by subsequent changes.
>> > >
>> > > Co-developed-by: Lina Iyer <[email protected]>
>> > > Signed-off-by: Lina Iyer <[email protected]>
>> > > Signed-off-by: Ulf Hansson <[email protected]>
>> > > ---
>> > >
>> > > Changes:
>> > > - Simplify code setting domain_state at power off.
>> > > - Use the genpd ->free_state() callback to manage freeing of states.
>> > > - Fixup a bogus while loop.
>> > >
>> > > ---
>> > > drivers/firmware/psci/Makefile | 2 +-
>> > > drivers/firmware/psci/psci.c | 7 +-
>> > > drivers/firmware/psci/psci.h | 5 +
>> > > drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
>> > > 4 files changed, 280 insertions(+), 2 deletions(-)
>> > > create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>> > >
>> >
>> > [...]
>> >
>> > > #endif /* __PSCI_H */
>> > > diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
>> > > new file mode 100644
>> > > index 000000000000..3c6ca846caf4
>> > > --- /dev/null
>> > > +++ b/drivers/firmware/psci/psci_pm_domain.c
>> > > @@ -0,0 +1,268 @@
>> > > +// SPDX-License-Identifier: GPL-2.0
>> > > +/*
>> > > + * PM domains for CPUs via genpd - managed by PSCI.
>> > > + *
>> > > + * Copyright (C) 2019 Linaro Ltd.
>> > > + * Author: Ulf Hansson <[email protected]>
>> > > + *
>> > > + */
>> > > +
>> >
>> > [...]
>> >
>> > > +static int psci_pd_power_off(struct generic_pm_domain *pd)
>> > > +{
>> > > + struct genpd_power_state *state = &pd->states[pd->state_idx];
>> > > + u32 *pd_state;
>> > > +
>> > > + /* If we have failed to enable OSI mode, then abort power off. */
>> > > + if (psci_has_osi_support() && !osi_mode_enabled)
>> > > + return -EBUSY;
>> > > +
>> > > + if (!state->data)
>> > > + return 0;
>> > > +
>> > > + /* When OSI mode is enabled, set the corresponding domain state. */
>> > > + pd_state = state->data;
>> > > + psci_set_domain_state(*pd_state);
>> >
>> > I trying to understand how would this scale to level 2(cluster of
>> > clusters or for simply system). The current code for psci_set_domain_state
>> > just stores the value @pd_state into per-cpu domain_state. E.g.: Now if
>> > the system level pd is getting called after cluster PD, it will set the
>> > domain state to system level PD state. It won't work with original
>> > format and it may work with extended format if it's carefully crafted.
>> > In short, the point is just over-writing domain_state is asking for
>> > troubles IMO.
>>
>> Thanks for spotting this!
>>
>> While walking upwards in the PM domain topology, I thought I was ORing
>> the domain states, but clearly the code isn't doing that.
>>
>> In principle we need to do the below instead.
>>
>> pd_state = state->data;
>> composite_pd_state = *pd_state | psci_get_domain_state();
>> psci_set_domain_state(composite_pd_state);
>>
>
>Yes 2 different issues here:
>1. The direct assignment overwriting the value is problem which you agree.
>2. The OR logic on it's own is bit not so clear from the specification.
> Since firmware authors need to be aware of this to make all these
> work. So it's not implicit, either we set this requirement in form
> of binding. We were also thinking of stating composite state in the
> DT, still just a thought, need to come up with examples/illustrations.
>
It is generally very obvious to firmware authors to map hardware
definitions to specific bits in the state param. If a cluster component
has more than on/off state, more bits are assigned to the define the
idle states of the component.
Addition is also an option, but there are enough bits compared to the
hardware components that we have in each state, that it hasn't been a
problem.
--Lina
On Thu, 18 Jul 2019 at 19:41, Lina Iyer <[email protected]> wrote:
>
> On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
> >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
> ><[email protected]> wrote:
> >>
> >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> >> > <[email protected]> wrote:
> >> > >
> >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> >> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> >> > > > support.
> >> > > >
> >> > > > To know for which idle states runtime PM reference counting is needed,
> >> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> >> > > > the requested idle state index and then act accordingly.
> >> > >
> >> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> >> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> >> > > entry.
> >> >
> >> > If the CPU idle governor did select the CPU retention for the CPU, it
> >> > was probably because the target residency for the CPU shutdown state
> >> > could not be met.
> >>
> >> The kernel does not know what those cpu states represent, so, this is an
> >> assumption you are making and it must be made clear that this code works
> >> as long as your assumption is valid.
> >>
> >> If eg a "cluster" retention state has lower target_residency than
> >> the deepest CPU idle state this assumption is wrong.
> >
> >Good point, you are right. I try to find a place to document this assumption.
> >
> >>
> >> And CPUidle and genPD governor decisions are not synced anyway so,
> >> again, this is an assumption, not a certainty.
> >>
> >> > In this case, there is no point in allowing any other deeper idle
> >> > states for cluster/package/system, since those have even greater
> >> > residencies, hence calling runtime PM doesn't make sense.
> >>
> >> On the systems you are testing on.
> >
> >So what you are saying typically means, that if all CPUs in the same
> >cluster have entered the CPU retention state, on some system the
> >cluster may also put into a cluster retention state (assuming the
> >target residency is met)?
> >
> >Do you know of any systems that has these characteristics?
> >
> Many QCOM SoCs can do that. But with the hardware improving, the
> power-performance benefits skew the results in favor of powering off
> the cluster than keeping the CPU and cluster in retention.
>
> Kevin H and I thought of this problem earlier on. But that is a second
> level problem to solve and definitely to be thought of after we have the
> support for the deepest states in the kernel. We left that out for a
> later date. The idea would have been to setup the allowable state(s) in
> the DT for CPU and cluster state definitions and have the genpd take
> that into consideration when deciding the idle state for the domain.
Thanks for confirming.
This more or less means we need to improve the hierarchical support in
genpd to support more levels, such that it makes sense to have a genpd
governor assigned at more than one level. This doesn't work well
today. As I also have stated, this is on my todo list for genpd.
However, I also agree with your standpoint, that let's start simple to
enable the deepest state as a start with, then we can improve things
on top.
Kind regards
Uffe
On Thu, Jul 18, 2019 at 11:57:46AM -0600, Lina Iyer wrote:
> On Thu, Jul 18 2019 at 07:19 -0600, Sudeep Holla wrote:
[...]
> >
> > Yes 2 different issues here:
> > 1. The direct assignment overwriting the value is problem which you agree.
> > 2. The OR logic on it's own is bit not so clear from the specification.
> > Since firmware authors need to be aware of this to make all these
> > work. So it's not implicit, either we set this requirement in form
> > of binding. We were also thinking of stating composite state in the
> > DT, still just a thought, need to come up with examples/illustrations.
> >
> It is generally very obvious to firmware authors to map hardware
> definitions to specific bits in the state param. If a cluster component
> has more than on/off state, more bits are assigned to the define the
> idle states of the component.
While I completely agree that generally the firmware authors tend to
assign a bit(s) to indicate a state, it not so evident from the PSCI
specification. My worry is someone shouldn't come up with sequential
numbering and expect this to work.
I am fine with OR, but we need to document it somewhere so that we
can point people so that the driver in the kernel is not expected
to work with any other schema of numbering StateID that violates
the assumption.
> Addition is also an option, but there are enough bits compared to the
> hardware components that we have in each state, that it hasn't been a
> problem.
>
Yes, but with extended format, the StateType move to bit 30, and if
we represent each individual state to indicate that bit correctly(I
except everyone to so that and it shouldn't cause issue if we OR the
parameters to get composite states), addition may cause issues.
--
Regards,
Sudeep
On Thu, Jul 18, 2019 at 11:49:11PM +0200, Ulf Hansson wrote:
> On Thu, 18 Jul 2019 at 19:41, Lina Iyer <[email protected]> wrote:
> >
> > On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
> > >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
> > ><[email protected]> wrote:
> > >>
> > >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> > >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> > >> > <[email protected]> wrote:
> > >> > >
> > >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > >> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> > >> > > > support.
> > >> > > >
> > >> > > > To know for which idle states runtime PM reference counting is needed,
> > >> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> > >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > >> > > > the requested idle state index and then act accordingly.
> > >> > >
> > >> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> > >> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> > >> > > entry.
> > >> >
> > >> > If the CPU idle governor did select the CPU retention for the CPU, it
> > >> > was probably because the target residency for the CPU shutdown state
> > >> > could not be met.
> > >>
> > >> The kernel does not know what those cpu states represent, so, this is an
> > >> assumption you are making and it must be made clear that this code works
> > >> as long as your assumption is valid.
> > >>
> > >> If eg a "cluster" retention state has lower target_residency than
> > >> the deepest CPU idle state this assumption is wrong.
> > >
> > >Good point, you are right. I try to find a place to document this assumption.
> > >
> > >>
> > >> And CPUidle and genPD governor decisions are not synced anyway so,
> > >> again, this is an assumption, not a certainty.
> > >>
> > >> > In this case, there is no point in allowing any other deeper idle
> > >> > states for cluster/package/system, since those have even greater
> > >> > residencies, hence calling runtime PM doesn't make sense.
> > >>
> > >> On the systems you are testing on.
> > >
> > >So what you are saying typically means, that if all CPUs in the same
> > >cluster have entered the CPU retention state, on some system the
> > >cluster may also put into a cluster retention state (assuming the
> > >target residency is met)?
> > >
> > >Do you know of any systems that has these characteristics?
> > >
> > Many QCOM SoCs can do that. But with the hardware improving, the
> > power-performance benefits skew the results in favor of powering off
> > the cluster than keeping the CPU and cluster in retention.
> >
> > Kevin H and I thought of this problem earlier on. But that is a second
> > level problem to solve and definitely to be thought of after we have the
> > support for the deepest states in the kernel. We left that out for a
> > later date. The idea would have been to setup the allowable state(s) in
> > the DT for CPU and cluster state definitions and have the genpd take
> > that into consideration when deciding the idle state for the domain.
>
> Thanks for confirming.
>
> This more or less means we need to improve the hierarchical support in
> genpd to support more levels, such that it makes sense to have a genpd
> governor assigned at more than one level. This doesn't work well
> today. As I also have stated, this is on my todo list for genpd.
>
> However, I also agree with your standpoint, that let's start simple to
> enable the deepest state as a start with, then we can improve things
> on top.
How to solve this in the kernel I don't know but please do make sure
that the DT bindings allow you to describe what's needed, once they are
merged you won't be able to change them and I won't bodge the code to
make things fit, so if anything let's focus on getting them right as a
matter of priority to get this done please.
Thanks,
Lorenzo
On Mon, May 13, 2019 at 09:22:43PM +0200, Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> Update DT bindings to represent hierarchical CPU and CPU PM domain idle
> states for PSCI. Also update the PSCI examples to clearly show how
> flattened and hierarchical idle states can be represented in DT.
>
> Signed-off-by: Lina Iyer <[email protected]>
> Reviewed-by: Rob Herring <[email protected]>
> Reviewed-by: Sudeep Holla <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - None.
>
> ---
> .../devicetree/bindings/arm/psci.txt | 166 ++++++++++++++++++
> 1 file changed, 166 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/psci.txt b/Documentation/devicetree/bindings/arm/psci.txt
> index a2c4f1d52492..e6d3553c8df8 100644
> --- a/Documentation/devicetree/bindings/arm/psci.txt
> +++ b/Documentation/devicetree/bindings/arm/psci.txt
> @@ -105,7 +105,173 @@ Case 3: PSCI v0.2 and PSCI v0.1.
> ...
> };
>
> +ARM systems can have multiple cores sometimes in hierarchical arrangement.
> +This often, but not always, maps directly to the processor power topology of
> +the system. Individual nodes in a topology have their own specific power states
> +and can be better represented in DT hierarchically.
> +
> +For these cases, the definitions of the idle states for the CPUs and the CPU
> +topology, must conform to the domain idle state specification [3]. The domain
> +idle states themselves, must be compatible with the defined 'domain-idle-state'
> +binding [1], and also need to specify the arm,psci-suspend-param property for
> +each idle state.
> +
> +DT allows representing CPUs and CPU idle states in two different ways -
> +
> +The flattened model as given in Example 1, lists CPU's idle states followed by
> +the domain idle state that the CPUs may choose. Note that the idle states are
> +all compatible with "arm,idle-state". Additionally, for the domain idle state
> +the "arm,psci-suspend-param" represents a superset of the CPU's idle state.
> +
> +Example 2 represents the hierarchical model of CPUs and domain idle states.
> +CPUs define their domain provider in their psci DT node. The domain controls
> +the power to the CPU and possibly other h/w blocks that would enter an idle
> +state along with the CPU. The CPU's idle states may therefore be considered as
> +the domain's idle states and have the compatible "arm,idle-state". Such domains
> +may also be embedded within another domain that may represent common h/w blocks
> +between these CPUs. The idle states of the CPU topology shall be represented as
> +the domain's idle states. Note that for the domain idle state, the
> +"arm,psci-suspend-param" represents idle states hierarchically.
> +
> +In PSCI firmware v1.0, the OS-Initiated mode is introduced. However, the
> +flattened vs hierarchical DT representation is orthogonal to the OS-Initiated
> +vs the platform-coordinated PSCI CPU suspend modes, thus should be considered
> +independent of each other.
> +
> +The hierarchical representation helps and makes it easy to implement OSI mode
> +and OS implementations may choose to mandate it. For the default platform-
> +coordinated mode, both representations are viable options.
> +
> +Example 1: Flattened representation of CPU and domain idle states
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + CPU0: cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a53", "arm,armv8";
> + reg = <0x0>;
> + enable-method = "psci";
> + cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
> + <&CLUSTER_PWRDN>;
> + };
> +
> + CPU1: cpu@1 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a57", "arm,armv8";
> + reg = <0x100>;
> + enable-method = "psci";
> + cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
> + <&CLUSTER_PWRDN>;
> + };
> +
> + idle-states {
> + CPU_PWRDN: cpu-power-down {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x0000001>;
This value is wrong, StateType must be 1 for CPU power down states.
> + entry-latency-us = <10>;
> + exit-latency-us = <10>;
> + min-residency-us = <100>;
> + };
> +
> + CLUSTER_RET: cluster-retention {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x1000011>;
It must be made crystal clear that this is the *full* power_state
that is passed to the CPU_SUSPEND call. It is already specified
in the bindings.
As Sudeep pointed out already, OR'ing the power_state parameters values
across power domains is wrong, in that there is nothing in the PSCI
specifications that enforces a power_state parameter scheme whereby
different "levels" are assigned different bitfields, in particular with
the extended format a power_state parameter for eg "system" level is not
necessarily OR'ed value of cluster|cpu|system values.
So, to sum it up, arm,psci-suspend-param must be the full power_state
parameter to be passed to CPU_SUSPEND and must be specified in full for
every CPU and power domain idle state.
Thanks,
Lorenzo
> + entry-latency-us = <500>;
> + exit-latency-us = <500>;
> + min-residency-us = <2000>;
> + };
> +
> + CLUSTER_PWRDN: cluster-power-down {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x1000031>;
> + entry-latency-us = <2000>;
> + exit-latency-us = <2000>;
> + min-residency-us = <6000>;
> + };
> + };
> +
> + psci {
> + compatible = "arm,psci-0.2";
> + method = "smc";
> + };
> +
> +Example 2: Hierarchical representation of CPU and domain idle states
> +
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + CPU0: cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a53", "arm,armv8";
> + reg = <0x0>;
> + enable-method = "psci";
> + power-domains = <&CPU_PD0>;
> + power-domain-names = "psci";
> + };
> +
> + CPU1: cpu@1 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a57", "arm,armv8";
> + reg = <0x100>;
> + enable-method = "psci";
> + power-domains = <&CPU_PD1>;
> + power-domain-names = "psci";
> + };
> +
> + idle-states {
> + CPU_PWRDN: cpu-power-down {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x0000001>;
> + entry-latency-us = <10>;
> + exit-latency-us = <10>;
> + min-residency-us = <100>;
> + };
> +
> + CLUSTER_RET: cluster-retention {
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000010>;
> + entry-latency-us = <500>;
> + exit-latency-us = <500>;
> + min-residency-us = <2000>;
> + };
> +
> + CLUSTER_PWRDN: cluster-power-down {
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000030>;
> + entry-latency-us = <2000>;
> + exit-latency-us = <2000>;
> + min-residency-us = <6000>;
> + };
> + };
> + };
> +
> + psci {
> + compatible = "arm,psci-1.0";
> + method = "smc";
> +
> + CPU_PD0: cpu-pd0 {
> + #power-domain-cells = <0>;
> + domain-idle-states = <&CPU_PWRDN>;
> + power-domains = <&CLUSTER_PD>;
> + };
> +
> + CPU_PD1: cpu-pd1 {
> + #power-domain-cells = <0>;
> + domain-idle-states = <&CPU_PWRDN>;
> + power-domains = <&CLUSTER_PD>;
> + };
> +
> + CLUSTER_PD: cluster-pd {
> + #power-domain-cells = <0>;
> + domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>;
> + };
> + };
> +
> [1] Kernel documentation - ARM idle states bindings
> Documentation/devicetree/bindings/arm/idle-states.txt
> [2] Power State Coordination Interface (PSCI) specification
> http://infocenter.arm.com/help/topic/com.arm.doc.den0022c/DEN0022C_Power_State_Coordination_Interface.pdf
> +[3]. PM Domains description
> + Documentation/devicetree/bindings/power/power_domain.txt
> --
> 2.17.1
>