Subject: Re: [PATCH 14/18] drivers: firmware: psci: Manage runtime PM in the idle path for CPUs

On Thu, Jul 18, 2019 at 11:49:11PM +0200, Ulf Hansson wrote:
> On Thu, 18 Jul 2019 at 19:41, Lina Iyer <[email protected]> wrote:
> >
> > On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
> > >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
> > ><[email protected]> wrote:
> > >>
> > >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> > >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> > >> > <[email protected]> wrote:
> > >> > >
> > >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > >> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> > >> > > > support.
> > >> > > >
> > >> > > > To know for which idle states runtime PM reference counting is needed,
> > >> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> > >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > >> > > > the requested idle state index and then act accordingly.
> > >> > >
> > >> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> > >> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> > >> > > entry.
> > >> >
> > >> > If the CPU idle governor did select the CPU retention for the CPU, it
> > >> > was probably because the target residency for the CPU shutdown state
> > >> > could not be met.
> > >>
> > >> The kernel does not know what those cpu states represent, so, this is an
> > >> assumption you are making and it must be made clear that this code works
> > >> as long as your assumption is valid.
> > >>
> > >> If eg a "cluster" retention state has lower target_residency than
> > >> the deepest CPU idle state this assumption is wrong.
> > >
> > >Good point, you are right. I try to find a place to document this assumption.
> > >
> > >>
> > >> And CPUidle and genPD governor decisions are not synced anyway so,
> > >> again, this is an assumption, not a certainty.
> > >>
> > >> > In this case, there is no point in allowing any other deeper idle
> > >> > states for cluster/package/system, since those have even greater
> > >> > residencies, hence calling runtime PM doesn't make sense.
> > >>
> > >> On the systems you are testing on.
> > >
> > >So what you are saying typically means, that if all CPUs in the same
> > >cluster have entered the CPU retention state, on some system the
> > >cluster may also put into a cluster retention state (assuming the
> > >target residency is met)?
> > >
> > >Do you know of any systems that has these characteristics?
> > >
> > Many QCOM SoCs can do that. But with the hardware improving, the
> > power-performance benefits skew the results in favor of powering off
> > the cluster than keeping the CPU and cluster in retention.
> >
> > Kevin H and I thought of this problem earlier on. But that is a second
> > level problem to solve and definitely to be thought of after we have the
> > support for the deepest states in the kernel. We left that out for a
> > later date. The idea would have been to setup the allowable state(s) in
> > the DT for CPU and cluster state definitions and have the genpd take
> > that into consideration when deciding the idle state for the domain.
>
> Thanks for confirming.
>
> This more or less means we need to improve the hierarchical support in
> genpd to support more levels, such that it makes sense to have a genpd
> governor assigned at more than one level. This doesn't work well
> today. As I also have stated, this is on my todo list for genpd.
>
> However, I also agree with your standpoint, that let's start simple to
> enable the deepest state as a start with, then we can improve things
> on top.

How to solve this in the kernel I don't know but please do make sure
that the DT bindings allow you to describe what's needed, once they are
merged you won't be able to change them and I won't bodge the code to
make things fit, so if anything let's focus on getting them right as a
matter of priority to get this done please.

Thanks,
Lorenzo

2019-07-19 12:15:31

by Lorenzo Pieralisi

[permalink] [raw]

Subject: Re: [PATCH 01/18] dt: psci: Update DT bindings to support hierarchical PSCI states

On Mon, May 13, 2019 at 09:22:43PM +0200, Ulf Hansson wrote:
> From: Lina Iyer <[email protected]>
>
> Update DT bindings to represent hierarchical CPU and CPU PM domain idle
> states for PSCI. Also update the PSCI examples to clearly show how
> flattened and hierarchical idle states can be represented in DT.
>
> Signed-off-by: Lina Iyer <[email protected]>
> Reviewed-by: Rob Herring <[email protected]>
> Reviewed-by: Sudeep Holla <[email protected]>
> Co-developed-by: Ulf Hansson <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes:
> - None.
>
> ---
> .../devicetree/bindings/arm/psci.txt | 166 ++++++++++++++++++
> 1 file changed, 166 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/psci.txt b/Documentation/devicetree/bindings/arm/psci.txt
> index a2c4f1d52492..e6d3553c8df8 100644
> --- a/Documentation/devicetree/bindings/arm/psci.txt
> +++ b/Documentation/devicetree/bindings/arm/psci.txt
> @@ -105,7 +105,173 @@ Case 3: PSCI v0.2 and PSCI v0.1.
> ...
> };
>
> +ARM systems can have multiple cores sometimes in hierarchical arrangement.
> +This often, but not always, maps directly to the processor power topology of
> +the system. Individual nodes in a topology have their own specific power states
> +and can be better represented in DT hierarchically.
> +
> +For these cases, the definitions of the idle states for the CPUs and the CPU
> +topology, must conform to the domain idle state specification [3]. The domain
> +idle states themselves, must be compatible with the defined 'domain-idle-state'
> +binding [1], and also need to specify the arm,psci-suspend-param property for
> +each idle state.
> +
> +DT allows representing CPUs and CPU idle states in two different ways -
> +
> +The flattened model as given in Example 1, lists CPU's idle states followed by
> +the domain idle state that the CPUs may choose. Note that the idle states are
> +all compatible with "arm,idle-state". Additionally, for the domain idle state
> +the "arm,psci-suspend-param" represents a superset of the CPU's idle state.
> +
> +Example 2 represents the hierarchical model of CPUs and domain idle states.
> +CPUs define their domain provider in their psci DT node. The domain controls
> +the power to the CPU and possibly other h/w blocks that would enter an idle
> +state along with the CPU. The CPU's idle states may therefore be considered as
> +the domain's idle states and have the compatible "arm,idle-state". Such domains
> +may also be embedded within another domain that may represent common h/w blocks
> +between these CPUs. The idle states of the CPU topology shall be represented as
> +the domain's idle states. Note that for the domain idle state, the
> +"arm,psci-suspend-param" represents idle states hierarchically.
> +
> +In PSCI firmware v1.0, the OS-Initiated mode is introduced. However, the
> +flattened vs hierarchical DT representation is orthogonal to the OS-Initiated
> +vs the platform-coordinated PSCI CPU suspend modes, thus should be considered
> +independent of each other.
> +
> +The hierarchical representation helps and makes it easy to implement OSI mode
> +and OS implementations may choose to mandate it. For the default platform-
> +coordinated mode, both representations are viable options.
> +
> +Example 1: Flattened representation of CPU and domain idle states
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + CPU0: cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a53", "arm,armv8";
> + reg = <0x0>;
> + enable-method = "psci";
> + cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
> + <&CLUSTER_PWRDN>;
> + };
> +
> + CPU1: cpu@1 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a57", "arm,armv8";
> + reg = <0x100>;
> + enable-method = "psci";
> + cpu-idle-states = <&CPU_PWRDN>, <&CLUSTER_RET>,
> + <&CLUSTER_PWRDN>;
> + };
> +
> + idle-states {
> + CPU_PWRDN: cpu-power-down {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x0000001>;

This value is wrong, StateType must be 1 for CPU power down states.

> + entry-latency-us = <10>;
> + exit-latency-us = <10>;
> + min-residency-us = <100>;
> + };
> +
> + CLUSTER_RET: cluster-retention {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x1000011>;

It must be made crystal clear that this is the *full* power_state
that is passed to the CPU_SUSPEND call. It is already specified
in the bindings.

As Sudeep pointed out already, OR'ing the power_state parameters values
across power domains is wrong, in that there is nothing in the PSCI
specifications that enforces a power_state parameter scheme whereby
different "levels" are assigned different bitfields, in particular with
the extended format a power_state parameter for eg "system" level is not
necessarily OR'ed value of cluster|cpu|system values.

So, to sum it up, arm,psci-suspend-param must be the full power_state
parameter to be passed to CPU_SUSPEND and must be specified in full for
every CPU and power domain idle state.

Thanks,
Lorenzo

> + entry-latency-us = <500>;
> + exit-latency-us = <500>;
> + min-residency-us = <2000>;
> + };
> +
> + CLUSTER_PWRDN: cluster-power-down {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x1000031>;
> + entry-latency-us = <2000>;
> + exit-latency-us = <2000>;
> + min-residency-us = <6000>;
> + };
> + };
> +
> + psci {
> + compatible = "arm,psci-0.2";
> + method = "smc";
> + };
> +
> +Example 2: Hierarchical representation of CPU and domain idle states
> +
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + CPU0: cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a53", "arm,armv8";
> + reg = <0x0>;
> + enable-method = "psci";
> + power-domains = <&CPU_PD0>;
> + power-domain-names = "psci";
> + };
> +
> + CPU1: cpu@1 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a57", "arm,armv8";
> + reg = <0x100>;
> + enable-method = "psci";
> + power-domains = <&CPU_PD1>;
> + power-domain-names = "psci";
> + };
> +
> + idle-states {
> + CPU_PWRDN: cpu-power-down {
> + compatible = "arm,idle-state";
> + arm,psci-suspend-param = <0x0000001>;
> + entry-latency-us = <10>;
> + exit-latency-us = <10>;
> + min-residency-us = <100>;
> + };
> +
> + CLUSTER_RET: cluster-retention {
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000010>;
> + entry-latency-us = <500>;
> + exit-latency-us = <500>;
> + min-residency-us = <2000>;
> + };
> +
> + CLUSTER_PWRDN: cluster-power-down {
> + compatible = "domain-idle-state";
> + arm,psci-suspend-param = <0x1000030>;
> + entry-latency-us = <2000>;
> + exit-latency-us = <2000>;
> + min-residency-us = <6000>;
> + };
> + };
> + };
> +
> + psci {
> + compatible = "arm,psci-1.0";
> + method = "smc";
> +
> + CPU_PD0: cpu-pd0 {
> + #power-domain-cells = <0>;
> + domain-idle-states = <&CPU_PWRDN>;
> + power-domains = <&CLUSTER_PD>;
> + };
> +
> + CPU_PD1: cpu-pd1 {
> + #power-domain-cells = <0>;
> + domain-idle-states = <&CPU_PWRDN>;
> + power-domains = <&CLUSTER_PD>;
> + };
> +
> + CLUSTER_PD: cluster-pd {
> + #power-domain-cells = <0>;
> + domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>;
> + };
> + };
> +
> [1] Kernel documentation - ARM idle states bindings
> Documentation/devicetree/bindings/arm/idle-states.txt
> [2] Power State Coordination Interface (PSCI) specification
> http://infocenter.arm.com/help/topic/com.arm.doc.den0022c/DEN0022C_Power_State_Coordination_Interface.pdf
> +[3]. PM Domains description
> + Documentation/devicetree/bindings/power/power_domain.txt
> --
> 2.17.1
>