2022-02-16 13:43:33

by Shawn Guo

[permalink] [raw]
Subject: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

Make a call to cpu_cluster_pm_enter() on the last CPU going to low power
state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
platforms can be notified to set up hardware for getting into the cluster
low power state.

Signed-off-by: Shawn Guo <[email protected]>
---
drivers/cpuidle/cpuidle-psci.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index b51b5df08450..c748c1a7d7b1 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -37,6 +37,7 @@ struct psci_cpuidle_data {
static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
static DEFINE_PER_CPU(u32, domain_state);
static bool psci_cpuidle_use_cpuhp;
+static atomic_t cpus_in_idle;

void psci_set_domain_state(u32 state)
{
@@ -67,6 +68,14 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
if (ret)
return -1;

+ if (atomic_inc_return(&cpus_in_idle) == num_online_cpus()) {
+ ret = cpu_cluster_pm_enter();
+ if (ret) {
+ ret = -1;
+ goto dec_atomic;
+ }
+ }
+
/* Do runtime PM to manage a hierarchical CPU toplogy. */
rcu_irq_enter_irqson();
if (s2idle)
@@ -88,6 +97,10 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
pm_runtime_get_sync(pd_dev);
rcu_irq_exit_irqson();

+ if (atomic_read(&cpus_in_idle) == num_online_cpus())
+ cpu_cluster_pm_exit();
+dec_atomic:
+ atomic_dec(&cpus_in_idle);
cpu_pm_exit();

/* Clear the domain state to start fresh when back from idle. */
--
2.17.1


2022-02-16 14:58:22

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

+Ulf (as you he is the author of cpuidle-psci-domains.c and can help you
with that if you require)

On Wed, Feb 16, 2022 at 09:28:28PM +0800, Shawn Guo wrote:
> Make a call to cpu_cluster_pm_enter() on the last CPU going to low power
> state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> platforms can be notified to set up hardware for getting into the cluster
> low power state.
>

NACK. We are not getting the notion of CPU cluster back to cpuidle again.
That must die. Remember the cluster doesn't map to idle states especially
in the DSU systems where HMP CPUs are in the same cluster but can be in
different power domains.

You need to decide which PSCI CPU_SUSPEND mode you want to use first. If it is
Platform Co-ordinated(PC), then you need not notify anything to the platform.
Just request the desired idle state on each CPU and platform will take care
from there.

If for whatever reason you have chosen OS initiated mode(OSI), then specify
the PSCI power domains correctly in the DT which will make use of the
cpuidle-psci-domains and handle the so called "cluster" state correctly.

--
Regards,
Sudeep

2022-02-16 15:44:35

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On 2022-02-16 13:28, Shawn Guo wrote:
> Make a call to cpu_cluster_pm_enter() on the last CPU going to low
> power
> state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> platforms can be notified to set up hardware for getting into the
> cluster
> low power state.
>
> Signed-off-by: Shawn Guo <[email protected]>
> ---
> drivers/cpuidle/cpuidle-psci.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/cpuidle/cpuidle-psci.c
> b/drivers/cpuidle/cpuidle-psci.c
> index b51b5df08450..c748c1a7d7b1 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -37,6 +37,7 @@ struct psci_cpuidle_data {
> static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data,
> psci_cpuidle_data);
> static DEFINE_PER_CPU(u32, domain_state);
> static bool psci_cpuidle_use_cpuhp;
> +static atomic_t cpus_in_idle;
>
> void psci_set_domain_state(u32 state)
> {
> @@ -67,6 +68,14 @@ static int __psci_enter_domain_idle_state(struct
> cpuidle_device *dev,
> if (ret)
> return -1;
>
> + if (atomic_inc_return(&cpus_in_idle) == num_online_cpus()) {
> + ret = cpu_cluster_pm_enter();
> + if (ret) {
> + ret = -1;
> + goto dec_atomic;
> + }
> + }
> +
> /* Do runtime PM to manage a hierarchical CPU toplogy. */
> rcu_irq_enter_irqson();
> if (s2idle)
> @@ -88,6 +97,10 @@ static int __psci_enter_domain_idle_state(struct
> cpuidle_device *dev,
> pm_runtime_get_sync(pd_dev);
> rcu_irq_exit_irqson();
>
> + if (atomic_read(&cpus_in_idle) == num_online_cpus())
> + cpu_cluster_pm_exit();
> +dec_atomic:
> + atomic_dec(&cpus_in_idle);
> cpu_pm_exit();
>
> /* Clear the domain state to start fresh when back from idle. */

Is it just me, or does anyone else find it a bit odd that a cpuidle
driver
calls back into the core cpuidle code to generate new events?

Also, why is this PSCI specific? I would assume that the core cpuidle
code
should be responsible for these transitions, not a random cpuidle
driver.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2022-02-16 16:25:22

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On 2022-02-16 14:49, Sudeep Holla wrote:
> +Ulf (as you he is the author of cpuidle-psci-domains.c and can help
> you
> with that if you require)
>
> On Wed, Feb 16, 2022 at 09:28:28PM +0800, Shawn Guo wrote:
>> Make a call to cpu_cluster_pm_enter() on the last CPU going to low
>> power
>> state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
>> platforms can be notified to set up hardware for getting into the
>> cluster
>> low power state.
>>
>
> NACK. We are not getting the notion of CPU cluster back to cpuidle
> again.
> That must die. Remember the cluster doesn't map to idle states
> especially
> in the DSU systems where HMP CPUs are in the same cluster but can be in
> different power domains.
>
> You need to decide which PSCI CPU_SUSPEND mode you want to use first.
> If it is
> Platform Co-ordinated(PC), then you need not notify anything to the
> platform.
> Just request the desired idle state on each CPU and platform will take
> care
> from there.
>
> If for whatever reason you have chosen OS initiated mode(OSI), then
> specify
> the PSCI power domains correctly in the DT which will make use of the
> cpuidle-psci-domains and handle the so called "cluster" state
> correctly.

My understanding is that what Shawn is after is a way to detect the
"last
man standing" on the system to kick off some funky wake-up controller
that
really should be handled by the power controller (because only that guy
knows for sure who is the last CPU on the block).

There was previously some really funky stuff (copy pasted from the
existing
rpmh_rsc_cpu_pm_callback()), which I totally objected to having hidden
in
an irqchip driver.

My ask was that if we needed such information, and assuming that it is
possible to obtain it in a reliable way, this should come from the core
code, and not be invented by random drivers.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2022-02-17 08:00:06

by Shawn Guo

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On Wed, Feb 16, 2022 at 02:39:26PM +0000, Marc Zyngier wrote:
> On 2022-02-16 13:28, Shawn Guo wrote:
> > Make a call to cpu_cluster_pm_enter() on the last CPU going to low power
> > state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> > platforms can be notified to set up hardware for getting into the
> > cluster
> > low power state.
> >
> > Signed-off-by: Shawn Guo <[email protected]>
> > ---
> > drivers/cpuidle/cpuidle-psci.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/cpuidle/cpuidle-psci.c
> > b/drivers/cpuidle/cpuidle-psci.c
> > index b51b5df08450..c748c1a7d7b1 100644
> > --- a/drivers/cpuidle/cpuidle-psci.c
> > +++ b/drivers/cpuidle/cpuidle-psci.c
> > @@ -37,6 +37,7 @@ struct psci_cpuidle_data {
> > static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data,
> > psci_cpuidle_data);
> > static DEFINE_PER_CPU(u32, domain_state);
> > static bool psci_cpuidle_use_cpuhp;
> > +static atomic_t cpus_in_idle;
> >
> > void psci_set_domain_state(u32 state)
> > {
> > @@ -67,6 +68,14 @@ static int __psci_enter_domain_idle_state(struct
> > cpuidle_device *dev,
> > if (ret)
> > return -1;
> >
> > + if (atomic_inc_return(&cpus_in_idle) == num_online_cpus()) {
> > + ret = cpu_cluster_pm_enter();
> > + if (ret) {
> > + ret = -1;
> > + goto dec_atomic;
> > + }
> > + }
> > +
> > /* Do runtime PM to manage a hierarchical CPU toplogy. */
> > rcu_irq_enter_irqson();
> > if (s2idle)
> > @@ -88,6 +97,10 @@ static int __psci_enter_domain_idle_state(struct
> > cpuidle_device *dev,
> > pm_runtime_get_sync(pd_dev);
> > rcu_irq_exit_irqson();
> >
> > + if (atomic_read(&cpus_in_idle) == num_online_cpus())
> > + cpu_cluster_pm_exit();
> > +dec_atomic:
> > + atomic_dec(&cpus_in_idle);
> > cpu_pm_exit();
> >
> > /* Clear the domain state to start fresh when back from idle. */
>
> Is it just me, or does anyone else find it a bit odd that a cpuidle driver
> calls back into the core cpuidle code to generate new events?

It's not uncommon that a platform driver calls some helper functions
provided by core.

> Also, why is this PSCI specific? I would assume that the core cpuidle code
> should be responsible for these transitions, not a random cpuidle driver.

The CPU PM helpers cpu_pm_enter() and cpu_cluster_pm_enter() are provided
by kernel/cpu_pm.c rather than cpuidle core. This PSCI cpuidle driver
already uses cpu_pm_enter(), and my patch is making a call to
cpu_cluster_pm_enter().

Shawn

2022-02-17 13:09:52

by Shawn Guo

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On Wed, Feb 16, 2022 at 03:58:41PM +0000, Marc Zyngier wrote:
> On 2022-02-16 14:49, Sudeep Holla wrote:
> > +Ulf (as you he is the author of cpuidle-psci-domains.c and can help you
> > with that if you require)

Thanks, Sudeep!

> >
> > On Wed, Feb 16, 2022 at 09:28:28PM +0800, Shawn Guo wrote:
> > > Make a call to cpu_cluster_pm_enter() on the last CPU going to low
> > > power
> > > state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> > > platforms can be notified to set up hardware for getting into the
> > > cluster
> > > low power state.
> > >
> >
> > NACK. We are not getting the notion of CPU cluster back to cpuidle
> > again.
> > That must die. Remember the cluster doesn't map to idle states
> > especially
> > in the DSU systems where HMP CPUs are in the same cluster but can be in
> > different power domains.

The 'cluster' in cpu_cluster_pm_enter() doesn't necessarily means
a physical CPU cluster. I think the documentation of the function has a
better description.

* Notifies listeners that all cpus in a power domain are entering a low power
* state that may cause some blocks in the same power domain to reset.

So cpu_domain_pm_enter() might be a better name? Anyways ...

> >
> > You need to decide which PSCI CPU_SUSPEND mode you want to use first. If
> > it is
> > Platform Co-ordinated(PC), then you need not notify anything to the
> > platform.
> > Just request the desired idle state on each CPU and platform will take
> > care
> > from there.
> >
> > If for whatever reason you have chosen OS initiated mode(OSI), then
> > specify
> > the PSCI power domains correctly in the DT which will make use of the
> > cpuidle-psci-domains and handle the so called "cluster" state correctly.

Yes, I'm running a Qualcomm platform that has OSI supported in PSCI.

>
> My understanding is that what Shawn is after is a way to detect the "last
> man standing" on the system to kick off some funky wake-up controller that
> really should be handled by the power controller (because only that guy
> knows for sure who is the last CPU on the block).
>
> There was previously some really funky stuff (copy pasted from the existing
> rpmh_rsc_cpu_pm_callback()), which I totally objected to having hidden in
> an irqchip driver.
>
> My ask was that if we needed such information, and assuming that it is
> possible to obtain it in a reliable way, this should come from the core
> code, and not be invented by random drivers.

Thanks Marc for explain my problem!

Right, all I need is a notification in MPM irqchip driver when the CPU
domain/cluster is about to enter low power state. As cpu_pm -
kernel/cpu_pm.c, already has helper cpu_cluster_pm_enter() sending
CPU_CLUSTER_PM_ENTER event, I just need to find a caller to this cpu_pm
helper.

Is .power_off hook of generic_pm_domain a better place for calling the
helper?

Shawn

----8<------------
diff --git a/drivers/cpuidle/cpuidle-psci-domain.c b/drivers/cpuidle/cpuidle-psci-domain.c
index ff2c3f8e4668..58aad15851f9 100644
--- a/drivers/cpuidle/cpuidle-psci-domain.c
+++ b/drivers/cpuidle/cpuidle-psci-domain.c
@@ -10,6 +10,7 @@
#define pr_fmt(fmt) "CPUidle PSCI: " fmt

#include <linux/cpu.h>
+#include <linux/cpu_pm.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/platform_device.h>
@@ -33,6 +34,7 @@ static int psci_pd_power_off(struct generic_pm_domain *pd)
{
struct genpd_power_state *state = &pd->states[pd->state_idx];
u32 *pd_state;
+ int ret;

if (!state->data)
return 0;
@@ -44,6 +46,16 @@ static int psci_pd_power_off(struct generic_pm_domain *pd)
pd_state = state->data;
psci_set_domain_state(*pd_state);

+ if (list_empty(&pd->child_links)) {
+ /*
+ * The top domain (not being a child of anyone) should be the
+ * best one triggering PM notification.
+ */
+ ret = cpu_cluster_pm_enter();
+ if (ret)
+ return ret;
+ }
+
return 0;
}

2022-02-17 18:39:28

by Shawn Guo

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On Thu, Feb 17, 2022 at 08:52:35AM +0000, Marc Zyngier wrote:
> On Thu, 17 Feb 2022 07:31:32 +0000,
> Shawn Guo <[email protected]> wrote:
> >
> > On Wed, Feb 16, 2022 at 03:58:41PM +0000, Marc Zyngier wrote:
> > > On 2022-02-16 14:49, Sudeep Holla wrote:
> > > > +Ulf (as you he is the author of cpuidle-psci-domains.c and can help you
> > > > with that if you require)
> >
> > Thanks, Sudeep!
> >
> > > >
> > > > On Wed, Feb 16, 2022 at 09:28:28PM +0800, Shawn Guo wrote:
> > > > > Make a call to cpu_cluster_pm_enter() on the last CPU going to low
> > > > > power
> > > > > state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> > > > > platforms can be notified to set up hardware for getting into the
> > > > > cluster
> > > > > low power state.
> > > > >
> > > >
> > > > NACK. We are not getting the notion of CPU cluster back to cpuidle
> > > > again.
> > > > That must die. Remember the cluster doesn't map to idle states
> > > > especially
> > > > in the DSU systems where HMP CPUs are in the same cluster but can be in
> > > > different power domains.
> >
> > The 'cluster' in cpu_cluster_pm_enter() doesn't necessarily means
> > a physical CPU cluster. I think the documentation of the function has a
> > better description.
> >
> > * Notifies listeners that all cpus in a power domain are entering a low power
> > * state that may cause some blocks in the same power domain to reset.
> >
> > So cpu_domain_pm_enter() might be a better name? Anyways ...
> >
> > > >
> > > > You need to decide which PSCI CPU_SUSPEND mode you want to use first. If
> > > > it is
> > > > Platform Co-ordinated(PC), then you need not notify anything to the
> > > > platform.
> > > > Just request the desired idle state on each CPU and platform will take
> > > > care
> > > > from there.
> > > >
> > > > If for whatever reason you have chosen OS initiated mode(OSI), then
> > > > specify
> > > > the PSCI power domains correctly in the DT which will make use of the
> > > > cpuidle-psci-domains and handle the so called "cluster" state correctly.
> >
> > Yes, I'm running a Qualcomm platform that has OSI supported in PSCI.
> >
> > >
> > > My understanding is that what Shawn is after is a way to detect the "last
> > > man standing" on the system to kick off some funky wake-up controller that
> > > really should be handled by the power controller (because only that guy
> > > knows for sure who is the last CPU on the block).
> > >
> > > There was previously some really funky stuff (copy pasted from the existing
> > > rpmh_rsc_cpu_pm_callback()), which I totally objected to having hidden in
> > > an irqchip driver.
> > >
> > > My ask was that if we needed such information, and assuming that it is
> > > possible to obtain it in a reliable way, this should come from the core
> > > code, and not be invented by random drivers.
> >
> > Thanks Marc for explain my problem!
> >
> > Right, all I need is a notification in MPM irqchip driver when the CPU
> > domain/cluster is about to enter low power state. As cpu_pm -
> > kernel/cpu_pm.c, already has helper cpu_cluster_pm_enter() sending
> > CPU_CLUSTER_PM_ENTER event, I just need to find a caller to this cpu_pm
> > helper.
> >
> > Is .power_off hook of generic_pm_domain a better place for calling the
> > helper?
>
> I really don't understand why you want a cluster PM event generated by
> the idle driver. Specially considering that you are not after a
> *cluster* PM event, but after some sort of system-wide event (last man
> standing).

That's primarily because MPM driver is used in the idle context, either
s2idle or cpuidle, and idle driver already has some infrastructure that
could help trigger the event.

> It looks to me that having a predicate that can be called from a PM
> notifier event to find out whether you're the last in line would be
> better suited, and could further be used to remove the crap from the
> rpmh-rsc driver.

I will see if I can come up with a patch for discussion.

Shawn

2022-02-17 23:43:19

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On Thu, 17 Feb 2022 07:31:32 +0000,
Shawn Guo <[email protected]> wrote:
>
> On Wed, Feb 16, 2022 at 03:58:41PM +0000, Marc Zyngier wrote:
> > On 2022-02-16 14:49, Sudeep Holla wrote:
> > > +Ulf (as you he is the author of cpuidle-psci-domains.c and can help you
> > > with that if you require)
>
> Thanks, Sudeep!
>
> > >
> > > On Wed, Feb 16, 2022 at 09:28:28PM +0800, Shawn Guo wrote:
> > > > Make a call to cpu_cluster_pm_enter() on the last CPU going to low
> > > > power
> > > > state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> > > > platforms can be notified to set up hardware for getting into the
> > > > cluster
> > > > low power state.
> > > >
> > >
> > > NACK. We are not getting the notion of CPU cluster back to cpuidle
> > > again.
> > > That must die. Remember the cluster doesn't map to idle states
> > > especially
> > > in the DSU systems where HMP CPUs are in the same cluster but can be in
> > > different power domains.
>
> The 'cluster' in cpu_cluster_pm_enter() doesn't necessarily means
> a physical CPU cluster. I think the documentation of the function has a
> better description.
>
> * Notifies listeners that all cpus in a power domain are entering a low power
> * state that may cause some blocks in the same power domain to reset.
>
> So cpu_domain_pm_enter() might be a better name? Anyways ...
>
> > >
> > > You need to decide which PSCI CPU_SUSPEND mode you want to use first. If
> > > it is
> > > Platform Co-ordinated(PC), then you need not notify anything to the
> > > platform.
> > > Just request the desired idle state on each CPU and platform will take
> > > care
> > > from there.
> > >
> > > If for whatever reason you have chosen OS initiated mode(OSI), then
> > > specify
> > > the PSCI power domains correctly in the DT which will make use of the
> > > cpuidle-psci-domains and handle the so called "cluster" state correctly.
>
> Yes, I'm running a Qualcomm platform that has OSI supported in PSCI.
>
> >
> > My understanding is that what Shawn is after is a way to detect the "last
> > man standing" on the system to kick off some funky wake-up controller that
> > really should be handled by the power controller (because only that guy
> > knows for sure who is the last CPU on the block).
> >
> > There was previously some really funky stuff (copy pasted from the existing
> > rpmh_rsc_cpu_pm_callback()), which I totally objected to having hidden in
> > an irqchip driver.
> >
> > My ask was that if we needed such information, and assuming that it is
> > possible to obtain it in a reliable way, this should come from the core
> > code, and not be invented by random drivers.
>
> Thanks Marc for explain my problem!
>
> Right, all I need is a notification in MPM irqchip driver when the CPU
> domain/cluster is about to enter low power state. As cpu_pm -
> kernel/cpu_pm.c, already has helper cpu_cluster_pm_enter() sending
> CPU_CLUSTER_PM_ENTER event, I just need to find a caller to this cpu_pm
> helper.
>
> Is .power_off hook of generic_pm_domain a better place for calling the
> helper?

I really don't understand why you want a cluster PM event generated by
the idle driver. Specially considering that you are not after a
*cluster* PM event, but after some sort of system-wide event (last man
standing).

It looks to me that having a predicate that can be called from a PM
notifier event to find out whether you're the last in line would be
better suited, and could further be used to remove the crap from the
rpmh-rsc driver.

M.

--
Without deviation from the norm, progress is not possible.

2022-02-20 20:21:38

by Shawn Guo

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] cpuidle: psci: Call cpu_cluster_pm_enter() on the last CPU

On Thu, Feb 17, 2022 at 09:37:03PM +0800, Shawn Guo wrote:
> On Thu, Feb 17, 2022 at 08:52:35AM +0000, Marc Zyngier wrote:
> > On Thu, 17 Feb 2022 07:31:32 +0000,
> > Shawn Guo <[email protected]> wrote:
> > >
> > > On Wed, Feb 16, 2022 at 03:58:41PM +0000, Marc Zyngier wrote:
> > > > On 2022-02-16 14:49, Sudeep Holla wrote:
> > > > > +Ulf (as you he is the author of cpuidle-psci-domains.c and can help you
> > > > > with that if you require)
> > >
> > > Thanks, Sudeep!
> > >
> > > > >
> > > > > On Wed, Feb 16, 2022 at 09:28:28PM +0800, Shawn Guo wrote:
> > > > > > Make a call to cpu_cluster_pm_enter() on the last CPU going to low
> > > > > > power
> > > > > > state (and cpu_cluster_pm_exit() on the firt CPU coming back), so that
> > > > > > platforms can be notified to set up hardware for getting into the
> > > > > > cluster
> > > > > > low power state.
> > > > > >
> > > > >
> > > > > NACK. We are not getting the notion of CPU cluster back to cpuidle
> > > > > again.
> > > > > That must die. Remember the cluster doesn't map to idle states
> > > > > especially
> > > > > in the DSU systems where HMP CPUs are in the same cluster but can be in
> > > > > different power domains.
> > >
> > > The 'cluster' in cpu_cluster_pm_enter() doesn't necessarily means
> > > a physical CPU cluster. I think the documentation of the function has a
> > > better description.
> > >
> > > * Notifies listeners that all cpus in a power domain are entering a low power
> > > * state that may cause some blocks in the same power domain to reset.
> > >
> > > So cpu_domain_pm_enter() might be a better name? Anyways ...
> > >
> > > > >
> > > > > You need to decide which PSCI CPU_SUSPEND mode you want to use first. If
> > > > > it is
> > > > > Platform Co-ordinated(PC), then you need not notify anything to the
> > > > > platform.
> > > > > Just request the desired idle state on each CPU and platform will take
> > > > > care
> > > > > from there.
> > > > >
> > > > > If for whatever reason you have chosen OS initiated mode(OSI), then
> > > > > specify
> > > > > the PSCI power domains correctly in the DT which will make use of the
> > > > > cpuidle-psci-domains and handle the so called "cluster" state correctly.
> > >
> > > Yes, I'm running a Qualcomm platform that has OSI supported in PSCI.
> > >
> > > >
> > > > My understanding is that what Shawn is after is a way to detect the "last
> > > > man standing" on the system to kick off some funky wake-up controller that
> > > > really should be handled by the power controller (because only that guy
> > > > knows for sure who is the last CPU on the block).
> > > >
> > > > There was previously some really funky stuff (copy pasted from the existing
> > > > rpmh_rsc_cpu_pm_callback()), which I totally objected to having hidden in
> > > > an irqchip driver.
> > > >
> > > > My ask was that if we needed such information, and assuming that it is
> > > > possible to obtain it in a reliable way, this should come from the core
> > > > code, and not be invented by random drivers.
> > >
> > > Thanks Marc for explain my problem!
> > >
> > > Right, all I need is a notification in MPM irqchip driver when the CPU
> > > domain/cluster is about to enter low power state. As cpu_pm -
> > > kernel/cpu_pm.c, already has helper cpu_cluster_pm_enter() sending
> > > CPU_CLUSTER_PM_ENTER event, I just need to find a caller to this cpu_pm
> > > helper.
> > >
> > > Is .power_off hook of generic_pm_domain a better place for calling the
> > > helper?
> >
> > I really don't understand why you want a cluster PM event generated by
> > the idle driver. Specially considering that you are not after a
> > *cluster* PM event, but after some sort of system-wide event (last man
> > standing).
>
> That's primarily because MPM driver is used in the idle context, either
> s2idle or cpuidle, and idle driver already has some infrastructure that
> could help trigger the event.
>
> > It looks to me that having a predicate that can be called from a PM
> > notifier event to find out whether you're the last in line would be
> > better suited, and could further be used to remove the crap from the
> > rpmh-rsc driver.
>
> I will see if I can come up with a patch for discussion.

How about this?

---8<-----------
From 0176b2311764959bb5f3322865bf85440eaf769e Mon Sep 17 00:00:00 2001
From: Shawn Guo <[email protected]>
Date: Sat, 19 Feb 2022 13:35:47 +0800
Subject: [PATCH] PM: cpu: Add CPU_LAST_PM_ENTER and CPU_FIRST_PM_EXIT support

It becomes a common situation on some platforms that certain hardware
setup needs to be done on the last standing cpu, and rpmh-rsc[1] is such
an existing example. As figuring out the last standing cpu is really
something generic, it adds CPU_LAST_PM_ENTER (and CPU_FIRST_PM_EXIT)
event support to cpu_pm helper, so that individual driver can be
notified when the last standing cpu is about to enter low power state.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/soc/qcom/rpmh-rsc.c?id=v5.16#n773

Signed-off-by: Shawn Guo <[email protected]>
---
include/linux/cpu_pm.h | 15 +++++++++++++++
kernel/cpu_pm.c | 33 +++++++++++++++++++++++++++++++--
2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpu_pm.h b/include/linux/cpu_pm.h
index 552b8f9ea05e..153344307b7c 100644
--- a/include/linux/cpu_pm.h
+++ b/include/linux/cpu_pm.h
@@ -55,6 +55,21 @@ enum cpu_pm_event {

/* A cpu power domain is exiting a low power state */
CPU_CLUSTER_PM_EXIT,
+
+ /*
+ * A cpu is entering a low power state after all other cpus
+ * in the system have entered the lower power state.
+ */
+ CPU_LAST_PM_ENTER,
+
+ /* The last cpu failed to enter a low power state */
+ CPU_LAST_PM_ENTER_FAILED,
+
+ /*
+ * A cpu is exiting a low power state before any other cpus
+ * in the system exits the low power state.
+ */
+ CPU_FIRST_PM_EXIT,
};

#ifdef CONFIG_CPU_PM
diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 246efc74e3f3..7c104446e1e9 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -26,6 +26,8 @@ static struct {
.lock = __RAW_SPIN_LOCK_UNLOCKED(cpu_pm_notifier.lock),
};

+static atomic_t cpus_in_pm;
+
static int cpu_pm_notify(enum cpu_pm_event event)
{
int ret;
@@ -116,7 +118,20 @@ EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);
*/
int cpu_pm_enter(void)
{
- return cpu_pm_notify_robust(CPU_PM_ENTER, CPU_PM_ENTER_FAILED);
+ int ret;
+
+ ret = cpu_pm_notify_robust(CPU_PM_ENTER, CPU_PM_ENTER_FAILED);
+ if (ret)
+ return ret;
+
+ if (atomic_inc_return(&cpus_in_pm) == num_online_cpus()) {
+ ret = cpu_pm_notify_robust(CPU_LAST_PM_ENTER,
+ CPU_LAST_PM_ENTER_FAILED);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
}
EXPORT_SYMBOL_GPL(cpu_pm_enter);

@@ -134,7 +149,21 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
*/
int cpu_pm_exit(void)
{
- return cpu_pm_notify(CPU_PM_EXIT);
+ int ret;
+
+ ret = cpu_pm_notify(CPU_PM_EXIT);
+ if (ret)
+ return ret;
+
+ if (atomic_read(&cpus_in_pm) == num_online_cpus()) {
+ ret = cpu_pm_notify(CPU_FIRST_PM_EXIT);
+ if (ret)
+ return ret;
+ }
+
+ atomic_dec(&cpus_in_pm);
+
+ return 0;
}
EXPORT_SYMBOL_GPL(cpu_pm_exit);

--
2.17.1