2021-09-29 14:46:03

by Ulf Hansson

[permalink] [raw]
Subject: [PATCH 0/2] cpuidle: Fix runtime PM based cpuidle for s2idle

Maulik Shah reported problems to me around the s2idle support in the
cpuidle-psci driver. More precisely, calls to pm_runtime_get|put fails during
system wide suspend, because runtime PM gets disabled by the PM core.

This small series intends to fix the problem. More details in the commit
messages.

Kind regards
Ulf Hansson


Ulf Hansson (2):
cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle
PM: sleep: Fix runtime PM based cpuidle support

drivers/base/power/main.c | 6 ++----
drivers/cpuidle/cpuidle.c | 7 ++++++-
include/linux/cpuidle.h | 2 ++
kernel/power/suspend.c | 2 --
kernel/sched/idle.c | 40 ++++++++++++++++++++++-----------------
5 files changed, 33 insertions(+), 24 deletions(-)

--
2.25.1


2021-09-29 14:46:12

by Ulf Hansson

[permalink] [raw]
Subject: [PATCH 1/2] cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle

In s2idle_enter(), cpuidle_resume|pause() are invoked to re-allow calls to
the cpuidle callbacks during s2idle operations. This is needed because
cpuidle is paused in-between in dpm_suspend_noirq() and dpm_resume_noirq().

However, calling cpuidle_resume|pause() from s2idle_enter() looks a bit
superfluous, as it also causes all CPUs to be waken up when the first CPU
wakes up from s2idle.

Therefore, let's drop the calls to cpuidle_resume|pause() from
s2idle_enter(). To make this work, let's also adopt the path in the
cpuidle_idle_call() to allow cpuidle callbacks to be invoked for s2idle,
even if cpuidle has been paused.

Signed-off-by: Ulf Hansson <[email protected]>
---
drivers/cpuidle/cpuidle.c | 7 ++++++-
include/linux/cpuidle.h | 2 ++
kernel/power/suspend.c | 2 --
kernel/sched/idle.c | 40 ++++++++++++++++++++++-----------------
4 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ef2ea1b12cd8..c76747e497e7 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -49,7 +49,12 @@ void disable_cpuidle(void)
bool cpuidle_not_available(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
- return off || !initialized || !drv || !dev || !dev->enabled;
+ return off || !drv || !dev || !dev->enabled;
+}
+
+bool cpuidle_paused(void)
+{
+ return !initialized;
}

/**
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index fce476275e16..51698b385ab5 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -165,6 +165,7 @@ extern void cpuidle_pause_and_lock(void);
extern void cpuidle_resume_and_unlock(void);
extern void cpuidle_pause(void);
extern void cpuidle_resume(void);
+extern bool cpuidle_paused(void);
extern int cpuidle_enable_device(struct cpuidle_device *dev);
extern void cpuidle_disable_device(struct cpuidle_device *dev);
extern int cpuidle_play_dead(void);
@@ -204,6 +205,7 @@ static inline void cpuidle_pause_and_lock(void) { }
static inline void cpuidle_resume_and_unlock(void) { }
static inline void cpuidle_pause(void) { }
static inline void cpuidle_resume(void) { }
+static inline bool cpuidle_paused(void) {return true; }
static inline int cpuidle_enable_device(struct cpuidle_device *dev)
{return -ENODEV; }
static inline void cpuidle_disable_device(struct cpuidle_device *dev) { }
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index eb75f394a059..388a5de4836e 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -97,7 +97,6 @@ static void s2idle_enter(void)
raw_spin_unlock_irq(&s2idle_lock);

cpus_read_lock();
- cpuidle_resume();

/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
@@ -105,7 +104,6 @@ static void s2idle_enter(void)
swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);

- cpuidle_pause();
cpus_read_unlock();

raw_spin_lock_irq(&s2idle_lock);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index d17b0a5ce6ac..3bc3a2c46731 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -158,6 +158,17 @@ static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
return cpuidle_enter(drv, dev, next_state);
}

+static void cpuidle_deepest_state(struct cpuidle_driver *drv,
+ struct cpuidle_device *dev,
+ u64 max_latency_ns)
+{
+ int next_state;
+
+ tick_nohz_idle_stop_tick();
+ next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
+ call_cpuidle(drv, dev, next_state);
+}
+
/**
* cpuidle_idle_call - the main idle function
*
@@ -189,6 +200,7 @@ static void cpuidle_idle_call(void)
*/

if (cpuidle_not_available(drv, dev)) {
+default_idle:
tick_nohz_idle_stop_tick();

default_idle_call();
@@ -204,25 +216,19 @@ static void cpuidle_idle_call(void)
* timekeeping to prevent timer interrupts from kicking us out of idle
* until a proper wakeup interrupt happens.
*/
+ if (idle_should_enter_s2idle()) {
+ entered_state = call_cpuidle_s2idle(drv, dev);
+ if (entered_state <= 0)
+ cpuidle_deepest_state(drv, dev, U64_MAX);
+ goto exit_idle;
+ }

- if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) {
- u64 max_latency_ns;
-
- if (idle_should_enter_s2idle()) {
-
- entered_state = call_cpuidle_s2idle(drv, dev);
- if (entered_state > 0)
- goto exit_idle;
-
- max_latency_ns = U64_MAX;
- } else {
- max_latency_ns = dev->forced_idle_latency_limit_ns;
- }
-
- tick_nohz_idle_stop_tick();
+ if (cpuidle_paused())
+ goto default_idle;

- next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
- call_cpuidle(drv, dev, next_state);
+ if (dev->forced_idle_latency_limit_ns) {
+ cpuidle_deepest_state(drv, dev,
+ dev->forced_idle_latency_limit_ns);
} else {
bool stop_tick = true;

--
2.25.1

2021-09-29 14:47:57

by Ulf Hansson

[permalink] [raw]
Subject: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

In the cpuidle-psci case, runtime PM in combination with the generic PM
domain (genpd), may be used when entering/exiting an idlestate. More
precisely, genpd relies on runtime PM to be enabled for the attached device
(in this case it belongs to a CPU), to properly manage the reference
counting of its PM domain.

This works fine most of the time, but during system suspend in the
dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
Beyond this point and until runtime PM becomes re-enabled in the
dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.

To make sure the reference counting in genpd becomes correct, we need to
prevent cpuidle-psci from using runtime PM when it has been disabled for
the device. Therefore, let's move the call to cpuidle_pause() from
dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
dpm_resume_noirq() into dpm_resume_early().

Diagnosed-by: Maulik Shah <[email protected]>
Suggested-by: Maulik Shah <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
---
drivers/base/power/main.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cbea78e79f3d..1c753b651272 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)

resume_device_irqs();
device_wakeup_disarm_wake_irqs();
-
- cpuidle_resume();
}

/**
@@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
}
mutex_unlock(&dpm_list_mtx);
async_synchronize_full();
+ cpuidle_resume();
dpm_show_time(starttime, state, 0, "early");
trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
}
@@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
{
int ret;

- cpuidle_pause();
-
device_wakeup_arm_wake_irqs();
suspend_device_irqs();

@@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
int error = 0;

trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
+ cpuidle_pause();
mutex_lock(&dpm_list_mtx);
pm_transition = state;
async_error = 0;
--
2.25.1

2021-10-06 10:24:41

by Maulik Shah

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle

Hi,

On 9/29/2021 8:14 PM, Ulf Hansson wrote:
> In s2idle_enter(), cpuidle_resume|pause() are invoked to re-allow calls to
> the cpuidle callbacks during s2idle operations. This is needed because
> cpuidle is paused in-between in dpm_suspend_noirq() and dpm_resume_noirq().
>
> However, calling cpuidle_resume|pause() from s2idle_enter() looks a bit
> superfluous, as it also causes all CPUs to be waken up when the first CPU
> wakes up from s2idle.

Thanks for the patch. This can be good optimization to avoid waking up
all CPUs always.

>
> Therefore, let's drop the calls to cpuidle_resume|pause() from
> s2idle_enter(). To make this work, let's also adopt the path in the
> cpuidle_idle_call() to allow cpuidle callbacks to be invoked for s2idle,
> even if cpuidle has been paused.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
> drivers/cpuidle/cpuidle.c | 7 ++++++-
> include/linux/cpuidle.h | 2 ++
> kernel/power/suspend.c | 2 --
> kernel/sched/idle.c | 40 ++++++++++++++++++++++-----------------
> 4 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index ef2ea1b12cd8..c76747e497e7 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -49,7 +49,12 @@ void disable_cpuidle(void)
> bool cpuidle_not_available(struct cpuidle_driver *drv,
> struct cpuidle_device *dev)
> {
> - return off || !initialized || !drv || !dev || !dev->enabled;
> + return off || !drv || !dev || !dev->enabled;
> +}
> +
> +bool cpuidle_paused(void)
> +{
> + return !initialized;
> }
>
> /**
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index fce476275e16..51698b385ab5 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -165,6 +165,7 @@ extern void cpuidle_pause_and_lock(void);
> extern void cpuidle_resume_and_unlock(void);
> extern void cpuidle_pause(void);
> extern void cpuidle_resume(void);
> +extern bool cpuidle_paused(void);
> extern int cpuidle_enable_device(struct cpuidle_device *dev);
> extern void cpuidle_disable_device(struct cpuidle_device *dev);
> extern int cpuidle_play_dead(void);
> @@ -204,6 +205,7 @@ static inline void cpuidle_pause_and_lock(void) { }
> static inline void cpuidle_resume_and_unlock(void) { }
> static inline void cpuidle_pause(void) { }
> static inline void cpuidle_resume(void) { }
> +static inline bool cpuidle_paused(void) {return true; }
> static inline int cpuidle_enable_device(struct cpuidle_device *dev)
> {return -ENODEV; }
> static inline void cpuidle_disable_device(struct cpuidle_device *dev) { }
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index eb75f394a059..388a5de4836e 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -97,7 +97,6 @@ static void s2idle_enter(void)
> raw_spin_unlock_irq(&s2idle_lock);
>
> cpus_read_lock();
> - cpuidle_resume();
>
> /* Push all the CPUs into the idle loop. */
> wake_up_all_idle_cpus();

wake_up_all_idle_cpus() will still cause all CPUs to be woken up when
first cpu wakes up.

say for example,
1. device goes to s2idle suspend.
2. one CPU wakes up to handle irq (irq is not a wake irq but left
enabled at GIC because of IRQF_NOSUSPEND flag) so such irq will not
break suspend.
3. The cpu handles the irq.
4. same cpu don't break s2idle_loop() and goes to s2idle_enter() where
it wakes up all existing idle cpus due to wake_up_all_idle_cpus()
5. all of CPUs again enter s2idle.

to avoid waking up all CPUs in above case, something like below snip may
help (i have not tested yet),

when CPUs are in s2idle_loop(),

1. set the s2idle state to enter.
2. wake up all cpus from shallow state, so that they can re-enter
deepest state.
3. Forever loop until a break with some wake irq.
4. clear the s2idle state.
5. wake up all cpus from deepest state so that they can now stay in
shallow state/running state.

void s2idle_loop(void)
{

+ s2idle_state = S2IDLE_STATE_ENTER;
+ /* Push all the CPUs to enter deepest available state */
+ wake_up_all_idle_cpus();
for (;;) {
if (s2idle_ops && s2idle_ops->wake) {
if (s2idle_ops->wake())
..
s2idle_enter();
}
+ s2idle_state = S2IDLE_STATE_NONE;
+ /* Push all the CPUs to enter default_idle() from this point */
+ wake_up_all_idle_cpus();
}

Thanks,
Maulik


> @@ -105,7 +104,6 @@ static void s2idle_enter(void)
> swait_event_exclusive(s2idle_wait_head,
> s2idle_state == S2IDLE_STATE_WAKE);
>
> - cpuidle_pause();
> cpus_read_unlock();
>
> raw_spin_lock_irq(&s2idle_lock);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index d17b0a5ce6ac..3bc3a2c46731 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -158,6 +158,17 @@ static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> return cpuidle_enter(drv, dev, next_state);
> }
>
> +static void cpuidle_deepest_state(struct cpuidle_driver *drv,
> + struct cpuidle_device *dev,
> + u64 max_latency_ns)
> +{
> + int next_state;
> +
> + tick_nohz_idle_stop_tick();
> + next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> + call_cpuidle(drv, dev, next_state);
> +}
> +
> /**
> * cpuidle_idle_call - the main idle function
> *
> @@ -189,6 +200,7 @@ static void cpuidle_idle_call(void)
> */
>
> if (cpuidle_not_available(drv, dev)) {
> +default_idle:
> tick_nohz_idle_stop_tick();
>
> default_idle_call();
> @@ -204,25 +216,19 @@ static void cpuidle_idle_call(void)
> * timekeeping to prevent timer interrupts from kicking us out of idle
> * until a proper wakeup interrupt happens.
> */
> + if (idle_should_enter_s2idle()) {
> + entered_state = call_cpuidle_s2idle(drv, dev);
> + if (entered_state <= 0)
> + cpuidle_deepest_state(drv, dev, U64_MAX);
> + goto exit_idle;
> + }
>
> - if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) {
> - u64 max_latency_ns;
> -
> - if (idle_should_enter_s2idle()) {
> -
> - entered_state = call_cpuidle_s2idle(drv, dev);
> - if (entered_state > 0)
> - goto exit_idle;
> -
> - max_latency_ns = U64_MAX;
> - } else {
> - max_latency_ns = dev->forced_idle_latency_limit_ns;
> - }
> -
> - tick_nohz_idle_stop_tick();
> + if (cpuidle_paused())
> + goto default_idle;
>
> - next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> - call_cpuidle(drv, dev, next_state);
> + if (dev->forced_idle_latency_limit_ns) {
> + cpuidle_deepest_state(drv, dev,
> + dev->forced_idle_latency_limit_ns);
> } else {
> bool stop_tick = true;
>
>

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation

2021-10-06 10:30:59

by Maulik Shah

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

Hi,

Thanks for the patch. Looks good to me.

Reviewed-by: Maulik Shah <[email protected]>

Thanks,
Maulik

On 9/29/2021 8:14 PM, Ulf Hansson wrote:
> In the cpuidle-psci case, runtime PM in combination with the generic PM
> domain (genpd), may be used when entering/exiting an idlestate. More
> precisely, genpd relies on runtime PM to be enabled for the attached device
> (in this case it belongs to a CPU), to properly manage the reference
> counting of its PM domain.
>
> This works fine most of the time, but during system suspend in the
> dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> Beyond this point and until runtime PM becomes re-enabled in the
> dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
>
> To make sure the reference counting in genpd becomes correct, we need to
> prevent cpuidle-psci from using runtime PM when it has been disabled for
> the device. Therefore, let's move the call to cpuidle_pause() from
> dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> dpm_resume_noirq() into dpm_resume_early().
>
> Diagnosed-by: Maulik Shah <[email protected]>
> Suggested-by: Maulik Shah <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
> drivers/base/power/main.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index cbea78e79f3d..1c753b651272 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
>
> resume_device_irqs();
> device_wakeup_disarm_wake_irqs();
> -
> - cpuidle_resume();
> }
>
> /**
> @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> }
> mutex_unlock(&dpm_list_mtx);
> async_synchronize_full();
> + cpuidle_resume();
> dpm_show_time(starttime, state, 0, "early");
> trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> }
> @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> {
> int ret;
>
> - cpuidle_pause();
> -
> device_wakeup_arm_wake_irqs();
> suspend_device_irqs();
>
> @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> int error = 0;
>
> trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> + cpuidle_pause();
> mutex_lock(&dpm_list_mtx);
> pm_transition = state;
> async_error = 0;
>

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation

2021-10-06 13:13:07

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle

On Wed, 6 Oct 2021 at 12:22, Maulik Shah <[email protected]> wrote:
>
> Hi,
>
> On 9/29/2021 8:14 PM, Ulf Hansson wrote:
> > In s2idle_enter(), cpuidle_resume|pause() are invoked to re-allow calls to
> > the cpuidle callbacks during s2idle operations. This is needed because
> > cpuidle is paused in-between in dpm_suspend_noirq() and dpm_resume_noirq().
> >
> > However, calling cpuidle_resume|pause() from s2idle_enter() looks a bit
> > superfluous, as it also causes all CPUs to be waken up when the first CPU
> > wakes up from s2idle.
>
> Thanks for the patch. This can be good optimization to avoid waking up
> all CPUs always.
>
> >
> > Therefore, let's drop the calls to cpuidle_resume|pause() from
> > s2idle_enter(). To make this work, let's also adopt the path in the
> > cpuidle_idle_call() to allow cpuidle callbacks to be invoked for s2idle,
> > even if cpuidle has been paused.
> >
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> > drivers/cpuidle/cpuidle.c | 7 ++++++-
> > include/linux/cpuidle.h | 2 ++
> > kernel/power/suspend.c | 2 --
> > kernel/sched/idle.c | 40 ++++++++++++++++++++++-----------------
> > 4 files changed, 31 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index ef2ea1b12cd8..c76747e497e7 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -49,7 +49,12 @@ void disable_cpuidle(void)
> > bool cpuidle_not_available(struct cpuidle_driver *drv,
> > struct cpuidle_device *dev)
> > {
> > - return off || !initialized || !drv || !dev || !dev->enabled;
> > + return off || !drv || !dev || !dev->enabled;
> > +}
> > +
> > +bool cpuidle_paused(void)
> > +{
> > + return !initialized;
> > }
> >
> > /**
> > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> > index fce476275e16..51698b385ab5 100644
> > --- a/include/linux/cpuidle.h
> > +++ b/include/linux/cpuidle.h
> > @@ -165,6 +165,7 @@ extern void cpuidle_pause_and_lock(void);
> > extern void cpuidle_resume_and_unlock(void);
> > extern void cpuidle_pause(void);
> > extern void cpuidle_resume(void);
> > +extern bool cpuidle_paused(void);
> > extern int cpuidle_enable_device(struct cpuidle_device *dev);
> > extern void cpuidle_disable_device(struct cpuidle_device *dev);
> > extern int cpuidle_play_dead(void);
> > @@ -204,6 +205,7 @@ static inline void cpuidle_pause_and_lock(void) { }
> > static inline void cpuidle_resume_and_unlock(void) { }
> > static inline void cpuidle_pause(void) { }
> > static inline void cpuidle_resume(void) { }
> > +static inline bool cpuidle_paused(void) {return true; }
> > static inline int cpuidle_enable_device(struct cpuidle_device *dev)
> > {return -ENODEV; }
> > static inline void cpuidle_disable_device(struct cpuidle_device *dev) { }
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index eb75f394a059..388a5de4836e 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -97,7 +97,6 @@ static void s2idle_enter(void)
> > raw_spin_unlock_irq(&s2idle_lock);
> >
> > cpus_read_lock();
> > - cpuidle_resume();
> >
> > /* Push all the CPUs into the idle loop. */
> > wake_up_all_idle_cpus();
>
> wake_up_all_idle_cpus() will still cause all CPUs to be woken up when
> first cpu wakes up.
>
> say for example,
> 1. device goes to s2idle suspend.
> 2. one CPU wakes up to handle irq (irq is not a wake irq but left
> enabled at GIC because of IRQF_NOSUSPEND flag) so such irq will not
> break suspend.
> 3. The cpu handles the irq.
> 4. same cpu don't break s2idle_loop() and goes to s2idle_enter() where
> it wakes up all existing idle cpus due to wake_up_all_idle_cpus()
> 5. all of CPUs again enter s2idle.
>
> to avoid waking up all CPUs in above case, something like below snip may
> help (i have not tested yet),
>
> when CPUs are in s2idle_loop(),
>
> 1. set the s2idle state to enter.
> 2. wake up all cpus from shallow state, so that they can re-enter
> deepest state.
> 3. Forever loop until a break with some wake irq.
> 4. clear the s2idle state.
> 5. wake up all cpus from deepest state so that they can now stay in
> shallow state/running state.
>
> void s2idle_loop(void)
> {
>
> + s2idle_state = S2IDLE_STATE_ENTER;
> + /* Push all the CPUs to enter deepest available state */
> + wake_up_all_idle_cpus();
> for (;;) {
> if (s2idle_ops && s2idle_ops->wake) {
> if (s2idle_ops->wake())
> ..
> s2idle_enter();
> }
> + s2idle_state = S2IDLE_STATE_NONE;
> + /* Push all the CPUs to enter default_idle() from this point */
> + wake_up_all_idle_cpus();
> }

Overall, I follow your reasoning above and I think it makes sense to
me, but maybe Rafael has some concerns about it.

Even if the above code needs some polishing, the logic seems
reasonable to me. I suggest you post a patch, based on top of my small
series, so we can discuss your suggested improvements separately. Or
just tell me, if you would like me to do it.

>
> Thanks,
> Maulik

Thanks for reviewing!

[...]

Kind regards
Uffe

2021-10-09 15:42:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle

On Wednesday, September 29, 2021 4:44:50 PM CEST Ulf Hansson wrote:
> In s2idle_enter(), cpuidle_resume|pause() are invoked to re-allow calls to
> the cpuidle callbacks during s2idle operations. This is needed because
> cpuidle is paused in-between in dpm_suspend_noirq() and dpm_resume_noirq().

Well, in fact, doing that last thing for s2idle is pointless, because cpuidle
is going to be resumed eventually anyway in that case and the breakage expected
to be prevented by the pausing will still occur.

So I would rather do something like the patch below (untested).

---
drivers/base/power/main.c | 11 ++++++-----
kernel/power/suspend.c | 8 ++++++--
2 files changed, 12 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state

resume_device_irqs();
device_wakeup_disarm_wake_irqs();
-
- cpuidle_resume();
}

/**
@@ -881,6 +879,7 @@ void dpm_resume_early(pm_message_t state
void dpm_resume_start(pm_message_t state)
{
dpm_resume_noirq(state);
+ cpuidle_resume();
dpm_resume_early(state);
}
EXPORT_SYMBOL_GPL(dpm_resume_start);
@@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state
{
int ret;

- cpuidle_pause();
-
device_wakeup_arm_wake_irqs();
suspend_device_irqs();

@@ -1521,9 +1518,13 @@ int dpm_suspend_end(pm_message_t state)
if (error)
goto out;

+ cpuidle_pause();
+
error = dpm_suspend_noirq(state);
- if (error)
+ if (error) {
+ cpuidle_resume();
dpm_resume_early(resume_event(state));
+ }

out:
dpm_show_time(starttime, state, error, "end");
Index: linux-pm/kernel/power/suspend.c
===================================================================
--- linux-pm.orig/kernel/power/suspend.c
+++ linux-pm/kernel/power/suspend.c
@@ -97,7 +97,6 @@ static void s2idle_enter(void)
raw_spin_unlock_irq(&s2idle_lock);

cpus_read_lock();
- cpuidle_resume();

/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
@@ -105,7 +104,6 @@ static void s2idle_enter(void)
swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);

- cpuidle_pause();
cpus_read_unlock();

raw_spin_lock_irq(&s2idle_lock);
@@ -405,6 +403,9 @@ static int suspend_enter(suspend_state_t
if (error)
goto Devices_early_resume;

+ if (state != PM_SUSPEND_TO_IDLE)
+ cpuidle_pause();
+
error = dpm_suspend_noirq(PMSG_SUSPEND);
if (error) {
pr_err("noirq suspend of devices failed\n");
@@ -459,6 +460,9 @@ static int suspend_enter(suspend_state_t
dpm_resume_noirq(PMSG_RESUME);

Platform_early_resume:
+ if (state != PM_SUSPEND_TO_IDLE)
+ cpuidle_resume();
+
platform_resume_early(state);

Devices_early_resume:



2021-10-09 15:45:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle

On Wednesday, October 6, 2021 3:10:55 PM CEST Ulf Hansson wrote:
> On Wed, 6 Oct 2021 at 12:22, Maulik Shah <[email protected]> wrote:
> >
> > Hi,
> >
> > On 9/29/2021 8:14 PM, Ulf Hansson wrote:
> > > In s2idle_enter(), cpuidle_resume|pause() are invoked to re-allow calls to
> > > the cpuidle callbacks during s2idle operations. This is needed because
> > > cpuidle is paused in-between in dpm_suspend_noirq() and dpm_resume_noirq().
> > >
> > > However, calling cpuidle_resume|pause() from s2idle_enter() looks a bit
> > > superfluous, as it also causes all CPUs to be waken up when the first CPU
> > > wakes up from s2idle.
> >
> > Thanks for the patch. This can be good optimization to avoid waking up
> > all CPUs always.
> >
> > >
> > > Therefore, let's drop the calls to cpuidle_resume|pause() from
> > > s2idle_enter(). To make this work, let's also adopt the path in the
> > > cpuidle_idle_call() to allow cpuidle callbacks to be invoked for s2idle,
> > > even if cpuidle has been paused.
> > >
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > > drivers/cpuidle/cpuidle.c | 7 ++++++-
> > > include/linux/cpuidle.h | 2 ++
> > > kernel/power/suspend.c | 2 --
> > > kernel/sched/idle.c | 40 ++++++++++++++++++++++-----------------
> > > 4 files changed, 31 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > > index ef2ea1b12cd8..c76747e497e7 100644
> > > --- a/drivers/cpuidle/cpuidle.c
> > > +++ b/drivers/cpuidle/cpuidle.c
> > > @@ -49,7 +49,12 @@ void disable_cpuidle(void)
> > > bool cpuidle_not_available(struct cpuidle_driver *drv,
> > > struct cpuidle_device *dev)
> > > {
> > > - return off || !initialized || !drv || !dev || !dev->enabled;
> > > + return off || !drv || !dev || !dev->enabled;
> > > +}
> > > +
> > > +bool cpuidle_paused(void)
> > > +{
> > > + return !initialized;
> > > }
> > >
> > > /**
> > > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> > > index fce476275e16..51698b385ab5 100644
> > > --- a/include/linux/cpuidle.h
> > > +++ b/include/linux/cpuidle.h
> > > @@ -165,6 +165,7 @@ extern void cpuidle_pause_and_lock(void);
> > > extern void cpuidle_resume_and_unlock(void);
> > > extern void cpuidle_pause(void);
> > > extern void cpuidle_resume(void);
> > > +extern bool cpuidle_paused(void);
> > > extern int cpuidle_enable_device(struct cpuidle_device *dev);
> > > extern void cpuidle_disable_device(struct cpuidle_device *dev);
> > > extern int cpuidle_play_dead(void);
> > > @@ -204,6 +205,7 @@ static inline void cpuidle_pause_and_lock(void) { }
> > > static inline void cpuidle_resume_and_unlock(void) { }
> > > static inline void cpuidle_pause(void) { }
> > > static inline void cpuidle_resume(void) { }
> > > +static inline bool cpuidle_paused(void) {return true; }
> > > static inline int cpuidle_enable_device(struct cpuidle_device *dev)
> > > {return -ENODEV; }
> > > static inline void cpuidle_disable_device(struct cpuidle_device *dev) { }
> > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > > index eb75f394a059..388a5de4836e 100644
> > > --- a/kernel/power/suspend.c
> > > +++ b/kernel/power/suspend.c
> > > @@ -97,7 +97,6 @@ static void s2idle_enter(void)
> > > raw_spin_unlock_irq(&s2idle_lock);
> > >
> > > cpus_read_lock();
> > > - cpuidle_resume();
> > >
> > > /* Push all the CPUs into the idle loop. */
> > > wake_up_all_idle_cpus();
> >
> > wake_up_all_idle_cpus() will still cause all CPUs to be woken up when
> > first cpu wakes up.
> >
> > say for example,
> > 1. device goes to s2idle suspend.
> > 2. one CPU wakes up to handle irq (irq is not a wake irq but left
> > enabled at GIC because of IRQF_NOSUSPEND flag) so such irq will not
> > break suspend.
> > 3. The cpu handles the irq.
> > 4. same cpu don't break s2idle_loop() and goes to s2idle_enter() where
> > it wakes up all existing idle cpus due to wake_up_all_idle_cpus()
> > 5. all of CPUs again enter s2idle.
> >
> > to avoid waking up all CPUs in above case, something like below snip may
> > help (i have not tested yet),
> >
> > when CPUs are in s2idle_loop(),
> >
> > 1. set the s2idle state to enter.
> > 2. wake up all cpus from shallow state, so that they can re-enter
> > deepest state.
> > 3. Forever loop until a break with some wake irq.
> > 4. clear the s2idle state.
> > 5. wake up all cpus from deepest state so that they can now stay in
> > shallow state/running state.
> >
> > void s2idle_loop(void)
> > {
> >
> > + s2idle_state = S2IDLE_STATE_ENTER;
> > + /* Push all the CPUs to enter deepest available state */
> > + wake_up_all_idle_cpus();
> > for (;;) {
> > if (s2idle_ops && s2idle_ops->wake) {
> > if (s2idle_ops->wake())

So generally you don't know what will break loose in the ->wake() callback
and so you don't know what idle states the CPUs will end up in before
s2idle_enter().

You could optimize for the case when ->wake() is not present, though.

> > ..
> > s2idle_enter();
> > }
> > + s2idle_state = S2IDLE_STATE_NONE;
> > + /* Push all the CPUs to enter default_idle() from this point */
> > + wake_up_all_idle_cpus();
> > }



2021-10-11 13:33:30

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle: Avoid calls to cpuidle_resume|pause() for s2idle

On Sat, 9 Oct 2021 at 17:39, Rafael J. Wysocki <[email protected]> wrote:
>
> On Wednesday, September 29, 2021 4:44:50 PM CEST Ulf Hansson wrote:
> > In s2idle_enter(), cpuidle_resume|pause() are invoked to re-allow calls to
> > the cpuidle callbacks during s2idle operations. This is needed because
> > cpuidle is paused in-between in dpm_suspend_noirq() and dpm_resume_noirq().
>
> Well, in fact, doing that last thing for s2idle is pointless, because cpuidle
> is going to be resumed eventually anyway in that case and the breakage expected
> to be prevented by the pausing will still occur.
>
> So I would rather do something like the patch below (untested).

Hi Rafael,

From a standalone change point of view, what you suggest seems reasonable to me.

However, the main issue I am really trying to fix in this series is
being done in patch2/2. And unfortunately, the below change doesn't
really fit with what I suggest in patch2/2. Can you please have a look
at patch2 as well?

If you think it may be better, I squash the two patches?

Kind regards
Uffe

>
> ---
> drivers/base/power/main.c | 11 ++++++-----
> kernel/power/suspend.c | 8 ++++++--
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state
>
> resume_device_irqs();
> device_wakeup_disarm_wake_irqs();
> -
> - cpuidle_resume();
> }
>
> /**
> @@ -881,6 +879,7 @@ void dpm_resume_early(pm_message_t state
> void dpm_resume_start(pm_message_t state)
> {
> dpm_resume_noirq(state);
> + cpuidle_resume();
> dpm_resume_early(state);
> }
> EXPORT_SYMBOL_GPL(dpm_resume_start);
> @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state
> {
> int ret;
>
> - cpuidle_pause();
> -
> device_wakeup_arm_wake_irqs();
> suspend_device_irqs();
>
> @@ -1521,9 +1518,13 @@ int dpm_suspend_end(pm_message_t state)
> if (error)
> goto out;
>
> + cpuidle_pause();
> +
> error = dpm_suspend_noirq(state);
> - if (error)
> + if (error) {
> + cpuidle_resume();
> dpm_resume_early(resume_event(state));
> + }
>
> out:
> dpm_show_time(starttime, state, error, "end");
> Index: linux-pm/kernel/power/suspend.c
> ===================================================================
> --- linux-pm.orig/kernel/power/suspend.c
> +++ linux-pm/kernel/power/suspend.c
> @@ -97,7 +97,6 @@ static void s2idle_enter(void)
> raw_spin_unlock_irq(&s2idle_lock);
>
> cpus_read_lock();
> - cpuidle_resume();
>
> /* Push all the CPUs into the idle loop. */
> wake_up_all_idle_cpus();
> @@ -105,7 +104,6 @@ static void s2idle_enter(void)
> swait_event_exclusive(s2idle_wait_head,
> s2idle_state == S2IDLE_STATE_WAKE);
>
> - cpuidle_pause();
> cpus_read_unlock();
>
> raw_spin_lock_irq(&s2idle_lock);
> @@ -405,6 +403,9 @@ static int suspend_enter(suspend_state_t
> if (error)
> goto Devices_early_resume;
>
> + if (state != PM_SUSPEND_TO_IDLE)
> + cpuidle_pause();
> +
> error = dpm_suspend_noirq(PMSG_SUSPEND);
> if (error) {
> pr_err("noirq suspend of devices failed\n");
> @@ -459,6 +460,9 @@ static int suspend_enter(suspend_state_t
> dpm_resume_noirq(PMSG_RESUME);
>
> Platform_early_resume:
> + if (state != PM_SUSPEND_TO_IDLE)
> + cpuidle_resume();
> +
> platform_resume_early(state);
>
> Devices_early_resume:
>

2021-10-20 18:19:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
>
> In the cpuidle-psci case, runtime PM in combination with the generic PM
> domain (genpd), may be used when entering/exiting an idlestate. More
> precisely, genpd relies on runtime PM to be enabled for the attached device
> (in this case it belongs to a CPU), to properly manage the reference
> counting of its PM domain.
>
> This works fine most of the time, but during system suspend in the
> dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> Beyond this point and until runtime PM becomes re-enabled in the
> dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
>
> To make sure the reference counting in genpd becomes correct, we need to
> prevent cpuidle-psci from using runtime PM when it has been disabled for
> the device. Therefore, let's move the call to cpuidle_pause() from
> dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> dpm_resume_noirq() into dpm_resume_early().
>
> Diagnosed-by: Maulik Shah <[email protected]>
> Suggested-by: Maulik Shah <[email protected]>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
> drivers/base/power/main.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index cbea78e79f3d..1c753b651272 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
>
> resume_device_irqs();
> device_wakeup_disarm_wake_irqs();
> -
> - cpuidle_resume();
> }
>
> /**
> @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> }
> mutex_unlock(&dpm_list_mtx);
> async_synchronize_full();
> + cpuidle_resume();
> dpm_show_time(starttime, state, 0, "early");
> trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> }
> @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> {
> int ret;
>
> - cpuidle_pause();
> -
> device_wakeup_arm_wake_irqs();
> suspend_device_irqs();
>
> @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> int error = 0;
>
> trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> + cpuidle_pause();
> mutex_lock(&dpm_list_mtx);
> pm_transition = state;
> async_error = 0;
> --

Well, this is somewhat heavy-handed and it affects even the systems
that don't really need to pause cpuidle at all in the suspend path.

Also, IIUC you don't need to pause cpuidle completely, but make it
temporarily avoid idle states potentially affected by this issue. An
additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
suppose and it could be set via cpuidle_suspend() called from the core
next to cpufreq_suspend().

The other guys who rely on the cpuidle pausing today could be switched
over to this new mechanism later and it would be possible to get rid
of the pausing from the system suspend path completely.

2021-10-21 11:50:38

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
>
> On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> >
> > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > domain (genpd), may be used when entering/exiting an idlestate. More
> > precisely, genpd relies on runtime PM to be enabled for the attached device
> > (in this case it belongs to a CPU), to properly manage the reference
> > counting of its PM domain.
> >
> > This works fine most of the time, but during system suspend in the
> > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > Beyond this point and until runtime PM becomes re-enabled in the
> > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> >
> > To make sure the reference counting in genpd becomes correct, we need to
> > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > the device. Therefore, let's move the call to cpuidle_pause() from
> > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > dpm_resume_noirq() into dpm_resume_early().
> >
> > Diagnosed-by: Maulik Shah <[email protected]>
> > Suggested-by: Maulik Shah <[email protected]>
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> > drivers/base/power/main.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index cbea78e79f3d..1c753b651272 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> >
> > resume_device_irqs();
> > device_wakeup_disarm_wake_irqs();
> > -
> > - cpuidle_resume();
> > }
> >
> > /**
> > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > }
> > mutex_unlock(&dpm_list_mtx);
> > async_synchronize_full();
> > + cpuidle_resume();
> > dpm_show_time(starttime, state, 0, "early");
> > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > }
> > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > {
> > int ret;
> >
> > - cpuidle_pause();
> > -
> > device_wakeup_arm_wake_irqs();
> > suspend_device_irqs();
> >
> > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > int error = 0;
> >
> > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > + cpuidle_pause();
> > mutex_lock(&dpm_list_mtx);
> > pm_transition = state;
> > async_error = 0;
> > --
>
> Well, this is somewhat heavy-handed and it affects even the systems
> that don't really need to pause cpuidle at all in the suspend path.

Yes, I agree.

Although, I am not really changing the behaviour in regards to this.
cpuidle_pause() is already being called in dpm_suspend_noirq(), for
everybody today.

>
> Also, IIUC you don't need to pause cpuidle completely, but make it
> temporarily avoid idle states potentially affected by this issue. An
> additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> suppose and it could be set via cpuidle_suspend() called from the core
> next to cpufreq_suspend().

cpuidle_suspend() would then need to go and fetch the cpuidle driver
instance, which in some cases is one driver per CPU. Doesn't that get
rather messy?

Additionally, since find_deepest_state() is being called for
cpuidle_enter_s2idle() too, we would need to treat the new
CPUIDLE_STATE_DISABLED_ flag in a special way, right?

Is this really what we want?

>
> The other guys who rely on the cpuidle pausing today could be switched
> over to this new mechanism later and it would be possible to get rid
> of the pausing from the system suspend path completely.

Avoiding to pause cpuidle when it's not needed makes perfect sense.
Although, it looks to me that we could also implement that on top of
$subject patch.

Unless you insist on the CPUIDLE_STATE_DISABLED_ way, I would probably
explore an option to let a cpuidle driver to set a global cpuidle flag
during ->probe(). Depending if this flag is set, we can simply skip
calling cpuidle_pause() during system suspend.

What do you think?

Kind regards
Uffe

2021-10-21 13:48:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
>
> On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > (in this case it belongs to a CPU), to properly manage the reference
> > > counting of its PM domain.
> > >
> > > This works fine most of the time, but during system suspend in the
> > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > Beyond this point and until runtime PM becomes re-enabled in the
> > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > >
> > > To make sure the reference counting in genpd becomes correct, we need to
> > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > dpm_resume_noirq() into dpm_resume_early().
> > >
> > > Diagnosed-by: Maulik Shah <[email protected]>
> > > Suggested-by: Maulik Shah <[email protected]>
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > > drivers/base/power/main.c | 6 ++----
> > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > index cbea78e79f3d..1c753b651272 100644
> > > --- a/drivers/base/power/main.c
> > > +++ b/drivers/base/power/main.c
> > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > >
> > > resume_device_irqs();
> > > device_wakeup_disarm_wake_irqs();
> > > -
> > > - cpuidle_resume();
> > > }
> > >
> > > /**
> > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > }
> > > mutex_unlock(&dpm_list_mtx);
> > > async_synchronize_full();
> > > + cpuidle_resume();
> > > dpm_show_time(starttime, state, 0, "early");
> > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > }
> > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > {
> > > int ret;
> > >
> > > - cpuidle_pause();
> > > -
> > > device_wakeup_arm_wake_irqs();
> > > suspend_device_irqs();
> > >
> > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > int error = 0;
> > >
> > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > + cpuidle_pause();
> > > mutex_lock(&dpm_list_mtx);
> > > pm_transition = state;
> > > async_error = 0;
> > > --
> >
> > Well, this is somewhat heavy-handed and it affects even the systems
> > that don't really need to pause cpuidle at all in the suspend path.
>
> Yes, I agree.
>
> Although, I am not really changing the behaviour in regards to this.
> cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> everybody today.

Yes, it is, but pausing it earlier will cause more energy to be spent,
potentially.

That said, there are not too many users of suspend_late callbacks in
the tree, so it may not matter too much.

> >
> > Also, IIUC you don't need to pause cpuidle completely, but make it
> > temporarily avoid idle states potentially affected by this issue. An
> > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > suppose and it could be set via cpuidle_suspend() called from the core
> > next to cpufreq_suspend().
>
> cpuidle_suspend() would then need to go and fetch the cpuidle driver
> instance, which in some cases is one driver per CPU. Doesn't that get
> rather messy?

Per-CPU variables are used for that, so it is quite straightforward.

> Additionally, since find_deepest_state() is being called for
> cpuidle_enter_s2idle() too, we would need to treat the new
> CPUIDLE_STATE_DISABLED_ flag in a special way, right?

No, it already checks "disabled".

> Is this really what we want?
>
> >
> > The other guys who rely on the cpuidle pausing today could be switched
> > over to this new mechanism later and it would be possible to get rid
> > of the pausing from the system suspend path completely.
>
> Avoiding to pause cpuidle when it's not needed makes perfect sense.
> Although, it looks to me that we could also implement that on top of
> $subject patch.

Yes, it could.

> Unless you insist on the CPUIDLE_STATE_DISABLED_ way, I would probably
> explore an option to let a cpuidle driver to set a global cpuidle flag
> during ->probe(). Depending if this flag is set, we can simply skip
> calling cpuidle_pause() during system suspend.
>
> What do you think?

Well, which driver in particular is in question here?

2021-10-21 14:08:29

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > counting of its PM domain.
> > > >
> > > > This works fine most of the time, but during system suspend in the
> > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > >
> > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > dpm_resume_noirq() into dpm_resume_early().
> > > >
> > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > Suggested-by: Maulik Shah <[email protected]>
> > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > ---
> > > > drivers/base/power/main.c | 6 ++----
> > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > index cbea78e79f3d..1c753b651272 100644
> > > > --- a/drivers/base/power/main.c
> > > > +++ b/drivers/base/power/main.c
> > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > >
> > > > resume_device_irqs();
> > > > device_wakeup_disarm_wake_irqs();
> > > > -
> > > > - cpuidle_resume();
> > > > }
> > > >
> > > > /**
> > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > }
> > > > mutex_unlock(&dpm_list_mtx);
> > > > async_synchronize_full();
> > > > + cpuidle_resume();
> > > > dpm_show_time(starttime, state, 0, "early");
> > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > }
> > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > {
> > > > int ret;
> > > >
> > > > - cpuidle_pause();
> > > > -
> > > > device_wakeup_arm_wake_irqs();
> > > > suspend_device_irqs();
> > > >
> > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > int error = 0;
> > > >
> > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > + cpuidle_pause();
> > > > mutex_lock(&dpm_list_mtx);
> > > > pm_transition = state;
> > > > async_error = 0;
> > > > --
> > >
> > > Well, this is somewhat heavy-handed and it affects even the systems
> > > that don't really need to pause cpuidle at all in the suspend path.
> >
> > Yes, I agree.
> >
> > Although, I am not really changing the behaviour in regards to this.
> > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > everybody today.
>
> Yes, it is, but pausing it earlier will cause more energy to be spent,
> potentially.
>
> That said, there are not too many users of suspend_late callbacks in
> the tree, so it may not matter too much.
>
> > >
> > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > temporarily avoid idle states potentially affected by this issue. An
> > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > suppose and it could be set via cpuidle_suspend() called from the core
> > > next to cpufreq_suspend().
> >
> > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > instance, which in some cases is one driver per CPU. Doesn't that get
> > rather messy?
>
> Per-CPU variables are used for that, so it is quite straightforward.
>
> > Additionally, since find_deepest_state() is being called for
> > cpuidle_enter_s2idle() too, we would need to treat the new
> > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
>
> No, it already checks "disabled".

Yes, but that would be wrong.

The use case I want to support, for cpuidle-psci, is to allow all idle
states in suspend-to-idle, but prevent those that rely on runtime PM
(after it has been disabled) for the regular idle path.

>
> > Is this really what we want?
> >
> > >
> > > The other guys who rely on the cpuidle pausing today could be switched
> > > over to this new mechanism later and it would be possible to get rid
> > > of the pausing from the system suspend path completely.
> >
> > Avoiding to pause cpuidle when it's not needed makes perfect sense.
> > Although, it looks to me that we could also implement that on top of
> > $subject patch.
>
> Yes, it could.
>
> > Unless you insist on the CPUIDLE_STATE_DISABLED_ way, I would probably
> > explore an option to let a cpuidle driver to set a global cpuidle flag
> > during ->probe(). Depending if this flag is set, we can simply skip
> > calling cpuidle_pause() during system suspend.
> >
> > What do you think?
>
> Well, which driver in particular is in question here?

Honestly, I don't know. It has not been my goal to entirely prevent
calling cpuidle_pause().

In any case, it was introduced in the below commit, perhaps that can
give us a hint when this is still needed?

commit 8651f97bd951d0bb1c10fa24e3fa3455193f3548
Author: Preeti U Murthy <[email protected]>
Date: Mon Jul 9 10:12:56 2012 +0200
PM / cpuidle: System resume hang fix with cpuidle

Kind regards
Uffe

2021-10-21 15:11:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
>
> On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > counting of its PM domain.
> > > > >
> > > > > This works fine most of the time, but during system suspend in the
> > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > >
> > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > >
> > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > ---
> > > > > drivers/base/power/main.c | 6 ++----
> > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > --- a/drivers/base/power/main.c
> > > > > +++ b/drivers/base/power/main.c
> > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > >
> > > > > resume_device_irqs();
> > > > > device_wakeup_disarm_wake_irqs();
> > > > > -
> > > > > - cpuidle_resume();
> > > > > }
> > > > >
> > > > > /**
> > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > }
> > > > > mutex_unlock(&dpm_list_mtx);
> > > > > async_synchronize_full();
> > > > > + cpuidle_resume();
> > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > }
> > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > {
> > > > > int ret;
> > > > >
> > > > > - cpuidle_pause();
> > > > > -
> > > > > device_wakeup_arm_wake_irqs();
> > > > > suspend_device_irqs();
> > > > >
> > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > int error = 0;
> > > > >
> > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > + cpuidle_pause();
> > > > > mutex_lock(&dpm_list_mtx);
> > > > > pm_transition = state;
> > > > > async_error = 0;
> > > > > --
> > > >
> > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > that don't really need to pause cpuidle at all in the suspend path.
> > >
> > > Yes, I agree.
> > >
> > > Although, I am not really changing the behaviour in regards to this.
> > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > everybody today.
> >
> > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > potentially.
> >
> > That said, there are not too many users of suspend_late callbacks in
> > the tree, so it may not matter too much.
> >
> > > >
> > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > temporarily avoid idle states potentially affected by this issue. An
> > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > next to cpufreq_suspend().
> > >
> > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > rather messy?
> >
> > Per-CPU variables are used for that, so it is quite straightforward.
> >
> > > Additionally, since find_deepest_state() is being called for
> > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> >
> > No, it already checks "disabled".
>
> Yes, but that would be wrong.

Hmmm.

> The use case I want to support, for cpuidle-psci, is to allow all idle
> states in suspend-to-idle,

So does PM-runtime work in suspend-to-idle? How?

> but prevent those that rely on runtime PM
> (after it has been disabled) for the regular idle path.

Do you have a special suspend-to-idle handling of those states that
doesn't require PM-runtime?

> >
> > > Is this really what we want?
> > >
> > > >
> > > > The other guys who rely on the cpuidle pausing today could be switched
> > > > over to this new mechanism later and it would be possible to get rid
> > > > of the pausing from the system suspend path completely.
> > >
> > > Avoiding to pause cpuidle when it's not needed makes perfect sense.
> > > Although, it looks to me that we could also implement that on top of
> > > $subject patch.
> >
> > Yes, it could.
> >
> > > Unless you insist on the CPUIDLE_STATE_DISABLED_ way, I would probably
> > > explore an option to let a cpuidle driver to set a global cpuidle flag
> > > during ->probe(). Depending if this flag is set, we can simply skip
> > > calling cpuidle_pause() during system suspend.
> > >
> > > What do you think?
> >
> > Well, which driver in particular is in question here?
>
> Honestly, I don't know. It has not been my goal to entirely prevent
> calling cpuidle_pause().
>
> In any case, it was introduced in the below commit, perhaps that can
> give us a hint when this is still needed?
>
> commit 8651f97bd951d0bb1c10fa24e3fa3455193f3548
> Author: Preeti U Murthy <[email protected]>
> Date: Mon Jul 9 10:12:56 2012 +0200
> PM / cpuidle: System resume hang fix with cpuidle

Yes, I remember that.

2021-10-21 15:48:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 5:09 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > >
> > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > counting of its PM domain.
> > > > > >
> > > > > > This works fine most of the time, but during system suspend in the
> > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > >
> > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > >
> > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > ---
> > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > --- a/drivers/base/power/main.c
> > > > > > +++ b/drivers/base/power/main.c
> > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > >
> > > > > > resume_device_irqs();
> > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > -
> > > > > > - cpuidle_resume();
> > > > > > }
> > > > > >
> > > > > > /**
> > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > }
> > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > async_synchronize_full();
> > > > > > + cpuidle_resume();
> > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > }
> > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > {
> > > > > > int ret;
> > > > > >
> > > > > > - cpuidle_pause();
> > > > > > -
> > > > > > device_wakeup_arm_wake_irqs();
> > > > > > suspend_device_irqs();
> > > > > >
> > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > int error = 0;
> > > > > >
> > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > + cpuidle_pause();
> > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > pm_transition = state;
> > > > > > async_error = 0;
> > > > > > --
> > > > >
> > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > >
> > > > Yes, I agree.
> > > >
> > > > Although, I am not really changing the behaviour in regards to this.
> > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > everybody today.
> > >
> > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > potentially.
> > >
> > > That said, there are not too many users of suspend_late callbacks in
> > > the tree, so it may not matter too much.
> > >
> > > > >
> > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > next to cpufreq_suspend().
> > > >
> > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > rather messy?
> > >
> > > Per-CPU variables are used for that, so it is quite straightforward.
> > >
> > > > Additionally, since find_deepest_state() is being called for
> > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > >
> > > No, it already checks "disabled".
> >
> > Yes, but that would be wrong.
>
> Hmmm.
>
> > The use case I want to support, for cpuidle-psci, is to allow all idle
> > states in suspend-to-idle,
>
> So does PM-runtime work in suspend-to-idle? How?
>
> > but prevent those that rely on runtime PM
> > (after it has been disabled) for the regular idle path.
>
> Do you have a special suspend-to-idle handling of those states that
> doesn't require PM-runtime?

Regardless, pausing cpuidle in the suspend-to-idle path simply doesn't
make sense at all, so this needs to be taken care of in the first
place.

The problem with PM-runtime being unavailable after dpm_suspend()
needs to be addressed in a different way IMO, because it only affects
one specific use case.

2021-10-21 16:19:55

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, 21 Oct 2021 at 17:09, Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > >
> > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > counting of its PM domain.
> > > > > >
> > > > > > This works fine most of the time, but during system suspend in the
> > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > >
> > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > >
> > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > ---
> > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > --- a/drivers/base/power/main.c
> > > > > > +++ b/drivers/base/power/main.c
> > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > >
> > > > > > resume_device_irqs();
> > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > -
> > > > > > - cpuidle_resume();
> > > > > > }
> > > > > >
> > > > > > /**
> > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > }
> > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > async_synchronize_full();
> > > > > > + cpuidle_resume();
> > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > }
> > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > {
> > > > > > int ret;
> > > > > >
> > > > > > - cpuidle_pause();
> > > > > > -
> > > > > > device_wakeup_arm_wake_irqs();
> > > > > > suspend_device_irqs();
> > > > > >
> > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > int error = 0;
> > > > > >
> > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > + cpuidle_pause();
> > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > pm_transition = state;
> > > > > > async_error = 0;
> > > > > > --
> > > > >
> > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > >
> > > > Yes, I agree.
> > > >
> > > > Although, I am not really changing the behaviour in regards to this.
> > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > everybody today.
> > >
> > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > potentially.
> > >
> > > That said, there are not too many users of suspend_late callbacks in
> > > the tree, so it may not matter too much.
> > >
> > > > >
> > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > next to cpufreq_suspend().
> > > >
> > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > rather messy?
> > >
> > > Per-CPU variables are used for that, so it is quite straightforward.
> > >
> > > > Additionally, since find_deepest_state() is being called for
> > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > >
> > > No, it already checks "disabled".
> >
> > Yes, but that would be wrong.
>
> Hmmm.
>
> > The use case I want to support, for cpuidle-psci, is to allow all idle
> > states in suspend-to-idle,
>
> So does PM-runtime work in suspend-to-idle? How?

No it doesn't. See below.

>
> > but prevent those that rely on runtime PM
> > (after it has been disabled) for the regular idle path.
>
> Do you have a special suspend-to-idle handling of those states that
> doesn't require PM-runtime?

Yes. Feel free to have a look in __psci_enter_domain_idle_state().

In principle, when running the s2idle path, we call
dev_pm_genpd_suspend|resume(), rather than pm_runtime_get|put*.

This let genpd manage the reference counting (hierarchically too) and
it also ignores the genpd governor in this stage, which also is needed
to enter the deepest state. Quite similar to how cpuidle works.

[...]

Kind regards
Uffe

2021-10-21 16:33:04

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, 21 Oct 2021 at 17:46, Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 5:09 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > > >
> > > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > > counting of its PM domain.
> > > > > > >
> > > > > > > This works fine most of the time, but during system suspend in the
> > > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > > >
> > > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > > >
> > > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > > ---
> > > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > > --- a/drivers/base/power/main.c
> > > > > > > +++ b/drivers/base/power/main.c
> > > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > > >
> > > > > > > resume_device_irqs();
> > > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > > -
> > > > > > > - cpuidle_resume();
> > > > > > > }
> > > > > > >
> > > > > > > /**
> > > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > > }
> > > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > > async_synchronize_full();
> > > > > > > + cpuidle_resume();
> > > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > > }
> > > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > > {
> > > > > > > int ret;
> > > > > > >
> > > > > > > - cpuidle_pause();
> > > > > > > -
> > > > > > > device_wakeup_arm_wake_irqs();
> > > > > > > suspend_device_irqs();
> > > > > > >
> > > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > > int error = 0;
> > > > > > >
> > > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > > + cpuidle_pause();
> > > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > > pm_transition = state;
> > > > > > > async_error = 0;
> > > > > > > --
> > > > > >
> > > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > > >
> > > > > Yes, I agree.
> > > > >
> > > > > Although, I am not really changing the behaviour in regards to this.
> > > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > > everybody today.
> > > >
> > > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > > potentially.
> > > >
> > > > That said, there are not too many users of suspend_late callbacks in
> > > > the tree, so it may not matter too much.
> > > >
> > > > > >
> > > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > > next to cpufreq_suspend().
> > > > >
> > > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > > rather messy?
> > > >
> > > > Per-CPU variables are used for that, so it is quite straightforward.
> > > >
> > > > > Additionally, since find_deepest_state() is being called for
> > > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > > >
> > > > No, it already checks "disabled".
> > >
> > > Yes, but that would be wrong.
> >
> > Hmmm.
> >
> > > The use case I want to support, for cpuidle-psci, is to allow all idle
> > > states in suspend-to-idle,
> >
> > So does PM-runtime work in suspend-to-idle? How?
> >
> > > but prevent those that rely on runtime PM
> > > (after it has been disabled) for the regular idle path.
> >
> > Do you have a special suspend-to-idle handling of those states that
> > doesn't require PM-runtime?
>
> Regardless, pausing cpuidle in the suspend-to-idle path simply doesn't
> make sense at all, so this needs to be taken care of in the first
> place.

Right, I do agree, don't get me wrong. But, do we really want to treat
s2-to-idle differently, compared to s2-to-ram in regards to this?

Wouldn't it be a lot easier to let cpuidle drivers to opt-out for
cpuidle_pause|resume(), no matter whether it's for s2-to-idle or
s2-to-ram?

>
> The problem with PM-runtime being unavailable after dpm_suspend()
> needs to be addressed in a different way IMO, because it only affects
> one specific use case.

It's one specific case so far, but we have the riscv driver on its
way, which would suffer from the same problem.

Anyway, an option is to figure out what platforms and cpuidle drivers,
that really needs cpuidle_pause|resume() at this point and make an
opt-in solution instead. This could then be used by runtime PM based
cpuidle drivers as well. Would that be a way forward?

Kind regards
Uffe

2021-10-21 16:36:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
>
> On Thu, 21 Oct 2021 at 17:09, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > > >
> > > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > > counting of its PM domain.
> > > > > > >
> > > > > > > This works fine most of the time, but during system suspend in the
> > > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > > >
> > > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > > >
> > > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > > ---
> > > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > > --- a/drivers/base/power/main.c
> > > > > > > +++ b/drivers/base/power/main.c
> > > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > > >
> > > > > > > resume_device_irqs();
> > > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > > -
> > > > > > > - cpuidle_resume();
> > > > > > > }
> > > > > > >
> > > > > > > /**
> > > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > > }
> > > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > > async_synchronize_full();
> > > > > > > + cpuidle_resume();
> > > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > > }
> > > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > > {
> > > > > > > int ret;
> > > > > > >
> > > > > > > - cpuidle_pause();
> > > > > > > -
> > > > > > > device_wakeup_arm_wake_irqs();
> > > > > > > suspend_device_irqs();
> > > > > > >
> > > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > > int error = 0;
> > > > > > >
> > > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > > + cpuidle_pause();
> > > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > > pm_transition = state;
> > > > > > > async_error = 0;
> > > > > > > --
> > > > > >
> > > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > > >
> > > > > Yes, I agree.
> > > > >
> > > > > Although, I am not really changing the behaviour in regards to this.
> > > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > > everybody today.
> > > >
> > > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > > potentially.
> > > >
> > > > That said, there are not too many users of suspend_late callbacks in
> > > > the tree, so it may not matter too much.
> > > >
> > > > > >
> > > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > > next to cpufreq_suspend().
> > > > >
> > > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > > rather messy?
> > > >
> > > > Per-CPU variables are used for that, so it is quite straightforward.
> > > >
> > > > > Additionally, since find_deepest_state() is being called for
> > > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > > >
> > > > No, it already checks "disabled".
> > >
> > > Yes, but that would be wrong.
> >
> > Hmmm.
> >
> > > The use case I want to support, for cpuidle-psci, is to allow all idle
> > > states in suspend-to-idle,
> >
> > So does PM-runtime work in suspend-to-idle? How?
>
> No it doesn't. See below.
>
> >
> > > but prevent those that rely on runtime PM
> > > (after it has been disabled) for the regular idle path.
> >
> > Do you have a special suspend-to-idle handling of those states that
> > doesn't require PM-runtime?
>
> Yes. Feel free to have a look in __psci_enter_domain_idle_state().

So in theory you could check the pm_runtime_put_sync_suspend() return
value and fall back to something like WFI if that's an error code.

> In principle, when running the s2idle path, we call
> dev_pm_genpd_suspend|resume(), rather than pm_runtime_get|put*.
>
> This let genpd manage the reference counting (hierarchically too) and
> it also ignores the genpd governor in this stage, which also is needed
> to enter the deepest state. Quite similar to how cpuidle works.

OK

2021-10-21 16:45:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 6:29 PM Ulf Hansson <[email protected]> wrote:
>
> On Thu, 21 Oct 2021 at 17:46, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 5:09 PM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > > > >
> > > > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > > > counting of its PM domain.
> > > > > > > >
> > > > > > > > This works fine most of the time, but during system suspend in the
> > > > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > > > >
> > > > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > > > >
> > > > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > > > --- a/drivers/base/power/main.c
> > > > > > > > +++ b/drivers/base/power/main.c
> > > > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > > > >
> > > > > > > > resume_device_irqs();
> > > > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > > > -
> > > > > > > > - cpuidle_resume();
> > > > > > > > }
> > > > > > > >
> > > > > > > > /**
> > > > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > > > }
> > > > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > > > async_synchronize_full();
> > > > > > > > + cpuidle_resume();
> > > > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > > > }
> > > > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > > > {
> > > > > > > > int ret;
> > > > > > > >
> > > > > > > > - cpuidle_pause();
> > > > > > > > -
> > > > > > > > device_wakeup_arm_wake_irqs();
> > > > > > > > suspend_device_irqs();
> > > > > > > >
> > > > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > > > int error = 0;
> > > > > > > >
> > > > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > > > + cpuidle_pause();
> > > > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > > > pm_transition = state;
> > > > > > > > async_error = 0;
> > > > > > > > --
> > > > > > >
> > > > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > > > >
> > > > > > Yes, I agree.
> > > > > >
> > > > > > Although, I am not really changing the behaviour in regards to this.
> > > > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > > > everybody today.
> > > > >
> > > > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > > > potentially.
> > > > >
> > > > > That said, there are not too many users of suspend_late callbacks in
> > > > > the tree, so it may not matter too much.
> > > > >
> > > > > > >
> > > > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > > > next to cpufreq_suspend().
> > > > > >
> > > > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > > > rather messy?
> > > > >
> > > > > Per-CPU variables are used for that, so it is quite straightforward.
> > > > >
> > > > > > Additionally, since find_deepest_state() is being called for
> > > > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > > > >
> > > > > No, it already checks "disabled".
> > > >
> > > > Yes, but that would be wrong.
> > >
> > > Hmmm.
> > >
> > > > The use case I want to support, for cpuidle-psci, is to allow all idle
> > > > states in suspend-to-idle,
> > >
> > > So does PM-runtime work in suspend-to-idle? How?
> > >
> > > > but prevent those that rely on runtime PM
> > > > (after it has been disabled) for the regular idle path.
> > >
> > > Do you have a special suspend-to-idle handling of those states that
> > > doesn't require PM-runtime?
> >
> > Regardless, pausing cpuidle in the suspend-to-idle path simply doesn't
> > make sense at all, so this needs to be taken care of in the first
> > place.
>
> Right, I do agree, don't get me wrong. But, do we really want to treat
> s2-to-idle differently, compared to s2-to-ram in regards to this?
>
> Wouldn't it be a lot easier to let cpuidle drivers to opt-out for
> cpuidle_pause|resume(), no matter whether it's for s2-to-idle or
> s2-to-ram?

I don't think so.

Suspend-to-idle resume cpuidle after pausing it which is just plain
confusing and waste of energy and the fact that the system-wide
suspend flow interferes with using PM-runtime for implementing cpuidle
callbacks at the low level really is an orthogonal problem.

> >
> > The problem with PM-runtime being unavailable after dpm_suspend()
> > needs to be addressed in a different way IMO, because it only affects
> > one specific use case.
>
> It's one specific case so far, but we have the riscv driver on its
> way, which would suffer from the same problem.

So perhaps they should be advised about this issue.

> Anyway, an option is to figure out what platforms and cpuidle drivers,
> that really needs cpuidle_pause|resume() at this point and make an
> opt-in solution instead.

None of them need to pause cpuidle for suspend-to-idle AFAICS.

Some may want it in the non-s2idle suspend path, but I'm not sure
about the exact point where cpuidle needs to be paused in this case.
Possibly before offlining the nonboot CPUs.

> This could then be used by runtime PM based
> cpuidle drivers as well. Would that be a way forward?

The PM-runtime case should be addressed directly IMO, we only need to
figure out how to do that.

I'm wondering how you are dealing with the case when user space
prevents pd_dev from suspending via sysfs, for that matter.

2021-10-21 17:08:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 6:41 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 6:29 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 17:46, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 5:09 PM Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > > > > counting of its PM domain.
> > > > > > > > >
> > > > > > > > > This works fine most of the time, but during system suspend in the
> > > > > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > > > > >
> > > > > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > > > > >
> > > > > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > > > > ---
> > > > > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > > > > --- a/drivers/base/power/main.c
> > > > > > > > > +++ b/drivers/base/power/main.c
> > > > > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > > > > >
> > > > > > > > > resume_device_irqs();
> > > > > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > > > > -
> > > > > > > > > - cpuidle_resume();
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > /**
> > > > > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > > > > }
> > > > > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > > > > async_synchronize_full();
> > > > > > > > > + cpuidle_resume();
> > > > > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > > > > }
> > > > > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > > > > {
> > > > > > > > > int ret;
> > > > > > > > >
> > > > > > > > > - cpuidle_pause();
> > > > > > > > > -
> > > > > > > > > device_wakeup_arm_wake_irqs();
> > > > > > > > > suspend_device_irqs();
> > > > > > > > >
> > > > > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > > > > int error = 0;
> > > > > > > > >
> > > > > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > > > > + cpuidle_pause();
> > > > > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > > > > pm_transition = state;
> > > > > > > > > async_error = 0;
> > > > > > > > > --
> > > > > > > >
> > > > > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > > > > >
> > > > > > > Yes, I agree.
> > > > > > >
> > > > > > > Although, I am not really changing the behaviour in regards to this.
> > > > > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > > > > everybody today.
> > > > > >
> > > > > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > > > > potentially.
> > > > > >
> > > > > > That said, there are not too many users of suspend_late callbacks in
> > > > > > the tree, so it may not matter too much.
> > > > > >
> > > > > > > >
> > > > > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > > > > next to cpufreq_suspend().
> > > > > > >
> > > > > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > > > > rather messy?
> > > > > >
> > > > > > Per-CPU variables are used for that, so it is quite straightforward.
> > > > > >
> > > > > > > Additionally, since find_deepest_state() is being called for
> > > > > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > > > > >
> > > > > > No, it already checks "disabled".
> > > > >
> > > > > Yes, but that would be wrong.
> > > >
> > > > Hmmm.
> > > >
> > > > > The use case I want to support, for cpuidle-psci, is to allow all idle
> > > > > states in suspend-to-idle,
> > > >
> > > > So does PM-runtime work in suspend-to-idle? How?
> > > >
> > > > > but prevent those that rely on runtime PM
> > > > > (after it has been disabled) for the regular idle path.
> > > >
> > > > Do you have a special suspend-to-idle handling of those states that
> > > > doesn't require PM-runtime?
> > >
> > > Regardless, pausing cpuidle in the suspend-to-idle path simply doesn't
> > > make sense at all, so this needs to be taken care of in the first
> > > place.
> >
> > Right, I do agree, don't get me wrong. But, do we really want to treat
> > s2-to-idle differently, compared to s2-to-ram in regards to this?
> >
> > Wouldn't it be a lot easier to let cpuidle drivers to opt-out for
> > cpuidle_pause|resume(), no matter whether it's for s2-to-idle or
> > s2-to-ram?
>
> I don't think so.
>
> Suspend-to-idle resume cpuidle after pausing it which is just plain
> confusing and waste of energy and the fact that the system-wide
> suspend flow interferes with using PM-runtime for implementing cpuidle
> callbacks at the low level really is an orthogonal problem.
>
> > >
> > > The problem with PM-runtime being unavailable after dpm_suspend()
> > > needs to be addressed in a different way IMO, because it only affects
> > > one specific use case.
> >
> > It's one specific case so far, but we have the riscv driver on its
> > way, which would suffer from the same problem.
>
> So perhaps they should be advised about this issue.
>
> > Anyway, an option is to figure out what platforms and cpuidle drivers,
> > that really needs cpuidle_pause|resume() at this point and make an
> > opt-in solution instead.
>
> None of them need to pause cpuidle for suspend-to-idle AFAICS.
>
> Some may want it in the non-s2idle suspend path, but I'm not sure
> about the exact point where cpuidle needs to be paused in this case.
> Possibly before offlining the nonboot CPUs.
>
> > This could then be used by runtime PM based
> > cpuidle drivers as well. Would that be a way forward?
>
> The PM-runtime case should be addressed directly IMO, we only need to
> figure out how to do that.
>
> I'm wondering how you are dealing with the case when user space
> prevents pd_dev from suspending via sysfs, for that matter.

Or what happens if rpm_suspend() returns -EAGAIN, because someone has
started to resume the device right after its reference counter went
down to 0.

It looks to me like the problem is there regardless of the whole
interference with system suspend.

2021-10-21 18:15:20

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 17:09, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 4:05 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Thu, 21 Oct 2021 at 15:45, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Thu, Oct 21, 2021 at 1:49 PM Ulf Hansson <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <[email protected]> wrote:
> > > > > > > >
> > > > > > > > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > > > > > > > domain (genpd), may be used when entering/exiting an idlestate. More
> > > > > > > > precisely, genpd relies on runtime PM to be enabled for the attached device
> > > > > > > > (in this case it belongs to a CPU), to properly manage the reference
> > > > > > > > counting of its PM domain.
> > > > > > > >
> > > > > > > > This works fine most of the time, but during system suspend in the
> > > > > > > > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > > > > > > > Beyond this point and until runtime PM becomes re-enabled in the
> > > > > > > > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> > > > > > > >
> > > > > > > > To make sure the reference counting in genpd becomes correct, we need to
> > > > > > > > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > > > > > > > the device. Therefore, let's move the call to cpuidle_pause() from
> > > > > > > > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > > > > > > > dpm_resume_noirq() into dpm_resume_early().
> > > > > > > >
> > > > > > > > Diagnosed-by: Maulik Shah <[email protected]>
> > > > > > > > Suggested-by: Maulik Shah <[email protected]>
> > > > > > > > Signed-off-by: Ulf Hansson <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/base/power/main.c | 6 ++----
> > > > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > > > > index cbea78e79f3d..1c753b651272 100644
> > > > > > > > --- a/drivers/base/power/main.c
> > > > > > > > +++ b/drivers/base/power/main.c
> > > > > > > > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> > > > > > > >
> > > > > > > > resume_device_irqs();
> > > > > > > > device_wakeup_disarm_wake_irqs();
> > > > > > > > -
> > > > > > > > - cpuidle_resume();
> > > > > > > > }
> > > > > > > >
> > > > > > > > /**
> > > > > > > > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > > > > > > > }
> > > > > > > > mutex_unlock(&dpm_list_mtx);
> > > > > > > > async_synchronize_full();
> > > > > > > > + cpuidle_resume();
> > > > > > > > dpm_show_time(starttime, state, 0, "early");
> > > > > > > > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > > > > > > > }
> > > > > > > > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > > > > > > > {
> > > > > > > > int ret;
> > > > > > > >
> > > > > > > > - cpuidle_pause();
> > > > > > > > -
> > > > > > > > device_wakeup_arm_wake_irqs();
> > > > > > > > suspend_device_irqs();
> > > > > > > >
> > > > > > > > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > > > > > > > int error = 0;
> > > > > > > >
> > > > > > > > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > > > > > > > + cpuidle_pause();
> > > > > > > > mutex_lock(&dpm_list_mtx);
> > > > > > > > pm_transition = state;
> > > > > > > > async_error = 0;
> > > > > > > > --
> > > > > > >
> > > > > > > Well, this is somewhat heavy-handed and it affects even the systems
> > > > > > > that don't really need to pause cpuidle at all in the suspend path.
> > > > > >
> > > > > > Yes, I agree.
> > > > > >
> > > > > > Although, I am not really changing the behaviour in regards to this.
> > > > > > cpuidle_pause() is already being called in dpm_suspend_noirq(), for
> > > > > > everybody today.
> > > > >
> > > > > Yes, it is, but pausing it earlier will cause more energy to be spent,
> > > > > potentially.
> > > > >
> > > > > That said, there are not too many users of suspend_late callbacks in
> > > > > the tree, so it may not matter too much.
> > > > >
> > > > > > >
> > > > > > > Also, IIUC you don't need to pause cpuidle completely, but make it
> > > > > > > temporarily avoid idle states potentially affected by this issue. An
> > > > > > > additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> > > > > > > suppose and it could be set via cpuidle_suspend() called from the core
> > > > > > > next to cpufreq_suspend().
> > > > > >
> > > > > > cpuidle_suspend() would then need to go and fetch the cpuidle driver
> > > > > > instance, which in some cases is one driver per CPU. Doesn't that get
> > > > > > rather messy?
> > > > >
> > > > > Per-CPU variables are used for that, so it is quite straightforward.
> > > > >
> > > > > > Additionally, since find_deepest_state() is being called for
> > > > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > > > >
> > > > > No, it already checks "disabled".
> > > >
> > > > Yes, but that would be wrong.
> > >
> > > Hmmm.
> > >
> > > > The use case I want to support, for cpuidle-psci, is to allow all idle
> > > > states in suspend-to-idle,
> > >
> > > So does PM-runtime work in suspend-to-idle? How?
> >
> > No it doesn't. See below.
> >
> > >
> > > > but prevent those that rely on runtime PM
> > > > (after it has been disabled) for the regular idle path.
> > >
> > > Do you have a special suspend-to-idle handling of those states that
> > > doesn't require PM-runtime?
> >
> > Yes. Feel free to have a look in __psci_enter_domain_idle_state().
>
> So in theory you could check the pm_runtime_put_sync_suspend() return
> value and fall back to something like WFI if that's an error code.

I have already tried that, but it simply got too complicated. The main
issue was that runtime PM could become disabled for the device in the
middle of executing the ->enter() callback.

For example, if pm_runtime_get_sync() fails, I still need to make sure
the reference counting in genpd becomes correct - and I can't do that
using dev_pm_genpd_resume(). That's because it's not designed to be
called in this "unknown" suspend phase, but should be called after the
noirq phase and be properly balanced with dev_pm_genpd_suspend().

In other words, the error path didn't work out for me.

[...]

Kind regards
Uffe

2021-10-21 18:38:44

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

[...]

> > > > > >
> > > > > > > Additionally, since find_deepest_state() is being called for
> > > > > > > cpuidle_enter_s2idle() too, we would need to treat the new
> > > > > > > CPUIDLE_STATE_DISABLED_ flag in a special way, right?
> > > > > >
> > > > > > No, it already checks "disabled".
> > > > >
> > > > > Yes, but that would be wrong.
> > > >
> > > > Hmmm.
> > > >
> > > > > The use case I want to support, for cpuidle-psci, is to allow all idle
> > > > > states in suspend-to-idle,
> > > >
> > > > So does PM-runtime work in suspend-to-idle? How?
> > > >
> > > > > but prevent those that rely on runtime PM
> > > > > (after it has been disabled) for the regular idle path.
> > > >
> > > > Do you have a special suspend-to-idle handling of those states that
> > > > doesn't require PM-runtime?
> > >
> > > Regardless, pausing cpuidle in the suspend-to-idle path simply doesn't
> > > make sense at all, so this needs to be taken care of in the first
> > > place.
> >
> > Right, I do agree, don't get me wrong. But, do we really want to treat
> > s2-to-idle differently, compared to s2-to-ram in regards to this?
> >
> > Wouldn't it be a lot easier to let cpuidle drivers to opt-out for
> > cpuidle_pause|resume(), no matter whether it's for s2-to-idle or
> > s2-to-ram?
>
> I don't think so.
>
> Suspend-to-idle resume cpuidle after pausing it which is just plain
> confusing and waste of energy and the fact that the system-wide
> suspend flow interferes with using PM-runtime for implementing cpuidle
> callbacks at the low level really is an orthogonal problem.

It's certainly an orthogonal problem, I agree. However, trying to
solve it in two different ways, may not really be worth the effort, in
my opinion.

As I kind of pointed out in the earlier reply, I am not sure there are
any other relatively easy solutions available, to fix the problem for
runtime PM based cpuidle drivers. We probably need to call
cpuidle_pause() (or similar) in some way.

>
> > >
> > > The problem with PM-runtime being unavailable after dpm_suspend()
> > > needs to be addressed in a different way IMO, because it only affects
> > > one specific use case.
> >
> > It's one specific case so far, but we have the riscv driver on its
> > way, which would suffer from the same problem.
>
> So perhaps they should be advised about this issue.

Yes, I will let them know - and hopefully I will soon also be able to
provide them with a fix. :-)

>
> > Anyway, an option is to figure out what platforms and cpuidle drivers,
> > that really needs cpuidle_pause|resume() at this point and make an
> > opt-in solution instead.
>
> None of them need to pause cpuidle for suspend-to-idle AFAICS.

I assume so too, otherwise things would have been broken when
cpuidle_resume() is called in s2idle_enter(). But, it's still a bit
unclear.

>
> Some may want it in the non-s2idle suspend path, but I'm not sure
> about the exact point where cpuidle needs to be paused in this case.
> Possibly before offlining the nonboot CPUs.

Okay.

Note that, I assume it would be okay to also pause cpuidle a bit
earlier in these cases, like in dpm_suspend() for example. The point
is, it's really a limited short period of time for when cpuidle would
be paused, so I doubt it would have any impact on the consumed energy.
Right?

>
> > This could then be used by runtime PM based
> > cpuidle drivers as well. Would that be a way forward?
>
> The PM-runtime case should be addressed directly IMO, we only need to
> figure out how to do that.

If you have any other suggestions, I am listening. :-)

>
> I'm wondering how you are dealing with the case when user space
> prevents pd_dev from suspending via sysfs, for that matter.

That should work fine during runtime - because runtime PM is enabled
for the device.

Kind regards
Uffe

2021-10-21 18:52:03

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

[...]

> > The PM-runtime case should be addressed directly IMO, we only need to
> > figure out how to do that.
> >
> > I'm wondering how you are dealing with the case when user space
> > prevents pd_dev from suspending via sysfs, for that matter.
>
> Or what happens if rpm_suspend() returns -EAGAIN, because someone has
> started to resume the device right after its reference counter went
> down to 0.

That would mean that the pm_runtime_put_sync() call fails to runtime
suspend the device. In other words, the corresponding genpd stays
powered on, which prevents idle states from being selected by the
genpd governor.

So, yes, this should work fine.

>
> It looks to me like the problem is there regardless of the whole
> interference with system suspend.

I don't think so, but I may be overlooking some points.

Kind regards
Uffe

2021-10-21 19:05:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
>
> On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:

[cut]

> > So in theory you could check the pm_runtime_put_sync_suspend() return
> > value and fall back to something like WFI if that's an error code.
>
> I have already tried that, but it simply got too complicated. The main
> issue was that runtime PM could become disabled for the device in the
> middle of executing the ->enter() callback.

So IIUC the problem is that you cannot resume after suspending in that case.

IOW, you need to guarantee that if the suspend is successful, the
resume also will take place, but if the suspend fails, you basically
don't care.

> For example, if pm_runtime_get_sync() fails, I still need to make sure
> the reference counting in genpd becomes correct - and I can't do that
> using dev_pm_genpd_resume(). That's because it's not designed to be
> called in this "unknown" suspend phase, but should be called after the
> noirq phase and be properly balanced with dev_pm_genpd_suspend().
>
> In other words, the error path didn't work out for me.

It should be sufficient to call wake_up_all_idle_cpus() in the suspend
path before dpm_suspend_late(), because system suspend acquires a
PM-runtime reference on every device. IOW, it won't let any devices
runtime-suspend, so if your power domain devices are resumed in that
path, they will never suspend again in it and the
pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
becomes a reference counter management call which works regardless of
whether or not PM runtime is disabled.

2021-10-21 19:59:27

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, 21 Oct 2021 at 21:02, Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
>
> [cut]
>
> > > So in theory you could check the pm_runtime_put_sync_suspend() return
> > > value and fall back to something like WFI if that's an error code.
> >
> > I have already tried that, but it simply got too complicated. The main
> > issue was that runtime PM could become disabled for the device in the
> > middle of executing the ->enter() callback.
>
> So IIUC the problem is that you cannot resume after suspending in that case.
>
> IOW, you need to guarantee that if the suspend is successful, the
> resume also will take place, but if the suspend fails, you basically
> don't care.

Exactly.

>
> > For example, if pm_runtime_get_sync() fails, I still need to make sure
> > the reference counting in genpd becomes correct - and I can't do that
> > using dev_pm_genpd_resume(). That's because it's not designed to be
> > called in this "unknown" suspend phase, but should be called after the
> > noirq phase and be properly balanced with dev_pm_genpd_suspend().
> >
> > In other words, the error path didn't work out for me.
>
> It should be sufficient to call wake_up_all_idle_cpus() in the suspend
> path before dpm_suspend_late(), because system suspend acquires a
> PM-runtime reference on every device. IOW, it won't let any devices
> runtime-suspend, so if your power domain devices are resumed in that
> path, they will never suspend again in it and the
> pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
> becomes a reference counter management call which works regardless of
> whether or not PM runtime is disabled.

That sounds like a great idea, this should work too! Then the question
is, how to make that call to wake_up_all_idle_cpus() to become
optional - or only invoked for the cpuidle drivers that need it.

In any case, I will try this out, thanks for the suggestion!

Kind regards
Uffe

2021-10-22 09:17:00

by Maulik Shah

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

Hi,

On 10/22/2021 1:26 AM, Ulf Hansson wrote:
> On Thu, 21 Oct 2021 at 21:02, Rafael J. Wysocki <[email protected]> wrote:
>>
>> On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
>>>
>>> On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
>>>>
>>>> On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
>>
>> [cut]
>>
>>>> So in theory you could check the pm_runtime_put_sync_suspend() return
>>>> value and fall back to something like WFI if that's an error code.
>>>
>>> I have already tried that, but it simply got too complicated. The main
>>> issue was that runtime PM could become disabled for the device in the
>>> middle of executing the ->enter() callback.
>>
>> So IIUC the problem is that you cannot resume after suspending in that case.
>>
>> IOW, you need to guarantee that if the suspend is successful, the
>> resume also will take place, but if the suspend fails, you basically
>> don't care.
>
> Exactly.
>
>>
>>> For example, if pm_runtime_get_sync() fails, I still need to make sure
>>> the reference counting in genpd becomes correct - and I can't do that
>>> using dev_pm_genpd_resume(). That's because it's not designed to be
>>> called in this "unknown" suspend phase, but should be called after the
>>> noirq phase and be properly balanced with dev_pm_genpd_suspend().
>>>
>>> In other words, the error path didn't work out for me.
>>
>> It should be sufficient to call wake_up_all_idle_cpus() in the suspend
>> path before dpm_suspend_late(), because system suspend acquires a
>> PM-runtime reference on every device. IOW, it won't let any devices
>> runtime-suspend, so if your power domain devices are resumed in that
>> path, they will never suspend again in it and the
>> pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
>> becomes a reference counter management call which works regardless of
>> whether or not PM runtime is disabled.
>
> That sounds like a great idea, this should work too! Then the question
> is, how to make that call to wake_up_all_idle_cpus() to become
> optional - or only invoked for the cpuidle drivers that need it.
>
> In any case, I will try this out, thanks for the suggestion!
>
> Kind regards
> Uffe

This may not work given that CPUs may re-enter idle after call to
wake_up_all_idle_cpus() is finished.

say a case where,

1. wake_up_all_idle_cpus() is called first then
2. __device_suspend_late() does __pm_runtime_disable() for all CPU devices.

inbetween 1 and 2, CPUs may have entered idle again and would have done
pm_runtime_put_sync_suspend() while entering idle from cpuidle-psci.

I was thinking if PM-QoS may be of help here, before runtime PM is
disabled, if QoS is taken with less value (say 1us) then all CPUs will
wake up to serve the QoS and then stay in simple WFI() due to QoS and
won't do any runtime put/get, which serve similar to cpuidle_pause() +
default idle call.

if QoS can be somehow taken from cpuidle-psci driver registering for
suspend ops? it won't effect other archs/idle drivers.

Thanks,
Maulik
>

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of Code Aurora Forum, hosted by The Linux Foundation

2021-10-22 10:21:18

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Thu, 21 Oct 2021 at 21:56, Ulf Hansson <[email protected]> wrote:
>
> On Thu, 21 Oct 2021 at 21:02, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
> >
> > [cut]
> >
> > > > So in theory you could check the pm_runtime_put_sync_suspend() return
> > > > value and fall back to something like WFI if that's an error code.
> > >
> > > I have already tried that, but it simply got too complicated. The main
> > > issue was that runtime PM could become disabled for the device in the
> > > middle of executing the ->enter() callback.
> >
> > So IIUC the problem is that you cannot resume after suspending in that case.
> >
> > IOW, you need to guarantee that if the suspend is successful, the
> > resume also will take place, but if the suspend fails, you basically
> > don't care.
>
> Exactly.
>
> >
> > > For example, if pm_runtime_get_sync() fails, I still need to make sure
> > > the reference counting in genpd becomes correct - and I can't do that
> > > using dev_pm_genpd_resume(). That's because it's not designed to be
> > > called in this "unknown" suspend phase, but should be called after the
> > > noirq phase and be properly balanced with dev_pm_genpd_suspend().
> > >
> > > In other words, the error path didn't work out for me.
> >
> > It should be sufficient to call wake_up_all_idle_cpus() in the suspend
> > path before dpm_suspend_late(), because system suspend acquires a
> > PM-runtime reference on every device. IOW, it won't let any devices
> > runtime-suspend, so if your power domain devices are resumed in that
> > path, they will never suspend again in it and the
> > pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
> > becomes a reference counter management call which works regardless of
> > whether or not PM runtime is disabled.
>
> That sounds like a great idea, this should work too! Then the question
> is, how to make that call to wake_up_all_idle_cpus() to become
> optional - or only invoked for the cpuidle drivers that need it.
>
> In any case, I will try this out, thanks for the suggestion!

I now recall that I have already tried this, but unfortunately it doesn't work.

The problem is that the dev->power.syscore flag is set for the device,
which makes device_prepare() to bail out early and skip calling
pm_runtime_get_noresume().

Kind regards
Uffe

2021-10-22 12:07:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Fri, Oct 22, 2021 at 12:18 PM Ulf Hansson <[email protected]> wrote:
>
> On Thu, 21 Oct 2021 at 21:56, Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 21:02, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > [cut]
> > >
> > > > > So in theory you could check the pm_runtime_put_sync_suspend() return
> > > > > value and fall back to something like WFI if that's an error code.
> > > >
> > > > I have already tried that, but it simply got too complicated. The main
> > > > issue was that runtime PM could become disabled for the device in the
> > > > middle of executing the ->enter() callback.
> > >
> > > So IIUC the problem is that you cannot resume after suspending in that case.
> > >
> > > IOW, you need to guarantee that if the suspend is successful, the
> > > resume also will take place, but if the suspend fails, you basically
> > > don't care.
> >
> > Exactly.
> >
> > >
> > > > For example, if pm_runtime_get_sync() fails, I still need to make sure
> > > > the reference counting in genpd becomes correct - and I can't do that
> > > > using dev_pm_genpd_resume(). That's because it's not designed to be
> > > > called in this "unknown" suspend phase, but should be called after the
> > > > noirq phase and be properly balanced with dev_pm_genpd_suspend().
> > > >
> > > > In other words, the error path didn't work out for me.
> > >
> > > It should be sufficient to call wake_up_all_idle_cpus() in the suspend
> > > path before dpm_suspend_late(), because system suspend acquires a
> > > PM-runtime reference on every device. IOW, it won't let any devices
> > > runtime-suspend, so if your power domain devices are resumed in that
> > > path, they will never suspend again in it and the
> > > pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
> > > becomes a reference counter management call which works regardless of
> > > whether or not PM runtime is disabled.
> >
> > That sounds like a great idea, this should work too! Then the question
> > is, how to make that call to wake_up_all_idle_cpus() to become
> > optional - or only invoked for the cpuidle drivers that need it.

It need not be optional.

For suspend-to-idle it doesn't matter, because all CPUs will be woken
up from idle shortly anyway.

For other suspend variants this doesn't matter, because all secondary
CPUs will be taken offline shortly and the primary CPU will call into
the platform "sleep" handler.

> >
> > In any case, I will try this out, thanks for the suggestion!
>
> I now recall that I have already tried this, but unfortunately it doesn't work.
>
> The problem is that the dev->power.syscore flag is set for the device,
> which makes device_prepare() to bail out early and skip calling
> pm_runtime_get_noresume().

This needs to be fixed, then.

2021-10-22 12:59:46

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Fri, 22 Oct 2021 at 14:02, Rafael J. Wysocki <[email protected]> wrote:
>
> On Fri, Oct 22, 2021 at 12:18 PM Ulf Hansson <[email protected]> wrote:
> >
> > On Thu, 21 Oct 2021 at 21:56, Ulf Hansson <[email protected]> wrote:
> > >
> > > On Thu, 21 Oct 2021 at 21:02, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
> > > >
> > > > [cut]
> > > >
> > > > > > So in theory you could check the pm_runtime_put_sync_suspend() return
> > > > > > value and fall back to something like WFI if that's an error code.
> > > > >
> > > > > I have already tried that, but it simply got too complicated. The main
> > > > > issue was that runtime PM could become disabled for the device in the
> > > > > middle of executing the ->enter() callback.
> > > >
> > > > So IIUC the problem is that you cannot resume after suspending in that case.
> > > >
> > > > IOW, you need to guarantee that if the suspend is successful, the
> > > > resume also will take place, but if the suspend fails, you basically
> > > > don't care.
> > >
> > > Exactly.
> > >
> > > >
> > > > > For example, if pm_runtime_get_sync() fails, I still need to make sure
> > > > > the reference counting in genpd becomes correct - and I can't do that
> > > > > using dev_pm_genpd_resume(). That's because it's not designed to be
> > > > > called in this "unknown" suspend phase, but should be called after the
> > > > > noirq phase and be properly balanced with dev_pm_genpd_suspend().
> > > > >
> > > > > In other words, the error path didn't work out for me.
> > > >
> > > > It should be sufficient to call wake_up_all_idle_cpus() in the suspend
> > > > path before dpm_suspend_late(), because system suspend acquires a
> > > > PM-runtime reference on every device. IOW, it won't let any devices
> > > > runtime-suspend, so if your power domain devices are resumed in that
> > > > path, they will never suspend again in it and the
> > > > pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
> > > > becomes a reference counter management call which works regardless of
> > > > whether or not PM runtime is disabled.
> > >
> > > That sounds like a great idea, this should work too! Then the question
> > > is, how to make that call to wake_up_all_idle_cpus() to become
> > > optional - or only invoked for the cpuidle drivers that need it.
>
> It need not be optional.
>
> For suspend-to-idle it doesn't matter, because all CPUs will be woken
> up from idle shortly anyway.
>
> For other suspend variants this doesn't matter, because all secondary
> CPUs will be taken offline shortly and the primary CPU will call into
> the platform "sleep" handler.
>
> > >
> > > In any case, I will try this out, thanks for the suggestion!
> >
> > I now recall that I have already tried this, but unfortunately it doesn't work.
> >
> > The problem is that the dev->power.syscore flag is set for the device,
> > which makes device_prepare() to bail out early and skip calling
> > pm_runtime_get_noresume().
>
> This needs to be fixed, then.

So bumping the usage count even if the dev->power.syscore is set,
should be fine? (And of course dropping it in the complete phase).

I can work with that, let me try!

Kind regards
Uffe

2021-10-22 13:13:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

On Fri, Oct 22, 2021 at 2:57 PM Ulf Hansson <[email protected]> wrote:
>
> On Fri, 22 Oct 2021 at 14:02, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Fri, Oct 22, 2021 at 12:18 PM Ulf Hansson <[email protected]> wrote:
> > >
> > > On Thu, 21 Oct 2021 at 21:56, Ulf Hansson <[email protected]> wrote:
> > > >
> > > > On Thu, 21 Oct 2021 at 21:02, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > On Thu, Oct 21, 2021 at 8:12 PM Ulf Hansson <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, 21 Oct 2021 at 18:33, Rafael J. Wysocki <[email protected]> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 21, 2021 at 6:17 PM Ulf Hansson <[email protected]> wrote:
> > > > >
> > > > > [cut]
> > > > >
> > > > > > > So in theory you could check the pm_runtime_put_sync_suspend() return
> > > > > > > value and fall back to something like WFI if that's an error code.
> > > > > >
> > > > > > I have already tried that, but it simply got too complicated. The main
> > > > > > issue was that runtime PM could become disabled for the device in the
> > > > > > middle of executing the ->enter() callback.
> > > > >
> > > > > So IIUC the problem is that you cannot resume after suspending in that case.
> > > > >
> > > > > IOW, you need to guarantee that if the suspend is successful, the
> > > > > resume also will take place, but if the suspend fails, you basically
> > > > > don't care.
> > > >
> > > > Exactly.
> > > >
> > > > >
> > > > > > For example, if pm_runtime_get_sync() fails, I still need to make sure
> > > > > > the reference counting in genpd becomes correct - and I can't do that
> > > > > > using dev_pm_genpd_resume(). That's because it's not designed to be
> > > > > > called in this "unknown" suspend phase, but should be called after the
> > > > > > noirq phase and be properly balanced with dev_pm_genpd_suspend().
> > > > > >
> > > > > > In other words, the error path didn't work out for me.
> > > > >
> > > > > It should be sufficient to call wake_up_all_idle_cpus() in the suspend
> > > > > path before dpm_suspend_late(), because system suspend acquires a
> > > > > PM-runtime reference on every device. IOW, it won't let any devices
> > > > > runtime-suspend, so if your power domain devices are resumed in that
> > > > > path, they will never suspend again in it and the
> > > > > pm_runtime_put_sync_suspend() in __psci_enter_domain_idle_state()
> > > > > becomes a reference counter management call which works regardless of
> > > > > whether or not PM runtime is disabled.
> > > >
> > > > That sounds like a great idea, this should work too! Then the question
> > > > is, how to make that call to wake_up_all_idle_cpus() to become
> > > > optional - or only invoked for the cpuidle drivers that need it.
> >
> > It need not be optional.
> >
> > For suspend-to-idle it doesn't matter, because all CPUs will be woken
> > up from idle shortly anyway.
> >
> > For other suspend variants this doesn't matter, because all secondary
> > CPUs will be taken offline shortly and the primary CPU will call into
> > the platform "sleep" handler.
> >
> > > >
> > > > In any case, I will try this out, thanks for the suggestion!
> > >
> > > I now recall that I have already tried this, but unfortunately it doesn't work.
> > >
> > > The problem is that the dev->power.syscore flag is set for the device,
> > > which makes device_prepare() to bail out early and skip calling
> > > pm_runtime_get_noresume().
> >
> > This needs to be fixed, then.
>
> So bumping the usage count even if the dev->power.syscore is set,
> should be fine? (And of course dropping it in the complete phase).

Yes, please see
https://patchwork.kernel.org/project/linux-pm/patch/5773062.lOV4Wx5bFT@kreacher/

It should have been done this way from the outset, but I messed up a
merge (and said that it was "trivial" :-/).

> I can work with that, let me try!