2021-06-03 09:39:59

by Ulf Hansson

[permalink] [raw]
Subject: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

Various discussions on LKML have pointed out that many subsystem/drivers for
devices that may be attached to a genpd and which manages DVFS/OPP though the
genpd performance states, would need very similar updates.

More precisely, they would likely have to call dev_pm_opp_set_rate|opp() to
drop and restore OPPs (which propagates upwards into performance states votes
in genpd), every time their devices should enter/exit a low power state, via
their device PM callbacks.

Rather than having to add the boilerplate code for these things into the
subsystems/drivers, this series implements the logic internally into genpd.

Concerns have been raised about this approach, mostly by myself, around that it
limits flexibility. On the other hand, it starts to look like more and more
people are requesting this to be manged internally in genpd, for good reasons.
So, I think it's worth to give this a try.

In the long run, if it turns out that the flexibility was indeed needed, we can
always deal with that as special cases on top.

Test and reviews are of course greatly appreciated!

Kind regards
Ulf Hansson

Ulf Hansson (4):
PM: domains: Split code in dev_pm_genpd_set_performance_state()
PM: domains: Return early if perf state is already set for the device
PM: domains: Drop/restore performance state votes for devices at
runtime PM
PM: domains: Drop/restore performance state votes for devices at
system PM

drivers/base/power/domain.c | 70 +++++++++++++++++++++++++++++--------
include/linux/pm_domain.h | 2 ++
2 files changed, 58 insertions(+), 14 deletions(-)

--
2.25.1


2021-06-03 09:40:20

by Ulf Hansson

[permalink] [raw]
Subject: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

A subsystem/driver that need to manage OPPs for its device, should
typically drop its vote for the OPP when the device becomes runtime
suspended. In this way, the corresponding aggregation of the performance
state votes that is managed in genpd for the attached PM domain, may find
that the aggregated vote can be decreased. Hence, it may allow genpd to set
the lower performance state for the PM domain, thus avoiding to waste
energy.

To accomplish this, typically a subsystem/driver would need to call
dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
callback, to drop the vote for the OPP. Accordingly, it needs another call
to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
->runtime_resume() callback.

To avoid boilerplate code in subsystems/driver to deal with these things,
let's instead manage this internally in genpd.

Signed-off-by: Ulf Hansson <[email protected]>
---

Changes in v2:
- Rebased.
- A few minor cosmetic changes.
- Deal with the error path in genpd_runtime_resume().

---
drivers/base/power/domain.c | 27 +++++++++++++++++++++++++--
include/linux/pm_domain.h | 1 +
2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index ef25a5b18587..e5d97174c254 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -400,6 +400,23 @@ static int genpd_set_performance_state(struct device *dev, unsigned int state)
return ret;
}

+static int genpd_drop_performance_state(struct device *dev)
+{
+ unsigned int prev_state = dev_gpd_data(dev)->performance_state;
+
+ if (!genpd_set_performance_state(dev, 0))
+ return prev_state;
+
+ return 0;
+}
+
+static void genpd_restore_performance_state(struct device *dev,
+ unsigned int state)
+{
+ if (state)
+ genpd_set_performance_state(dev, state);
+}
+
/**
* dev_pm_genpd_set_performance_state- Set performance state of device's power
* domain.
@@ -842,7 +859,8 @@ static int genpd_runtime_suspend(struct device *dev)
{
struct generic_pm_domain *genpd;
bool (*suspend_ok)(struct device *__dev);
- struct gpd_timing_data *td = &dev_gpd_data(dev)->td;
+ struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev);
+ struct gpd_timing_data *td = &gpd_data->td;
bool runtime_pm = pm_runtime_enabled(dev);
ktime_t time_start;
s64 elapsed_ns;
@@ -899,6 +917,7 @@ static int genpd_runtime_suspend(struct device *dev)
return 0;

genpd_lock(genpd);
+ gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
genpd_power_off(genpd, true, 0);
genpd_unlock(genpd);

@@ -916,7 +935,8 @@ static int genpd_runtime_suspend(struct device *dev)
static int genpd_runtime_resume(struct device *dev)
{
struct generic_pm_domain *genpd;
- struct gpd_timing_data *td = &dev_gpd_data(dev)->td;
+ struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev);
+ struct gpd_timing_data *td = &gpd_data->td;
bool runtime_pm = pm_runtime_enabled(dev);
ktime_t time_start;
s64 elapsed_ns;
@@ -940,6 +960,8 @@ static int genpd_runtime_resume(struct device *dev)

genpd_lock(genpd);
ret = genpd_power_on(genpd, 0);
+ if (!ret)
+ genpd_restore_performance_state(dev, gpd_data->rpm_pstate);
genpd_unlock(genpd);

if (ret)
@@ -978,6 +1000,7 @@ static int genpd_runtime_resume(struct device *dev)
err_poweroff:
if (!pm_runtime_is_irq_safe(dev) || genpd_is_irq_safe(genpd)) {
genpd_lock(genpd);
+ gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
genpd_power_off(genpd, true, 0);
genpd_unlock(genpd);
}
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index dfcfbcecc34b..21a0577305ef 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -198,6 +198,7 @@ struct generic_pm_domain_data {
struct notifier_block *power_nb;
int cpu;
unsigned int performance_state;
+ unsigned int rpm_pstate;
ktime_t next_wakeup;
void *data;
};
--
2.25.1

2021-06-03 09:58:10

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On 03-06-21, 11:34, Ulf Hansson wrote:
> A subsystem/driver that need to manage OPPs for its device, should
> typically drop its vote for the OPP when the device becomes runtime
> suspended. In this way, the corresponding aggregation of the performance
> state votes that is managed in genpd for the attached PM domain, may find
> that the aggregated vote can be decreased. Hence, it may allow genpd to set
> the lower performance state for the PM domain, thus avoiding to waste
> energy.
>
> To accomplish this, typically a subsystem/driver would need to call
> dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
> callback, to drop the vote for the OPP. Accordingly, it needs another call
> to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
> ->runtime_resume() callback.
>
> To avoid boilerplate code in subsystems/driver to deal with these things,
> let's instead manage this internally in genpd.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v2:
> - Rebased.
> - A few minor cosmetic changes.
> - Deal with the error path in genpd_runtime_resume().
>
> ---
> drivers/base/power/domain.c | 27 +++++++++++++++++++++++++--
> include/linux/pm_domain.h | 1 +
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index ef25a5b18587..e5d97174c254 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -400,6 +400,23 @@ static int genpd_set_performance_state(struct device *dev, unsigned int state)
> return ret;
> }
>
> +static int genpd_drop_performance_state(struct device *dev)

What about passing the state pointer here? that will simplify the
callers to just a call.

> +{
> + unsigned int prev_state = dev_gpd_data(dev)->performance_state;
> +
> + if (!genpd_set_performance_state(dev, 0))
> + return prev_state;
> +
> + return 0;
> +}
> +
> +static void genpd_restore_performance_state(struct device *dev,
> + unsigned int state)
> +{
> + if (state)

I will skip this check, as we are checking it in
genpd_set_performance_state() anyway ?

> + genpd_set_performance_state(dev, state);
> +}
> +
> /**
> * dev_pm_genpd_set_performance_state- Set performance state of device's power
> * domain.
> @@ -842,7 +859,8 @@ static int genpd_runtime_suspend(struct device *dev)
> {
> struct generic_pm_domain *genpd;
> bool (*suspend_ok)(struct device *__dev);
> - struct gpd_timing_data *td = &dev_gpd_data(dev)->td;
> + struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev);
> + struct gpd_timing_data *td = &gpd_data->td;
> bool runtime_pm = pm_runtime_enabled(dev);
> ktime_t time_start;
> s64 elapsed_ns;
> @@ -899,6 +917,7 @@ static int genpd_runtime_suspend(struct device *dev)
> return 0;
>
> genpd_lock(genpd);
> + gpd_data->rpm_pstate = genpd_drop_performance_state(dev);

So this will become:

genpd_drop_performance_state(dev, &gpd_data->rpm_pstate);

and it can have return type of void.

--
viresh

2021-06-03 10:35:46

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On Thu, 3 Jun 2021 at 11:55, Viresh Kumar <[email protected]> wrote:
>
> On 03-06-21, 11:34, Ulf Hansson wrote:
> > A subsystem/driver that need to manage OPPs for its device, should
> > typically drop its vote for the OPP when the device becomes runtime
> > suspended. In this way, the corresponding aggregation of the performance
> > state votes that is managed in genpd for the attached PM domain, may find
> > that the aggregated vote can be decreased. Hence, it may allow genpd to set
> > the lower performance state for the PM domain, thus avoiding to waste
> > energy.
> >
> > To accomplish this, typically a subsystem/driver would need to call
> > dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
> > callback, to drop the vote for the OPP. Accordingly, it needs another call
> > to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
> > ->runtime_resume() callback.
> >
> > To avoid boilerplate code in subsystems/driver to deal with these things,
> > let's instead manage this internally in genpd.
> >
> > Signed-off-by: Ulf Hansson <[email protected]>
> > ---
> >
> > Changes in v2:
> > - Rebased.
> > - A few minor cosmetic changes.
> > - Deal with the error path in genpd_runtime_resume().
> >
> > ---
> > drivers/base/power/domain.c | 27 +++++++++++++++++++++++++--
> > include/linux/pm_domain.h | 1 +
> > 2 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index ef25a5b18587..e5d97174c254 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -400,6 +400,23 @@ static int genpd_set_performance_state(struct device *dev, unsigned int state)
> > return ret;
> > }
> >
> > +static int genpd_drop_performance_state(struct device *dev)
>
> What about passing the state pointer here? that will simplify the
> callers to just a call.

Not sure I get that. Can you elaborate a bit more?

>
> > +{
> > + unsigned int prev_state = dev_gpd_data(dev)->performance_state;
> > +
> > + if (!genpd_set_performance_state(dev, 0))
> > + return prev_state;
> > +
> > + return 0;
> > +}
> > +
> > +static void genpd_restore_performance_state(struct device *dev,
> > + unsigned int state)
> > +{
> > + if (state)
>
> I will skip this check, as we are checking it in
> genpd_set_performance_state() anyway ?

I don't want us to override OPP votes made by the subsystem/driver
level runtime PM callbacks. For example, if the drivers manage this
thing themselves, that should be preserved.

That said, by the check above I want to avoid setting the state to
zero internally by genpd, if the driver level ->runtime_resume()
callback has already restored the state.

>
> > + genpd_set_performance_state(dev, state);
> > +}
> > +
> > /**
> > * dev_pm_genpd_set_performance_state- Set performance state of device's power
> > * domain.
> > @@ -842,7 +859,8 @@ static int genpd_runtime_suspend(struct device *dev)
> > {
> > struct generic_pm_domain *genpd;
> > bool (*suspend_ok)(struct device *__dev);
> > - struct gpd_timing_data *td = &dev_gpd_data(dev)->td;
> > + struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev);
> > + struct gpd_timing_data *td = &gpd_data->td;
> > bool runtime_pm = pm_runtime_enabled(dev);
> > ktime_t time_start;
> > s64 elapsed_ns;
> > @@ -899,6 +917,7 @@ static int genpd_runtime_suspend(struct device *dev)
> > return 0;
> >
> > genpd_lock(genpd);
> > + gpd_data->rpm_pstate = genpd_drop_performance_state(dev);
>
> So this will become:
>
> genpd_drop_performance_state(dev, &gpd_data->rpm_pstate);
>
> and it can have return type of void.

See more above, about the reason why it looks like this. Hopefully
that explains it.

Kind regards
Uffe

2021-06-03 11:17:18

by Stephan Gerhold

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Thu, Jun 03, 2021 at 11:34:34AM +0200, Ulf Hansson wrote:
> Various discussions on LKML have pointed out that many subsystem/drivers for
> devices that may be attached to a genpd and which manages DVFS/OPP though the
> genpd performance states, would need very similar updates.
>
> More precisely, they would likely have to call dev_pm_opp_set_rate|opp() to
> drop and restore OPPs (which propagates upwards into performance states votes
> in genpd), every time their devices should enter/exit a low power state, via
> their device PM callbacks.
>
> Rather than having to add the boilerplate code for these things into the
> subsystems/drivers, this series implements the logic internally into genpd.
>
> Concerns have been raised about this approach, mostly by myself, around that it
> limits flexibility. On the other hand, it starts to look like more and more
> people are requesting this to be manged internally in genpd, for good reasons.
> So, I think it's worth to give this a try.
>
> In the long run, if it turns out that the flexibility was indeed needed, we can
> always deal with that as special cases on top.
>

Do I understand your patch set correctly that you basically make the
performance state votes conditional to the "power-on" vote of the device
(which is automatically toggled during runtime/system PM)?

If yes, I think that's a good thing. It was always really confusing to me
that a device can make performance state votes if it doesn't actually
want the power domain to be powered on.

What happens if a driver calls dev_pm_genpd_set_performance_state(...)
while the device is suspended? Will that mess up the performance state
when the device resumes?

I think this might also go into the direction of my problem with the OPP
core for CPU DVFS [1] since the OPP core currently does not "power-on"
the power domains, it just sets a performance state. I got kind of stuck
with all the complexity of power domains in Linux so I think we never
solved that.

Stephan

[1]: https://lore.kernel.org/linux-pm/[email protected]/

2021-06-03 11:20:08

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On Thu, 3 Jun 2021 at 12:31, Ulf Hansson <[email protected]> wrote:
>
> On Thu, 3 Jun 2021 at 11:55, Viresh Kumar <[email protected]> wrote:
> >
> > On 03-06-21, 11:34, Ulf Hansson wrote:
> > > A subsystem/driver that need to manage OPPs for its device, should
> > > typically drop its vote for the OPP when the device becomes runtime
> > > suspended. In this way, the corresponding aggregation of the performance
> > > state votes that is managed in genpd for the attached PM domain, may find
> > > that the aggregated vote can be decreased. Hence, it may allow genpd to set
> > > the lower performance state for the PM domain, thus avoiding to waste
> > > energy.
> > >
> > > To accomplish this, typically a subsystem/driver would need to call
> > > dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
> > > callback, to drop the vote for the OPP. Accordingly, it needs another call
> > > to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
> > > ->runtime_resume() callback.
> > >
> > > To avoid boilerplate code in subsystems/driver to deal with these things,
> > > let's instead manage this internally in genpd.
> > >
> > > Signed-off-by: Ulf Hansson <[email protected]>
> > > ---
> > >
> > > Changes in v2:
> > > - Rebased.
> > > - A few minor cosmetic changes.
> > > - Deal with the error path in genpd_runtime_resume().
> > >
> > > ---
> > > drivers/base/power/domain.c | 27 +++++++++++++++++++++++++--
> > > include/linux/pm_domain.h | 1 +
> > > 2 files changed, 26 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > index ef25a5b18587..e5d97174c254 100644
> > > --- a/drivers/base/power/domain.c
> > > +++ b/drivers/base/power/domain.c
> > > @@ -400,6 +400,23 @@ static int genpd_set_performance_state(struct device *dev, unsigned int state)
> > > return ret;
> > > }
> > >
> > > +static int genpd_drop_performance_state(struct device *dev)
> >
> > What about passing the state pointer here? that will simplify the
> > callers to just a call.
>
> Not sure I get that. Can you elaborate a bit more?
>
> >
> > > +{
> > > + unsigned int prev_state = dev_gpd_data(dev)->performance_state;
> > > +
> > > + if (!genpd_set_performance_state(dev, 0))
> > > + return prev_state;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static void genpd_restore_performance_state(struct device *dev,
> > > + unsigned int state)
> > > +{
> > > + if (state)
> >
> > I will skip this check, as we are checking it in
> > genpd_set_performance_state() anyway ?
>
> I don't want us to override OPP votes made by the subsystem/driver
> level runtime PM callbacks. For example, if the drivers manage this
> thing themselves, that should be preserved.
>
> That said, by the check above I want to avoid setting the state to
> zero internally by genpd, if the driver level ->runtime_resume()
> callback has already restored the state.

Ehh, forget about what I said about the ->runtime_resume() callback.

I am mostly trying to avoid restoring a state that is zero, just to be
sure nobody else on some different level outside gendp, have decided
to set a new OPP in-between our calls to
genpd_drop|restore_performance state.

[...]

Kind regards
Uffe

2021-06-03 15:31:49

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
>
> On Thu, Jun 03, 2021 at 11:34:34AM +0200, Ulf Hansson wrote:
> > Various discussions on LKML have pointed out that many subsystem/drivers for
> > devices that may be attached to a genpd and which manages DVFS/OPP though the
> > genpd performance states, would need very similar updates.
> >
> > More precisely, they would likely have to call dev_pm_opp_set_rate|opp() to
> > drop and restore OPPs (which propagates upwards into performance states votes
> > in genpd), every time their devices should enter/exit a low power state, via
> > their device PM callbacks.
> >
> > Rather than having to add the boilerplate code for these things into the
> > subsystems/drivers, this series implements the logic internally into genpd.
> >
> > Concerns have been raised about this approach, mostly by myself, around that it
> > limits flexibility. On the other hand, it starts to look like more and more
> > people are requesting this to be manged internally in genpd, for good reasons.
> > So, I think it's worth to give this a try.
> >
> > In the long run, if it turns out that the flexibility was indeed needed, we can
> > always deal with that as special cases on top.
> >
>
> Do I understand your patch set correctly that you basically make the
> performance state votes conditional to the "power-on" vote of the device
> (which is automatically toggled during runtime/system PM)?

The series can be considered as a step in that direction, but no, this
series doesn't change that behaviour.

Users of dev_pm_genpd_set_performance_state() are still free to set a
performance state, orthogonally to whether the PM domain is powered on
or off.

>
> If yes, I think that's a good thing. It was always really confusing to me
> that a device can make performance state votes if it doesn't actually
> want the power domain to be powered on.

I share your view, it's a bit confusing.

Just adding the condition internally to genpd to prevent the caller of
dev_pm_genpd_set_performance() from succeeding to set a new state,
unless the genpd is powered on, should be a rather simple thing to
add.

However, to change this, we first need to double check that all the
callers are making sure they have turned on the PM domain (typically
via runtime PM).

>
> What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> while the device is suspended? Will that mess up the performance state
> when the device resumes?

Good question. The idea is:

If genpd in genpd_runtime_suspend() are able to drop an existing vote
for a performance state, it should restore the vote in
genpd_runtime_resume(). This also means, if there is no vote to drop
in genpd_runtime_suspend(), genpd should just leave the vote as is in
genpd_runtime_resume().

When it comes to the system suspend/resume path, being implemented in
patch4, we should probably defer that patch from being merged. It
turned out that we probably need to think more about that approach.

>
> I think this might also go into the direction of my problem with the OPP
> core for CPU DVFS [1] since the OPP core currently does not "power-on"
> the power domains, it just sets a performance state. I got kind of stuck
> with all the complexity of power domains in Linux so I think we never
> solved that.

Hmm, that issue is in a way related.

Although, if I understand correctly, that was rather about at what
layer it makes best sense to activate the device (from runtime PM
point of view). And this was needed due to the fact that the
corresponding genpd provider, requires the PM domain to be power on to
allow changing a performance state for it. Did I get that correct?

>
> Stephan
>
> [1]: https://lore.kernel.org/linux-pm/[email protected]/

Kind regards
Uffe

2021-06-03 17:19:47

by Stephan Gerhold

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
> > I think this might also go into the direction of my problem with the OPP
> > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > the power domains, it just sets a performance state. I got kind of stuck
> > with all the complexity of power domains in Linux so I think we never
> > solved that.
>
> Hmm, that issue is in a way related.
>
> Although, if I understand correctly, that was rather about at what
> layer it makes best sense to activate the device (from runtime PM
> point of view). And this was needed due to the fact that the
> corresponding genpd provider, requires the PM domain to be power on to
> allow changing a performance state for it. Did I get that correct?
>

Yes, mostly. But I guess I keep coming back to the same question:

When/why does it make sense to vote for a "performance state" of
a power domain that is or might be powered off?

"Powered off" sounds like the absolutely lowest possible performance
state to me, it's just not on at all. And if suddenly a device comes and
says "I want performance state X", nothing can change until the power
domain is also "powered on".

I think my "CPU DVFS" problem only exists because in many other
situations it's possible to rely on one of the following side effects:

1. The genpd provider does not care if it's powered on or not.
(i.e. it's always-on or implicitly powers on if state > 0).
2. There is some other device that votes to keep the power domain on.

And that's how the problem relates to my comment for this patch series ...

>
> >
> > Do I understand your patch set correctly that you basically make the
> > performance state votes conditional to the "power-on" vote of the device
> > (which is automatically toggled during runtime/system PM)?
>
> The series can be considered as a step in that direction, but no, this
> series doesn't change that behaviour.
>
> Users of dev_pm_genpd_set_performance_state() are still free to set a
> performance state, orthogonally to whether the PM domain is powered on
> or off.
>
> >
> > If yes, I think that's a good thing. It was always really confusing to me
> > that a device can make performance state votes if it doesn't actually
> > want the power domain to be powered on.
>
> I share your view, it's a bit confusing.
>
> Just adding the condition internally to genpd to prevent the caller of
> dev_pm_genpd_set_performance() from succeeding to set a new state,
> unless the genpd is powered on, should be a rather simple thing to
> add.
>
> However, to change this, we first need to double check that all the
> callers are making sure they have turned on the PM domain (typically
> via runtime PM).
>

... because if performance state votes would be conditional to the
"power-on" vote of the device, it would no longer be possible
to rely on the side effects mentioned above. So this would most
certainly break some code that (incorrectly?) relies on these side
effects, but would also prevent such code.

My (personal) feeling so far is that just dropping performance votes
during runtime/system suspend just makes the entire situation even more
confusing.

> >
> > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > while the device is suspended? Will that mess up the performance state
> > when the device resumes?
>
> Good question. The idea is:
>
> If genpd in genpd_runtime_suspend() are able to drop an existing vote
> for a performance state, it should restore the vote in
> genpd_runtime_resume(). This also means, if there is no vote to drop
> in genpd_runtime_suspend(), genpd should just leave the vote as is in
> genpd_runtime_resume().
>

But the next time the device enters runtime suspend that vote would be
dropped, wouldn't it? That feels kind of strange to me.

Stephan

2021-06-03 19:06:12

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

03.06.2021 12:34, Ulf Hansson пишет:
> A subsystem/driver that need to manage OPPs for its device, should
> typically drop its vote for the OPP when the device becomes runtime
> suspended. In this way, the corresponding aggregation of the performance
> state votes that is managed in genpd for the attached PM domain, may find
> that the aggregated vote can be decreased. Hence, it may allow genpd to set
> the lower performance state for the PM domain, thus avoiding to waste
> energy.
>
> To accomplish this, typically a subsystem/driver would need to call
> dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
> callback, to drop the vote for the OPP. Accordingly, it needs another call
> to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
> ->runtime_resume() callback.
>
> To avoid boilerplate code in subsystems/driver to deal with these things,
> let's instead manage this internally in genpd.
>
> Signed-off-by: Ulf Hansson <[email protected]>
> ---
>
> Changes in v2:
> - Rebased.
> - A few minor cosmetic changes.
> - Deal with the error path in genpd_runtime_resume().

I tested this on NVIDIA Tegra by removing the boilerplate code from
drivers' RPM and haven't noticed any problems, the performance state is
dropped/restored as expected. Thank you.

2021-06-03 19:10:42

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

03.06.2021 22:02, Dmitry Osipenko пишет:
> 03.06.2021 12:34, Ulf Hansson пишет:
>> A subsystem/driver that need to manage OPPs for its device, should
>> typically drop its vote for the OPP when the device becomes runtime
>> suspended. In this way, the corresponding aggregation of the performance
>> state votes that is managed in genpd for the attached PM domain, may find
>> that the aggregated vote can be decreased. Hence, it may allow genpd to set
>> the lower performance state for the PM domain, thus avoiding to waste
>> energy.
>>
>> To accomplish this, typically a subsystem/driver would need to call
>> dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
>> callback, to drop the vote for the OPP. Accordingly, it needs another call
>> to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
>> ->runtime_resume() callback.
>>
>> To avoid boilerplate code in subsystems/driver to deal with these things,
>> let's instead manage this internally in genpd.
>>
>> Signed-off-by: Ulf Hansson <[email protected]>
>> ---
>>
>> Changes in v2:
>> - Rebased.
>> - A few minor cosmetic changes.
>> - Deal with the error path in genpd_runtime_resume().
>
> I tested this on NVIDIA Tegra by removing the boilerplate code from
> drivers' RPM and haven't noticed any problems, the performance state is
> dropped/restored as expected. Thank you.
>

Tested-by: Dmitry Osipenko <[email protected]>

2021-06-04 03:56:11

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On 03-06-21, 13:17, Ulf Hansson wrote:
> On Thu, 3 Jun 2021 at 12:31, Ulf Hansson <[email protected]> wrote:
> > > > +static int genpd_drop_performance_state(struct device *dev)
> > > > +{
> > > > + unsigned int prev_state = dev_gpd_data(dev)->performance_state;
> > > > +
> > > > + if (!genpd_set_performance_state(dev, 0))
> > > > + return prev_state;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void genpd_restore_performance_state(struct device *dev,
> > > > + unsigned int state)
> > > > +{
> > > > + if (state)
> > >
> > > I will skip this check, as we are checking it in
> > > genpd_set_performance_state() anyway ?
> >
> > I don't want us to override OPP votes made by the subsystem/driver
> > level runtime PM callbacks. For example, if the drivers manage this
> > thing themselves, that should be preserved.
> >
> > That said, by the check above I want to avoid setting the state to
> > zero internally by genpd, if the driver level ->runtime_resume()
> > callback has already restored the state.
>
> Ehh, forget about what I said about the ->runtime_resume() callback.
>
> I am mostly trying to avoid restoring a state that is zero, just to be
> sure nobody else on some different level outside gendp, have decided
> to set a new OPP in-between our calls to
> genpd_drop|restore_performance state.

What stops the core to call genpd_drop_performance_state() in the
first place here, if the driver was doing its own thing ? If that gets
called, then restore should be without any checks IMO. The state
should already be 0 at this point of time, I don't know why this will
get called again with state 0, but it will have no effect.

Can you give some sort of flow sequence where I can see the problem a
bit more clearly ?

--
viresh

2021-06-04 07:21:54

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Thu, 3 Jun 2021 at 19:16, Stephan Gerhold <[email protected]> wrote:
>
> On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> > On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
> > > I think this might also go into the direction of my problem with the OPP
> > > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > > the power domains, it just sets a performance state. I got kind of stuck
> > > with all the complexity of power domains in Linux so I think we never
> > > solved that.
> >
> > Hmm, that issue is in a way related.
> >
> > Although, if I understand correctly, that was rather about at what
> > layer it makes best sense to activate the device (from runtime PM
> > point of view). And this was needed due to the fact that the
> > corresponding genpd provider, requires the PM domain to be power on to
> > allow changing a performance state for it. Did I get that correct?
> >
>
> Yes, mostly. But I guess I keep coming back to the same question:
>
> When/why does it make sense to vote for a "performance state" of
> a power domain that is or might be powered off?
>
> "Powered off" sounds like the absolutely lowest possible performance
> state to me, it's just not on at all. And if suddenly a device comes and
> says "I want performance state X", nothing can change until the power
> domain is also "powered on".
>
> I think my "CPU DVFS" problem only exists because in many other
> situations it's possible to rely on one of the following side effects:
>
> 1. The genpd provider does not care if it's powered on or not.
> (i.e. it's always-on or implicitly powers on if state > 0).
> 2. There is some other device that votes to keep the power domain on.
>
> And that's how the problem relates to my comment for this patch series ...
>
> >
> > >
> > > Do I understand your patch set correctly that you basically make the
> > > performance state votes conditional to the "power-on" vote of the device
> > > (which is automatically toggled during runtime/system PM)?
> >
> > The series can be considered as a step in that direction, but no, this
> > series doesn't change that behaviour.
> >
> > Users of dev_pm_genpd_set_performance_state() are still free to set a
> > performance state, orthogonally to whether the PM domain is powered on
> > or off.
> >
> > >
> > > If yes, I think that's a good thing. It was always really confusing to me
> > > that a device can make performance state votes if it doesn't actually
> > > want the power domain to be powered on.
> >
> > I share your view, it's a bit confusing.
> >
> > Just adding the condition internally to genpd to prevent the caller of
> > dev_pm_genpd_set_performance() from succeeding to set a new state,
> > unless the genpd is powered on, should be a rather simple thing to
> > add.
> >
> > However, to change this, we first need to double check that all the
> > callers are making sure they have turned on the PM domain (typically
> > via runtime PM).
> >
>
> ... because if performance state votes would be conditional to the
> "power-on" vote of the device, it would no longer be possible
> to rely on the side effects mentioned above. So this would most
> certainly break some code that (incorrectly?) relies on these side
> effects, but would also prevent such code.

Right. I understand your point and I am open to discuss an
implementation. Although, I suggest we continue that separately from
the $subject series.

>
> My (personal) feeling so far is that just dropping performance votes
> during runtime/system suspend just makes the entire situation even more
> confusing.

Well, that's what most subsystems/drivers need to do.

Moreover, we have specific devices that only use one default OPP [1].

>
> > >
> > > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > > while the device is suspended? Will that mess up the performance state
> > > when the device resumes?
> >
> > Good question. The idea is:
> >
> > If genpd in genpd_runtime_suspend() are able to drop an existing vote
> > for a performance state, it should restore the vote in
> > genpd_runtime_resume(). This also means, if there is no vote to drop
> > in genpd_runtime_suspend(), genpd should just leave the vote as is in
> > genpd_runtime_resume().
> >
>
> But the next time the device enters runtime suspend that vote would be
> dropped, wouldn't it? That feels kind of strange to me.

What do you mean by "next time"?

My main point is, if the device enters runtime suspend state, why
should we keep the vote for an OPP for the device? I mean, the device
isn't going to be used anyway.

>
> Stephan

Kind regards
Uffe

[1]
https://patchwork.kernel.org/project/linux-pm/list/?series=489309

2021-06-04 07:23:39

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On Thu, 3 Jun 2021 at 21:08, Dmitry Osipenko <[email protected]> wrote:
>
> 03.06.2021 22:02, Dmitry Osipenko пишет:
> > 03.06.2021 12:34, Ulf Hansson пишет:
> >> A subsystem/driver that need to manage OPPs for its device, should
> >> typically drop its vote for the OPP when the device becomes runtime
> >> suspended. In this way, the corresponding aggregation of the performance
> >> state votes that is managed in genpd for the attached PM domain, may find
> >> that the aggregated vote can be decreased. Hence, it may allow genpd to set
> >> the lower performance state for the PM domain, thus avoiding to waste
> >> energy.
> >>
> >> To accomplish this, typically a subsystem/driver would need to call
> >> dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend()
> >> callback, to drop the vote for the OPP. Accordingly, it needs another call
> >> to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its
> >> ->runtime_resume() callback.
> >>
> >> To avoid boilerplate code in subsystems/driver to deal with these things,
> >> let's instead manage this internally in genpd.
> >>
> >> Signed-off-by: Ulf Hansson <[email protected]>
> >> ---
> >>
> >> Changes in v2:
> >> - Rebased.
> >> - A few minor cosmetic changes.
> >> - Deal with the error path in genpd_runtime_resume().
> >
> > I tested this on NVIDIA Tegra by removing the boilerplate code from
> > drivers' RPM and haven't noticed any problems, the performance state is
> > dropped/restored as expected. Thank you.
> >
>
> Tested-by: Dmitry Osipenko <[email protected]>

Thanks a lot, much appreciated!

Kind regards
Uffe

2021-06-04 07:48:50

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On Fri, 4 Jun 2021 at 05:53, Viresh Kumar <[email protected]> wrote:
>
> On 03-06-21, 13:17, Ulf Hansson wrote:
> > On Thu, 3 Jun 2021 at 12:31, Ulf Hansson <[email protected]> wrote:
> > > > > +static int genpd_drop_performance_state(struct device *dev)
> > > > > +{
> > > > > + unsigned int prev_state = dev_gpd_data(dev)->performance_state;
> > > > > +
> > > > > + if (!genpd_set_performance_state(dev, 0))
> > > > > + return prev_state;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static void genpd_restore_performance_state(struct device *dev,
> > > > > + unsigned int state)
> > > > > +{
> > > > > + if (state)
> > > >
> > > > I will skip this check, as we are checking it in
> > > > genpd_set_performance_state() anyway ?
> > >
> > > I don't want us to override OPP votes made by the subsystem/driver
> > > level runtime PM callbacks. For example, if the drivers manage this
> > > thing themselves, that should be preserved.
> > >
> > > That said, by the check above I want to avoid setting the state to
> > > zero internally by genpd, if the driver level ->runtime_resume()
> > > callback has already restored the state.
> >
> > Ehh, forget about what I said about the ->runtime_resume() callback.
> >
> > I am mostly trying to avoid restoring a state that is zero, just to be
> > sure nobody else on some different level outside gendp, have decided
> > to set a new OPP in-between our calls to
> > genpd_drop|restore_performance state.
>
> What stops the core to call genpd_drop_performance_state() in the
> first place here, if the driver was doing its own thing ? If that gets
> called, then restore should be without any checks IMO. The state
> should already be 0 at this point of time, I don't know why this will
> get called again with state 0, but it will have no effect.
>
> Can you give some sort of flow sequence where I can see the problem a
> bit more clearly ?

Starting calls from the subsystem/driver:

------
dev_pm_genpd_set_performance_state(dev, 100);
"run a use case with device runtime resumed"
...
"use case ends"
dev_pm_genpd_set_performance_state(dev, 0);
pm_runtime_put()
->genpd_runtime_suspend()
gpd_data->performance_state == 0, -> gpd_data->rpm_pstate = 0;
...
"new use case start"
dev_pm_genpd_set_performance_state(dev, 100);
pm_runtime_get_sync()
->genpd_runtime_resume()
gpd_data->performance_state == 100, -> gpd_data->rpm_pstate = 0;
(This is where we need to check for "zero" to not override the value)
.....
------

I wouldn't say that the above is the way how I see the calls to
dev_pm_genpd_set_performance_state (or actually
dev_pm_opp_set_rate|opp()) being deployed. The calls should rather be
done from the subsystem/driver's ->runtime_suspend|resume() callback,
then the path above would work in the way you suggest.

Although, as we currently treat performance states and power states in
genpd orthogonally, I wanted to make sure we could cope with both
situations.

Did this help? :-)

Kind regards
Uffe

2021-06-04 08:27:45

by Stephan Gerhold

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Fri, Jun 04, 2021 at 09:18:45AM +0200, Ulf Hansson wrote:
> On Thu, 3 Jun 2021 at 19:16, Stephan Gerhold <[email protected]> wrote:
> >
> > On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> > > On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
> > > > I think this might also go into the direction of my problem with the OPP
> > > > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > > > the power domains, it just sets a performance state. I got kind of stuck
> > > > with all the complexity of power domains in Linux so I think we never
> > > > solved that.
> > >
> > > Hmm, that issue is in a way related.
> > >
> > > Although, if I understand correctly, that was rather about at what
> > > layer it makes best sense to activate the device (from runtime PM
> > > point of view). And this was needed due to the fact that the
> > > corresponding genpd provider, requires the PM domain to be power on to
> > > allow changing a performance state for it. Did I get that correct?
> > >
> >
> > Yes, mostly. But I guess I keep coming back to the same question:
> >
> > When/why does it make sense to vote for a "performance state" of
> > a power domain that is or might be powered off?
> >
> > "Powered off" sounds like the absolutely lowest possible performance
> > state to me, it's just not on at all. And if suddenly a device comes and
> > says "I want performance state X", nothing can change until the power
> > domain is also "powered on".
> >
> > I think my "CPU DVFS" problem only exists because in many other
> > situations it's possible to rely on one of the following side effects:
> >
> > 1. The genpd provider does not care if it's powered on or not.
> > (i.e. it's always-on or implicitly powers on if state > 0).
> > 2. There is some other device that votes to keep the power domain on.
> >
> > And that's how the problem relates to my comment for this patch series ...
> >
> > >
> > > >
> > > > Do I understand your patch set correctly that you basically make the
> > > > performance state votes conditional to the "power-on" vote of the device
> > > > (which is automatically toggled during runtime/system PM)?
> > >
> > > The series can be considered as a step in that direction, but no, this
> > > series doesn't change that behaviour.
> > >
> > > Users of dev_pm_genpd_set_performance_state() are still free to set a
> > > performance state, orthogonally to whether the PM domain is powered on
> > > or off.
> > >
> > > >
> > > > If yes, I think that's a good thing. It was always really confusing to me
> > > > that a device can make performance state votes if it doesn't actually
> > > > want the power domain to be powered on.
> > >
> > > I share your view, it's a bit confusing.
> > >
> > > Just adding the condition internally to genpd to prevent the caller of
> > > dev_pm_genpd_set_performance() from succeeding to set a new state,
> > > unless the genpd is powered on, should be a rather simple thing to
> > > add.
> > >
> > > However, to change this, we first need to double check that all the
> > > callers are making sure they have turned on the PM domain (typically
> > > via runtime PM).
> > >
> >
> > ... because if performance state votes would be conditional to the
> > "power-on" vote of the device, it would no longer be possible
> > to rely on the side effects mentioned above. So this would most
> > certainly break some code that (incorrectly?) relies on these side
> > effects, but would also prevent such code.
>
> Right. I understand your point and I am open to discuss an
> implementation. Although, I suggest we continue that separately from
> the $subject series.
>
> >
> > My (personal) feeling so far is that just dropping performance votes
> > during runtime/system suspend just makes the entire situation even more
> > confusing.
>
> Well, that's what most subsystems/drivers need to do.
>
> Moreover, we have specific devices that only use one default OPP [1].
>
> >
> > > >
> > > > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > > > while the device is suspended? Will that mess up the performance state
> > > > when the device resumes?
> > >
> > > Good question. The idea is:
> > >
> > > If genpd in genpd_runtime_suspend() are able to drop an existing vote
> > > for a performance state, it should restore the vote in
> > > genpd_runtime_resume(). This also means, if there is no vote to drop
> > > in genpd_runtime_suspend(), genpd should just leave the vote as is in
> > > genpd_runtime_resume().
> > >
> >
> > But the next time the device enters runtime suspend that vote would be
> > dropped, wouldn't it? That feels kind of strange to me.
>
> What do you mean by "next time"?
>

Basically just like:

<device runtime-suspended>
driver does dev_pm_genpd_set_performance_state(...)
- performance state is applied immediately, even though device does
apparently not actually want the power domain to be powered on
<device runtime resumes>
- performance state is kept
<device runtime suspends>
- performance state is dropped
...

I'm not saying this example makes sense (it doesn't for me). It doesn't
make sense to vote for a performance state while runtime suspended.

But with this patch series we still allow that, and it will kind of
produce inconsistent behavior that the performance state is applied
immediately, even though the device is currently runtime-suspended.
But once it runtime suspends again, suddenly it is dropped.

And when you say:

> My main point is, if the device enters runtime suspend state, why
> should we keep the vote for an OPP for the device? I mean, the device
> isn't going to be used anyway.
>

A very similar point would be: "If the device *is* in runtime suspend
state, why should we take a vote for an OPP for the device?"

But I understand that this might be something we should address
separately in a follow-up patch/discussion. Don't get me wrong, I agree
this patch set is good, I just think we should go one step further and
finally make this consistent and less prone to side effects.

A good first step might be something like a WARN_ON_ONCE(...) if a
device tries to vote for a performance state while runtime suspended.
Then we might get a clearer picture which drivers do that currently.

Stephan

2021-06-04 11:01:43

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Fri, 4 Jun 2021 at 10:23, Stephan Gerhold <[email protected]> wrote:
>
> On Fri, Jun 04, 2021 at 09:18:45AM +0200, Ulf Hansson wrote:
> > On Thu, 3 Jun 2021 at 19:16, Stephan Gerhold <[email protected]> wrote:
> > >
> > > On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> > > > On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
> > > > > I think this might also go into the direction of my problem with the OPP
> > > > > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > > > > the power domains, it just sets a performance state. I got kind of stuck
> > > > > with all the complexity of power domains in Linux so I think we never
> > > > > solved that.
> > > >
> > > > Hmm, that issue is in a way related.
> > > >
> > > > Although, if I understand correctly, that was rather about at what
> > > > layer it makes best sense to activate the device (from runtime PM
> > > > point of view). And this was needed due to the fact that the
> > > > corresponding genpd provider, requires the PM domain to be power on to
> > > > allow changing a performance state for it. Did I get that correct?
> > > >
> > >
> > > Yes, mostly. But I guess I keep coming back to the same question:
> > >
> > > When/why does it make sense to vote for a "performance state" of
> > > a power domain that is or might be powered off?
> > >
> > > "Powered off" sounds like the absolutely lowest possible performance
> > > state to me, it's just not on at all. And if suddenly a device comes and
> > > says "I want performance state X", nothing can change until the power
> > > domain is also "powered on".
> > >
> > > I think my "CPU DVFS" problem only exists because in many other
> > > situations it's possible to rely on one of the following side effects:
> > >
> > > 1. The genpd provider does not care if it's powered on or not.
> > > (i.e. it's always-on or implicitly powers on if state > 0).
> > > 2. There is some other device that votes to keep the power domain on.
> > >
> > > And that's how the problem relates to my comment for this patch series ...
> > >
> > > >
> > > > >
> > > > > Do I understand your patch set correctly that you basically make the
> > > > > performance state votes conditional to the "power-on" vote of the device
> > > > > (which is automatically toggled during runtime/system PM)?
> > > >
> > > > The series can be considered as a step in that direction, but no, this
> > > > series doesn't change that behaviour.
> > > >
> > > > Users of dev_pm_genpd_set_performance_state() are still free to set a
> > > > performance state, orthogonally to whether the PM domain is powered on
> > > > or off.
> > > >
> > > > >
> > > > > If yes, I think that's a good thing. It was always really confusing to me
> > > > > that a device can make performance state votes if it doesn't actually
> > > > > want the power domain to be powered on.
> > > >
> > > > I share your view, it's a bit confusing.
> > > >
> > > > Just adding the condition internally to genpd to prevent the caller of
> > > > dev_pm_genpd_set_performance() from succeeding to set a new state,
> > > > unless the genpd is powered on, should be a rather simple thing to
> > > > add.
> > > >
> > > > However, to change this, we first need to double check that all the
> > > > callers are making sure they have turned on the PM domain (typically
> > > > via runtime PM).
> > > >
> > >
> > > ... because if performance state votes would be conditional to the
> > > "power-on" vote of the device, it would no longer be possible
> > > to rely on the side effects mentioned above. So this would most
> > > certainly break some code that (incorrectly?) relies on these side
> > > effects, but would also prevent such code.
> >
> > Right. I understand your point and I am open to discuss an
> > implementation. Although, I suggest we continue that separately from
> > the $subject series.
> >
> > >
> > > My (personal) feeling so far is that just dropping performance votes
> > > during runtime/system suspend just makes the entire situation even more
> > > confusing.
> >
> > Well, that's what most subsystems/drivers need to do.
> >
> > Moreover, we have specific devices that only use one default OPP [1].
> >
> > >
> > > > >
> > > > > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > > > > while the device is suspended? Will that mess up the performance state
> > > > > when the device resumes?
> > > >
> > > > Good question. The idea is:
> > > >
> > > > If genpd in genpd_runtime_suspend() are able to drop an existing vote
> > > > for a performance state, it should restore the vote in
> > > > genpd_runtime_resume(). This also means, if there is no vote to drop
> > > > in genpd_runtime_suspend(), genpd should just leave the vote as is in
> > > > genpd_runtime_resume().
> > > >
> > >
> > > But the next time the device enters runtime suspend that vote would be
> > > dropped, wouldn't it? That feels kind of strange to me.
> >
> > What do you mean by "next time"?
> >
>
> Basically just like:
>
> <device runtime-suspended>
> driver does dev_pm_genpd_set_performance_state(...)
> - performance state is applied immediately, even though device does
> apparently not actually want the power domain to be powered on
> <device runtime resumes>
> - performance state is kept
> <device runtime suspends>
> - performance state is dropped

Yep, this is what would happen.

> ...
>
> I'm not saying this example makes sense (it doesn't for me). It doesn't
> make sense to vote for a performance state while runtime suspended.
>
> But with this patch series we still allow that, and it will kind of
> produce inconsistent behavior that the performance state is applied
> immediately, even though the device is currently runtime-suspended.
> But once it runtime suspends again, suddenly it is dropped.

Yes.

Note that, I have been looking at the existing callers of
dev_pm_genpd_set_performance_state() in the kernel as of today. It
should not be an issue, at least as far as I can tell.

>
> And when you say:
>
> > My main point is, if the device enters runtime suspend state, why
> > should we keep the vote for an OPP for the device? I mean, the device
> > isn't going to be used anyway.
> >
>
> A very similar point would be: "If the device *is* in runtime suspend
> state, why should we take a vote for an OPP for the device?"
>
> But I understand that this might be something we should address
> separately in a follow-up patch/discussion. Don't get me wrong, I agree
> this patch set is good, I just think we should go one step further and
> finally make this consistent and less prone to side effects.

I agree. We should look into how to change the behaviour. I intend to
have a look at it in a while.

>
> A good first step might be something like a WARN_ON_ONCE(...) if a
> device tries to vote for a performance state while runtime suspended.
> Then we might get a clearer picture which drivers do that currently.

That's an idea we could try, even if the number of users are quite
limited today. I can try the "git grep" analyze-method, I will
probably find most of them.

>
> Stephan

That said, are you okay that we move forward with the $subject series
(except patch4)?

Kind regards
Uffe

2021-06-04 11:53:00

by Stephan Gerhold

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Fri, Jun 04, 2021 at 12:57:59PM +0200, Ulf Hansson wrote:
> On Fri, 4 Jun 2021 at 10:23, Stephan Gerhold <[email protected]> wrote:
> >
> > On Fri, Jun 04, 2021 at 09:18:45AM +0200, Ulf Hansson wrote:
> > > On Thu, 3 Jun 2021 at 19:16, Stephan Gerhold <[email protected]> wrote:
> > > >
> > > > On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> > > > > On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
> > > > > > I think this might also go into the direction of my problem with the OPP
> > > > > > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > > > > > the power domains, it just sets a performance state. I got kind of stuck
> > > > > > with all the complexity of power domains in Linux so I think we never
> > > > > > solved that.
> > > > >
> > > > > Hmm, that issue is in a way related.
> > > > >
> > > > > Although, if I understand correctly, that was rather about at what
> > > > > layer it makes best sense to activate the device (from runtime PM
> > > > > point of view). And this was needed due to the fact that the
> > > > > corresponding genpd provider, requires the PM domain to be power on to
> > > > > allow changing a performance state for it. Did I get that correct?
> > > > >
> > > >
> > > > Yes, mostly. But I guess I keep coming back to the same question:
> > > >
> > > > When/why does it make sense to vote for a "performance state" of
> > > > a power domain that is or might be powered off?
> > > >
> > > > "Powered off" sounds like the absolutely lowest possible performance
> > > > state to me, it's just not on at all. And if suddenly a device comes and
> > > > says "I want performance state X", nothing can change until the power
> > > > domain is also "powered on".
> > > >
> > > > I think my "CPU DVFS" problem only exists because in many other
> > > > situations it's possible to rely on one of the following side effects:
> > > >
> > > > 1. The genpd provider does not care if it's powered on or not.
> > > > (i.e. it's always-on or implicitly powers on if state > 0).
> > > > 2. There is some other device that votes to keep the power domain on.
> > > >
> > > > And that's how the problem relates to my comment for this patch series ...
> > > >
> > > > >
> > > > > >
> > > > > > Do I understand your patch set correctly that you basically make the
> > > > > > performance state votes conditional to the "power-on" vote of the device
> > > > > > (which is automatically toggled during runtime/system PM)?
> > > > >
> > > > > The series can be considered as a step in that direction, but no, this
> > > > > series doesn't change that behaviour.
> > > > >
> > > > > Users of dev_pm_genpd_set_performance_state() are still free to set a
> > > > > performance state, orthogonally to whether the PM domain is powered on
> > > > > or off.
> > > > >
> > > > > >
> > > > > > If yes, I think that's a good thing. It was always really confusing to me
> > > > > > that a device can make performance state votes if it doesn't actually
> > > > > > want the power domain to be powered on.
> > > > >
> > > > > I share your view, it's a bit confusing.
> > > > >
> > > > > Just adding the condition internally to genpd to prevent the caller of
> > > > > dev_pm_genpd_set_performance() from succeeding to set a new state,
> > > > > unless the genpd is powered on, should be a rather simple thing to
> > > > > add.
> > > > >
> > > > > However, to change this, we first need to double check that all the
> > > > > callers are making sure they have turned on the PM domain (typically
> > > > > via runtime PM).
> > > > >
> > > >
> > > > ... because if performance state votes would be conditional to the
> > > > "power-on" vote of the device, it would no longer be possible
> > > > to rely on the side effects mentioned above. So this would most
> > > > certainly break some code that (incorrectly?) relies on these side
> > > > effects, but would also prevent such code.
> > >
> > > Right. I understand your point and I am open to discuss an
> > > implementation. Although, I suggest we continue that separately from
> > > the $subject series.
> > >
> > > >
> > > > My (personal) feeling so far is that just dropping performance votes
> > > > during runtime/system suspend just makes the entire situation even more
> > > > confusing.
> > >
> > > Well, that's what most subsystems/drivers need to do.
> > >
> > > Moreover, we have specific devices that only use one default OPP [1].
> > >
> > > >
> > > > > >
> > > > > > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > > > > > while the device is suspended? Will that mess up the performance state
> > > > > > when the device resumes?
> > > > >
> > > > > Good question. The idea is:
> > > > >
> > > > > If genpd in genpd_runtime_suspend() are able to drop an existing vote
> > > > > for a performance state, it should restore the vote in
> > > > > genpd_runtime_resume(). This also means, if there is no vote to drop
> > > > > in genpd_runtime_suspend(), genpd should just leave the vote as is in
> > > > > genpd_runtime_resume().
> > > > >
> > > >
> > > > But the next time the device enters runtime suspend that vote would be
> > > > dropped, wouldn't it? That feels kind of strange to me.
> > >
> > > What do you mean by "next time"?
> > >
> >
> > Basically just like:
> >
> > <device runtime-suspended>
> > driver does dev_pm_genpd_set_performance_state(...)
> > - performance state is applied immediately, even though device does
> > apparently not actually want the power domain to be powered on
> > <device runtime resumes>
> > - performance state is kept
> > <device runtime suspends>
> > - performance state is dropped
>
> Yep, this is what would happen.
>
> > ...
> >
> > I'm not saying this example makes sense (it doesn't for me). It doesn't
> > make sense to vote for a performance state while runtime suspended.
> >
> > But with this patch series we still allow that, and it will kind of
> > produce inconsistent behavior that the performance state is applied
> > immediately, even though the device is currently runtime-suspended.
> > But once it runtime suspends again, suddenly it is dropped.
>
> Yes.
>
> Note that, I have been looking at the existing callers of
> dev_pm_genpd_set_performance_state() in the kernel as of today. It
> should not be an issue, at least as far as I can tell.
>
> >
> > And when you say:
> >
> > > My main point is, if the device enters runtime suspend state, why
> > > should we keep the vote for an OPP for the device? I mean, the device
> > > isn't going to be used anyway.
> > >
> >
> > A very similar point would be: "If the device *is* in runtime suspend
> > state, why should we take a vote for an OPP for the device?"
> >
> > But I understand that this might be something we should address
> > separately in a follow-up patch/discussion. Don't get me wrong, I agree
> > this patch set is good, I just think we should go one step further and
> > finally make this consistent and less prone to side effects.
>
> I agree. We should look into how to change the behaviour. I intend to
> have a look at it in a while.
>

Great, thanks!

> >
> > A good first step might be something like a WARN_ON_ONCE(...) if a
> > device tries to vote for a performance state while runtime suspended.
> > Then we might get a clearer picture which drivers do that currently.
>
> That's an idea we could try, even if the number of users are quite
> limited today. I can try the "git grep" analyze-method, I will
> probably find most of them.
>

The current user of "required-opps" for CPU DVFS (just qcom/qcs404.dtsi
with qcom/cpr.c I think?) is definitely broken (never votes to turn on
the power domain). So one requirement for making that change of behavior
is figuring out how to deal with enabling power domains at the OPP core
(or whereever else).

>
> That said, are you okay that we move forward with the $subject series
> (except patch4)?
>

It sounds fine to me. My system doesn't have power domain performance
states set up properly yet (due to various open questions), so I can't
test it properly though. Also, I'm not sure if it's a good idea to omit
patch 4, doesn't that mean drivers that currently drop the performance
states themselves can be only partially cleaned up?

I do have some thoughts about the "regulator-fixed-domain". Not exactly
a solution for the problem you mentioned, just some related thoughts.
Will try to reply there later.

Thanks!
Stephan

2021-06-07 04:50:18

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On 04-06-21, 09:45, Ulf Hansson wrote:
> Starting calls from the subsystem/driver:
>
> ------
> dev_pm_genpd_set_performance_state(dev, 100);
> "run a use case with device runtime resumed"
> ...
> "use case ends"
> dev_pm_genpd_set_performance_state(dev, 0);
> pm_runtime_put()
> ->genpd_runtime_suspend()
> gpd_data->performance_state == 0, -> gpd_data->rpm_pstate = 0;
> ...
> "new use case start"
> dev_pm_genpd_set_performance_state(dev, 100);
> pm_runtime_get_sync()
> ->genpd_runtime_resume()
> gpd_data->performance_state == 100, -> gpd_data->rpm_pstate = 0;
> (This is where we need to check for "zero" to not override the value)
> .....
> ------
>
> I wouldn't say that the above is the way how I see the calls to
> dev_pm_genpd_set_performance_state (or actually
> dev_pm_opp_set_rate|opp()) being deployed. The calls should rather be
> done from the subsystem/driver's ->runtime_suspend|resume() callback,
> then the path above would work in the way you suggest.
>
> Although, as we currently treat performance states and power states in
> genpd orthogonally, I wanted to make sure we could cope with both
> situations.

I think letting the drivers to call
dev_pm_genpd_set_performance_state(dev, 0) from suspend/resume makes
it really ugly/racy as both depend on the gpd_data->performance_state
for this. It doesn't look nice. And we shouldn't try to protect such
drivers.

Anyway, your call :)

> Did this help? :-)

Yes :)

--
viresh

2021-06-09 15:08:50

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] PM: domains: Drop/restore performance state votes for devices at runtime PM

On Mon, 7 Jun 2021 at 06:47, Viresh Kumar <[email protected]> wrote:
>
> On 04-06-21, 09:45, Ulf Hansson wrote:
> > Starting calls from the subsystem/driver:
> >
> > ------
> > dev_pm_genpd_set_performance_state(dev, 100);
> > "run a use case with device runtime resumed"
> > ...
> > "use case ends"
> > dev_pm_genpd_set_performance_state(dev, 0);
> > pm_runtime_put()
> > ->genpd_runtime_suspend()
> > gpd_data->performance_state == 0, -> gpd_data->rpm_pstate = 0;
> > ...
> > "new use case start"
> > dev_pm_genpd_set_performance_state(dev, 100);
> > pm_runtime_get_sync()
> > ->genpd_runtime_resume()
> > gpd_data->performance_state == 100, -> gpd_data->rpm_pstate = 0;
> > (This is where we need to check for "zero" to not override the value)
> > .....
> > ------
> >
> > I wouldn't say that the above is the way how I see the calls to
> > dev_pm_genpd_set_performance_state (or actually
> > dev_pm_opp_set_rate|opp()) being deployed. The calls should rather be
> > done from the subsystem/driver's ->runtime_suspend|resume() callback,
> > then the path above would work in the way you suggest.
> >
> > Although, as we currently treat performance states and power states in
> > genpd orthogonally, I wanted to make sure we could cope with both
> > situations.
>
> I think letting the drivers to call
> dev_pm_genpd_set_performance_state(dev, 0) from suspend/resume makes
> it really ugly/racy as both depend on the gpd_data->performance_state
> for this. It doesn't look nice. And we shouldn't try to protect such
> drivers.
>
> Anyway, your call :)

Well, I am not sure we have an option at this point. As long as we
allow performance states to be managed orthogonally to on/off states
in genpd, the check in genpd_restore_performance_state() is needed.

I have started to prepare a new version of the series - and will add a
comment about this in the code to try to clarify this.

[...]

Kind regards
Uffe

2021-06-11 16:45:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

On Fri, Jun 4, 2021 at 1:51 PM Stephan Gerhold <[email protected]> wrote:
>
> On Fri, Jun 04, 2021 at 12:57:59PM +0200, Ulf Hansson wrote:
> > On Fri, 4 Jun 2021 at 10:23, Stephan Gerhold <[email protected]> wrote:
> > >
> > > On Fri, Jun 04, 2021 at 09:18:45AM +0200, Ulf Hansson wrote:
> > > > On Thu, 3 Jun 2021 at 19:16, Stephan Gerhold <[email protected]> wrote:
> > > > >
> > > > > On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> > > > > > On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <[email protected]> wrote:
> > > > > > > I think this might also go into the direction of my problem with the OPP
> > > > > > > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > > > > > > the power domains, it just sets a performance state. I got kind of stuck
> > > > > > > with all the complexity of power domains in Linux so I think we never
> > > > > > > solved that.
> > > > > >
> > > > > > Hmm, that issue is in a way related.
> > > > > >
> > > > > > Although, if I understand correctly, that was rather about at what
> > > > > > layer it makes best sense to activate the device (from runtime PM
> > > > > > point of view). And this was needed due to the fact that the
> > > > > > corresponding genpd provider, requires the PM domain to be power on to
> > > > > > allow changing a performance state for it. Did I get that correct?
> > > > > >
> > > > >
> > > > > Yes, mostly. But I guess I keep coming back to the same question:
> > > > >
> > > > > When/why does it make sense to vote for a "performance state" of
> > > > > a power domain that is or might be powered off?
> > > > >
> > > > > "Powered off" sounds like the absolutely lowest possible performance
> > > > > state to me, it's just not on at all. And if suddenly a device comes and
> > > > > says "I want performance state X", nothing can change until the power
> > > > > domain is also "powered on".
> > > > >
> > > > > I think my "CPU DVFS" problem only exists because in many other
> > > > > situations it's possible to rely on one of the following side effects:
> > > > >
> > > > > 1. The genpd provider does not care if it's powered on or not.
> > > > > (i.e. it's always-on or implicitly powers on if state > 0).
> > > > > 2. There is some other device that votes to keep the power domain on.
> > > > >
> > > > > And that's how the problem relates to my comment for this patch series ...
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Do I understand your patch set correctly that you basically make the
> > > > > > > performance state votes conditional to the "power-on" vote of the device
> > > > > > > (which is automatically toggled during runtime/system PM)?
> > > > > >
> > > > > > The series can be considered as a step in that direction, but no, this
> > > > > > series doesn't change that behaviour.
> > > > > >
> > > > > > Users of dev_pm_genpd_set_performance_state() are still free to set a
> > > > > > performance state, orthogonally to whether the PM domain is powered on
> > > > > > or off.
> > > > > >
> > > > > > >
> > > > > > > If yes, I think that's a good thing. It was always really confusing to me
> > > > > > > that a device can make performance state votes if it doesn't actually
> > > > > > > want the power domain to be powered on.
> > > > > >
> > > > > > I share your view, it's a bit confusing.
> > > > > >
> > > > > > Just adding the condition internally to genpd to prevent the caller of
> > > > > > dev_pm_genpd_set_performance() from succeeding to set a new state,
> > > > > > unless the genpd is powered on, should be a rather simple thing to
> > > > > > add.
> > > > > >
> > > > > > However, to change this, we first need to double check that all the
> > > > > > callers are making sure they have turned on the PM domain (typically
> > > > > > via runtime PM).
> > > > > >
> > > > >
> > > > > ... because if performance state votes would be conditional to the
> > > > > "power-on" vote of the device, it would no longer be possible
> > > > > to rely on the side effects mentioned above. So this would most
> > > > > certainly break some code that (incorrectly?) relies on these side
> > > > > effects, but would also prevent such code.
> > > >
> > > > Right. I understand your point and I am open to discuss an
> > > > implementation. Although, I suggest we continue that separately from
> > > > the $subject series.
> > > >
> > > > >
> > > > > My (personal) feeling so far is that just dropping performance votes
> > > > > during runtime/system suspend just makes the entire situation even more
> > > > > confusing.
> > > >
> > > > Well, that's what most subsystems/drivers need to do.
> > > >
> > > > Moreover, we have specific devices that only use one default OPP [1].
> > > >
> > > > >
> > > > > > >
> > > > > > > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > > > > > > while the device is suspended? Will that mess up the performance state
> > > > > > > when the device resumes?
> > > > > >
> > > > > > Good question. The idea is:
> > > > > >
> > > > > > If genpd in genpd_runtime_suspend() are able to drop an existing vote
> > > > > > for a performance state, it should restore the vote in
> > > > > > genpd_runtime_resume(). This also means, if there is no vote to drop
> > > > > > in genpd_runtime_suspend(), genpd should just leave the vote as is in
> > > > > > genpd_runtime_resume().
> > > > > >
> > > > >
> > > > > But the next time the device enters runtime suspend that vote would be
> > > > > dropped, wouldn't it? That feels kind of strange to me.
> > > >
> > > > What do you mean by "next time"?
> > > >
> > >
> > > Basically just like:
> > >
> > > <device runtime-suspended>
> > > driver does dev_pm_genpd_set_performance_state(...)
> > > - performance state is applied immediately, even though device does
> > > apparently not actually want the power domain to be powered on
> > > <device runtime resumes>
> > > - performance state is kept
> > > <device runtime suspends>
> > > - performance state is dropped
> >
> > Yep, this is what would happen.
> >
> > > ...
> > >
> > > I'm not saying this example makes sense (it doesn't for me). It doesn't
> > > make sense to vote for a performance state while runtime suspended.
> > >
> > > But with this patch series we still allow that, and it will kind of
> > > produce inconsistent behavior that the performance state is applied
> > > immediately, even though the device is currently runtime-suspended.
> > > But once it runtime suspends again, suddenly it is dropped.
> >
> > Yes.
> >
> > Note that, I have been looking at the existing callers of
> > dev_pm_genpd_set_performance_state() in the kernel as of today. It
> > should not be an issue, at least as far as I can tell.
> >
> > >
> > > And when you say:
> > >
> > > > My main point is, if the device enters runtime suspend state, why
> > > > should we keep the vote for an OPP for the device? I mean, the device
> > > > isn't going to be used anyway.
> > > >
> > >
> > > A very similar point would be: "If the device *is* in runtime suspend
> > > state, why should we take a vote for an OPP for the device?"
> > >
> > > But I understand that this might be something we should address
> > > separately in a follow-up patch/discussion. Don't get me wrong, I agree
> > > this patch set is good, I just think we should go one step further and
> > > finally make this consistent and less prone to side effects.
> >
> > I agree. We should look into how to change the behaviour. I intend to
> > have a look at it in a while.
> >
>
> Great, thanks!
>
> > >
> > > A good first step might be something like a WARN_ON_ONCE(...) if a
> > > device tries to vote for a performance state while runtime suspended.
> > > Then we might get a clearer picture which drivers do that currently.
> >
> > That's an idea we could try, even if the number of users are quite
> > limited today. I can try the "git grep" analyze-method, I will
> > probably find most of them.
> >
>
> The current user of "required-opps" for CPU DVFS (just qcom/qcs404.dtsi
> with qcom/cpr.c I think?) is definitely broken (never votes to turn on
> the power domain). So one requirement for making that change of behavior
> is figuring out how to deal with enabling power domains at the OPP core
> (or whereever else).
>
> >
> > That said, are you okay that we move forward with the $subject series
> > (except patch4)?
> >
>
> It sounds fine to me.

All right.

So patches [1-3/4] have been applied as 5.14 material, thanks!