2017-07-19 22:54:35

by Florian Fainelli

[permalink] [raw]
Subject: cpuidle and cpufreq coupling?

Hi,

We have a particular ARM CPU design that is drawing quite a lot of
current upon exit from WFI, and it does so in a way even before the
first instruction out of WFI is executed. That means we cannot influence
directly the exit from WFI other than by changing the state in which it
would be previously entered because of this "dead" time during which the
internal logic needs to ramp up back where it left.

A naive approach to solving this problem because we have CPU frequency
scaling available would be to do the following:

- just before entering WFI, switch to a low frequency OPP
- enter WFI
- upon exit from WFI, ramp up the frequency back to e.g: highest OPP

Some of the parts that I am not exactly clear on would be:

- would that qualify as a cpuidle governor of some kind that ties in
which cpufreq?
- would using cpufreq_driver_fast_switch() be an appropriate API to use
from outside

Thanks!
--
Florian



2017-07-19 23:17:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
> Hi,
>
> We have a particular ARM CPU design that is drawing quite a lot of
> current upon exit from WFI, and it does so in a way even before the
> first instruction out of WFI is executed. That means we cannot influence
> directly the exit from WFI other than by changing the state in which it
> would be previously entered because of this "dead" time during which the
> internal logic needs to ramp up back where it left.
>
> A naive approach to solving this problem because we have CPU frequency
> scaling available would be to do the following:
>
> - just before entering WFI, switch to a low frequency OPP
> - enter WFI
> - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
>
> Some of the parts that I am not exactly clear on would be:
>
> - would that qualify as a cpuidle governor of some kind that ties in
> which cpufreq?
> - would using cpufreq_driver_fast_switch() be an appropriate API to use
> from outside

Generally, the idle driver is expected to manipulate OPPs as suitable
for it at the low level.

Thanks,
Rafael

2017-07-20 07:18:52

by Viresh Kumar

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On 20-07-17, 01:17, Rafael J. Wysocki wrote:
> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
> > Hi,
> >
> > We have a particular ARM CPU design that is drawing quite a lot of
> > current upon exit from WFI, and it does so in a way even before the
> > first instruction out of WFI is executed. That means we cannot influence
> > directly the exit from WFI other than by changing the state in which it
> > would be previously entered because of this "dead" time during which the
> > internal logic needs to ramp up back where it left.
> >
> > A naive approach to solving this problem because we have CPU frequency
> > scaling available would be to do the following:
> >
> > - just before entering WFI, switch to a low frequency OPP
> > - enter WFI
> > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
> >
> > Some of the parts that I am not exactly clear on would be:
> >
> > - would that qualify as a cpuidle governor of some kind that ties in
> > which cpufreq?
> > - would using cpufreq_driver_fast_switch() be an appropriate API to use
> > from outside
>
> Generally, the idle driver is expected to manipulate OPPs as suitable
> for it at the low level.

Does any idle driver do it today ?

I am not sure, but I haven't heard anyone from ARM doing it. Though I
may have completely missed it :)

So, that must call into cpufreq (somehow) and look for a low power
OPP?

@Florian: It would be more tricky then we anticipate. We don't always
want to go to low OPP on idle, as we may get out of it very quickly
and changing OPP twice (before and after idle) in that scenario would
be a complete waste of time. And then I hope your ARM CPUs must be
sharing clock/voltage lines with each other as well ? And in that case
we shouldn't touch the OPP unless the whole cluster is going down, as
some CPUs might be running code then.

--
viresh

2017-07-20 09:23:36

by Sudeep Holla

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?



On 20/07/17 08:18, Viresh Kumar wrote:
> On 20-07-17, 01:17, Rafael J. Wysocki wrote:
>> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
>>> Hi,
>>>
>>> We have a particular ARM CPU design that is drawing quite a lot of
>>> current upon exit from WFI, and it does so in a way even before the
>>> first instruction out of WFI is executed. That means we cannot influence
>>> directly the exit from WFI other than by changing the state in which it
>>> would be previously entered because of this "dead" time during which the
>>> internal logic needs to ramp up back where it left.
>>>
>>> A naive approach to solving this problem because we have CPU frequency
>>> scaling available would be to do the following:
>>>
>>> - just before entering WFI, switch to a low frequency OPP
>>> - enter WFI
>>> - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
>>>
>>> Some of the parts that I am not exactly clear on would be:
>>>
>>> - would that qualify as a cpuidle governor of some kind that ties in
>>> which cpufreq?
>>> - would using cpufreq_driver_fast_switch() be an appropriate API to use
>>> from outside
>>
>> Generally, the idle driver is expected to manipulate OPPs as suitable
>> for it at the low level.
>
> Does any idle driver do it today ?

> I am not sure, but I haven't heard anyone from ARM doing it. Though I
> may have completely missed it :)
>

It doesn't need to be in Linux. E.g. PSCI or any low lever driver can do
that transparently.

> So, that must call into cpufreq (somehow) and look for a low power
> OPP?
>

That's seems hacky and NAK if it's PSCI platform. It's cleaner do such
hacks/workarounds in platform specific PSCI firmware.

> @Florian: It would be more tricky then we anticipate. We don't always
> want to go to low OPP on idle, as we may get out of it very quickly
> and changing OPP twice (before and after idle) in that scenario would
> be a complete waste of time.

Exactly.

--
Regards,
Sudeep

2017-07-20 09:52:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On Thu, Jul 20, 2017 at 9:18 AM, Viresh Kumar <[email protected]> wrote:
> On 20-07-17, 01:17, Rafael J. Wysocki wrote:
>> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
>> > Hi,
>> >
>> > We have a particular ARM CPU design that is drawing quite a lot of
>> > current upon exit from WFI, and it does so in a way even before the
>> > first instruction out of WFI is executed. That means we cannot influence
>> > directly the exit from WFI other than by changing the state in which it
>> > would be previously entered because of this "dead" time during which the
>> > internal logic needs to ramp up back where it left.
>> >
>> > A naive approach to solving this problem because we have CPU frequency
>> > scaling available would be to do the following:
>> >
>> > - just before entering WFI, switch to a low frequency OPP
>> > - enter WFI
>> > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
>> >
>> > Some of the parts that I am not exactly clear on would be:
>> >
>> > - would that qualify as a cpuidle governor of some kind that ties in
>> > which cpufreq?
>> > - would using cpufreq_driver_fast_switch() be an appropriate API to use
>> > from outside
>>
>> Generally, the idle driver is expected to manipulate OPPs as suitable
>> for it at the low level.
>
> Does any idle driver do it today ?
>
> I am not sure, but I haven't heard anyone from ARM doing it. Though I
> may have completely missed it :)

You may not, but that's what is recommended.

Had you attended PM sessions at the LPC and similar, you might have
heard about it ...

> So, that must call into cpufreq (somehow) and look for a low power
> OPP?

It should know what OPP to use and then coordinate with cpufreq so
they don't go against each other (on shared policies).

2017-07-20 14:45:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On Thu, Jul 20, 2017 at 11:52:41AM +0200, Rafael J. Wysocki wrote:
> On Thu, Jul 20, 2017 at 9:18 AM, Viresh Kumar <[email protected]> wrote:
> > On 20-07-17, 01:17, Rafael J. Wysocki wrote:
> >> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > We have a particular ARM CPU design that is drawing quite a lot of
> >> > current upon exit from WFI, and it does so in a way even before the
> >> > first instruction out of WFI is executed. That means we cannot influence
> >> > directly the exit from WFI other than by changing the state in which it
> >> > would be previously entered because of this "dead" time during which the
> >> > internal logic needs to ramp up back where it left.
> >> >
> >> > A naive approach to solving this problem because we have CPU frequency
> >> > scaling available would be to do the following:
> >> >
> >> > - just before entering WFI, switch to a low frequency OPP
> >> > - enter WFI
> >> > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
> >> >
> >> > Some of the parts that I am not exactly clear on would be:
> >> >
> >> > - would that qualify as a cpuidle governor of some kind that ties in
> >> > which cpufreq?
> >> > - would using cpufreq_driver_fast_switch() be an appropriate API to use
> >> > from outside

Can your ARM part change OPP without scheduling? Because (for obvious
reasons) the idle thread is not supposed to block.

2017-07-20 22:56:14

by Florian Fainelli

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On 07/20/2017 07:45 AM, Peter Zijlstra wrote:
> On Thu, Jul 20, 2017 at 11:52:41AM +0200, Rafael J. Wysocki wrote:
>> On Thu, Jul 20, 2017 at 9:18 AM, Viresh Kumar <[email protected]> wrote:
>>> On 20-07-17, 01:17, Rafael J. Wysocki wrote:
>>>> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> We have a particular ARM CPU design that is drawing quite a lot of
>>>>> current upon exit from WFI, and it does so in a way even before the
>>>>> first instruction out of WFI is executed. That means we cannot influence
>>>>> directly the exit from WFI other than by changing the state in which it
>>>>> would be previously entered because of this "dead" time during which the
>>>>> internal logic needs to ramp up back where it left.
>>>>>
>>>>> A naive approach to solving this problem because we have CPU frequency
>>>>> scaling available would be to do the following:
>>>>>
>>>>> - just before entering WFI, switch to a low frequency OPP
>>>>> - enter WFI
>>>>> - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
>>>>>
>>>>> Some of the parts that I am not exactly clear on would be:
>>>>>
>>>>> - would that qualify as a cpuidle governor of some kind that ties in
>>>>> which cpufreq?
>>>>> - would using cpufreq_driver_fast_switch() be an appropriate API to use
>>>>> from outside
>
> Can your ARM part change OPP without scheduling? Because (for obvious
> reasons) the idle thread is not supposed to block.

I think it should be able to do that, but I am not sure that if I went
through the cpufreq API it would be that straight forward so I may have
to re-implement some of the frequency scaling logic outside of cpufreq
(or rather make the low-level parts some kind of library I guess).

2017-07-20 23:01:45

by Florian Fainelli

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On 07/20/2017 02:23 AM, Sudeep Holla wrote:
>
>
> On 20/07/17 08:18, Viresh Kumar wrote:
>> On 20-07-17, 01:17, Rafael J. Wysocki wrote:
>>> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> We have a particular ARM CPU design that is drawing quite a lot of
>>>> current upon exit from WFI, and it does so in a way even before the
>>>> first instruction out of WFI is executed. That means we cannot influence
>>>> directly the exit from WFI other than by changing the state in which it
>>>> would be previously entered because of this "dead" time during which the
>>>> internal logic needs to ramp up back where it left.
>>>>
>>>> A naive approach to solving this problem because we have CPU frequency
>>>> scaling available would be to do the following:
>>>>
>>>> - just before entering WFI, switch to a low frequency OPP
>>>> - enter WFI
>>>> - upon exit from WFI, ramp up the frequency back to e.g: highest OPP
>>>>
>>>> Some of the parts that I am not exactly clear on would be:
>>>>
>>>> - would that qualify as a cpuidle governor of some kind that ties in
>>>> which cpufreq?
>>>> - would using cpufreq_driver_fast_switch() be an appropriate API to use
>>>> from outside
>>>
>>> Generally, the idle driver is expected to manipulate OPPs as suitable
>>> for it at the low level.
>>
>> Does any idle driver do it today ?
>
>> I am not sure, but I haven't heard anyone from ARM doing it. Though I
>> may have completely missed it :)
>>
>
> It doesn't need to be in Linux. E.g. PSCI or any low lever driver can do
> that transparently.

Not everything is PSCI-based, this platform is ARM (32_bit) and now
several years old, still, the logic and spirit remains largely the same.

>
>> So, that must call into cpufreq (somehow) and look for a low power
>> OPP?
>>
>
> That's seems hacky and NAK if it's PSCI platform. It's cleaner do such
> hacks/workarounds in platform specific PSCI firmware.
>
>> @Florian: It would be more tricky then we anticipate. We don't always
>> want to go to low OPP on idle, as we may get out of it very quickly
>> and changing OPP twice (before and after idle) in that scenario would
>> be a complete waste of time.
>
> Exactly.
>

I completely agree, this is a trade-off between creating a big but short
spike of energy that a poorly designed regulator/power distribution may
not handle versus creating a smaller amplitude, but longer in time
energy need.

The key point is that if your only lowest OPP is the lowest CPU
frequency, and the low-level logic to make that happen is there already
in the cpufreq driver, can we somehow both utilize it, and feed back its
latency into cpuidle, or should the cpufreq driver have hooks into
cpuidle (either way is probably fine, but the former scales better to
the number of diverse cpufreq drivers out there).

Thanks!
--
Florian

2017-07-21 00:11:04

by Vikram Mulukutla

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On 7/20/2017 3:56 PM, Florian Fainelli wrote:
> On 07/20/2017 07:45 AM, Peter Zijlstra wrote:

<snip>

>>
>> Can your ARM part change OPP without scheduling? Because (for obvious
>> reasons) the idle thread is not supposed to block.
>
> I think it should be able to do that, but I am not sure that if I went
> through the cpufreq API it would be that straight forward so I may have
> to re-implement some of the frequency scaling logic outside of cpufreq
> (or rather make the low-level parts some kind of library I guess).
>

I think I can safely mention that some of our non-upstream idle drivers
in the past have invoked low level clock drivers to atomically switch
CPUs to low frequency OPPs, with no interaction whatsoever with cpufreq.
It was maintainable since both the idle and clock drivers were
qcom-specific. However this is no longer necessary in recent designs and
I really hope we never need to do this again...

We didn't have to do a voltage switch and just PLL or mux
work so this was doable. I'm guessing your atomic switching also allows
voltage reduction?

If your architecture allows another CPU to change the entering-idle
CPU's
frequency, synchronization will be necessary as well - this is where it
can get a bit tricky.

Thanks,
Vikram

2017-07-21 00:30:47

by Florian Fainelli

[permalink] [raw]
Subject: Re: cpuidle and cpufreq coupling?

On 07/20/2017 05:11 PM, Vikram Mulukutla wrote:
> On 7/20/2017 3:56 PM, Florian Fainelli wrote:
>> On 07/20/2017 07:45 AM, Peter Zijlstra wrote:
>
> <snip>
>
>>>
>>> Can your ARM part change OPP without scheduling? Because (for obvious
>>> reasons) the idle thread is not supposed to block.
>>
>> I think it should be able to do that, but I am not sure that if I went
>> through the cpufreq API it would be that straight forward so I may have
>> to re-implement some of the frequency scaling logic outside of cpufreq
>> (or rather make the low-level parts some kind of library I guess).
>>
>
> I think I can safely mention that some of our non-upstream idle drivers
> in the past have invoked low level clock drivers to atomically switch
> CPUs to low frequency OPPs, with no interaction whatsoever with cpufreq.
> It was maintainable since both the idle and clock drivers were
> qcom-specific. However this is no longer necessary in recent designs and
> I really hope we never need to do this again...

Yes same here, this is for a past generation product, current generation
has a smarter design that so far does not require that.

>
> We didn't have to do a voltage switch and just PLL or mux
> work so this was doable. I'm guessing your atomic switching also allows
> voltage reduction?

Correct there is a voltage reduction occurring which is largely under
control of a separate MCU/firmware.

>
> If your architecture allows another CPU to change the entering-idle CPU's
> frequency, synchronization will be necessary as well - this is where it
> can get a bit tricky.

That is a very good point, the frequency scaling is not per-CPU but for
the entire CPU complex (up to 4 cores) so that might indeed be a problem.

Thanks!
--
Florian