2016-03-01 13:58:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On Fri, Feb 12, 2016 at 03:48:54PM +0100, Vincent Guittot wrote:

> Another point to take into account is that the RT tasks will "steal"
> the compute capacity that has been requested by the cfs tasks.
>
> Let takes the example of a CPU with 3 OPP on which run 2 rt tasks A
> and B and 1 cfs task C.

> Let assume that the real time constraint of RT task A is too agressive
> for the lowest OPP0 and that the change of the frequency of the core
> is too slow compare to this constraint but the real time constraint of
> RT task B can be handle whatever the OPP. System don't have other
> choice than setting the cpufreq min freq to OPP1 to be sure that
> constraint of task A will be covered at anytime.

> Then, we still have 2
> possible OPPs. The CFS task asks for compute capacity that fits in
> OPP1 but a part of this capacity will be stolen by RT tasks. If we
> monitor the load of RT tasks and request capacity for these RT tasks
> according to their current utilization, we can decide to switch to
> highest OPP2 to ensure that task C will have enough remaining
> capacity. A lot of embedded platform faces such kind of use cases

Still doesn't make sense. How would you know the constraint of RT task
A, and that it cannot be satisfied by OPP0 ? The only information you
have in the task model is a static priority.

The only possible choice the kernel has at this point is max OPP. It
doesn't have enough (_any_) information about worst case execution of
that task.


2016-03-01 14:16:07

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On 01/03/16 14:58, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 03:48:54PM +0100, Vincent Guittot wrote:
>
> > Another point to take into account is that the RT tasks will "steal"
> > the compute capacity that has been requested by the cfs tasks.
> >
> > Let takes the example of a CPU with 3 OPP on which run 2 rt tasks A
> > and B and 1 cfs task C.
>
> > Let assume that the real time constraint of RT task A is too agressive
> > for the lowest OPP0 and that the change of the frequency of the core
> > is too slow compare to this constraint but the real time constraint of
> > RT task B can be handle whatever the OPP. System don't have other
> > choice than setting the cpufreq min freq to OPP1 to be sure that
> > constraint of task A will be covered at anytime.
>
> > Then, we still have 2
> > possible OPPs. The CFS task asks for compute capacity that fits in
> > OPP1 but a part of this capacity will be stolen by RT tasks. If we
> > monitor the load of RT tasks and request capacity for these RT tasks
> > according to their current utilization, we can decide to switch to
> > highest OPP2 to ensure that task C will have enough remaining
> > capacity. A lot of embedded platform faces such kind of use cases
>
> Still doesn't make sense. How would you know the constraint of RT task
> A, and that it cannot be satisfied by OPP0 ? The only information you
> have in the task model is a static priority.
>

But, can't we have the problem Vincent describes if we s/RT/DL/ ?

Thanks,

- Juri

2016-03-01 14:25:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On Tue, Mar 01, 2016 at 02:17:06PM +0000, Juri Lelli wrote:
> On 01/03/16 14:58, Peter Zijlstra wrote:
> > On Fri, Feb 12, 2016 at 03:48:54PM +0100, Vincent Guittot wrote:
> >
> > > Another point to take into account is that the RT tasks will "steal"
> > > the compute capacity that has been requested by the cfs tasks.
> > >
> > > Let takes the example of a CPU with 3 OPP on which run 2 rt tasks A
> > > and B and 1 cfs task C.
> >
> > > Let assume that the real time constraint of RT task A is too agressive
> > > for the lowest OPP0 and that the change of the frequency of the core
> > > is too slow compare to this constraint but the real time constraint of
> > > RT task B can be handle whatever the OPP. System don't have other
> > > choice than setting the cpufreq min freq to OPP1 to be sure that
> > > constraint of task A will be covered at anytime.
> >
> > > Then, we still have 2
> > > possible OPPs. The CFS task asks for compute capacity that fits in
> > > OPP1 but a part of this capacity will be stolen by RT tasks. If we
> > > monitor the load of RT tasks and request capacity for these RT tasks
> > > according to their current utilization, we can decide to switch to
> > > highest OPP2 to ensure that task C will have enough remaining
> > > capacity. A lot of embedded platform faces such kind of use cases
> >
> > Still doesn't make sense. How would you know the constraint of RT task
> > A, and that it cannot be satisfied by OPP0 ? The only information you
> > have in the task model is a static priority.
> >
>
> But, can't we have the problem Vincent describes if we s/RT/DL/ ?

Still not sure I actually see a problem. With DL you have a minimal OPP
required to guarantee correct execution of the DL tasks. For CFS you
have an average util reflecting its workload.

Add the two and you've got an effective OPP request. Or in CPPC terms:
we request a min freq of the DL and a max freq of DL+avg_CFS.

We could probably improve upon that by also tracking an avg DL and
lowering the max freq request to: min(DL, avg_DL + avg_CFS). The
consequence is that when the DL tasks hit peaks (over their avg) the CFS
tasks get a little more delay. But this might be a worthwhile trade-off.

2016-03-01 14:26:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On Tue, Mar 01, 2016 at 03:24:59PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 01, 2016 at 02:17:06PM +0000, Juri Lelli wrote:
> > On 01/03/16 14:58, Peter Zijlstra wrote:
> > > On Fri, Feb 12, 2016 at 03:48:54PM +0100, Vincent Guittot wrote:
> > >
> > > > Another point to take into account is that the RT tasks will "steal"
> > > > the compute capacity that has been requested by the cfs tasks.
> > > >
> > > > Let takes the example of a CPU with 3 OPP on which run 2 rt tasks A
> > > > and B and 1 cfs task C.
> > >
> > > > Let assume that the real time constraint of RT task A is too agressive
> > > > for the lowest OPP0 and that the change of the frequency of the core
> > > > is too slow compare to this constraint but the real time constraint of
> > > > RT task B can be handle whatever the OPP. System don't have other
> > > > choice than setting the cpufreq min freq to OPP1 to be sure that
> > > > constraint of task A will be covered at anytime.
> > >
> > > > Then, we still have 2
> > > > possible OPPs. The CFS task asks for compute capacity that fits in
> > > > OPP1 but a part of this capacity will be stolen by RT tasks. If we
> > > > monitor the load of RT tasks and request capacity for these RT tasks
> > > > according to their current utilization, we can decide to switch to
> > > > highest OPP2 to ensure that task C will have enough remaining
> > > > capacity. A lot of embedded platform faces such kind of use cases
> > >
> > > Still doesn't make sense. How would you know the constraint of RT task
> > > A, and that it cannot be satisfied by OPP0 ? The only information you
> > > have in the task model is a static priority.
> > >
> >
> > But, can't we have the problem Vincent describes if we s/RT/DL/ ?
>
> Still not sure I actually see a problem. With DL you have a minimal OPP
> required to guarantee correct execution of the DL tasks. For CFS you
> have an average util reflecting its workload.
>
> Add the two and you've got an effective OPP request. Or in CPPC terms:
> we request a min freq of the DL and a max freq of DL+avg_CFS.
>
> We could probably improve upon that by also tracking an avg DL and
> lowering the max freq request to: min(DL, avg_DL + avg_CFS). The

max(DL, avg_DL + avg_CFS) obviously! ;-)

> consequence is that when the DL tasks hit peaks (over their avg) the CFS
> tasks get a little more delay. But this might be a worthwhile trade-off.

2016-03-01 14:41:11

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On 01/03/16 15:26, Peter Zijlstra wrote:
> On Tue, Mar 01, 2016 at 03:24:59PM +0100, Peter Zijlstra wrote:
> > On Tue, Mar 01, 2016 at 02:17:06PM +0000, Juri Lelli wrote:
> > > On 01/03/16 14:58, Peter Zijlstra wrote:
> > > > On Fri, Feb 12, 2016 at 03:48:54PM +0100, Vincent Guittot wrote:
> > > >
> > > > > Another point to take into account is that the RT tasks will "steal"
> > > > > the compute capacity that has been requested by the cfs tasks.
> > > > >
> > > > > Let takes the example of a CPU with 3 OPP on which run 2 rt tasks A
> > > > > and B and 1 cfs task C.
> > > >
> > > > > Let assume that the real time constraint of RT task A is too agressive
> > > > > for the lowest OPP0 and that the change of the frequency of the core
> > > > > is too slow compare to this constraint but the real time constraint of
> > > > > RT task B can be handle whatever the OPP. System don't have other
> > > > > choice than setting the cpufreq min freq to OPP1 to be sure that
> > > > > constraint of task A will be covered at anytime.
> > > >
> > > > > Then, we still have 2
> > > > > possible OPPs. The CFS task asks for compute capacity that fits in
> > > > > OPP1 but a part of this capacity will be stolen by RT tasks. If we
> > > > > monitor the load of RT tasks and request capacity for these RT tasks
> > > > > according to their current utilization, we can decide to switch to
> > > > > highest OPP2 to ensure that task C will have enough remaining
> > > > > capacity. A lot of embedded platform faces such kind of use cases
> > > >
> > > > Still doesn't make sense. How would you know the constraint of RT task
> > > > A, and that it cannot be satisfied by OPP0 ? The only information you
> > > > have in the task model is a static priority.
> > > >
> > >
> > > But, can't we have the problem Vincent describes if we s/RT/DL/ ?
> >
> > Still not sure I actually see a problem. With DL you have a minimal OPP
> > required to guarantee correct execution of the DL tasks. For CFS you
> > have an average util reflecting its workload.
> >
> > Add the two and you've got an effective OPP request. Or in CPPC terms:
> > we request a min freq of the DL and a max freq of DL+avg_CFS.
> >
> > We could probably improve upon that by also tracking an avg DL and
> > lowering the max freq request to: min(DL, avg_DL + avg_CFS). The
>
> max(DL, avg_DL + avg_CFS) obviously! ;-)
>
> > consequence is that when the DL tasks hit peaks (over their avg) the CFS
> > tasks get a little more delay. But this might be a worthwhile trade-off.
>

Agree. My point was actually more about Rafael's schedutil RFC (I should
probably have posted this there, but I thought it fitted well with this
example). I realize that Rafael is starting simple, but I fear that some
aggregation of util coming from the different classes will be needed in
the end; schedfreq has already something along this line.

IMHO, the general approach would be that every scheduling class has an
interface to communicate its util requirement. Then RT will probably
have to ask for max, but CFS and DL will do better.

Thanks,

- Juri

2016-03-01 14:58:47

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On 1 March 2016 at 14:58, Peter Zijlstra <[email protected]> wrote:
> On Fri, Feb 12, 2016 at 03:48:54PM +0100, Vincent Guittot wrote:
>
>> Another point to take into account is that the RT tasks will "steal"
>> the compute capacity that has been requested by the cfs tasks.
>>
>> Let takes the example of a CPU with 3 OPP on which run 2 rt tasks A
>> and B and 1 cfs task C.
>
>> Let assume that the real time constraint of RT task A is too agressive
>> for the lowest OPP0 and that the change of the frequency of the core
>> is too slow compare to this constraint but the real time constraint of
>> RT task B can be handle whatever the OPP. System don't have other
>> choice than setting the cpufreq min freq to OPP1 to be sure that
>> constraint of task A will be covered at anytime.
>
>> Then, we still have 2
>> possible OPPs. The CFS task asks for compute capacity that fits in
>> OPP1 but a part of this capacity will be stolen by RT tasks. If we
>> monitor the load of RT tasks and request capacity for these RT tasks
>> according to their current utilization, we can decide to switch to
>> highest OPP2 to ensure that task C will have enough remaining
>> capacity. A lot of embedded platform faces such kind of use cases
>
> Still doesn't make sense. How would you know the constraint of RT task
> A, and that it cannot be satisfied by OPP0 ? The only information you
> have in the task model is a static priority.

The kernel doesn't have this information so that's why the sysfs
cpufreq/scaling_min_freq has to be used to prevent the kernel (and
cpufreq in particular) to use OPP0.
>From a kernel/sched/cpufreq pov, we assume that all OPPs above
cpufreq/scaling_min can be used with RT tasks of the system. And
performance governor is used if only highest OPP can be used.

>
> The only possible choice the kernel has at this point is max OPP. It
> doesn't have enough (_any_) information about worst case execution of
> that task.
>

2016-03-01 15:04:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On Tue, Mar 01, 2016 at 02:42:10PM +0000, Juri Lelli wrote:
> Agree. My point was actually more about Rafael's schedutil RFC (I should
> probably have posted this there, but I thought it fitted well with this
> example). I realize that Rafael is starting simple, but I fear that some
> aggregation of util coming from the different classes will be needed in
> the end; schedfreq has already something along this line.

Right, but I'm not sure that's a hard thing to add. But yes, it needs
doing.

It also very much has a bearing on the OPP state selection. As already
pointed out, the nearest OPP thing Rafael did is just wrong for DL.

It probably makes sense to pass a CPPC like form into the (software) OPP
selector.

> IMHO, the general approach would be that every scheduling class has an
> interface to communicate its util requirement. Then RT will probably
> have to ask for max, but CFS and DL will do better.

Right, so on IRC you mentioned that we could also use the global (or
cgroup) RT throttle to lower the RT util/OPP.

2016-03-01 19:49:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

On Tue, Mar 1, 2016 at 4:04 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, Mar 01, 2016 at 02:42:10PM +0000, Juri Lelli wrote:
>> Agree. My point was actually more about Rafael's schedutil RFC (I should
>> probably have posted this there, but I thought it fitted well with this
>> example). I realize that Rafael is starting simple, but I fear that some
>> aggregation of util coming from the different classes will be needed in
>> the end; schedfreq has already something along this line.
>
> Right, but I'm not sure that's a hard thing to add. But yes, it needs
> doing.
>
> It also very much has a bearing on the OPP state selection. As already
> pointed out, the nearest OPP thing Rafael did is just wrong for DL.
>
> It probably makes sense to pass a CPPC like form into the (software) OPP
> selector.
>
>> IMHO, the general approach would be that every scheduling class has an
>> interface to communicate its util requirement. Then RT will probably
>> have to ask for max, but CFS and DL will do better.
>
> Right, so on IRC you mentioned that we could also use the global (or
> cgroup) RT throttle to lower the RT util/OPP.

The current code simply treats RT/DL as "uknknown" and will always ask
for the max for them. That should work, although it's suboptimal for
DL at least. However, I'd prefer to add something more sophisticated
on top of it just to keep things simple to start with.