2014-06-05 08:49:59

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFC PATCH 01/16] sched: Documentation for scheduler energy cost model

Hi Morten,

On 23 May 2014 20:16, Morten Rasmussen <[email protected]> wrote:
> This documentation patch provide a brief overview of the experimental
> scheduler energy costing model and associated data structures.
>
> Signed-off-by: Morten Rasmussen <[email protected]>
> ---
> Documentation/scheduler/sched-energy.txt | 66 ++++++++++++++++++++++++++++++
> 1 file changed, 66 insertions(+)
> create mode 100644 Documentation/scheduler/sched-energy.txt
>
> diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
> new file mode 100644
> index 0000000..c6896c0
> --- /dev/null
> +++ b/Documentation/scheduler/sched-energy.txt
> @@ -0,0 +1,66 @@
> +Energy cost model for energy-aware scheduling (EXPERIMENTAL)
> +
> +Introduction
> +=============
> +The basic energy model uses platform energy data stored in sched_energy data
> +structures attached to the sched_groups in the sched_domain hierarchy. The
> +energy cost model offers two function that can be used to guide scheduling
> +decisions:
> +
> +1. energy_diff_util(cpu, util, wakeups)

Could you give us mor edetails of what util and wakeups are ?
util is a absolute value or a delta
Is wakeups a boolean or does wakeups define a number of tasks/cpus
that wake up ?

> +2. energy_diff_task(cpu, task)
> +
> +Both return the energy cost delta caused by adding/removing utilization or a
> +task to/from a specific cpu.
> +
> +CONFIG_SCHED_ENERGY needs to be defined in Kconfig to enable the energy cost
> +model and associated data structures.
> +
> +The basic algorithm
> +====================
> +The basic idea is to determine the energy cost at each level in sched_domain
> +hierarchy based on utilization:
> +
> + for_each_domain(cpu, sd) {
> + sg = sched_group_of(cpu)
> + energy_before = curr_util(sg) * busy_power(sg)
> + + 1-curr_util(sg) * idle_power(sg)
> + energy_after = new_util(sg) * busy_power(sg)
> + + 1-new_util(sg) * idle_power(sg)
> + + new_util(sg) * task_wakeups
> + * wakeup_energy(sg)
> + energy_diff += energy_before - energy_after
> + }
> +
> + return energy_diff

So this is the algorithm used in energy_diff_util and energy_diff_task ?

it's not straight foward for me to map the algorithm variable and the
function argument

> +
> +Platform energy data
> +=====================
> +struct sched_energy has the following members:
> +
> +cap_states:
> + List of struct capacity_state representing the supported capacity states
> + (P-states). struct capacity_state has two members: cap and power, which
> + represents the compute capacity and the busy power of the state. The
> + list must ordered by capacity low->high.
> +
> +nr_cap_states:
> + Number of capacity states in cap_states.
> +
> +max_capacity:
> + The highest capacity supported by any of the capacity states in
> + cap_states.

can't you directly use cap_states[nr_cap_states].cap has the array is ordered ?

Vincent
> +
> +idle_power:
> + Idle power consumption. Will be extended to support multiple C-states
> + later.
> +
> +wakeup_energy:
> + Energy cost of wakeup/power-down cycle for the sched_group which this is
> + attached to. Will be extended to support different costs for different
> + C-states later.
> +
> +There are no unit requirements for the energy cost data. Data can be normalized
> +with any reference, however, the normalization must be consistent across all
> +energy cost data. That is, one bogo-joule/watt must be same quantity for data,
> +but we don't care what it is.
> --
> 1.7.9.5
>
>


2014-06-05 11:35:48

by Morten Rasmussen

[permalink] [raw]
Subject: Re: [RFC PATCH 01/16] sched: Documentation for scheduler energy cost model

On Thu, Jun 05, 2014 at 09:49:35AM +0100, Vincent Guittot wrote:
> Hi Morten,
>
> On 23 May 2014 20:16, Morten Rasmussen <[email protected]> wrote:
> > This documentation patch provide a brief overview of the experimental
> > scheduler energy costing model and associated data structures.
> >
> > Signed-off-by: Morten Rasmussen <[email protected]>
> > ---
> > Documentation/scheduler/sched-energy.txt | 66 ++++++++++++++++++++++++++++++
> > 1 file changed, 66 insertions(+)
> > create mode 100644 Documentation/scheduler/sched-energy.txt
> >
> > diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
> > new file mode 100644
> > index 0000000..c6896c0
> > --- /dev/null
> > +++ b/Documentation/scheduler/sched-energy.txt
> > @@ -0,0 +1,66 @@
> > +Energy cost model for energy-aware scheduling (EXPERIMENTAL)
> > +
> > +Introduction
> > +=============
> > +The basic energy model uses platform energy data stored in sched_energy data
> > +structures attached to the sched_groups in the sched_domain hierarchy. The
> > +energy cost model offers two function that can be used to guide scheduling
> > +decisions:
> > +
> > +1. energy_diff_util(cpu, util, wakeups)
>
> Could you give us mor edetails of what util and wakeups are ?
> util is a absolute value or a delta
> Is wakeups a boolean or does wakeups define a number of tasks/cpus
> that wake up ?

Good point... It is not clear at all. Improving the documentation is at
the top of my todo list.

cpu: The cpu in question.

util: Is a signed utilization delta. That is, the amount of utilization
we want to add or remove from the cpu. We don't have good metric for
utilization yet (I assume you have followed the thread on that topic
that started from your recent patch posting), so for now I have used
load_avg_contrib. energy_diff_task() just passes the task
load_avg_contrib as the utilization to energy_diff_load().

wakeups: Is the number of wakeups (task enqueues, not idle exits) caused
by the utilization we are about to add or remove from the cpu. We need
to pick some period to measure the wakeups over. For that I have
introduced task wakeup tracking, very similar to the existing load tracking.
The wakeup tracking gives us an indication of how often a task will
cause an idle exit if it ran alone on a cpu. For short but frequently
running tasks, the wakeup cost may be where the majority of the energy
is spent.

>
> > +2. energy_diff_task(cpu, task)
> > +
> > +Both return the energy cost delta caused by adding/removing utilization or a
> > +task to/from a specific cpu.
> > +
> > +CONFIG_SCHED_ENERGY needs to be defined in Kconfig to enable the energy cost
> > +model and associated data structures.
> > +
> > +The basic algorithm
> > +====================
> > +The basic idea is to determine the energy cost at each level in sched_domain
> > +hierarchy based on utilization:
> > +
> > + for_each_domain(cpu, sd) {
> > + sg = sched_group_of(cpu)
> > + energy_before = curr_util(sg) * busy_power(sg)
> > + + 1-curr_util(sg) * idle_power(sg)
> > + energy_after = new_util(sg) * busy_power(sg)
> > + + 1-new_util(sg) * idle_power(sg)
> > + + new_util(sg) * task_wakeups
> > + * wakeup_energy(sg)
> > + energy_diff += energy_before - energy_after
> > + }
> > +
> > + return energy_diff
>
> So this is the algorithm used in energy_diff_util and energy_diff_task ?

It is. energy_diff_task() is basically just a wrapper for
energy_diff_util().

> it's not straight foward for me to map the algorithm variable and the
> function argument

The pseudo-code above is very simplified. It is an attempt to show that
the algorithm goes up the sched_domain hierarhcy and estimates the
energy impact of adding/removing 'util' amount of utilization to/from
the cpu.

{curr, new}_util is the cpu utilization at the lowest level and
the overall non-idle time for the entire group for higher levels.
utilization is in the range 0.0 to 1.0.

busy_power is the power consumption of the group (for TC2, cpu at the
lowest level, cluster at the next).

idle_power is the power consumption of the group while idle (for TC2,
WFI at the lowest level, cluster power down at cluster level).

task_wakeups (should have been just 'wakeups' in the general case) is the
number of wakeups caused by the utilization we are adding/removing. To
predict how many of the wakeups that causes idle exits we scale the
number by the utilization (assuming that wakeups are uniformly
distributed). wakeup_energy is the energy consumed for an idle
exit/entry cycle for the group (for TC2, WFI at lowest level, cluster
power down at cluster level).

At each level we need to compute the energy before and after the change
to find the energy delta.

Does that answer your question?

>
> > +
> > +Platform energy data
> > +=====================
> > +struct sched_energy has the following members:
> > +
> > +cap_states:
> > + List of struct capacity_state representing the supported capacity states
> > + (P-states). struct capacity_state has two members: cap and power, which
> > + represents the compute capacity and the busy power of the state. The
> > + list must ordered by capacity low->high.
> > +
> > +nr_cap_states:
> > + Number of capacity states in cap_states.
> > +
> > +max_capacity:
> > + The highest capacity supported by any of the capacity states in
> > + cap_states.
>
> can't you directly use cap_states[nr_cap_states].cap has the array is ordered ?

Yes, indeed. max_capacity can be removed.

Morten

2014-06-05 15:02:43

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFC PATCH 01/16] sched: Documentation for scheduler energy cost model

On 5 June 2014 13:35, Morten Rasmussen <[email protected]> wrote:
> On Thu, Jun 05, 2014 at 09:49:35AM +0100, Vincent Guittot wrote:
>> Hi Morten,
>>
>> On 23 May 2014 20:16, Morten Rasmussen <[email protected]> wrote:
>> > This documentation patch provide a brief overview of the experimental
>> > scheduler energy costing model and associated data structures.
>> >
>> > Signed-off-by: Morten Rasmussen <[email protected]>
>> > ---
>> > Documentation/scheduler/sched-energy.txt | 66 ++++++++++++++++++++++++++++++
>> > 1 file changed, 66 insertions(+)
>> > create mode 100644 Documentation/scheduler/sched-energy.txt
>> >
>> > diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
>> > new file mode 100644
>> > index 0000000..c6896c0
>> > --- /dev/null
>> > +++ b/Documentation/scheduler/sched-energy.txt
>> > @@ -0,0 +1,66 @@
>> > +Energy cost model for energy-aware scheduling (EXPERIMENTAL)
>> > +
>> > +Introduction
>> > +=============
>> > +The basic energy model uses platform energy data stored in sched_energy data
>> > +structures attached to the sched_groups in the sched_domain hierarchy. The
>> > +energy cost model offers two function that can be used to guide scheduling
>> > +decisions:
>> > +
>> > +1. energy_diff_util(cpu, util, wakeups)
>>
>> Could you give us mor edetails of what util and wakeups are ?
>> util is a absolute value or a delta
>> Is wakeups a boolean or does wakeups define a number of tasks/cpus
>> that wake up ?
>
> Good point... It is not clear at all. Improving the documentation is at
> the top of my todo list.
>
> cpu: The cpu in question.
>
> util: Is a signed utilization delta. That is, the amount of utilization
> we want to add or remove from the cpu. We don't have good metric for
> utilization yet (I assume you have followed the thread on that topic
> that started from your recent patch posting), so for now I have used
> load_avg_contrib. energy_diff_task() just passes the task
> load_avg_contrib as the utilization to energy_diff_load().
>
> wakeups: Is the number of wakeups (task enqueues, not idle exits) caused
> by the utilization we are about to add or remove from the cpu. We need
> to pick some period to measure the wakeups over. For that I have
> introduced task wakeup tracking, very similar to the existing load tracking.
> The wakeup tracking gives us an indication of how often a task will
> cause an idle exit if it ran alone on a cpu. For short but frequently
> running tasks, the wakeup cost may be where the majority of the energy
> is spent.
>
>>
>> > +2. energy_diff_task(cpu, task)
>> > +
>> > +Both return the energy cost delta caused by adding/removing utilization or a
>> > +task to/from a specific cpu.
>> > +
>> > +CONFIG_SCHED_ENERGY needs to be defined in Kconfig to enable the energy cost
>> > +model and associated data structures.
>> > +
>> > +The basic algorithm
>> > +====================
>> > +The basic idea is to determine the energy cost at each level in sched_domain
>> > +hierarchy based on utilization:
>> > +
>> > + for_each_domain(cpu, sd) {
>> > + sg = sched_group_of(cpu)
>> > + energy_before = curr_util(sg) * busy_power(sg)
>> > + + 1-curr_util(sg) * idle_power(sg)
>> > + energy_after = new_util(sg) * busy_power(sg)
>> > + + 1-new_util(sg) * idle_power(sg)
>> > + + new_util(sg) * task_wakeups
>> > + * wakeup_energy(sg)
>> > + energy_diff += energy_before - energy_after
>> > + }
>> > +
>> > + return energy_diff
>>
>> So this is the algorithm used in energy_diff_util and energy_diff_task ?
>
> It is. energy_diff_task() is basically just a wrapper for
> energy_diff_util().
>
>> it's not straight foward for me to map the algorithm variable and the
>> function argument
>
> The pseudo-code above is very simplified. It is an attempt to show that
> the algorithm goes up the sched_domain hierarhcy and estimates the
> energy impact of adding/removing 'util' amount of utilization to/from
> the cpu.
>
> {curr, new}_util is the cpu utilization at the lowest level and
> the overall non-idle time for the entire group for higher levels.
> utilization is in the range 0.0 to 1.0.
>
> busy_power is the power consumption of the group (for TC2, cpu at the
> lowest level, cluster at the next).
>
> idle_power is the power consumption of the group while idle (for TC2,
> WFI at the lowest level, cluster power down at cluster level).
>
> task_wakeups (should have been just 'wakeups' in the general case) is the
> number of wakeups caused by the utilization we are adding/removing. To
> predict how many of the wakeups that causes idle exits we scale the
> number by the utilization (assuming that wakeups are uniformly
> distributed). wakeup_energy is the energy consumed for an idle
> exit/entry cycle for the group (for TC2, WFI at lowest level, cluster
> power down at cluster level).
>
> At each level we need to compute the energy before and after the change
> to find the energy delta.
>
> Does that answer your question?

yes, thanks

>
>>
>> > +
>> > +Platform energy data
>> > +=====================
>> > +struct sched_energy has the following members:
>> > +
>> > +cap_states:
>> > + List of struct capacity_state representing the supported capacity states
>> > + (P-states). struct capacity_state has two members: cap and power, which
>> > + represents the compute capacity and the busy power of the state. The
>> > + list must ordered by capacity low->high.
>> > +
>> > +nr_cap_states:
>> > + Number of capacity states in cap_states.
>> > +
>> > +max_capacity:
>> > + The highest capacity supported by any of the capacity states in
>> > + cap_states.
>>
>> can't you directly use cap_states[nr_cap_states].cap has the array is ordered ?
>
> Yes, indeed. max_capacity can be removed.
>
> Morten
>