2014-06-03 11:44:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Fri, May 23, 2014 at 07:16:33PM +0100, Morten Rasmussen wrote:
> +static struct capacity_state cap_states_cluster_a7[] = {
> + /* Cluster only power */
> + { .cap = 358, .power = 2967, }, /* 350 MHz */
> + { .cap = 410, .power = 2792, }, /* 400 MHz */
> + { .cap = 512, .power = 2810, }, /* 500 MHz */
> + { .cap = 614, .power = 2815, }, /* 600 MHz */
> + { .cap = 717, .power = 2919, }, /* 700 MHz */
> + { .cap = 819, .power = 2847, }, /* 800 MHz */
> + { .cap = 922, .power = 3917, }, /* 900 MHz */
> + { .cap = 1024, .power = 4905, }, /* 1000 MHz */
> + };
> +
> +static struct capacity_state cap_states_cluster_a15[] = {
> + /* Cluster only power */
> + { .cap = 840, .power = 7920, }, /* 500 MHz */
> + { .cap = 1008, .power = 8165, }, /* 600 MHz */
> + { .cap = 1176, .power = 8172, }, /* 700 MHz */
> + { .cap = 1343, .power = 8195, }, /* 800 MHz */
> + { .cap = 1511, .power = 8265, }, /* 900 MHz */
> + { .cap = 1679, .power = 8446, }, /* 1000 MHz */
> + { .cap = 1847, .power = 11426, }, /* 1100 MHz */
> + { .cap = 2015, .power = 15200, }, /* 1200 MHz */
> + };


So how did you obtain these numbers? Did you use numbers provided by the
hardware people, or did you run a particular benchmark and record the
power usage?

Does that benchmark do some actual work (as opposed to a while(1) loop)
to keep more silicon lit up?

If you have a setup for measuring these, should we try and publish that
too so that people can run it on their platform and provide these
numbers?


Attachments:
(No filename) (1.49 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-04 15:42:22

by Morten Rasmussen

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Tue, Jun 03, 2014 at 12:44:28PM +0100, Peter Zijlstra wrote:
> On Fri, May 23, 2014 at 07:16:33PM +0100, Morten Rasmussen wrote:
> > +static struct capacity_state cap_states_cluster_a7[] = {
> > + /* Cluster only power */
> > + { .cap = 358, .power = 2967, }, /* 350 MHz */
> > + { .cap = 410, .power = 2792, }, /* 400 MHz */
> > + { .cap = 512, .power = 2810, }, /* 500 MHz */
> > + { .cap = 614, .power = 2815, }, /* 600 MHz */
> > + { .cap = 717, .power = 2919, }, /* 700 MHz */
> > + { .cap = 819, .power = 2847, }, /* 800 MHz */
> > + { .cap = 922, .power = 3917, }, /* 900 MHz */
> > + { .cap = 1024, .power = 4905, }, /* 1000 MHz */
> > + };
> > +
> > +static struct capacity_state cap_states_cluster_a15[] = {
> > + /* Cluster only power */
> > + { .cap = 840, .power = 7920, }, /* 500 MHz */
> > + { .cap = 1008, .power = 8165, }, /* 600 MHz */
> > + { .cap = 1176, .power = 8172, }, /* 700 MHz */
> > + { .cap = 1343, .power = 8195, }, /* 800 MHz */
> > + { .cap = 1511, .power = 8265, }, /* 900 MHz */
> > + { .cap = 1679, .power = 8446, }, /* 1000 MHz */
> > + { .cap = 1847, .power = 11426, }, /* 1100 MHz */
> > + { .cap = 2015, .power = 15200, }, /* 1200 MHz */
> > + };
>
>
> So how did you obtain these numbers? Did you use numbers provided by the
> hardware people, or did you run a particular benchmark and record the
> power usage?
>
> Does that benchmark do some actual work (as opposed to a while(1) loop)
> to keep more silicon lit up?

Hardware people don't like sharing data, so I did my own measurements
and calculations to get the numbers above.

ARM TC2 has on-chip energy counters for counting energy consumed by the
A7 and A15 clusters. They are fairly accurate. I used sysbench cpu
benchmark as test workload for the above numbers. sysbench might not be
a representative workload, but it is easy to use. I think, ideally,
vendors would run their own mix of workloads they care about and derrive
their numbers for their platform based on that.

> If you have a setup for measuring these, should we try and publish that
> too so that people can run it on their platform and provide these
> numbers?

The workload setup I used quite simple. I ran sysbench with taskset with
different numbers of threads to extrapolate power consumed by each
individual cpu and how much comes from just powering on the domain.

Measuring the actual power is very platform specific. Developing a fully
automated tool do it for any given platform isn't straigt forward, but
I'm happy to share how I did it. I can add a description of the method I
used on TC2 to the documentation so others can use it as reference.

2014-06-04 16:26:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Wed, Jun 04, 2014 at 04:42:27PM +0100, Morten Rasmussen wrote:
> On Tue, Jun 03, 2014 at 12:44:28PM +0100, Peter Zijlstra wrote:
> > On Fri, May 23, 2014 at 07:16:33PM +0100, Morten Rasmussen wrote:
> > > +static struct capacity_state cap_states_cluster_a7[] = {
> > > + /* Cluster only power */
> > > + { .cap = 358, .power = 2967, }, /* 350 MHz */
> > > + { .cap = 410, .power = 2792, }, /* 400 MHz */
> > > + { .cap = 512, .power = 2810, }, /* 500 MHz */
> > > + { .cap = 614, .power = 2815, }, /* 600 MHz */
> > > + { .cap = 717, .power = 2919, }, /* 700 MHz */
> > > + { .cap = 819, .power = 2847, }, /* 800 MHz */
> > > + { .cap = 922, .power = 3917, }, /* 900 MHz */
> > > + { .cap = 1024, .power = 4905, }, /* 1000 MHz */
> > > + };
> > > +
> > > +static struct capacity_state cap_states_cluster_a15[] = {
> > > + /* Cluster only power */
> > > + { .cap = 840, .power = 7920, }, /* 500 MHz */
> > > + { .cap = 1008, .power = 8165, }, /* 600 MHz */
> > > + { .cap = 1176, .power = 8172, }, /* 700 MHz */
> > > + { .cap = 1343, .power = 8195, }, /* 800 MHz */
> > > + { .cap = 1511, .power = 8265, }, /* 900 MHz */
> > > + { .cap = 1679, .power = 8446, }, /* 1000 MHz */
> > > + { .cap = 1847, .power = 11426, }, /* 1100 MHz */
> > > + { .cap = 2015, .power = 15200, }, /* 1200 MHz */
> > > + };
> >
> >
> > So how did you obtain these numbers? Did you use numbers provided by the
> > hardware people, or did you run a particular benchmark and record the
> > power usage?
> >
> > Does that benchmark do some actual work (as opposed to a while(1) loop)
> > to keep more silicon lit up?
>
> Hardware people don't like sharing data, so I did my own measurements
> and calculations to get the numbers above.
>
> ARM TC2 has on-chip energy counters for counting energy consumed by the
> A7 and A15 clusters. They are fairly accurate.

Recent Intel chips have that too; they come packaged as:

perf stat -a -e "power/energy-cores/" -- cmd

(through the perf_event_intel_rapl.c driver), It would be ideal if the
ARM equivalent was available through a similar interface.

http://lwn.net/Articles/573602/

> I used sysbench cpu
> benchmark as test workload for the above numbers. sysbench might not be
> a representative workload, but it is easy to use. I think, ideally,
> vendors would run their own mix of workloads they care about and derrive
> their numbers for their platform based on that.
>
> > If you have a setup for measuring these, should we try and publish that
> > too so that people can run it on their platform and provide these
> > numbers?
>
> The workload setup I used quite simple. I ran sysbench with taskset with
> different numbers of threads to extrapolate power consumed by each
> individual cpu and how much comes from just powering on the domain.
>
> Measuring the actual power is very platform specific. Developing a fully
> automated tool do it for any given platform isn't straigt forward, but
> I'm happy to share how I did it. I can add a description of the method I
> used on TC2 to the documentation so others can use it as reference.

That would be good I think, esp. if we can get similar perf based energy
measurement things sorted. And if we make the tool consume the machine
topology present in sysfs we can get a long way towards automating this
I think.



Attachments:
(No filename) (3.27 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-06 13:15:15

by Morten Rasmussen

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Wed, Jun 04, 2014 at 05:16:18PM +0100, Peter Zijlstra wrote:
> On Wed, Jun 04, 2014 at 04:42:27PM +0100, Morten Rasmussen wrote:
> > On Tue, Jun 03, 2014 at 12:44:28PM +0100, Peter Zijlstra wrote:
> > > On Fri, May 23, 2014 at 07:16:33PM +0100, Morten Rasmussen wrote:
> > > > +static struct capacity_state cap_states_cluster_a7[] = {
> > > > + /* Cluster only power */
> > > > + { .cap = 358, .power = 2967, }, /* 350 MHz */
> > > > + { .cap = 410, .power = 2792, }, /* 400 MHz */
> > > > + { .cap = 512, .power = 2810, }, /* 500 MHz */
> > > > + { .cap = 614, .power = 2815, }, /* 600 MHz */
> > > > + { .cap = 717, .power = 2919, }, /* 700 MHz */
> > > > + { .cap = 819, .power = 2847, }, /* 800 MHz */
> > > > + { .cap = 922, .power = 3917, }, /* 900 MHz */
> > > > + { .cap = 1024, .power = 4905, }, /* 1000 MHz */
> > > > + };
> > > > +
> > > > +static struct capacity_state cap_states_cluster_a15[] = {
> > > > + /* Cluster only power */
> > > > + { .cap = 840, .power = 7920, }, /* 500 MHz */
> > > > + { .cap = 1008, .power = 8165, }, /* 600 MHz */
> > > > + { .cap = 1176, .power = 8172, }, /* 700 MHz */
> > > > + { .cap = 1343, .power = 8195, }, /* 800 MHz */
> > > > + { .cap = 1511, .power = 8265, }, /* 900 MHz */
> > > > + { .cap = 1679, .power = 8446, }, /* 1000 MHz */
> > > > + { .cap = 1847, .power = 11426, }, /* 1100 MHz */
> > > > + { .cap = 2015, .power = 15200, }, /* 1200 MHz */
> > > > + };
> > >
> > >
> > > So how did you obtain these numbers? Did you use numbers provided by the
> > > hardware people, or did you run a particular benchmark and record the
> > > power usage?
> > >
> > > Does that benchmark do some actual work (as opposed to a while(1) loop)
> > > to keep more silicon lit up?
> >
> > Hardware people don't like sharing data, so I did my own measurements
> > and calculations to get the numbers above.
> >
> > ARM TC2 has on-chip energy counters for counting energy consumed by the
> > A7 and A15 clusters. They are fairly accurate.
>
> Recent Intel chips have that too; they come packaged as:
>
> perf stat -a -e "power/energy-cores/" -- cmd
>
> (through the perf_event_intel_rapl.c driver), It would be ideal if the
> ARM equivalent was available through a similar interface.
>
> http://lwn.net/Articles/573602/

Nice. On ARM it is not mandatory to have energy counters and what they
actually measure if they are implemented is implementation dependent.
However, each vendor does extensive evaluation and characterization of
their implementation already, so I don't think would be a problem for
them to provide the numbers.

> > I used sysbench cpu
> > benchmark as test workload for the above numbers. sysbench might not be
> > a representative workload, but it is easy to use. I think, ideally,
> > vendors would run their own mix of workloads they care about and derrive
> > their numbers for their platform based on that.
> >
> > > If you have a setup for measuring these, should we try and publish that
> > > too so that people can run it on their platform and provide these
> > > numbers?
> >
> > The workload setup I used quite simple. I ran sysbench with taskset with
> > different numbers of threads to extrapolate power consumed by each
> > individual cpu and how much comes from just powering on the domain.
> >
> > Measuring the actual power is very platform specific. Developing a fully
> > automated tool do it for any given platform isn't straigt forward, but
> > I'm happy to share how I did it. I can add a description of the method I
> > used on TC2 to the documentation so others can use it as reference.
>
> That would be good I think, esp. if we can get similar perf based energy
> measurement things sorted. And if we make the tool consume the machine
> topology present in sysfs we can get a long way towards automating this
> I think.

Some of the measurements could be automated. Others are hard to
automate as they require extensive knowledge about the platform. wakeup
energy, for example. You may need to do various tricks and hacks to
force the platform to use a specific idle-state so you know what you are
measuring.

I will add the TC2 recipe as a start and then see if my ugly scripts can
be turned into something generally useful.

2014-06-06 13:43:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Fri, Jun 06, 2014 at 02:15:10PM +0100, Morten Rasmussen wrote:
> > > ARM TC2 has on-chip energy counters for counting energy consumed by the
> > > A7 and A15 clusters. They are fairly accurate.
> >
> > Recent Intel chips have that too; they come packaged as:
> >
> > perf stat -a -e "power/energy-cores/" -- cmd
> >
> > (through the perf_event_intel_rapl.c driver), It would be ideal if the
> > ARM equivalent was available through a similar interface.
> >
> > http://lwn.net/Articles/573602/
>
> Nice. On ARM it is not mandatory to have energy counters and what they
> actually measure if they are implemented is implementation dependent.
> However, each vendor does extensive evaluation and characterization of
> their implementation already, so I don't think would be a problem for
> them to provide the numbers.

How is the ARM energy thing exposed? Through the regular PMU but with
vendor specific events, or through a separate interface, entirely vendor
specific?

In any case, would it be at all possible to nudge them to provide a
'driver' for this so that they can be more easily used?

> Some of the measurements could be automated. Others are hard to
> automate as they require extensive knowledge about the platform. wakeup
> energy, for example. You may need to do various tricks and hacks to
> force the platform to use a specific idle-state so you know what you are
> measuring.
>
> I will add the TC2 recipe as a start and then see if my ugly scripts can
> be turned into something generally useful.

Fair enough; I would prefer to have a situation where 'we' can validate
whatever magic numbers the vendors provide for their hardware, or can
generate numbers for hardware where the vendor is not interested.

But yes, publishing your hacks is a good first step at getting such a
thing going, if we then further require everybody to use this 'tool' and
improve if not suitable, we might end up with something useful ;-)

2014-06-06 14:29:57

by Morten Rasmussen

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Fri, Jun 06, 2014 at 02:43:03PM +0100, Peter Zijlstra wrote:
> On Fri, Jun 06, 2014 at 02:15:10PM +0100, Morten Rasmussen wrote:
> > > > ARM TC2 has on-chip energy counters for counting energy consumed by the
> > > > A7 and A15 clusters. They are fairly accurate.
> > >
> > > Recent Intel chips have that too; they come packaged as:
> > >
> > > perf stat -a -e "power/energy-cores/" -- cmd
> > >
> > > (through the perf_event_intel_rapl.c driver), It would be ideal if the
> > > ARM equivalent was available through a similar interface.
> > >
> > > http://lwn.net/Articles/573602/
> >
> > Nice. On ARM it is not mandatory to have energy counters and what they
> > actually measure if they are implemented is implementation dependent.
> > However, each vendor does extensive evaluation and characterization of
> > their implementation already, so I don't think would be a problem for
> > them to provide the numbers.
>
> How is the ARM energy thing exposed? Through the regular PMU but with
> vendor specific events, or through a separate interface, entirely vendor
> specific?

There is an upstream hwmon driver for TC2 already with an easy to use
sysfs interface for all the energy counters. So it is somewhat vendor
specific at the moment unfortunately.

> In any case, would it be at all possible to nudge them to provide a
> 'driver' for this so that they can be more easily used?

I have raised it internally that unification on this front is needed.

> > Some of the measurements could be automated. Others are hard to
> > automate as they require extensive knowledge about the platform. wakeup
> > energy, for example. You may need to do various tricks and hacks to
> > force the platform to use a specific idle-state so you know what you are
> > measuring.
> >
> > I will add the TC2 recipe as a start and then see if my ugly scripts can
> > be turned into something generally useful.
>
> Fair enough; I would prefer to have a situation where 'we' can validate
> whatever magic numbers the vendors provide for their hardware, or can
> generate numbers for hardware where the vendor is not interested.
>
> But yes, publishing your hacks is a good first step at getting such a
> thing going, if we then further require everybody to use this 'tool' and
> improve if not suitable, we might end up with something useful ;-)

Fair plan ;-)

That said, vendors may want to provide slightly different numbers if
they do characterization based on workloads they care about rather than
sysbench or whatever 'we' end up using. The numbers will vary depending
on which workload(s) you use.

2014-06-12 15:13:14

by Vince Weaver

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

On Fri, 6 Jun 2014, Morten Rasmussen wrote:

> On Fri, Jun 06, 2014 at 02:43:03PM +0100, Peter Zijlstra wrote:
> > On Fri, Jun 06, 2014 at 02:15:10PM +0100, Morten Rasmussen wrote:
> > > > > ARM TC2 has on-chip energy counters for counting energy consumed by the
> > > > > A7 and A15 clusters. They are fairly accurate.
> > > >
> > > > Recent Intel chips have that too; they come packaged as:
> > > >
> > > > perf stat -a -e "power/energy-cores/" -- cmd
> > > >
> > > > (through the perf_event_intel_rapl.c driver), It would be ideal if the
> > > > ARM equivalent was available through a similar interface.
> > > >
> > > > http://lwn.net/Articles/573602/
> > >
> > > Nice. On ARM it is not mandatory to have energy counters and what they
> > > actually measure if they are implemented is implementation dependent.
> > > However, each vendor does extensive evaluation and characterization of
> > > their implementation already, so I don't think would be a problem for
> > > them to provide the numbers.
> >
> > How is the ARM energy thing exposed? Through the regular PMU but with
> > vendor specific events, or through a separate interface, entirely vendor
> > specific?
>
> There is an upstream hwmon driver for TC2 already with an easy to use
> sysfs interface for all the energy counters. So it is somewhat vendor
> specific at the moment unfortunately.

What is the plan about future interfaces for energy info?

Intel RAPL of course has a perf_event interface.

However AMD's (somewhat unfortunately acronymed) Application Power
Management exports similar information via hwmon and the fam15h_power
driver.

And it sounds like ARM systems also put things in hwmon.

User tools like PAPI can sort of abstract this (for example it supports
getting RAPL data from perf_event while it also has a driver for getting
info from hwmon). But users stuck with perf end up having to use multiple
tools to get energy and performance info simultaneously on non-intel
hardware.

Vince