2010-12-02 05:18:01

by Lin Ming

[permalink] [raw]
Subject: [RFC PATCH 3/3 v3] perf: Update perf tool to monitor uncore events

Uncore events are monitored with raw events with "ru" prefix("u" for uncore).
Note that, per-task uncore event is not allowed.

$ ./perf stat -e ru0101 -- ls
No permission to collect stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid.

./perf stat -a -C 0 -e ru0101 -- ls

Performance counter stats for 'ls':

949585 raw 0x101

0.001741513 seconds time elapsed

TODO:

- Add a new sub-command for uncore events statistics.
For example, perf package -p <package-id> -e <event>

Signed-off-by: Lin Ming <[email protected]>
---
tools/perf/util/parse-events.c | 14 ++++++++++----
1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 73215e7..81df119 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -278,7 +278,7 @@ const char *__event_name(int type, u64 config)
{
static char buf[32];

- if (type == PERF_TYPE_RAW) {
+ if (type == PERF_TYPE_RAW || type == PERF_TYPE_UNCORE) {
sprintf(buf, "raw 0x%llx", config);
return buf;
}
@@ -659,14 +659,20 @@ parse_raw_event(const char **strp, struct perf_event_attr *attr)
{
const char *str = *strp;
u64 config;
+ int uncore = 0;
int n;

if (*str != 'r')
return EVT_FAILED;
- n = hex2u64(str + 1, &config);
+ if (*(str+1) == 'u')
+ uncore = 1;
+ n = hex2u64(str + 1 + uncore, &config);
if (n > 0) {
- *strp = str + n + 1;
- attr->type = PERF_TYPE_RAW;
+ *strp = str + n + 1 + uncore;
+ if (!uncore)
+ attr->type = PERF_TYPE_RAW;
+ else
+ attr->type = PERF_TYPE_UNCORE;
attr->config = config;
return EVT_HANDLED;
}
--
1.5.3



2010-12-14 01:28:46

by Corey Ashford

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3 v3] perf: Update perf tool to monitor uncore events

On 12/01/2010 09:20 PM, Lin Ming wrote:
> Uncore events are monitored with raw events with "ru" prefix("u" for uncore).
> Note that, per-task uncore event is not allowed.
>
> $ ./perf stat -e ru0101 -- ls
> No permission to collect stats.
> Consider tweaking /proc/sys/kernel/perf_event_paranoid.
>
> ./perf stat -a -C 0 -e ru0101 -- ls

Sorry for replying to this thread so late, but I have some concerns
about this modification.

First of all, "uncore" is an x86-specific term and so it's not clear to
me if you meant for all arches to utilize this encoding for all "not
core but on the same die" events (IBM Power arch refers to this as
"nest" logic).

In the case of the IBM PowerEN chip (aka WireSpeed Processor) we have a
large number of "uncore" PMUs. It's not clear to me how we should break
them up using the syntax you've suggested here. Until now, we (IBM)
have stuck with encoding all PowerEN nest events as PERF_TYPE_RAW and
utilizing the 64-bit config value to encode which PMU, which event, and
other necessary event attribute bits.

In one scenario, we could utilize the "u" encoding as suggested in this
patch, but then we'd be stuck with encoding the specific PMU into the
config value, really not buying us any convenience.

Another way might be to introduce a bunch of new prefixes for each of
the PMU's and add corresponding PERF_TYPE_* values. Do we want a bunch
of arch-specific PERF_TYPE_* values in include/linux/perf_event.h?
Having that many PMUs, we might want a more sophisticated prefix scheme,
perhaps something like what Stephane Eranian uses in libpfm4:
<pmu>::nnnn, e.g:

perf stat -C 0 -e runc::0101
or more verbosely,
perf stat -C 0 -e runcore::0101

For the PowerEN chip, we have PMUs for these nest functional units:
XML accelerator
Regular expression accelerator
Crypto accelerator
PowerBus interface chiplet (0..3)
Network accelerator
Memory controller Synchronous (0..1)
Memory controller Asynchronous (0..1)
etc.

I can see value in adding something like:

perf stat -C 0 -e rmcs0::1d

On the other hand, I don't want to get carried away with this, when we
have the sysfs solution coming down the road, which I think will reduce
or eliminate the need for these prefixes.

Is PERF_TYPE_UNCORE and the "u" encoding intended to be a temporary
solution? How do you envision someone using sysfs to specify an uncore
event (especially one with a raw encoding)?

- Corey

2010-12-14 02:10:29

by Lin Ming

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3 v3] perf: Update perf tool to monitor uncore events

On Tue, 2010-12-14 at 09:28 +0800, Corey Ashford wrote:
> On 12/01/2010 09:20 PM, Lin Ming wrote:
> > Uncore events are monitored with raw events with "ru" prefix("u" for uncore).
> > Note that, per-task uncore event is not allowed.
> >
> > $ ./perf stat -e ru0101 -- ls
> > No permission to collect stats.
> > Consider tweaking /proc/sys/kernel/perf_event_paranoid.
> >
> > ./perf stat -a -C 0 -e ru0101 -- ls
>
> Sorry for replying to this thread so late, but I have some concerns
> about this modification.
>
> First of all, "uncore" is an x86-specific term and so it's not clear to
> me if you meant for all arches to utilize this encoding for all "not
> core but on the same die" events (IBM Power arch refers to this as
> "nest" logic).
>
> In the case of the IBM PowerEN chip (aka WireSpeed Processor) we have a
> large number of "uncore" PMUs. It's not clear to me how we should break
> them up using the syntax you've suggested here. Until now, we (IBM)
> have stuck with encoding all PowerEN nest events as PERF_TYPE_RAW and
> utilizing the 64-bit config value to encode which PMU, which event, and
> other necessary event attribute bits.
>
> In one scenario, we could utilize the "u" encoding as suggested in this
> patch, but then we'd be stuck with encoding the specific PMU into the
> config value, really not buying us any convenience.
>
> Another way might be to introduce a bunch of new prefixes for each of
> the PMU's and add corresponding PERF_TYPE_* values. Do we want a bunch
> of arch-specific PERF_TYPE_* values in include/linux/perf_event.h?
> Having that many PMUs, we might want a more sophisticated prefix scheme,
> perhaps something like what Stephane Eranian uses in libpfm4:
> <pmu>::nnnn, e.g:
>
> perf stat -C 0 -e runc::0101
> or more verbosely,
> perf stat -C 0 -e runcore::0101
>
> For the PowerEN chip, we have PMUs for these nest functional units:
> XML accelerator
> Regular expression accelerator
> Crypto accelerator
> PowerBus interface chiplet (0..3)
> Network accelerator
> Memory controller Synchronous (0..1)
> Memory controller Asynchronous (0..1)
> etc.
>
> I can see value in adding something like:
>
> perf stat -C 0 -e rmcs0::1d
>
> On the other hand, I don't want to get carried away with this, when we
> have the sysfs solution coming down the road, which I think will reduce
> or eliminate the need for these prefixes.
>
> Is PERF_TYPE_UNCORE and the "u" encoding intended to be a temporary
> solution? How do you envision someone using sysfs to specify an uncore
> event (especially one with a raw encoding)?

Yes, they are temporary solution. I use it to easily test uncore
patches. Sorry I should mention that.

sysfs is the final solution, but I'm not clear how the sysfs structures
should be.

As we discussed before,

1. Should we list all events under sysfs?

/sys/devices/system/cpu/cpuN/events/event0
...
/sys/devices/system/cpu/cpuN/events/eventN

/sys/devices/system/node/nodeN/pmuN/events/event0
...
/sys/devices/system/node/nodeN/pmuN/events/eventN

2. Or should we use a sysfs file to pass in raw config value?

/sys/devices/system/cpu/cpuN/raw
/sys/devices/system/node/nodeN/pmuN/raw?

3. How will the additional attributes(needed by IBM PowerEN chip, etc)
be passed in?

4. and other problems I don't remember now.

>
> - Corey

2010-12-14 12:34:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3 v3] perf: Update perf tool to monitor uncore events

On Tue, 2010-12-14 at 10:15 +0800, Lin Ming wrote:

> > First of all, "uncore" is an x86-specific term and so it's not clear to
> > me if you meant for all arches to utilize this encoding for all "not
> > core but on the same die" events (IBM Power arch refers to this as
> > "nest" logic).

I don't think the x86 uncore matches the "not on core but on the same
die" definition. The x86-uncore thing is more like a memory controller
PMU (and since the memory controller is on die it is of course on die,
but its not just any random on-die thing).

The wire-speed thing has tons of special purpose 'cores' on die, each of
them having a PMU.

Using the sysfs stuff you could actually expose each individually.

> Yes, they are temporary solution. I use it to easily test uncore
> patches. Sorry I should mention that.
>
> sysfs is the final solution, but I'm not clear how the sysfs structures
> should be.
>
> As we discussed before,
>
> 1. Should we list all events under sysfs?
>
> /sys/devices/system/cpu/cpuN/events/event0
> ...
> /sys/devices/system/cpu/cpuN/events/eventN
>
> /sys/devices/system/node/nodeN/pmuN/events/event0
> ...
> /sys/devices/system/node/nodeN/pmuN/events/eventN

I'd not put _all_ events there, maybe a key few. Then again, Corey would
like to make all optional on module load or somesuch.

> 2. Or should we use a sysfs file to pass in raw config value?

That's not mutually exclusive with 1), the way I've implemented it is
that the sysfs event files provide the raw config values needed for that
event.

> /sys/devices/system/cpu/cpuN/raw
> /sys/devices/system/node/nodeN/pmuN/raw?
>
> 3. How will the additional attributes(needed by IBM PowerEN chip, etc)
> be passed in?

>From what I understood they all still fit in the single u64 config
field.

2010-12-14 14:13:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3 v3] perf: Update perf tool to monitor uncore events

On Tue, Dec 14, 2010 at 01:33:27PM +0100, Peter Zijlstra wrote:
> > > First of all, "uncore" is an x86-specific term and so it's not clear to
> > > me if you meant for all arches to utilize this encoding for all "not
> > > core but on the same die" events (IBM Power arch refers to this as
> > > "nest" logic).
>
> I don't think the x86 uncore matches the "not on core but on the same
> die" definition. The x86-uncore thing is more like a memory controller
> PMU (and since the memory controller is on die it is of course on die,

memory controller + interconnect + cache + power management + various other things.

Older x86 CPUs also had special PMUs on die for parts of that, but
without memory controller.

> but its not just any random on-die thing).
>
> The wire-speed thing has tons of special purpose 'cores' on die, each of
> them having a PMU.

Modern x86 CPUs also have other PMUs, at least in package (e.g. in the
GPU)

> Using the sysfs stuff you could actually expose each individually.

I expect this will be also needed on x86. Also there are x86 SOCs
where other parts of the SOC will have their own counters too.
So in general a flexible scheme to describe that is useful.

-Andi

2010-12-15 17:14:55

by Martin Hicks

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3 v3] perf: Update perf tool to monitor uncore events


On Tue, Dec 14, 2010 at 03:13:14PM +0100, Andi Kleen wrote:
> On Tue, Dec 14, 2010 at 01:33:27PM +0100, Peter Zijlstra wrote:
> > > > First of all, "uncore" is an x86-specific term and so it's not clear to
> > > > me if you meant for all arches to utilize this encoding for all "not
> > > > core but on the same die" events (IBM Power arch refers to this as
> > > > "nest" logic).
> >
> > I don't think the x86 uncore matches the "not on core but on the same
> > die" definition. The x86-uncore thing is more like a memory controller
> > PMU (and since the memory controller is on die it is of course on die,
>
> memory controller + interconnect + cache + power management + various other things.
>
> Older x86 CPUs also had special PMUs on die for parts of that, but
> without memory controller.
>
> > but its not just any random on-die thing).
> >
> > The wire-speed thing has tons of special purpose 'cores' on die, each of
> > them having a PMU.
>
> Modern x86 CPUs also have other PMUs, at least in package (e.g. in the
> GPU)
>
> > Using the sysfs stuff you could actually expose each individually.
>
> I expect this will be also needed on x86. Also there are x86 SOCs
> where other parts of the SOC will have their own counters too.
> So in general a flexible scheme to describe that is useful.

Definitely. For instance, SGI intends to use this to expose the
performance metrics that are found in the ASIC hub chip that does the
cache coherency in the big SSI systems. There will be one of these per
blade.

mh