2011-04-29 15:06:12

by Vince Weaver

[permalink] [raw]
Subject: re-enable Nehalem raw Offcore-Events support

Hello Linus

can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09

This removed functionality from perf_events that allowed raw event access
for OFFCORE_EVENTS type events on Nehalem and Westmere cpus.

To be fair, this is not technically a regression as the feature was only
(finally!) added in the 2.6.39 merge window. However this is a useful
feature and many tools (including the PAPI performance counter library
that I work on) had added support for it in anticipation of the 2.6.39
release.

Ingo's reasons for removing the feature seem to boil down to
1. "perf" doesn't use the functionality, and any other userspace
program that uses the perf_events syscalls don't matter
2. Users are too stupid to use the raw functionality properly;
we should only allow a kernel-developer-approved small subset
of the features provided by the CPU as described in the intel
developers manuals.

#2 seems like a gross misinterpretation of the whole "Linux gives you
enough rope to shoot yourself in the foot" policy from days passed, but
maybe things have moved on.

Thanks,

Vince
[email protected]


2011-04-29 15:27:40

by Andi Kleen

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, Apr 29, 2011 at 11:04:46AM -0400, Vince Weaver wrote:
> Hello Linus
>
> can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09

Acked-by: Andi Kleen <[email protected]>

(I wrote the original patch)

> (finally!) added in the 2.6.39 merge window. However this is a useful
> feature and many tools (including the PAPI performance counter library
> that I work on) had added support for it in anticipation of the 2.6.39

I also use some tools which benefit from this functionality. The
extended raw events are very useful to analyze NUMA problems for once.

-Andi

2011-04-29 16:42:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Vince Weaver <[email protected]> wrote:

> Hello Linus
>
> can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09
>
> This removed functionality from perf_events that allowed raw event access
> for OFFCORE_EVENTS type events on Nehalem and Westmere cpus.

I have three major objections/concerns.

Firstly, one technical problem i have with the raw events ABI method is that it
was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel
Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the
declared title of the commit, it was not declared in the changelog either and
it was not my intention to offer such an ABI prematurely either - and i noticed
those two lines too late - but still in time to not let this slip into v2.6.39.

Secondly, Peter posted a patch that might resolve this issue in v2.6.40 - but
that patch is not cooked yet and you guys have not helped finish it. I'd like
to see that process play out first - maybe we discover some detail that will
force us to modify the config1/config2 ABI approach - which we cannot do if
this is released into v2.6.39 prematurely.

Thirdly, and this is my most fundamental objection, i also object to the timing
of this offcore raw access ABI, because past experience is that we *really* do
not want to allow raw PMU details without *first* having generic abstractions
and generic events first.

The discussion in the "[PATCH 1/1] perf tools: Add missing user space support
for config1/config2" thread on lkml has demonstrated it pretty well: people
only started making serious thoughts about proper structure and abstractions
and easy tooling once they were forced to think about that ...

The thing is, as far as i can see you and Andi are *still* pushing the failed
perfmon and Oprofile ABI and tooling models.

My job as a maintainer is to notice such things and to say 'no' to incomplete
bits.

Basically without proper generalization people get sloppy and go the fast path
and export very low level, opaque, unstructured PMU interfaces to user-space
and repeat the Oprofile and perfmon tooling mistakes again and again.

"Thinking is hard, lets go shopping^W exporting raw ABIs."

So the perf events policy has always been that while we tolerate raw events
(there's nothing bad with offering them once generic events have crystallized
out), we only accept them if the *useful* events are first abstracted and
generalized out.

We put structure, proper abstractions and easy tooling *ahead* of the interests
of a small group of people who'd rather prefer a lowlevel, opaque hardware
channel so that they do not have to *think* about generalization and also
perhaps so they do not have to share their selection of events and analysis
methods with others ...

For the offcore patches this concept of 'abstraction first' has been ignored
entirely, and commit e994d7d23a0b ("perf: Fix LLC-* events on Intel
Nehalem/Westmere") has (without declaring it in the changelog) added a raw ABI
hack to the offcore PMU features without bothering to factor out the useful
events first. This slipped through and i only noticed it when Andi's patch got
to me:

https://lkml.org/lkml/2011/4/22/14

Generalization of offcore, NUMA memory events is very much possible and
desirable, and Peter has posted an RFC patch that implements one form of it:

https://lkml.org/lkml/2011/4/22/281

And with that done raw events can be offered as well.

But it's still work in progress - it might be mergable in v2.6.40.
Unfortunately neither you nor Andi has actually bothered testing (and
improving) Peter's patch. If we do the raw ABI now i fear you guys will
disappear and wont ever bother with proper generalization.

We want generalization like Peter's patch first - that is what users really
need in the end, and that is the price of us supporting/maintaining this PMU
functionality in the kernel. Once we feel good about it can we expose the raw
bits as well.

Not the other way around.

> To be fair, this is not technically a regression as the feature was only
> (finally!) added in the 2.6.39 merge window. However this is a useful
> feature and many tools (including the PAPI performance counter library that I
> work on) had added support for it in anticipation of the 2.6.39 release.
>
> Ingo's reasons for removing the feature seem to boil down to
> 1. "perf" doesn't use the functionality, and any other userspace
> program that uses the perf_events syscalls don't matter
> 2. Users are too stupid to use the raw functionality properly;
> we should only allow a kernel-developer-approved small subset
> of the features provided by the CPU as described in the intel
> developers manuals.
>
> #2 seems like a gross misinterpretation of the whole "Linux gives you
> enough rope to shoot yourself in the foot" policy from days passed, but maybe
> things have moved on.

That is a very unfair and misleading summary that grossly misrepresents my
position. I've made my position very clear to you, multiple times - and so has
Peter and others have made clear their similar position on this issue.

I detailed my concerns in the commit you want reverted and i also repeated it
in the lkml discussion, multiple times, as replies to you. You can also see it
outlined in detail in my reply above.

In light of all that, how you could possibly misrepresent my position in such
an unfair, distorted and manipulative way is beyond me ...

Thanks,

Ingo

2011-04-29 16:50:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Andi Kleen <[email protected]> wrote:

> On Fri, Apr 29, 2011 at 11:04:46AM -0400, Vince Weaver wrote:
> > Hello Linus
> >
> > can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09
>
> Acked-by: Andi Kleen <[email protected]>

I outlined my objections in my reply to Vince.

> (I wrote the original patch)
>
> > (finally!) added in the 2.6.39 merge window. However this is a useful
> > feature and many tools (including the PAPI performance counter library
> > that I work on) had added support for it in anticipation of the 2.6.39
>
> I also use some tools which benefit from this functionality. The extended raw
> events are very useful to analyze NUMA problems for once.

Mind sharing those methods and help generalizing them and help making them
useful to non-experts? Peter's patch which adds a 'NUMA' level to the cache
event abstractions could be a good start.

Only once generalization has been covered sufficiently, once we are sure we can
stick with the raw ABI, can we push that upstream.

Thanks,

Ingo

2011-04-29 17:17:27

by Pekka Enberg

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

Hi Vince,

On Fri, Apr 29, 2011 at 6:04 PM, Vince Weaver <[email protected]> wrote:
> Hello Linus
>
> can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09
>
> This removed functionality from perf_events that allowed raw event access
> for OFFCORE_EVENTS type events on Nehalem and Westmere cpus.
>
> To be fair, this is not technically a regression as the feature was only
> (finally!) added in the 2.6.39 merge window. ?However this is a useful
> feature and many tools (including the PAPI performance counter library
> that I work on) had added support for it in anticipation of the 2.6.39
> release.
>
> Ingo's reasons for removing the feature seem to boil down to
> ?1. ?"perf" doesn't use the functionality, and any other userspace
> ? ? ?program that uses the perf_events syscalls don't matter
> ?2. ?Users are too stupid to use the raw functionality properly;
> ? ? ?we should only allow a kernel-developer-approved small subset
> ? ? ?of the features provided by the CPU as described in the intel
> ? ? ?developers manuals.
>
> #2 seems like a gross misinterpretation of the whole "Linux gives you
> enough rope to shoot yourself in the foot" policy from days passed, but
> maybe things have moved on.

That's a gross misrepresentation of what Ingo has been saying on LKML.
Really, learn to work with relevant maintainers before you ask Linus
to revert something.

Pekka

2011-04-29 17:25:32

by Andi Kleen

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

> > ?2. ?Users are too stupid to use the raw functionality properly;
> > ? ? ?we should only allow a kernel-developer-approved small subset
> > ? ? ?of the features provided by the CPU as described in the intel
> > ? ? ?developers manuals.
> >
> > #2 seems like a gross misinterpretation of the whole "Linux gives you
> > enough rope to shoot yourself in the foot" policy from days passed, but
> > maybe things have moved on.
>
> That's a gross misrepresentation of what Ingo has been saying on LKML.
> Really, learn to work with relevant maintainers before you ask Linus
> to revert something.

Ingo may not have explicitely said (2), but at least his revert (disabling
the raw interface users are asking for) is practically implementing (2).

Actions speak louder than words.

That is either you have a raw interface or you only have the cooked
interface or you have both. Since he reverted raw only cooked
is left, which is (2)

I agree with Vince it's a bad policy.

-Andi

2011-04-29 17:37:26

by Pekka Enberg

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, Apr 29, 2011 at 8:25 PM, Andi Kleen <[email protected]> wrote:
>> > ?2. ?Users are too stupid to use the raw functionality properly;
>> > ? ? ?we should only allow a kernel-developer-approved small subset
>> > ? ? ?of the features provided by the CPU as described in the intel
>> > ? ? ?developers manuals.
>> >
>> > #2 seems like a gross misinterpretation of the whole "Linux gives you
>> > enough rope to shoot yourself in the foot" policy from days passed, but
>> > maybe things have moved on.
>>
>> That's a gross misrepresentation of what Ingo has been saying on LKML.
>> Really, learn to work with relevant maintainers before you ask Linus
>> to revert something.
>
> Ingo may not have explicitely said (2), but at least his revert (disabling
> the raw interface users are asking for) is practically implementing (2).
>
> Actions speak louder than words.
>
> That is either you have a raw interface or you only have the cooked
> interface or you have both. Since he reverted raw only cooked
> is left, which is (2)
>
> I agree with Vince it's a bad policy.

So a maintainer reverts an ABI that he thinks needs more thought/work
before it's too late and we're stuck with it forever. Can you please
explain what's the problem here?

Asking Linus to revert the commit is short-sighted and doesn't solve
the problem. Learn to work with the maintainer and save yourself a lot
of trouble.

2011-04-29 17:43:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Andi Kleen wrote:

> > > ?2. ?Users are too stupid to use the raw functionality properly;
> > > ? ? ?we should only allow a kernel-developer-approved small subset
> > > ? ? ?of the features provided by the CPU as described in the intel
> > > ? ? ?developers manuals.
> > >
> > > #2 seems like a gross misinterpretation of the whole "Linux gives you
> > > enough rope to shoot yourself in the foot" policy from days passed, but
> > > maybe things have moved on.
> >
> > That's a gross misrepresentation of what Ingo has been saying on LKML.
> > Really, learn to work with relevant maintainers before you ask Linus
> > to revert something.
>
> Ingo may not have explicitely said (2), but at least his revert (disabling
> the raw interface users are asking for) is practically implementing (2).
>
> Actions speak louder than words.
>
> That is either you have a raw interface or you only have the cooked
> interface or you have both. Since he reverted raw only cooked
> is left, which is (2)
>
> I agree with Vince it's a bad policy.

No, it's not the raw interface will be made available when the proper
set of abstracted functionality has been added and settled down,
simply because it might to change the way the raw event is exposed. As
long there are open questions which might have an influence on the
exposure of the raw event, it's completely correct to keep it
disabled.

Though you and Vince ignored Peters patches and the questions he
raised and just kept harping on your own interests.

That's a bad attitude, but we've been there before.

Thanks,

tglx

2011-04-29 17:47:42

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Pekka Enberg wrote:

> Asking Linus to revert the commit is short-sighted and doesn't solve
> the problem. Learn to work with the maintainer and save yourself a lot
> of trouble.

Work "with" Ingo? That's turned out well so far. I'm sure certain
scheduler people could comment here too on where that gets you.

The kernel I run is "Linux" not "Ingoix". So I await a comment from Linus
on this issue. If it turns out that he's happy with Ingo's work, fine.
It just means I'll have to start maintaining some perf counter related
patches out of tree for those of us who actually like having control on
what we're measuring.

Vince

2011-04-29 18:00:00

by Pekka Enberg

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

Hi Vince!

On Fri, 29 Apr 2011, Pekka Enberg wrote:
>> Asking Linus to revert the commit is short-sighted and doesn't solve
>> the problem. Learn to work with the maintainer and save yourself a lot
>> of trouble.

On Fri, Apr 29, 2011 at 8:46 PM, Vince Weaver <[email protected]> wrote:
> Work "with" Ingo? ?That's turned out well so far. ?I'm sure certain
> scheduler people could comment here too on where that gets you.

Yeah, that Ingo dude is really impossible to work with (as are most
kernel maintainers)! I've personally been so unfortunate that I've
never had any problems but it must be my bad attitude to working with
other people and actually listening to them. :-(

On Fri, Apr 29, 2011 at 8:46 PM, Vince Weaver <[email protected]> wrote:
> The kernel I run is "Linux" not "Ingoix". ?So I await a comment from Linus
> on this issue. ?If it turns out that he's happy with Ingo's work, fine.
> It just means I'll have to start maintaining some perf counter related
> patches out of tree for those of us who actually like having control on
> what we're measuring.

Well, it's not Ingoix but Ingo gets to maintain your ABI long after
you're gone while Linus can just sit back, relax, and have a drink. So
I think it'd be fair to at least _pretend_ you care what Ingo thinks
about perf ABIs, no?

Pekka

2011-04-29 18:02:20

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Ingo Molnar wrote:

> Firstly, one technical problem i have with the raw events ABI method is that it
> was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel
> Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the
> declared title of the commit, it was not declared in the changelog either and
> it was not my intention to offer such an ABI prematurely either - and i noticed
> those two lines too late - but still in time to not let this slip into v2.6.39.

The initial patches from November seem to make it clear what is being done
here. I thought it was pretty obvious to those reviewing those patches
what was involved. How would I have known that OFFCORE_RESPONSE support
was coming if I didn't see the patches obviously float by on linux-kernel?

> Thirdly, and this is my most fundamental objection, i also object to the timing
> of this offcore raw access ABI, because past experience is that we *really* do
> not want to allow raw PMU details without *first* having generic abstractions
> and generic events first.

why? Can you explain this better?

> The thing is, as far as i can see you and Andi are *still* pushing the failed
> perfmon and Oprofile ABI and tooling models.

what ABI? by the way, I hate oprofile and never use it.

perfmon2 and perfctr are very similar to perf_events in that they provide
lightly massaged access to the MSRs so you can program whatever raw event
that you like.

It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things
differently than perf, but that's a *userspace* API, not a kernel ABI.
You seem to keep confusing this.

> We put structure, proper abstractions and easy tooling *ahead* of the interests
> of a small group of people who'd rather prefer a lowlevel, opaque hardware
> channel so that they do not have to *think* about generalization and also
> perhaps so they do not have to share their selection of events and analysis
> methods with others ...

And generalization across platforms (and even across minor chip revisions)
*doesn't work*. It lasted maybe a year in PAPI before it was realized to
be unworkable. Talk to some people from AMD or Intel if you want. It's
not possible to sanely generalize perf counters. They are too tied to
hardware quirks.

Vince
[email protected]

2011-04-29 18:58:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Vince Weaver <[email protected]> wrote:

> On Fri, 29 Apr 2011, Ingo Molnar wrote:
>
> > Firstly, one technical problem i have with the raw events ABI method is that it
> > was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel
> > Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the
> > declared title of the commit, it was not declared in the changelog either and
> > it was not my intention to offer such an ABI prematurely either - and i noticed
> > those two lines too late - but still in time to not let this slip into v2.6.39.
>
> The initial patches from November seem to make it clear what is being done
> here. I thought it was pretty obvious to those reviewing those patches what
> was involved. How would I have known that OFFCORE_RESPONSE support was
> coming if I didn't see the patches obviously float by on linux-kernel?

Not really, Peter did a lot of review of those patches and they were changed
beyond recognition from their original form - i think Peter wrote a fair
portion of the supporting cleanups, as Andi seemed desinterested in acting
quickly on review feedback.

> > Thirdly, and this is my most fundamental objection, i also object to the
> > timing of this offcore raw access ABI, because past experience is that we
> > *really* do not want to allow raw PMU details without *first* having
> > generic abstractions and generic events first.
>
> why? Can you explain this better?

Didn't i do that in the rest of my reply? You even quote some of it below.

> > The thing is, as far as i can see you and Andi are *still* pushing the
> > failed perfmon and Oprofile ABI and tooling models.
>
> what ABI?

Well, the raw events ABI reminds me of the perfmon2/perfmon3 ABI: get the raw
PMU to user-space as quickly as possible and leave all the details to
user-space. I do not agree with that model of exposing performance measurement
hardware features.

> [...] by the way, I hate oprofile and never use it.

I dont 'hate' oprofile per se (hey, i still keep pulling and pushing oprofile
bits from Robert), i just find it very unintuitive and cumbersome to use, and i
think it was misdesigned in several ways.

> perfmon2 and perfctr are very similar to perf_events in that they provide
> lightly massaged access to the MSRs so you can program whatever raw event
> that you like.

perf events (the kernel side) has a very, very different design from perfmon2
and perfctr - but judging by your past replies such design aspects you do not
seem to recognize, let alone appreciate.

> It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things
> differently than perf, but that's a *userspace* API, not a kernel ABI. You
> seem to keep confusing this.

No, i do not think i am confused, i just disagree with you.

> > We put structure, proper abstractions and easy tooling *ahead* of the
> > interests of a small group of people who'd rather prefer a lowlevel, opaque
> > hardware channel so that they do not have to *think* about generalization
> > and also perhaps so they do not have to share their selection of events and
> > analysis methods with others ...
>
> And generalization across platforms (and even across minor chip revisions)
> *doesn't work*.

Why not? We cannot generalize everything, but generalizing the major CPU
concepts works quite well for perf. The thing is, the laws of physics are the
same for all CPUs so they all seem to employ very similar concepts and measure
those concepts in similar ways, with similar events.

But it's more than that, generalization works even on the *hardware* level:

AMD managed to keep a large chunk of their events stable even across very
radical changes of the underlying hardware. I have two AMD systems produced
*10* years apart and they even use the same event encodings for the major
events.

Intel started introducing stable event definitions a couple of years ago as
well.

So i think i can tell it with a fairly high confidence factor that you simply
do not know what you are talking about.

> [...] It lasted maybe a year in PAPI before it was realized to be
> unworkable. Talk to some people from AMD or Intel if you want. It's not
> possible to sanely generalize perf counters. They are too tied to hardware
> quirks.

I have the exact opposite experience: chip designers we talked to were clearly
supportive of the generalizations perf events offers and clearly both AMD and
Intel chips are moving *towards* more stable, more generic and more flexible
performance event measurement methods.

We are getting more counters and with less constraints. Even the hardware is
slowly but surely abstracting things out.

It is in the interest of PMU designers as well that their stuff moves one level
higher within OSs and does not stay at the weird hardware-specific level.
Hardware is getting more complex, measuring it becomes more complex, so making
things more generic certainly helps.

Thanks,

Ingo

2011-04-29 22:16:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, Apr 29, 2011 at 06:42:27PM +0200, Ingo Molnar wrote:

[..]

> Basically without proper generalization people get sloppy and go the fast path
> and export very low level, opaque, unstructured PMU interfaces to user-space
> and repeat the Oprofile and perfmon tooling mistakes again and again.
>
> "Thinking is hard, lets go shopping^W exporting raw ABIs."
>
> So the perf events policy has always been that while we tolerate raw events
> (there's nothing bad with offering them once generic events have crystallized
> out), we only accept them if the *useful* events are first abstracted and
> generalized out.
>
> We put structure, proper abstractions and easy tooling *ahead* of the interests
> of a small group of people who'd rather prefer a lowlevel, opaque hardware
> channel so that they do not have to *think* about generalization and also
> perhaps so they do not have to share their selection of events and analysis
> methods with others ...

Yep, absolutely. Excuse my french but even kernel developers who
can understand perf code don't need to know f*cking magical hex
constants in order to trace a little. And yes, we talk about perf
and say how cool it is but users want to see more examples like on
http://perf.wiki.kernel.org - they want to get to use it first _and_
_then_ maybe look at code/more involved scenarios. Other kernel
developers don't give a rat's ass about the possibility for shooting
themselves in the foot - they want to use this thing without reading
code and CPU documentation for a day first. And I believe I speak for
the majority when I say so.

We're always bitching about Linux usability and now when it comes down
to yet another case where this can be done right for a change, and perf
people are trying to do something productive, you come waving hands
loudly at Linus with revert requests instead of helping. This is as
productive as trying to shoot yourself in the foot.

--
Regards/Gruss,
Boris.

2011-04-30 01:50:08

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Borislav Petkov wrote:

> On Fri, Apr 29, 2011 at 06:42:27PM +0200, Ingo Molnar wrote:
> > "Thinking is hard, lets go shopping^W exporting raw ABIs."

> We're always bitching about Linux usability and now when it comes down
> to yet another case where this can be done right for a change, and perf
> people are trying to do something productive, you come waving hands
> loudly at Linus with revert requests instead of helping. This is as
> productive as trying to shoot yourself in the foot.

Have I proposed that the "perf" tool be changed at all?

No. Never.

I proposed that the interface to allow raw access to offcore events _not_
be disabled so that advanced tools can access it directly.

I don't care how perf works. Nor do I care how many pointless generic
events get added to the kernel (other than being annoyed about it taking
up extra bytes in my kernel image).

Reverting this patch would have absolutely no bearing on "perf", the
usability of perf, or anything that any normal user sees. I'm not sure
how the argument is even getting framed that way.

Vince

2011-04-30 01:53:20

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Ingo Molnar wrote:

> Generalization of offcore, NUMA memory events is very much possible and
> desirable, and Peter has posted an RFC patch that implements one form of it:
>
> https://lkml.org/lkml/2011/4/22/281
>

OK, so I "reviewed" this patch.

It creates a "generalized" new event, that is only actually available on
Nehalem and Westmere. It's listed as unavailable for all other known
architectures.

How is this any better than just using the event by its actual name if you
happen to have a Nehalem-esque chip?

This is just pointless kernel bloat.

So here's my review:
NACK

Vince

2011-04-30 02:17:55

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Ingo Molnar wrote:

> > why? Can you explain this better?
>
> Didn't i do that in the rest of my reply? You even quote some of it below.

No.

You have not explained why having "generalized" counter definitions have
anything to do with raw event access.

If your argument was you thought that the values being written to
the config1 and config2 fields of the perf_attr structure might need to be
better defined, well that's a better argument and I'd buy that. That's a
valid technical argument for blocking raw event access (though you
probably shouldn't have the fields there at all if you are unsure, they
become ABI pretty quickly).

But your argument isn't that. Your argument is that you're blocking raw
event access as some sort of punishment because us HPC people aren't
providing patches for "generalized" events that we never plan to use.
That's not a technical argument, that's some sort of weird power play.

> Well, the raw events ABI reminds me of the perfmon2/perfmon3 ABI: get the raw
> PMU to user-space as quickly as possible and leave all the details to
> user-space. I do not agree with that model of exposing performance measurement
> hardware features.

well you probably should have thought of that before you enabled raw
events at all then. It's a bit too late now.

> > perfmon2 and perfctr are very similar to perf_events in that they provide
> > lightly massaged access to the MSRs so you can program whatever raw event
> > that you like.
>
> perf events (the kernel side) has a very, very different design from perfmon2
> and perfctr - but judging by your past replies such design aspects you do not
> seem to recognize, let alone appreciate.

I didn't mean the internal designs were similar. There's only so many
sane ways to provide access to perf counters at the kernel level, and all
of them look a lot alike from a high level.

> > It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things
> > differently than perf, but that's a *userspace* API, not a kernel ABI. You
> > seem to keep confusing this.
>
> No, i do not think i am confused, i just disagree with you.

Why does it matter? Why should you as a kernel devel have any say in what
my userspace tool looks like, as long as it is using a published ABI in a
documented manner?

> Why not? We cannot generalize everything, but generalizing the major CPU
> concepts works quite well for perf. The thing is, the laws of physics are the
> same for all CPUs so they all seem to employ very similar concepts and measure
> those concepts in similar ways, with similar events.

Fine. Can we have a document saying what the events measure?

Also can you provide some way to query from userspace what event is being
used so that if someone reports a problem with an event we can figure
out which one it is in the relevant manual?

For cache events:
+ Do they count prefetches? (SW, HW?)
+ Do they count coherency misses or just standard CCC ones?
+ Do they count speculative accesses or only retired accesses?
+ Do they count HW pagetable walks?

For branch events:
+ Are they determnistic?
+ Are they speculative?

For retired instructions:
+ Deterministic?
+ Does it inclue HW interrupt counts?
+ are there any erratta?
+ Are any counted twice?

> AMD managed to keep a large chunk of their events stable even across very
> radical changes of the underlying hardware. I have two AMD systems produced
> *10* years apart and they even use the same event encodings for the major
> events.

Well guess what, AMD family 15h changes all of that.

And you're not going to like LWP. They got tired of waiting for a
workable kernel perf counter interface and moved it completely to
usersapce, and there's nothing you can do about it unless you start
blocking the xsave patches from getting in.


> Intel started introducing stable event definitions a couple of years ago as
> well.

yes. ANd just how compatible are they? You might want to discuss that
with some people from intel.

> So i think i can tell it with a fairly high confidence factor that you simply
> do not know what you are talking about.

Really.

> I have the exact opposite experience: chip designers we talked to were clearly
> supportive of the generalizations perf events offers and clearly both AMD and
> Intel chips are moving *towards* more stable, more generic and more flexible
> performance event measurement methods.

You must be talking to different people that I have. Have you looked at
Power6/Power7 or ARM counters?

> We are getting more counters and with less constraints. Even the hardware is
> slowly but surely abstracting things out.

Again... Sandy Bridge? Interlagos? You might want to check that out.


In any case I wish you'd get on the ball with uncore, offcore, etc.

One of the promises made when perf_events was merged was that the kernel
was the place to do all this stuff because it would allow such quick
turnaround on new features.

As it is by the time Nehalem Offcore/Uncore support gets into a kernel
that is picked up by a distro the chips are going to be 3+ years old and
headed to the recycle bin.

Vince

2011-04-30 07:14:34

by Pekka Enberg

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

Hi Vince,

On Sat, Apr 30, 2011 at 5:17 AM, Vince Weaver <[email protected]> wrote:
> But your argument isn't that. ?Your argument is that you're blocking raw
> event access as some sort of punishment because us HPC people aren't
> providing patches for "generalized" events that we never plan to use.
> That's not a technical argument, that's some sort of weird power play.

That's not his argument at all and if you fail to see that you really
have no idea what the concept "working with the maintainer" means.

Yes, raw event access was reverted from 2.6.39 but that doesn't mean
it's blocked forever. If you want to keep pushing your feature, please
tone down your crazy-talk and start acting like a developer who's
genuinely interested in Linux, not on your own narrow, selfish goals.

I mean really, I haven't even had the pleasure of interracting a lot
with you and while I personally don't see the problem with raw event
access (if done in a well-thought out manner from ABI pov), you've
already managed to convince me that applying _any_ patch from you is a
bad idea because the baggage that comes with it is simply not worth
it.

If you want to alienate other developers, keep doing what you're doing
- otherwise consider changing your tactics. It's boring to watch you
repeat the same mistakes over and over again.

Pekka

2011-04-30 08:11:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, Apr 29, 2011 at 10:17:04PM -0400, Vince Weaver wrote:
> > AMD managed to keep a large chunk of their events stable even across very
> > radical changes of the underlying hardware. I have two AMD systems produced
> > *10* years apart and they even use the same event encodings for the major
> > events.
>
> Well guess what, AMD family 15h changes all of that.

I don't see a big problem here, Robert has a patch that takes care of
counter constraints. It probably needs a bit more work but we'll get
where we need to be.

> And you're not going to like LWP. They got tired of waiting for a
> workable kernel perf counter interface and moved it completely to
> usersapce,

I don't know where you get your information but that's absolutely and
completely not nearly even beginning to smell the truth.

> and there's nothing you can do about it unless you start blocking the
> xsave patches from getting in.

Look at tip/x86/xsave, looks like LWP support will most likely be in
2.6.40.

--
Regards/Gruss,
Boris.

2011-04-30 20:06:32

by Corey Ashford

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On 04/29/2011 10:42 AM, Thomas Gleixner wrote:
> On Fri, 29 Apr 2011, Andi Kleen wrote:
>
>>>> 2. Users are too stupid to use the raw functionality properly;
>>>> we should only allow a kernel-developer-approved small subset
>>>> of the features provided by the CPU as described in the intel
>>>> developers manuals.
>>>>
>>>> #2 seems like a gross misinterpretation of the whole "Linux gives you
>>>> enough rope to shoot yourself in the foot" policy from days passed, but
>>>> maybe things have moved on.
>>>
>>> That's a gross misrepresentation of what Ingo has been saying on LKML.
>>> Really, learn to work with relevant maintainers before you ask Linus
>>> to revert something.
>>
>> Ingo may not have explicitely said (2), but at least his revert (disabling
>> the raw interface users are asking for) is practically implementing (2).
>>
>> Actions speak louder than words.
>>
>> That is either you have a raw interface or you only have the cooked
>> interface or you have both. Since he reverted raw only cooked
>> is left, which is (2)
>>
>> I agree with Vince it's a bad policy.
>
> No, it's not the raw interface will be made available when the proper
> set of abstracted functionality has been added and settled down,
> simply because it might to change the way the raw event is exposed. As
> long there are open questions which might have an influence on the
> exposure of the raw event, it's completely correct to keep it
> disabled.

Carl Love and I recently completed some work to add perf_events support
for the IBM Blue Waters machine's "CPU networking" chip, called the
Torrent chip. We did all of this work based on a RHEL 6 kernel
(2.6.32ish), which doesn't have Peter's more recent multi-PMU support.

I would say that most if not all of the events are not generalizable in
the sense that you are talking about; the events are very specific to
the Torrent chip. For example, the Torrent chip communicates with four
POWER7 chips via a high-speed serial interconnect, called the W, X, Y,
and Z links, and it also has similar links which connect to other
Torrent chips, and to other nodes. The events measure certain types of
activity on these various links, for example "X link receive idle".

So if I'm understanding what you have said correctly, we would not be
able to get a forward port of this code committed without abstracting
these events in a away that's acceptable to the kernel community. Is
that right? If so, this is important for us to know so that we can
correctly size the work effort involved in the forward port.

Thanks for your consideration,

- Corey

2011-04-30 20:48:43

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Sat, 30 Apr 2011, Pekka Enberg wrote:

> you've
> already managed to convince me that applying _any_ patch from you is a
> bad idea because the baggage that comes with it is simply not worth
> it.

well if it makes you feel better you can do a "git log" and search for my
name and back out all the included perf related patches that have my name
associated with them. Then you can have a "vince-free" kernel without
all the "baggage".

> If you want to alienate other developers, keep doing what you're doing
> - otherwise consider changing your tactics. It's boring to watch you
> repeat the same mistakes over and over again.

I spend a lot of time dealing with developers who use perf-counter related
interfaces all the time. They complain to me *constantly* about the
drawbacks of perf_events, because PAPI is one step up from the kernel.

I try to get them to interact with the kernel people, but they won't. Do
you know why? Because they feel like the perf_events developers are rude
at best, unhelpful in general, and actively anti-anyone-not-using-perf.

Most of them simply think it's not worth dealing with the perf_events
people, even if it means hardship down the road. I keep trying because I
am foolishly idealistic at times. So anyway all of my vitriol is the
combined power of scores of disenfranchised developers, who were happy to
work on kernel problems when the perfmon2 developers were running things,
but now won't touch it with a 10-foot pole.

So make of that what you will, but things go both ways. You can be as
obnoxious as you want as a maintainer but don't expect people to send you
patches if you are.

Vince

2011-04-30 20:58:33

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, 29 Apr 2011, Vince Weaver wrote:

> > https://lkml.org/lkml/2011/4/22/281
> >
>
> So here's my review:
> NACK

so a slightly more useful review on slightly more sleep.

You are doing things backwards with your "generalization first" policy.

The right way to do things is enable raw event support first.

Then you can have users experiment with the feature. Try various events
using their favorite userspace utility (be it libpfm4, PAPI, perf). This
is easy, as choosing a new event is a simple matter of changing the
command line option for your measurement. Once a good event is found for
generalization, *THEN* you add a generalized event that is well tested.

Your way is difficult. Fine, Peter picks some arbitrary event he thinks
work well. I have to download a git kernel and reboot my machine (a
process that takes an hour at best assuming I have root access). Then if
I want to try a new event, since RAW access is blocked, I have to patch
the kernel, recompile, reboot. So at least an hour between tests.

This assumes I can even do that. My only Nehalem machine is at work and
has only a fragile wireless network connection that requires manual
intervention to get going. so I *can't* review a change in general events
with a remove access when it lives in the kernel, yet if it was in user
space like it *should be* I could test away all day no problem.

See the problem here? Going general event first makes it seriously
inconvenient to test and so noone is going to do it for you because it's
such a pain. RAW first is the way to go.

Vince

2011-04-30 21:04:22

by Vince Weaver

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Sat, 30 Apr 2011, Borislav Petkov wrote:
> On Fri, Apr 29, 2011 at 10:17:04PM -0400, Vince Weaver wrote:

> > Well guess what, AMD family 15h changes all of that.
>
> I don't see a big problem here, Robert has a patch that takes care of
> counter constraints. It probably needs a bit more work but we'll get
> where we need to be.

yes, but it's a bit of a change from the PMU of previous AMD chips,
going against Ingo's argument that the featureset of all modern CPUs is
somehow converging.

> > And you're not going to like LWP. They got tired of waiting for a
> > workable kernel perf counter interface and moved it completely to
> > usersapce,
>
> I don't know where you get your information but that's absolutely and
> completely not nearly even beginning to smell the truth.

I talked with someone fairly involved in the development with LWP who
implied as much in an off-the-record discussion. You have to admit back
5-6 years old when LWP was being planned it wasn't certain that kernel
support for perf events was *ever* going to make it into Linux.
Though maybe AMD is more concerned about the even worse support in other
OSes. It's true you'd probably know better.

> Look at tip/x86/xsave, looks like LWP support will most likely be in
> 2.6.40.

Really? Does Ingo know yet? I get the impression he doesn't like
perf-event features slipping in under the radar like that.

Vince

2011-04-30 21:08:49

by Alan

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

> See the problem here? Going general event first makes it seriously
> inconvenient to test and so noone is going to do it for you because it's
> such a pain. RAW first is the way to go.

Or you build your patch back in each time.

Lots of us don't run a Linus kernel. Mine gets several patches each
update which mean the disk performance is typically a few percent faster
than the upstream one etc.

Alan

2011-05-01 04:45:56

by Andi Kleen

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

> I would say that most if not all of the events are not generalizable
> in the sense that you are talking about; the events are very
> specific to the Torrent chip. For example, the Torrent chip

It's similar also on Intel chips. There are lots of events
which are useful, but are unlikely to have any equivalents
on other designs (or sometimes not even in later/earlier chip
generations). So such a requirement would make it impossible
to support them.

Given a lot of them are obscure, but a lot of others are not
and they can be very useful for specific analyses.
Computers are getting more and more complex and we need all
the help we can get to understand their behaviour.

For example we've been recently using various Nehalem+ events for NUMA
tuning (memory latency and offcore) and it is very useful and
fuitful. But there are a lot of specialities there which do not extend to
other chips.

I've been working around that now by programming the special registers
in user space from special wrapper scripts, but clearly that's not a good
solution and doesn't also work in all cases.

-Andi

2011-05-01 17:55:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Corey Ashford <[email protected]> wrote:

> Carl Love and I recently completed some work to add perf_events support for
> the IBM Blue Waters machine's "CPU networking" chip, called the Torrent chip.
> We did all of this work based on a RHEL 6 kernel (2.6.32ish), which doesn't
> have Peter's more recent multi-PMU support.
>
> I would say that most if not all of the events are not generalizable in the
> sense that you are talking about; the events are very specific to the Torrent
> chip. [...]

That's ok and not a problem.

The issue here are events that *are* generalizable.

> So if I'm understanding what you have said correctly, we would not be able to
> get a forward port of this code committed without abstracting these events in
> a away that's acceptable to the kernel community. [...]

If the number of events worth generalizing is the empty set that's ok.

Thanks,

Ingo

2011-05-01 18:01:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Andi Kleen <[email protected]> wrote:

> > I would say that most if not all of the events are not generalizable
> > in the sense that you are talking about; the events are very
> > specific to the Torrent chip. For example, the Torrent chip
>
> It's similar also on Intel chips. [...]

You seem to be seriously misinformed about Intel CPUs.

There are a fair number of events on Intel CPUs that can be generalized and
which we have already generalized. Here's a selection:

Performance counter stats for './fill_1b':

2829.562519 task-clock # 0.994 CPUs utilized
27 context-switches # 0.000 M/sec
52 CPU-migrations # 0.000 M/sec
99 page-faults # 0.000 M/sec
8,559,062,611 cycles # 3.025 GHz (20.02%)
2,530,761,381 stalled-cycles-frontend # 29.57% frontend cycles idle (30.03%)
423,070,037 stalled-cycles-backend # 4.94% backend cycles idle (40.04%)
18,043,436,126 instructions # 2.11 insns per cycle
# 0.14 stalled cycles per insn (50.04%)
1,007,704,770 branches # 356.134 M/sec (60.04%)
521,894 branch-misses # 0.05% of all branches (60.02%)
9,424,849 L1-dcache-loads # 3.331 M/sec (50.03%)
1,028,884 L1-dcache-load-misses # 10.92% of all L1-dcache hits (50.02%)
490,266 LLC-loads # 0.173 M/sec (39.99%)
133,226 LLC-load-misses # 0.047 M/sec (10.01%)

2.846836822 seconds time elapsed

Thanks,

Ingo

2011-05-01 18:31:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* Vince Weaver <[email protected]> wrote:

> I spend a lot of time dealing with developers who use perf-counter related
> interfaces all the time. They complain to me *constantly* about the
> drawbacks of perf_events, because PAPI is one step up from the kernel.
>
> I try to get them to interact with the kernel people, but they won't. Do you
> know why? Because they feel like the perf_events developers are rude at
> best, unhelpful in general, and actively anti-anyone-not-using-perf.

Arnaldo, the maintainer of perf tooling (and with whom most users complaining
about perf would be interacting) is one of the most responsive maintainers and
developers i've ever seen. I have not seen him brush off a single user
bugreport or complaint, ever - let alone be 'unhelpful' or be anti-anyone.
Ditto for Peter.

They didnt even brush *you* off, ever.

Let me guess, you just made that argument up, right?

Ingo

2011-05-02 18:32:20

by Corey Ashford

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On 05/01/2011 10:55 AM, Ingo Molnar wrote:
>
> * Corey Ashford <[email protected]> wrote:
>
>> Carl Love and I recently completed some work to add perf_events support for
>> the IBM Blue Waters machine's "CPU networking" chip, called the Torrent chip.
>> We did all of this work based on a RHEL 6 kernel (2.6.32ish), which doesn't
>> have Peter's more recent multi-PMU support.
>>
>> I would say that most if not all of the events are not generalizable in the
>> sense that you are talking about; the events are very specific to the Torrent
>> chip. [...]
>
> That's ok and not a problem.
>
> The issue here are events that *are* generalizable.
>
>> So if I'm understanding what you have said correctly, we would not be able to
>> get a forward port of this code committed without abstracting these events in
>> a away that's acceptable to the kernel community. [...]
>
> If the number of events worth generalizing is the empty set that's ok.
>
> Thanks,
>
> Ingo

Great, that's good to hear.

Thanks,

- Corey

2011-05-09 11:01:29

by Stephane Eranian

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support

On Fri, Apr 29, 2011 at 8:57 PM, Ingo Molnar <[email protected]> wrote:
>
> * Vince Weaver <[email protected]> wrote:
>
>> On Fri, 29 Apr 2011, Ingo Molnar wrote:
>>
>> > Firstly, one technical problem i have with the raw events ABI method is that it
>> > was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel
>> > Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the
>> > declared title of the commit, it was not declared in the changelog either and
>> > it was not my intention to offer such an ABI prematurely either - and i noticed
>> > those two lines too late - but still in time to not let this slip into v2.6.39.
>>
>> The initial patches from November seem to make it clear what is being done
>> here.  I thought it was pretty obvious to those reviewing those patches what
>> was involved.  How would I have known that OFFCORE_RESPONSE support was
>> coming if I didn't see the patches obviously float by on linux-kernel?
>
> Not really, Peter did a lot of review of those patches and they were changed
> beyond recognition from their original form - i think Peter wrote a fair
> portion of the supporting cleanups, as Andi seemed desinterested in acting
> quickly on review feedback.
>

I did spend quite some time looking at the patch, testing it,
debugging it with Lin
Ming. It was all done in the open. We even discussed with Peter the
config1/config2
approach instead of stashing the extra bits in config due to
SandyBridge. During those
months, nobody, absolutely nobody, including YOU, objected to the fact
that the patch
did not provide a generic abstraction for the offcore_response events.
I find it hard
to believe you overlooked that until the last minute. There was no
'under the radar'
behavior. So please, stick to the facts.

> Secondly, Peter posted a patch that might resolve this issue in v2.6.40 - but
> that patch is not cooked yet and you guys have not helped finish it. I'd like
> to see that process play out first - maybe we discover some detail that will
> force us to modify the config1/config2 ABI approach - which we cannot do if
> this is released into v2.6.39 prematurely.
>

I would think the opposite would happen. The config1 is pretty much all you
need to pass the extra config for this event. The hardware is not going to
change from under us on those processors. Keep in mind that offcore_response
is not an architected event and will never be. I would rather see a situation
where you devise mappings to generic events for v2.6.40 and then later you
realize they are wrong. Now, you've changed the behavior of the kernel, it does
not count the same thing anymore. This has already happened with the existing
generic events and will continue to happen based on my limited understanding
of what they're supposed to count.

> Thirdly, and this is my most fundamental objection, i also object to the timing
> of this offcore raw access ABI, because past experience is that we *really* do
> not want to allow raw PMU details without *first* having generic abstractions
> and generic events first.

I am not opposed to generic events. But I don't think they're the
ultimate solution
to all your performance problems: the crystal ball you're trying to sell.

I also don't think users are sloppy either. That's not showing a lot
of considerations
for end-users. I also don't quite follow the reasoning here: "Users are sloppy,
therefore push all the complexity in the "smart" kernel'. What's wrong
with having
smarter tools to help users? The kernel is not necessarily the
solution to all users'
problems. Tool developers are as talented and innovative as kernel developers.

Performance monitoring is not and never will be a 5mn thing you do at
the end of the
day. Same thing for tools, the fact that you write a performance tool
in half a day
is not necessarily a sign that the tool or the kernel API it sits on,
are very good.
What matters is the quality of the data it returns, the quality of the
interpretation
of the data and how it can be translated into program changes that may
eventually lead
to performance improvements. So when I can do a quick:

$ perf stat -e l1-load-misses foo

I want to be sure:
- I understand what I am actually measuring
- I am measuring the same thing on different processors
- what I am measuring does not change at each kernel version

Sure, it spares me the time to read the manual, but I'd like to be sure
I understand what's going on. It is easy to be misled by counts (see below).
As we've discussed earlier, what matters is the ability to associate costs to
events. I think it would be quite hard to associate costs to generic events when
many are just too broad.

Generic events could be a first approximation BUT they need to be very carefully
defined. You need to clearly state what they count. That's really a minimum.
And if they are just approximations, then I need to know to what extent. Those
rules would have to be set across the board. If you start saying that on Intel
these restrictions apply and on AMD another set of restrictions applies, then
what's the point of all of this? "Sloppy" users should not be expected to
sift through the kernel changelog to realize that some generic events have
restrictions or are just vast approximations. Ultimately, the tool has to be
aware of this to warn users. This is the problem with the model, it creates
the illusion of uniformity an stability, when the reality is quite different.

You also need to be more careful in how you map generic events. This
goes back to your
"thinking is hard, ..." argument. You do need to think hard before you
come up with
an event you think would be valuable as a generic event. Such event
becomes valuable
only if it can be mapped on MORE than one processor AND measure the SAME thing.
Failure to do so, means the model is useless.

A quick reading of the Intel event table to find approximate mappings
is not enough.
Given generic events are a center-piece of your design, you need to be
extra cautious
when adding mappings. I would expect you'd write micro-benchmarks to
validate that
the event counts what its generic mapping is defined for.

I am afraid, your recent series of stalls events is not a perfect
illustration of that.
Here is an example:

/* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x1803fb1;

There is a reason this event is called CORE. When HT is on, it counts
what's going on for the
two threads. You're measuring your CPU and the sibling CPU. If you are
stalled and the other
thread is not, you will vastly undercount. This is regardless of the
setting of that ANY bit.
The count is wrong when running per-thread mode. At the user level,
you think you're measuring
stalls in your thread when the reality is very different. This
example just illustrates the danger
of generic events.

Going back to offcore-response, generic events becomes valuable if you
can map them
onto more than one processor. I'd like to understand their mappings on
AMD processors.

As you said, most processors have common micro-architectural
components these days.
But that does NOT mean you can measure them the same way. The Intel
and AMD event
tables are full of examples of that (LLC misses is one). I am not
necessarily happy about
that, but I can understand why this happens. Many times, it is not
possible to compensate
in SW for the HW differences in how an event counts despite its
concept being apparently
simple such as with a cache miss.

> But it's more than that, generalization works even on the *hardware* level:
>
> AMD managed to keep a large chunk of their events stable even across very
> radical changes of the underlying hardware. I have two AMD systems produced
> *10* years apart and they even use the same event encodings for the major
> events.
>
> Intel started introducing stable event definitions a couple of years ago as
> well.
>

I don't agree with this statement. It's not happening. The proof is that Intel
came out with the architected events with the Core micro-architecture. Since,
then, we've had Nehalem, Westmere, Sandy Bridge and the list of architected
events has NOT been extended. I bet you, it won't with follow on processors.
It does not make sense. The micro-architecture keeps changing. Take the uncore
component. It varies between a single-socket and dual-socket WSM and is
totally different on the EX part. You think you can ever get an architected last
level cache miss event that works across the board? The event definition does
matter and it's not a marginal issue.

As for AMD, yes, it has not changed in 10 years, but that does not
mean the problem
is solved and that all events are useful. Furthermore, I am sure
you've seen the AMD
patches for Fam15h processors (Bulldozer), they've added a bunch of
event constraints.

> Basically without proper generalization people get sloppy and go the fast path
> and export very low level, opaque, unstructured PMU interfaces to user-space
> and repeat the Oprofile and perfmon tooling mistakes again and again.
>
> "Thinking is hard, lets go shopping^W exporting raw ABIs."
>

What is your proposal for the proper abstraction for AMD IBS, then?


> We put structure, proper abstractions and easy tooling *ahead* of the interests
> of a small group of people who'd rather prefer a lowlevel, opaque hardware
> channel so that they do not have to *think* about generalization and also
> perhaps so they do not have to share their selection of events and analysis
> methods with others ...
>

Now what? A conspiracy theory. You really think that's the goal of those
people (which I bet include myself)? The reality is quite different. Those
people want to help. They have been looking at this for years. They know
where the pitfalls are and they are trying to raise awareness. They also
want to make sure Linux provides them with an infrastructure on which they
can build better tools for advanced analysis.

Don't go claiming those people will run away once they have raw event access.
Have I not contributed patches to perf_events to make it better and that
despite what happened two years ago?

Nobody is trying to conceal events or analysis techniques (see the presentation
below). People are trying to get what they need based on past experience dealing
with PMU hardware and applications.

Related to that, the following statement on Vince:

> So i think i can tell it with a fairly high confidence factor that you simply
> do not know what you are talking about.

I think this is a gratuitous and unfounded statement. I have known Vince for
years. He has been studying the PMU events for years, writing micro-benchmarks
to really understand what they actually count and their differences
across processors.
So I think he is fully qualified to comment on events.


As described above, there are lots of pitfalls when using PMU events. I'd like
to have to access the events as described in the processor specs. There is no
harm in doing so. This is a way of validating measurements and also a
way of doing
finer grain analysis. The extra 1% of performance does matter for a
lot of applications
and for those you need a lot more than the generic events.

Analysis techniques have been published (not concealed). The following
presentation
given at CERN a few months back is a good example:

https://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/HPC_Perf_analysis_Xeon_5500_5600_intro.pdf

We believe we can build tools to create that decomposition tree. Such
decomposition
needs access to many raw events. Some people have already prototyped tools based
on those analysis techniques:

http://mkortela.web.cern.ch/mkortela/ptuview/

If perf_events does not allow such tools to be built because it is
artificially restricting
access to certain hardware features, then people, incl. myself, may legitimately
question its usefulness.

In summary, I am not a believer in generic events, at least not at the
kernel level.
That does not mean I am against them. However, I am against the ideas that there
should only be generic events and that generic events should come first.

2011-05-10 09:36:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: re-enable Nehalem raw Offcore-Events support


* stephane eranian <[email protected]> wrote:

> > Thirdly, and this is my most fundamental objection, i also object to the
> > timing of this offcore raw access ABI, because past experience is that we
> > *really* do not want to allow raw PMU details without *first* having
> > generic abstractions and generic events first.
>
> I am not opposed to generic events. [...]

Ok - and that's the most important point really.

> [...] But I don't think they're the ultimate solution to all your performance
> problems: the crystal ball you're trying to sell.

I do not claim that and i'm not selling a crystal ball either.

I just see that 90%+ of our users use generic events (most in fact just use
whatever comes as a default, which is cycles) and only a tiny niche uses raw
events. I'm responding to that demand.

[ We saw that with Oprofile already: only an exceedingly small minority *ever*
made use of any event but the default Oprofile came with.

So even with our current generalizations we have more than the typical
developer would use for profiling and we try to not define everything and the
kitchen sink but respond to demand in a common sense way as we see it. ]

And note that i have no problems with and no prejudices against crazy niches
(-rt, anyone?), as long as they *know* that they are crazy and as long as they
help the advancement of the common case!

Really, as a Linux kernel maintainer i'm very easily corrupted by niches: if
you want me to care about your niche you only need to bribe me with
improvements to the more common case! :-)

Note that time is running out to get the offcore bits activated even in
v2.6.40: we are at -rc7 and the merge window is getting closer.

So if you guys care about this code please have a look at Peter's patch and
help test/finish it (or provide a detailed and convincing technical review of
his patch to prove why his approach to provide node level events is impossible
to meet).

Arguing in this thread some more wont help get the code changed i'm afraid!

Thanks,

Ingo