2010-11-26 15:05:26

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:perf/core] perf: Ignore non-sampling overflows

Commit-ID: 963988262c3c8f4234f64a0dde59446a295e07bb
Gitweb: http://git.kernel.org/tip/963988262c3c8f4234f64a0dde59446a295e07bb
Author: Peter Zijlstra <[email protected]>
AuthorDate: Wed, 24 Nov 2010 18:55:29 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 26 Nov 2010 15:14:55 +0100

perf: Ignore non-sampling overflows

Some arch implementations call perf_event_overflow() by 'accident',
ignore this.

Reported-by: Francis Moreau <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/perf_event.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 98c5549..af1e63f 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
struct hw_perf_event *hwc = &event->hw;
int ret = 0;

+ /*
+ * Non-sampling counters might still use the PMI to fold short
+ * hardware counters, ignore those.
+ */
+ if (unlikely(!is_sampling_event(event)))
+ return 0;
+
if (!throttle) {
hwc->interrupts++;
} else {


2010-11-26 19:20:37

by Francis Moreau

[permalink] [raw]
Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

Peter,

tip-bot for Peter Zijlstra <[email protected]> writes:
>
> perf: Ignore non-sampling overflows
>
> Some arch implementations call perf_event_overflow() by 'accident',
> ignore this.
>
> Reported-by: Francis Moreau <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/perf_event.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 98c5549..af1e63f 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
> struct hw_perf_event *hwc = &event->hw;
> int ret = 0;
>
> + /*
> + * Non-sampling counters might still use the PMI to fold short
> + * hardware counters, ignore those.
> + */
> + if (unlikely(!is_sampling_event(event)))
> + return 0;
> +
> if (!throttle) {
> hwc->interrupts++;
> } else {

Couldn't we place this in perf_event_overflow() instead, like the
following ?

--8<---------------cut here---------------start------------->8---

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index df27fd8..1dabb54 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4267,7 +4267,9 @@ int perf_event_overflow(struct perf_event *event,
int nmi,
struct perf_sample_data *data,
struct pt_regs *regs)
{
- return __perf_event_overflow(event, nmi, 1, data, regs);
+ if (is_sampling_event(event))
+ return __perf_event_overflow(event, nmi, 1, data, regs);
+ return 0;
}

/*

--8<---------------cut here---------------end--------------->8---

thanks,
--
Francis

Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On 26.11.10 10:04:56, tip-bot for Peter Zijlstra wrote:
> Commit-ID: 963988262c3c8f4234f64a0dde59446a295e07bb
> Gitweb: http://git.kernel.org/tip/963988262c3c8f4234f64a0dde59446a295e07bb
> Author: Peter Zijlstra <[email protected]>
> AuthorDate: Wed, 24 Nov 2010 18:55:29 +0100
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 26 Nov 2010 15:14:55 +0100
>
> perf: Ignore non-sampling overflows
>
> Some arch implementations call perf_event_overflow() by 'accident',
> ignore this.
>
> Reported-by: Francis Moreau <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/perf_event.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 98c5549..af1e63f 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
> struct hw_perf_event *hwc = &event->hw;
> int ret = 0;
>
> + /*
> + * Non-sampling counters might still use the PMI to fold short
> + * hardware counters, ignore those.
> + */
> + if (unlikely(!is_sampling_event(event)))
> + return 0;
> +

Peter,

do you remember the background of this change. This check silently
drops data of non-sampling events. I want to use perf_event_overflow()
to write to the buffer and want to modify the check, but don't see
which 'accidentally' interrupts may occur that must be ignored.

I was thinking of modifying the check to something like:

static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_type != 0;
}

or

if (unlikely(!is_sampling_event(event) && !event->attr.sample_type))
...

Thanks,

-Robert

> if (!throttle) {
> hwc->interrupts++;
> } else {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Advanced Micro Devices, Inc.
Operating System Research Center

2011-06-28 11:07:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On Tue, 2011-06-28 at 12:53 +0200, Robert Richter wrote:
> > --- a/kernel/perf_event.c
> > +++ b/kernel/perf_event.c
> > @@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
> > struct hw_perf_event *hwc = &event->hw;
> > int ret = 0;
> >
> > + /*
> > + * Non-sampling counters might still use the PMI to fold short
> > + * hardware counters, ignore those.
> > + */
> > + if (unlikely(!is_sampling_event(event)))
> > + return 0;
> > +

> do you remember the background of this change. This check silently
> drops data of non-sampling events. I want to use perf_event_overflow()
> to write to the buffer and want to modify the check, but don't see
> which 'accidentally' interrupts may occur that must be ignored.

IIRC this is because we always program the interrupt bit, such that when
the counter overflows we can account and reprogram the thing. This is
needed because no hardware counter is in fact 64 bits wide. Therefore we
have to program the counter to its max width and properly account the
state and reprogram on overflow.

Imagine a 32bit cycle counter (@1GHz), if we were not to program that as
taking interrupts and nobody would read that counter for about 4.2
seconds, we'd have overflowed and lost the actual count value for the
thing.

So what we do is program is at 31bits (so that the msb can toggle and
trigger the interrupt), and on interrupt add to event->count, and reset
the hardware to start counting again.

Now some arch/*/perf_event.c implementations unconditionally called
perf_event_overflow() from their IRQ handler, even for such non-sampling
counters.

2011-06-28 11:57:56

by Francis Moreau

[permalink] [raw]
Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On Tue, Jun 28, 2011 at 1:05 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2011-06-28 at 12:53 +0200, Robert Richter wrote:
>> > --- a/kernel/perf_event.c
>> > +++ b/kernel/perf_event.c
>> > @@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
>> > ? ? struct hw_perf_event *hwc = &event->hw;
>> > ? ? int ret = 0;
>> >
>> > + ? /*
>> > + ? ?* Non-sampling counters might still use the PMI to fold short
>> > + ? ?* hardware counters, ignore those.
>> > + ? ?*/
>> > + ? if (unlikely(!is_sampling_event(event)))
>> > + ? ? ? ? ? return 0;
>> > +
>
>> do you remember the background of this change. This check silently
>> drops data of non-sampling events. I want to use perf_event_overflow()
>> to write to the buffer and want to modify the check, but don't see
>> which 'accidentally' interrupts may occur that must be ignored.
>
> IIRC this is because we always program the interrupt bit, such that when
> the counter overflows we can account and reprogram the thing. This is
> needed because no hardware counter is in fact 64 bits wide. Therefore we
> have to program the counter to its max width and properly account the
> state and reprogram on overflow.
>
> Imagine a 32bit cycle counter (@1GHz), if we were not to program that as
> taking interrupts and nobody would read that counter for about 4.2
> seconds, we'd have overflowed and lost the actual count value for the
> thing.
>
> So what we do is program is at 31bits (so that the msb can toggle and
> trigger the interrupt), and on interrupt add to event->count, and reset
> the hardware to start counting again.
>
> Now some arch/*/perf_event.c implementations unconditionally called
> perf_event_overflow() from their IRQ handler, even for such non-sampling
> counters.

Yes that's what I recall too.

--
Francis

Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On 28.06.11 07:56:03, Francis Moreau wrote:
> On Tue, Jun 28, 2011 at 1:05 PM, Peter Zijlstra <[email protected]> wrote:
> > On Tue, 2011-06-28 at 12:53 +0200, Robert Richter wrote:
> >> > --- a/kernel/perf_event.c
> >> > +++ b/kernel/perf_event.c
> >> > @@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
> >> > ? ? struct hw_perf_event *hwc = &event->hw;
> >> > ? ? int ret = 0;
> >> >
> >> > + ? /*
> >> > + ? ?* Non-sampling counters might still use the PMI to fold short
> >> > + ? ?* hardware counters, ignore those.
> >> > + ? ?*/
> >> > + ? if (unlikely(!is_sampling_event(event)))
> >> > + ? ? ? ? ? return 0;
> >> > +
> >
> >> do you remember the background of this change. This check silently
> >> drops data of non-sampling events. I want to use perf_event_overflow()
> >> to write to the buffer and want to modify the check, but don't see
> >> which 'accidentally' interrupts may occur that must be ignored.
> >
> > IIRC this is because we always program the interrupt bit, such that when
> > the counter overflows we can account and reprogram the thing. This is
> > needed because no hardware counter is in fact 64 bits wide. Therefore we
> > have to program the counter to its max width and properly account the
> > state and reprogram on overflow.
> >
> > Imagine a 32bit cycle counter (@1GHz), if we were not to program that as
> > taking interrupts and nobody would read that counter for about 4.2
> > seconds, we'd have overflowed and lost the actual count value for the
> > thing.
> >
> > So what we do is program is at 31bits (so that the msb can toggle and
> > trigger the interrupt), and on interrupt add to event->count, and reset
> > the hardware to start counting again.
> >
> > Now some arch/*/perf_event.c implementations unconditionally called
> > perf_event_overflow() from their IRQ handler, even for such non-sampling
> > counters.

I looked at the interrupt handlers. The events are always determined
from a per-cpu array:

cpuc = &__get_cpu_var(cpu_hw_events);
...
event = cpuc->events[idx];

In case of interrupts the event should then always be a hw event (or
uninitialized). Even if the interrupt was triggered by a different
source, it would always be mapped to the same event and the check
is_sampling_event() would be meaningless.

There are other occurrences of perf_event_overflow() in
kernel/events/core.c for events of type PERF_TYPE_SOFTWARE. These
events are initialized with sample_period set and a check would always
be true too.

For both cases I stil don't see a reason for the check.

Anyway, would the following extentension of the check above ok?

if (unlikely(!is_sampling_event(event) && !event->attr.sample_type))
...

With no bits set in attr.sample_type the sample would be empty and
nothing to report. Now, with this change, samples that have data to
report wouldn't be dropped anymore.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2011-06-29 10:52:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On Wed, 2011-06-29 at 12:37 +0200, Robert Richter wrote:

> I looked at the interrupt handlers. The events are always determined
> from a per-cpu array:
>
> cpuc = &__get_cpu_var(cpu_hw_events);
> ...
> event = cpuc->events[idx];
>
> In case of interrupts the event should then always be a hw event (or
> uninitialized). Even if the interrupt was triggered by a different
> source, it would always be mapped to the same event and the check
> is_sampling_event() would be meaningless.

I'm probably not quite getting what you mean, but how is
is_sampling_event() meaningless? the INT bit is enabled for _all_
events, whether they were configured as a sampling event or not.

Its just that for !sampling events we shouldn't attempt to generate any
output.

> There are other occurrences of perf_event_overflow() in
> kernel/events/core.c for events of type PERF_TYPE_SOFTWARE. These
> events are initialized with sample_period set and a check would always
> be true too.

I'm failing to see what you mean, where do we always set
event->attr.sample_period for software events?

> For both cases I stil don't see a reason for the check.

You're going to have to spell things out for me, I'm really not getting
your argument.

> Anyway, would the following extentension of the check above ok?
>
> if (unlikely(!is_sampling_event(event) && !event->attr.sample_type))
> ...
>
> With no bits set in attr.sample_type the sample would be empty and
> nothing to report. Now, with this change, samples that have data to
> report wouldn't be dropped anymore.

Also, could you explain in what way data is dropped? Where do
non-sampling events need to write sample data?

Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On 29.06.11 06:50:55, Peter Zijlstra wrote:
> On Wed, 2011-06-29 at 12:37 +0200, Robert Richter wrote:
>
> > I looked at the interrupt handlers. The events are always determined
> > from a per-cpu array:
> >
> > cpuc = &__get_cpu_var(cpu_hw_events);
> > ...
> > event = cpuc->events[idx];
> >
> > In case of interrupts the event should then always be a hw event (or
> > uninitialized). Even if the interrupt was triggered by a different
> > source, it would always be mapped to the same event and the check
> > is_sampling_event() would be meaningless.
>
> I'm probably not quite getting what you mean, but how is
> is_sampling_event() meaningless? the INT bit is enabled for _all_
> events, whether they were configured as a sampling event or not.

Aren't all events that are mapped to counters via cpu_hw_events always
sampling events? Then, when calling perf_event_overflow() from an
interrupt handler there are no other events than sampling events.

>
> Its just that for !sampling events we shouldn't attempt to generate any
> output.

If attr.sample_type is null, there is no output to generate. Better
use this instead of attr.sample_type in is_sampling_event()?
perf_event_overflow() could be used then to generate output also for
samples where no period is specified.

>
> > There are other occurrences of perf_event_overflow() in
> > kernel/events/core.c for events of type PERF_TYPE_SOFTWARE. These
> > events are initialized with sample_period set and a check would always
> > be true too.
>
> I'm failing to see what you mean, where do we always set
> event->attr.sample_period for software events?

Hmm, I read the code wrong and the check in perf_event_overflow()
might be needed for swevents.

>
> > For both cases I stil don't see a reason for the check.
>
> You're going to have to spell things out for me, I'm really not getting
> your argument.

I was thinking about to change this check and haven't seen cases for
that the check is for. What would happen if the check isn't there and
perf_event_overflow() is called from the interrupt handler?

>
> > Anyway, would the following extentension of the check above ok?
> >
> > if (unlikely(!is_sampling_event(event) && !event->attr.sample_type))
> > ...
> >
> > With no bits set in attr.sample_type the sample would be empty and
> > nothing to report. Now, with this change, samples that have data to
> > report wouldn't be dropped anymore.
>
> Also, could you explain in what way data is dropped? Where do
> non-sampling events need to write sample data?

I stumbled over this while rebasing my perf ibs patches:

http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-ibs

Hope I could explain this to you better now.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2011-06-29 14:40:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On Wed, 2011-06-29 at 16:10 +0200, Robert Richter wrote:

> > I'm probably not quite getting what you mean, but how is
> > is_sampling_event() meaningless? the INT bit is enabled for _all_
> > events, whether they were configured as a sampling event or not.
>
> Aren't all events that are mapped to counters via cpu_hw_events always
> sampling events?

No. perf stat which only counts (has period==0) uses hardware counters
just fine but doesn't sample anything, yet has the INT bit set (as
explained a few emails back).

> Then, when calling perf_event_overflow() from an
> interrupt handler there are no other events than sampling events.

Thus false. (Also, even if we didn't always set the INT bit, it might
see the overflow of a non-sampling event while dealing with the PMI
triggered by another event).

> > Its just that for !sampling events we shouldn't attempt to generate any
> > output.
>
> If attr.sample_type is null, there is no output to generate.

Arguably true, currently we would still write a rudimentary sample,
consisting of just the header with a 0-sized payload. This is, I'd
rather we do that than add yet another conditional on the sample fast
path. If the user doesn't want samples he should've set period==0, if
the does he had better set a non-zero sample_type.

> Better
> use this instead of attr.sample_type in is_sampling_event()?
> perf_event_overflow() could be used then to generate output also for
> samples where no period is specified.

But what for? period==0 is defined as: does not generate samples.

> > You're going to have to spell things out for me, I'm really not getting
> > your argument.
>
> I was thinking about to change this check and haven't seen cases for
> that the check is for. What would happen if the check isn't there and
> perf_event_overflow() is called from the interrupt handler?

It might generate spurious samples, nothing too bad, just unexpected.

> > > Anyway, would the following extentension of the check above ok?
> > >
> > > if (unlikely(!is_sampling_event(event) && !event->attr.sample_type))
> > > ...
> > >
> > > With no bits set in attr.sample_type the sample would be empty and
> > > nothing to report. Now, with this change, samples that have data to
> > > report wouldn't be dropped anymore.
> >
> > Also, could you explain in what way data is dropped? Where do
> > non-sampling events need to write sample data?
>
> I stumbled over this while rebasing my perf ibs patches:
>
> http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-ibs
>
> Hope I could explain this to you better now.

Could you point where exactly in the IBS code this happens? Even for
IBS, if period==0 it should not generate samples. Arguably IBS with
period==0 is pretty pointless, but that's another story.

Subject: Re: [tip:perf/core] perf: Ignore non-sampling overflows

On 29.06.11 10:39:46, Peter Zijlstra wrote:
> On Wed, 2011-06-29 at 16:10 +0200, Robert Richter wrote:
>
> > > I'm probably not quite getting what you mean, but how is
> > > is_sampling_event() meaningless? the INT bit is enabled for _all_
> > > events, whether they were configured as a sampling event or not.
> >
> > Aren't all events that are mapped to counters via cpu_hw_events always
> > sampling events?
>
> No. perf stat which only counts (has period==0) uses hardware counters
> just fine but doesn't sample anything, yet has the INT bit set (as
> explained a few emails back).

Ah, this makes sense.

>
> > Then, when calling perf_event_overflow() from an
> > interrupt handler there are no other events than sampling events.
>
> Thus false. (Also, even if we didn't always set the INT bit, it might
> see the overflow of a non-sampling event while dealing with the PMI
> triggered by another event).
>
> > > Its just that for !sampling events we shouldn't attempt to generate any
> > > output.
> >
> > If attr.sample_type is null, there is no output to generate.
>
> Arguably true, currently we would still write a rudimentary sample,
> consisting of just the header with a 0-sized payload. This is, I'd
> rather we do that than add yet another conditional on the sample fast
> path. If the user doesn't want samples he should've set period==0, if
> the does he had better set a non-zero sample_type.
>
> > Better
> > use this instead of attr.sample_type in is_sampling_event()?
> > perf_event_overflow() could be used then to generate output also for
> > samples where no period is specified.
>
> But what for? period==0 is defined as: does not generate samples.
>
> > > You're going to have to spell things out for me, I'm really not getting
> > > your argument.
> >
> > I was thinking about to change this check and haven't seen cases for
> > that the check is for. What would happen if the check isn't there and
> > perf_event_overflow() is called from the interrupt handler?
>
> It might generate spurious samples, nothing too bad, just unexpected.

Ok, I see its better to check for period==0. I found a solution that
works without changing this check. See below.

>
> > > > Anyway, would the following extentension of the check above ok?
> > > >
> > > > if (unlikely(!is_sampling_event(event) && !event->attr.sample_type))
> > > > ...
> > > >
> > > > With no bits set in attr.sample_type the sample would be empty and
> > > > nothing to report. Now, with this change, samples that have data to
> > > > report wouldn't be dropped anymore.
> > >
> > > Also, could you explain in what way data is dropped? Where do
> > > non-sampling events need to write sample data?
> >
> > I stumbled over this while rebasing my perf ibs patches:
> >
> > http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-ibs
> >
> > Hope I could explain this to you better now.
>
> Could you point where exactly in the IBS code this happens? Even for
> IBS, if period==0 it should not generate samples. Arguably IBS with
> period==0 is pretty pointless, but that's another story.

The original code is this:

http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=blob;f=arch/x86/kernel/cpu/perf_event_amd.c;h=d79d295692d148b4408b26cfeb265cf4dc2e75aa;hb=2ded5ae9883974fb8f0358c6b2434410b5e9e83c#l273

It covers the case there the sampling period may be specified in the
raw config register (line 286). I know, this is not the preferred way,
but this way you can directly set the config value in perf without
bypassing some bits through event->attr. As IBS events are sampling
events I will change the code so that it writes back the specified
period to event->attr.sample_period/event->hw.sample_period. Something
like:

if (event->hw.sample_period) {
...
} else {
max_cnt = event->attr.config & map->cnt_mask;
event->attr.sample_period = max_cnt << 4;
event->hw.sample_period = event->attr.sample_period;
}

This will then proper identify IBS samples as samling event.

Thanks for taking your time.

-Robert

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Advanced Micro Devices, Inc.
Operating System Research Center