2012-02-08 11:32:54

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)


Hi,

On Thu, 26 Jan 2012 17:03:19 +0100
Stephane Eranian <[email protected]> wrote:

> This patch fixes the throttling mechanism. It was broken
> in 3.2.0. Events were not being unthrottled. The unthrottling
> mechanism required that events be checked at each timer tick.

This patch breaks perf on POWER. I haven't had a chance to work out why
yet, but will investigate tomorrow.

Anton

> This patch solves this problem and also separates:
> - unthrottling
> - multiplexing
> - frequency-mode period adjustments
>
> Not all of them need to be executed at each timer tick.
>
> This third version of the patch is based on my original patch +
> PeterZ proposal (https://lkml.org/lkml/2012/1/7/87).
>
> At each timer tick, for each context:
> - if the current CPU has throttled events, we unthrottle events
>
> - if context has frequency-based events, we adjust sampling periods
>
> - if we have reached the jiffies interval, we multiplex (rotate)
>
> We decoupled rotation (multiplexing) from frequency-mode sampling
> period adjustments. They should not necessarily happen at the same
> rate. Multiplexing is subject to jiffies_interval (currently at 1
> but could be higher once the tunable is exposed via sysfs).
>
> We have grouped frequency-mode adjustment and unthrottling into the
> same routine to minimize code duplication. When throttled while in
> frequency mode, we scan the events only once.
>
> We have fixed the threshold enforcement code in
> __perf_event_overflow(). There was a bug whereby it would allow more
> than the authorized rate because an increment of hwc->interrupts was
> not executed at the right place.
>
> The patch was tested with low sampling limit (2000) and fixed periods,
> frequency mode, overcommitted PMU.
>
> On a 2.1GHz AMD CPU:
> $ cat /proc/sys/kernel/perf_event_max_sample_rate
> 2000
>
> We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000):
>
> $ perf record -e cycles,cycles -c 700000 noploop 10
> $ perf report -D | tail -21
> Aggregated stats:
> TOTAL events: 80086
> MMAP events: 88
> COMM events: 2
> EXIT events: 4
> THROTTLE events: 19996
> UNTHROTTLE events: 19996
> SAMPLE events: 40000
> cycles stats:
> TOTAL events: 40006
> MMAP events: 5
> COMM events: 1
> EXIT events: 4
> THROTTLE events: 9998
> UNTHROTTLE events: 9998
> SAMPLE events: 20000
> cycles stats:
> TOTAL events: 39996
> THROTTLE events: 9998
> UNTHROTTLE events: 9998
> SAMPLE events: 20000
>
> For 10s, the cap is 2x2000x10 = 40000 samples.
> We get exactly that: 20000 samples/event.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0b91db2..412b790 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -589,6 +589,7 @@ struct hw_perf_event {
> u64 sample_period;
> u64 last_period;
> local64_t period_left;
> + u64 interrupts_seq;
> u64 interrupts;
>
> u64 freq_time_stamp;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index de859fb..7c3b9de 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2300,6 +2300,9 @@ do { \
> return div64_u64(dividend, divisor);
> }
>
> +static DEFINE_PER_CPU(int, perf_throttled_count);
> +static DEFINE_PER_CPU(u64, perf_throttled_seq);
> +
> static void perf_adjust_period(struct perf_event *event, u64 nsec,
> u64 count) {
> struct hw_perf_event *hwc = &event->hw;
> @@ -2325,16 +2328,29 @@ static void perf_adjust_period(struct
> perf_event *event, u64 nsec, u64 count) }
> }
>
> -static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64
> period) +/*
> + * combine freq adjustment with unthrottling to avoid two passes
> over the
> + * events. At the same time, make sure, having freq events does not
> change
> + * the rate of unthrottling as that would introduce bias.
> + */
> +static void perf_adjust_freq_unthr_context(struct perf_event_context
> *ctx,
> + int needs_unthr)
> {
> struct perf_event *event;
> struct hw_perf_event *hwc;
> - u64 interrupts, now;
> + u64 now, period = TICK_NSEC;
> s64 delta;
>
> - if (!ctx->nr_freq)
> + /*
> + * only need to iterate over all events iff:
> + * - context have events in frequency mode (needs freq
> adjust)
> + * - there are events to unthrottle on this cpu
> + */
> + if (!(ctx->nr_freq || needs_unthr))
> return;
>
> + raw_spin_lock(&ctx->lock);
> +
> list_for_each_entry_rcu(event, &ctx->event_list,
> event_entry) { if (event->state != PERF_EVENT_STATE_ACTIVE)
> continue;
> @@ -2344,13 +2360,8 @@ static void perf_ctx_adjust_freq(struct
> perf_event_context *ctx, u64 period)
> hwc = &event->hw;
>
> - interrupts = hwc->interrupts;
> - hwc->interrupts = 0;
> -
> - /*
> - * unthrottle events on the tick
> - */
> - if (interrupts == MAX_INTERRUPTS) {
> + if (needs_unthr && hwc->interrupts ==
> MAX_INTERRUPTS) {
> + hwc->interrupts = 0;
> perf_log_throttle(event, 1);
> event->pmu->start(event, 0);
> }
> @@ -2358,14 +2369,26 @@ static void perf_ctx_adjust_freq(struct
> perf_event_context *ctx, u64 period) if (!event->attr.freq
> || !event->attr.sample_freq) continue;
>
> - event->pmu->read(event);
> + /*
> + * stop the event and update event->count
> + */
> + event->pmu->stop(event, PERF_EF_UPDATE);
> +
> now = local64_read(&event->count);
> delta = now - hwc->freq_count_stamp;
> hwc->freq_count_stamp = now;
>
> + /*
> + * restart the event
> + * reload only if value has changed
> + */
> if (delta > 0)
> perf_adjust_period(event, period, delta);
> +
> + event->pmu->start(event, delta > 0 ?
> PERF_EF_RELOAD : 0); }
> +
> + raw_spin_unlock(&ctx->lock);
> }
>
> /*
> @@ -2388,16 +2411,13 @@ static void rotate_ctx(struct
> perf_event_context *ctx) */
> static void perf_rotate_context(struct perf_cpu_context *cpuctx)
> {
> - u64 interval = (u64)cpuctx->jiffies_interval * TICK_NSEC;
> struct perf_event_context *ctx = NULL;
> - int rotate = 0, remove = 1, freq = 0;
> + int rotate = 0, remove = 1;
>
> if (cpuctx->ctx.nr_events) {
> remove = 0;
> if (cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
> rotate = 1;
> - if (cpuctx->ctx.nr_freq)
> - freq = 1;
> }
>
> ctx = cpuctx->task_ctx;
> @@ -2405,37 +2425,26 @@ static void perf_rotate_context(struct
> perf_cpu_context *cpuctx) remove = 0;
> if (ctx->nr_events != ctx->nr_active)
> rotate = 1;
> - if (ctx->nr_freq)
> - freq = 1;
> }
>
> - if (!rotate && !freq)
> + if (!rotate)
> goto done;
>
> perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> perf_pmu_disable(cpuctx->ctx.pmu);
>
> - if (freq) {
> - perf_ctx_adjust_freq(&cpuctx->ctx, interval);
> - if (ctx)
> - perf_ctx_adjust_freq(ctx, interval);
> - }
> -
> - if (rotate) {
> - cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
> - if (ctx)
> - ctx_sched_out(ctx, cpuctx, EVENT_FLEXIBLE);
> + cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
> + if (ctx)
> + ctx_sched_out(ctx, cpuctx, EVENT_FLEXIBLE);
>
> - rotate_ctx(&cpuctx->ctx);
> - if (ctx)
> - rotate_ctx(ctx);
> + rotate_ctx(&cpuctx->ctx);
> + if (ctx)
> + rotate_ctx(ctx);
>
> - perf_event_sched_in(cpuctx, ctx, current);
> - }
> + perf_event_sched_in(cpuctx, ctx, current);
>
> perf_pmu_enable(cpuctx->ctx.pmu);
> perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> -
> done:
> if (remove)
> list_del_init(&cpuctx->rotation_list);
> @@ -2445,10 +2454,22 @@ void perf_event_task_tick(void)
> {
> struct list_head *head = &__get_cpu_var(rotation_list);
> struct perf_cpu_context *cpuctx, *tmp;
> + struct perf_event_context *ctx;
> + int throttled;
>
> WARN_ON(!irqs_disabled());
>
> + __this_cpu_inc(perf_throttled_seq);
> + throttled = __this_cpu_xchg(perf_throttled_count, 0);
> +
> list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
> + ctx = &cpuctx->ctx;
> + perf_adjust_freq_unthr_context(ctx, throttled);
> +
> + ctx = cpuctx->task_ctx;
> + if (ctx)
> + perf_adjust_freq_unthr_context(ctx,
> throttled); +
> if (cpuctx->jiffies_interval == 1 ||
> !(jiffies %
> cpuctx->jiffies_interval)) perf_rotate_context(cpuctx);
> @@ -4514,6 +4535,7 @@ static int __perf_event_overflow(struct
> perf_event *event, {
> int events = atomic_read(&event->event_limit);
> struct hw_perf_event *hwc = &event->hw;
> + u64 seq;
> int ret = 0;
>
> /*
> @@ -4523,14 +4545,20 @@ static int __perf_event_overflow(struct
> perf_event *event, if (unlikely(!is_sampling_event(event)))
> return 0;
>
> - if (unlikely(hwc->interrupts >= max_samples_per_tick)) {
> - if (throttle) {
> + seq = __this_cpu_read(perf_throttled_seq);
> + if (seq != hwc->interrupts_seq) {
> + hwc->interrupts_seq = seq;
> + hwc->interrupts = 1;
> + } else {
> + hwc->interrupts++;
> + if (unlikely(throttle
> + && hwc->interrupts >=
> max_samples_per_tick)) {
> + __this_cpu_inc(perf_throttled_count);
> hwc->interrupts = MAX_INTERRUPTS;
> perf_log_throttle(event, 0);
> ret = 1;
> }
> - } else
> - hwc->interrupts++;
> + }
>
> if (event->attr.freq) {
> u64 now = perf_clock();
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


2012-02-08 11:43:47

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)

On Wed, Feb 8, 2012 at 12:32 PM, Anton Blanchard <[email protected]> wrote:
>
> Hi,
>
> On Thu, 26 Jan 2012 17:03:19 +0100
> Stephane Eranian <[email protected]> wrote:
>
>> This patch fixes the throttling mechanism. It was broken
>> in 3.2.0. Events were not being unthrottled. The unthrottling
>> mechanism required that events be checked at each timer tick.
>
> This patch breaks perf on POWER. I haven't had a chance to work out why
> yet, but will investigate tomorrow.
>
Did you apply the subsequent patch I posted yesterday:

https://lkml.org/lkml/2012/2/7/185

> Anton
>
>> This patch solves this problem and also separates:
>>   - unthrottling
>>   - multiplexing
>>   - frequency-mode period adjustments
>>
>> Not all of them need to be executed at each timer tick.
>>
>> This third version of the patch is based on my original patch +
>> PeterZ proposal (https://lkml.org/lkml/2012/1/7/87).
>>
>> At each timer tick, for each context:
>>   - if the current CPU has throttled events, we unthrottle events
>>
>>   - if context has frequency-based events, we adjust sampling periods
>>
>>   - if we have reached the jiffies interval, we multiplex (rotate)
>>
>> We decoupled rotation (multiplexing) from frequency-mode sampling
>> period adjustments.  They should not necessarily happen at the same
>> rate. Multiplexing is subject to jiffies_interval (currently at 1
>> but could be higher once the tunable is exposed via sysfs).
>>
>> We have grouped frequency-mode adjustment and unthrottling into the
>> same routine to minimize code duplication. When throttled while in
>> frequency mode, we scan the events only once.
>>
>> We have fixed the threshold enforcement code in
>> __perf_event_overflow(). There was a bug whereby it would allow more
>> than the authorized rate because an increment of hwc->interrupts was
>> not executed at the right place.
>>
>> The patch was tested with low sampling limit (2000) and fixed periods,
>> frequency mode, overcommitted PMU.
>>
>> On a 2.1GHz AMD CPU:
>> $ cat /proc/sys/kernel/perf_event_max_sample_rate
>> 2000
>>
>> We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000):
>>
>> $ perf record -e cycles,cycles -c 700000  noploop 10
>> $ perf report -D | tail -21
>> Aggregated stats:
>>            TOTAL events:      80086
>>             MMAP events:         88
>>             COMM events:          2
>>             EXIT events:          4
>>         THROTTLE events:      19996
>>       UNTHROTTLE events:      19996
>>           SAMPLE events:      40000
>> cycles stats:
>>            TOTAL events:      40006
>>             MMAP events:          5
>>             COMM events:          1
>>             EXIT events:          4
>>         THROTTLE events:       9998
>>       UNTHROTTLE events:       9998
>>           SAMPLE events:      20000
>> cycles stats:
>>            TOTAL events:      39996
>>         THROTTLE events:       9998
>>       UNTHROTTLE events:       9998
>>           SAMPLE events:      20000
>>
>> For 10s, the cap is 2x2000x10 = 40000 samples.
>> We get exactly that: 20000 samples/event.
>>
>> Signed-off-by: Stephane Eranian <[email protected]>
>> ---
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 0b91db2..412b790 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -589,6 +589,7 @@ struct hw_perf_event {
>>       u64                             sample_period;
>>       u64                             last_period;
>>       local64_t                       period_left;
>> +     u64                             interrupts_seq;
>>       u64                             interrupts;
>>
>>       u64                             freq_time_stamp;
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index de859fb..7c3b9de 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -2300,6 +2300,9 @@ do {                                    \
>>       return div64_u64(dividend, divisor);
>>  }
>>
>> +static DEFINE_PER_CPU(int, perf_throttled_count);
>> +static DEFINE_PER_CPU(u64, perf_throttled_seq);
>> +
>>  static void perf_adjust_period(struct perf_event *event, u64 nsec,
>> u64 count) {
>>       struct hw_perf_event *hwc = &event->hw;
>> @@ -2325,16 +2328,29 @@ static void perf_adjust_period(struct
>> perf_event *event, u64 nsec, u64 count) }
>>  }
>>
>> -static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64
>> period) +/*
>> + * combine freq adjustment with unthrottling to avoid two passes
>> over the
>> + * events. At the same time, make sure, having freq events does not
>> change
>> + * the rate of unthrottling as that would introduce bias.
>> + */
>> +static void perf_adjust_freq_unthr_context(struct perf_event_context
>> *ctx,
>> +                                        int needs_unthr)
>>  {
>>       struct perf_event *event;
>>       struct hw_perf_event *hwc;
>> -     u64 interrupts, now;
>> +     u64 now, period = TICK_NSEC;
>>       s64 delta;
>>
>> -     if (!ctx->nr_freq)
>> +     /*
>> +      * only need to iterate over all events iff:
>> +      * - context have events in frequency mode (needs freq
>> adjust)
>> +      * - there are events to unthrottle on this cpu
>> +      */
>> +     if (!(ctx->nr_freq || needs_unthr))
>>               return;
>>
>> +     raw_spin_lock(&ctx->lock);
>> +
>>       list_for_each_entry_rcu(event, &ctx->event_list,
>> event_entry) { if (event->state != PERF_EVENT_STATE_ACTIVE)
>>                       continue;
>> @@ -2344,13 +2360,8 @@ static void perf_ctx_adjust_freq(struct
>> perf_event_context *ctx, u64 period)
>>               hwc = &event->hw;
>>
>> -             interrupts = hwc->interrupts;
>> -             hwc->interrupts = 0;
>> -
>> -             /*
>> -              * unthrottle events on the tick
>> -              */
>> -             if (interrupts == MAX_INTERRUPTS) {
>> +             if (needs_unthr && hwc->interrupts ==
>> MAX_INTERRUPTS) {
>> +                     hwc->interrupts = 0;
>>                       perf_log_throttle(event, 1);
>>                       event->pmu->start(event, 0);
>>               }
>> @@ -2358,14 +2369,26 @@ static void perf_ctx_adjust_freq(struct
>> perf_event_context *ctx, u64 period) if (!event->attr.freq
>> || !event->attr.sample_freq) continue;
>>
>> -             event->pmu->read(event);
>> +             /*
>> +              * stop the event and update event->count
>> +              */
>> +             event->pmu->stop(event, PERF_EF_UPDATE);
>> +
>>               now = local64_read(&event->count);
>>               delta = now - hwc->freq_count_stamp;
>>               hwc->freq_count_stamp = now;
>>
>> +             /*
>> +              * restart the event
>> +              * reload only if value has changed
>> +              */
>>               if (delta > 0)
>>                       perf_adjust_period(event, period, delta);
>> +
>> +             event->pmu->start(event, delta > 0 ?
>> PERF_EF_RELOAD : 0); }
>> +
>> +     raw_spin_unlock(&ctx->lock);
>>  }
>>
>>  /*
>> @@ -2388,16 +2411,13 @@ static void rotate_ctx(struct
>> perf_event_context *ctx) */
>>  static void perf_rotate_context(struct perf_cpu_context *cpuctx)
>>  {
>> -     u64 interval = (u64)cpuctx->jiffies_interval * TICK_NSEC;
>>       struct perf_event_context *ctx = NULL;
>> -     int rotate = 0, remove = 1, freq = 0;
>> +     int rotate = 0, remove = 1;
>>
>>       if (cpuctx->ctx.nr_events) {
>>               remove = 0;
>>               if (cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
>>                       rotate = 1;
>> -             if (cpuctx->ctx.nr_freq)
>> -                     freq = 1;
>>       }
>>
>>       ctx = cpuctx->task_ctx;
>> @@ -2405,37 +2425,26 @@ static void perf_rotate_context(struct
>> perf_cpu_context *cpuctx) remove = 0;
>>               if (ctx->nr_events != ctx->nr_active)
>>                       rotate = 1;
>> -             if (ctx->nr_freq)
>> -                     freq = 1;
>>       }
>>
>> -     if (!rotate && !freq)
>> +     if (!rotate)
>>               goto done;
>>
>>       perf_ctx_lock(cpuctx, cpuctx->task_ctx);
>>       perf_pmu_disable(cpuctx->ctx.pmu);
>>
>> -     if (freq) {
>> -             perf_ctx_adjust_freq(&cpuctx->ctx, interval);
>> -             if (ctx)
>> -                     perf_ctx_adjust_freq(ctx, interval);
>> -     }
>> -
>> -     if (rotate) {
>> -             cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>> -             if (ctx)
>> -                     ctx_sched_out(ctx, cpuctx, EVENT_FLEXIBLE);
>> +     cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>> +     if (ctx)
>> +             ctx_sched_out(ctx, cpuctx, EVENT_FLEXIBLE);
>>
>> -             rotate_ctx(&cpuctx->ctx);
>> -             if (ctx)
>> -                     rotate_ctx(ctx);
>> +     rotate_ctx(&cpuctx->ctx);
>> +     if (ctx)
>> +             rotate_ctx(ctx);
>>
>> -             perf_event_sched_in(cpuctx, ctx, current);
>> -     }
>> +     perf_event_sched_in(cpuctx, ctx, current);
>>
>>       perf_pmu_enable(cpuctx->ctx.pmu);
>>       perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
>> -
>>  done:
>>       if (remove)
>>               list_del_init(&cpuctx->rotation_list);
>> @@ -2445,10 +2454,22 @@ void perf_event_task_tick(void)
>>  {
>>       struct list_head *head = &__get_cpu_var(rotation_list);
>>       struct perf_cpu_context *cpuctx, *tmp;
>> +     struct perf_event_context *ctx;
>> +     int throttled;
>>
>>       WARN_ON(!irqs_disabled());
>>
>> +     __this_cpu_inc(perf_throttled_seq);
>> +     throttled = __this_cpu_xchg(perf_throttled_count, 0);
>> +
>>       list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
>> +             ctx = &cpuctx->ctx;
>> +             perf_adjust_freq_unthr_context(ctx, throttled);
>> +
>> +             ctx = cpuctx->task_ctx;
>> +             if (ctx)
>> +                     perf_adjust_freq_unthr_context(ctx,
>> throttled); +
>>               if (cpuctx->jiffies_interval == 1 ||
>>                               !(jiffies %
>> cpuctx->jiffies_interval)) perf_rotate_context(cpuctx);
>> @@ -4514,6 +4535,7 @@ static int __perf_event_overflow(struct
>> perf_event *event, {
>>       int events = atomic_read(&event->event_limit);
>>       struct hw_perf_event *hwc = &event->hw;
>> +     u64 seq;
>>       int ret = 0;
>>
>>       /*
>> @@ -4523,14 +4545,20 @@ static int __perf_event_overflow(struct
>> perf_event *event, if (unlikely(!is_sampling_event(event)))
>>               return 0;
>>
>> -     if (unlikely(hwc->interrupts >= max_samples_per_tick)) {
>> -             if (throttle) {
>> +     seq = __this_cpu_read(perf_throttled_seq);
>> +     if (seq != hwc->interrupts_seq) {
>> +             hwc->interrupts_seq = seq;
>> +             hwc->interrupts = 1;
>> +     } else {
>> +             hwc->interrupts++;
>> +             if (unlikely(throttle
>> +                          && hwc->interrupts >=
>> max_samples_per_tick)) {
>> +                     __this_cpu_inc(perf_throttled_count);
>>                       hwc->interrupts = MAX_INTERRUPTS;
>>                       perf_log_throttle(event, 0);
>>                       ret = 1;
>>               }
>> -     } else
>> -             hwc->interrupts++;
>> +     }
>>
>>       if (event->attr.freq) {
>>               u64 now = perf_clock();
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>

2012-02-10 03:17:44

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)

Stephane Eranian [[email protected]] wrote:
| On Wed, Feb 8, 2012 at 12:32 PM, Anton Blanchard <[email protected]> wrote:
| >
| > Hi,
| >
| > On Thu, 26 Jan 2012 17:03:19 +0100
| > Stephane Eranian <[email protected]> wrote:
| >
| >> This patch fixes the throttling mechanism. It was broken
| >> in 3.2.0. Events were not being unthrottled. The unthrottling
| >> mechanism required that events be checked at each timer tick.
| >
| > This patch breaks perf on POWER. I haven't had a chance to work out why
| > yet, but will investigate tomorrow.
| >
| Did you apply the subsequent patch I posted yesterday:
|
| https://lkml.org/lkml/2012/2/7/185

I applied both patches to the powerpc.git tree - following hunk already
exists in the powerpc tree (and even Jan 27 mainline) so I skipped this
hunk.

--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -988,6 +988,9 @@ static void x86_pmu_start(struct perf_event *event, int flags)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx = event->hw.idx;

+ if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+ return;
+
if (WARN_ON_ONCE(idx == -1))
return;

I tried this on Power7 system with a simple 'nooploop' program that loops
in while(1) for 10 seconds.

Results before the patches were applied:

# /tmp/perf record -e cycles,cycles /tmp/nooploop 10
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.886 MB perf.data (~38714 samples) ]

# /tmp/perf report -D | tail -15

Aggregated stats:
TOTAL events: 19307
MMAP events: 42
COMM events: 2
EXIT events: 2
SAMPLE events: 19261
cycles stats:
TOTAL events: 9993
MMAP events: 4
COMM events: 1
EXIT events: 2
SAMPLE events: 9986
cycles stats:
TOTAL events: 9275
SAMPLE events: 9275

After applying both patches:

# /tmp/perf record -e cycles,cycles /tmp/nooploop 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.006 MB perf.data (~260 samples) ]

# /tmp/perf report -D | tail -15

Aggregated stats:
TOTAL events: 80
MMAP events: 42
COMM events: 2
EXIT events: 2
SAMPLE events: 34
cycles stats:
TOTAL events: 24
MMAP events: 4
COMM events: 1
EXIT events: 2
SAMPLE events: 17

cycles stats:
TOTAL events: 17
SAMPLE events: 17

Sukadev

2012-02-10 06:18:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)

On Thu, 2012-02-09 at 19:19 -0800, Sukadev Bhattiprolu wrote:
> Stephane Eranian [[email protected]] wrote:
> | On Wed, Feb 8, 2012 at 12:32 PM, Anton Blanchard <[email protected]> wrote:
> | >
> | > Hi,
> | >
> | > On Thu, 26 Jan 2012 17:03:19 +0100
> | > Stephane Eranian <[email protected]> wrote:
> | >
> | >> This patch fixes the throttling mechanism. It was broken
> | >> in 3.2.0. Events were not being unthrottled. The unthrottling
> | >> mechanism required that events be checked at each timer tick.
> | >
> | > This patch breaks perf on POWER. I haven't had a chance to work out why
> | > yet, but will investigate tomorrow.
> | >
> | Did you apply the subsequent patch I posted yesterday:
> |
> | https://lkml.org/lkml/2012/2/7/185
>
> I applied both patches to the powerpc.git tree - following hunk already
> exists in the powerpc tree (and even Jan 27 mainline) so I skipped this
> hunk.

The powerpc .git tree is not useful at this point, I haven't opened
powerpc -next yet so it's just all old stale stuff. Please work with
upstream always unless you have a very good reason to use my tree. It's
really only meant as a conduit between me and Linus or for -next.

Cheers,
Ben.

2012-02-15 02:34:12

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)

Benjamin Herrenschmidt [[email protected]] wrote:
| On Thu, 2012-02-09 at 19:19 -0800, Sukadev Bhattiprolu wrote:
| > Stephane Eranian [[email protected]] wrote:
| > | On Wed, Feb 8, 2012 at 12:32 PM, Anton Blanchard <[email protected]> wrote:
| > | >
| > | > Hi,
| > | >
| > | > On Thu, 26 Jan 2012 17:03:19 +0100
| The powerpc .git tree is not useful at this point, I haven't opened
| powerpc -next yet so it's just all old stale stuff. Please work with
| upstream always unless you have a very good reason to use my tree. It's
| really only meant as a conduit between me and Linus or for -next.

Ok.

I tried this on current mainline (commit 13d261932) which now includes both
of Stephane's patches and repeated the nooploop test.

The problem still exists - the number of events reported on Power is far
fewer than without the patches.

Aggregated stats:
TOTAL events: 78
MMAP events: 47
COMM events: 2
EXIT events: 2
SAMPLE events: 27
cycles stats:
TOTAL events: 21
MMAP events: 4
COMM events: 1
EXIT events: 2
SAMPLE events: 14
cycles stats:
TOTAL events: 13
SAMPLE events: 13

Sukadev

2012-02-19 11:26:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)

Did Anton recent fix solve that problem?


On Wed, Feb 15, 2012 at 3:37 AM, Sukadev Bhattiprolu
<[email protected]> wrote:
> Benjamin Herrenschmidt [[email protected]] wrote:
> | On Thu, 2012-02-09 at 19:19 -0800, Sukadev Bhattiprolu wrote:
> | > Stephane Eranian [[email protected]] wrote:
> | > | On Wed, Feb 8, 2012 at 12:32 PM, Anton Blanchard <[email protected]> wrote:
> | > | >
> | > | > Hi,
> | > | >
> | > | > On Thu, 26 Jan 2012 17:03:19 +0100
> | The powerpc .git tree is not useful at this point, I haven't opened
> | powerpc -next yet so it's just all old stale stuff. Please work with
> | upstream always unless you have a very good reason to use my tree. It's
> | really only meant as a conduit between me and Linus or for -next.
>
> Ok.
>
> I tried this on current mainline (commit 13d261932) which now includes both
> of Stephane's patches and repeated the nooploop test.
>
> The problem still exists - the number of events reported on Power is far
> fewer than without the patches.
>
> Aggregated stats:
>           TOTAL events:         78
>            MMAP events:         47
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:         27
> cycles stats:
>           TOTAL events:         21
>            MMAP events:          4
>            COMM events:          1
>            EXIT events:          2
>          SAMPLE events:         14
> cycles stats:
>           TOTAL events:         13
>          SAMPLE events:         13
>
> Sukadev
>

2012-02-20 19:02:16

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH] perf_events: fix broken intr throttling (v3)

Stephane Eranian [[email protected]] wrote:
| Did Anton recent fix solve that problem?

Yes - could not reproduce on -rc4.