2016-03-29 01:33:42

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH] perf/core: Fix time tracking bug with multiplexing

This patch fixes a bug introduced by:

commit 3cbaa59069677920186dcf502632ca1df4329f80
Author: Peter Zijlstra <[email protected]>
Date: Wed Feb 24 18:45:47 2016 +0100

perf: Fix ctx time tracking by introducing EVENT_TIME

This patch introduce a bug in the time tracking of events when multiplexing is used.

The issue is easily reproducible with the following perf run:

$ perf stat -a -C 0 -e branches,branches,branches,branches,branches -I 1000
1.000730239 652,394 branches (66.41%)
1.000730239 597,809 branches (66.41%)
1.000730239 593,870 branches (66.63%)
1.000730239 651,440 branches (67.03%)
1.000730239 656,725 branches (66.96%)
1.000730239 <not counted> branches

One branches event is shown as not having run. Yet, with multiplexing, all events
should run especially with a 1s (-I 1000) interval. The delta for time_running
comes out to 0. Yet, the event has run because the kernel is actually multiplexing
the events. The problem is that the time tracking is the kernel and especially in
ctx_sched_out() is wrong now.

The problem is that in case that the kernel enters ctx_sched_out() with the
following state:
ctx->is_active=0x7 event_type=0x1
Call Trace:
[<ffffffff813ddd41>] dump_stack+0x63/0x82
[<ffffffff81182bdc>] ctx_sched_out+0x2bc/0x2d0
[<ffffffff81183896>] perf_mux_hrtimer_handler+0xf6/0x2c0
[<ffffffff811837a0>] ? __perf_install_in_context+0x130/0x130
[<ffffffff810f5818>] __hrtimer_run_queues+0xf8/0x2f0
[<ffffffff810f6097>] hrtimer_interrupt+0xb7/0x1d0
[<ffffffff810509a8>] local_apic_timer_interrupt+0x38/0x60
[<ffffffff8175ca9d>] smp_apic_timer_interrupt+0x3d/0x50
[<ffffffff8175ac7c>] apic_timer_interrupt+0x8c/0xa0

In that case, the test:
if (is_active & EVENT_TIME)

will be false and the time will not be updated. Time must always be updated on
sched out. This patch fixes the problem.

Patch is relative to PeterZ queue.tip perf/core branch.

Signed-off-by: Stephane Eranian <[email protected]>
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 339aa46..4ba41f6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2447,7 +2447,7 @@ static void ctx_sched_out(struct perf_event_context *ctx,

is_active ^= ctx->is_active; /* changed bits */

- if (is_active & EVENT_TIME) {
+ if (ctx->is_active & EVENT_TIME) {
/* update (and stop) ctx time */
update_context_time(ctx);
update_cgrp_time_from_cpuctx(cpuctx);
--
2.5.0


2016-03-29 07:27:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf/core: Fix time tracking bug with multiplexing

On Tue, Mar 29, 2016 at 03:33:23AM +0200, Stephane Eranian wrote:
> This patch fixes a bug introduced by:
>
> commit 3cbaa59069677920186dcf502632ca1df4329f80
> Author: Peter Zijlstra <[email protected]>
> Date: Wed Feb 24 18:45:47 2016 +0100
>
> perf: Fix ctx time tracking by introducing EVENT_TIME

Normal quoting style is:

3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")

I have the following git alias to help with that:

one = show -s --pretty='format:%h (\"%s\")'

> The problem is that in case that the kernel enters ctx_sched_out() with the
> following state:
> ctx->is_active=0x7 event_type=0x1
> Call Trace:
> [<ffffffff813ddd41>] dump_stack+0x63/0x82
> [<ffffffff81182bdc>] ctx_sched_out+0x2bc/0x2d0
> [<ffffffff81183896>] perf_mux_hrtimer_handler+0xf6/0x2c0
> [<ffffffff811837a0>] ? __perf_install_in_context+0x130/0x130
> [<ffffffff810f5818>] __hrtimer_run_queues+0xf8/0x2f0
> [<ffffffff810f6097>] hrtimer_interrupt+0xb7/0x1d0
> [<ffffffff810509a8>] local_apic_timer_interrupt+0x38/0x60
> [<ffffffff8175ca9d>] smp_apic_timer_interrupt+0x3d/0x50
> [<ffffffff8175ac7c>] apic_timer_interrupt+0x8c/0xa0
>
> In that case, the test:
> if (is_active & EVENT_TIME)
>
> will be false and the time will not be updated. Time must always be updated on
> sched out. This patch fixes the problem.

Humm, no. It breaks things like ctx_sched_out(.event_type = EVENT_ALL),
which will set ctx->is_active = 0, and then not update time.

> +++ b/kernel/events/core.c
> @@ -2447,7 +2447,7 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>
> is_active ^= ctx->is_active; /* changed bits */
>
> - if (is_active & EVENT_TIME) {
> + if (ctx->is_active & EVENT_TIME) {
> /* update (and stop) ctx time */
> update_context_time(ctx);
> update_cgrp_time_from_cpuctx(cpuctx);

I think you want something like this.

---
kernel/events/core.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 614614821f00..10ee22b8d2a8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2402,14 +2402,24 @@ static void ctx_sched_out(struct perf_event_context *ctx,
cpuctx->task_ctx = NULL;
}

- is_active ^= ctx->is_active; /* changed bits */
-
+ /*
+ * Always update time if it was set; not only when it changes.
+ * Otherwise we can 'forget' to update time for any but the last
+ * context we sched out. For example:
+ *
+ * ctx_sched_out(.event_type = EVENT_FLEXIBLE)
+ * ctx_sched_out(.event_type = EVENT_PINNED)
+ *
+ * would only update time for the pinned events.
+ */
if (is_active & EVENT_TIME) {
/* update (and stop) ctx time */
update_context_time(ctx);
update_cgrp_time_from_cpuctx(cpuctx);
}

+ is_active ^= ctx->is_active; /* changed bits */
+
if (!ctx->nr_active || !(is_active & EVENT_ALL))
return;


2016-03-29 15:43:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf/core: Fix time tracking bug with multiplexing

Em Tue, Mar 29, 2016 at 09:26:44AM +0200, Peter Zijlstra escreveu:
> On Tue, Mar 29, 2016 at 03:33:23AM +0200, Stephane Eranian wrote:
> > This patch fixes a bug introduced by:
> >
> > commit 3cbaa59069677920186dcf502632ca1df4329f80
> > Author: Peter Zijlstra <[email protected]>
> > Date: Wed Feb 24 18:45:47 2016 +0100
> >
> > perf: Fix ctx time tracking by introducing EVENT_TIME
>
> Normal quoting style is:
>
> 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
>
> I have the following git alias to help with that:
>
> one = show -s --pretty='format:%h (\"%s\")'

Cool, for completeness sake:

$ git config --global alias.one "show -s --pretty='format:%h (\"%s\")'"
$ git one 3cbaa59069677920186dcf502632ca1df4329f80
3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")

Make sure you have this in place:

$ git config --global core.abbrev 12

I also have this, FWIW:

$ cat ~/bin/fixes
#!/bin/bash

if [ $# -eq 1 ] ; then
cset=$1
else
read cset
fi
git show --pretty=fuller $cset | grep '^\(Author\|Commit\): ' | sed -r 's/.*: +/Cc: /g'
echo "Fixes: " `git one $cset`
$ vim ~/bin/fixes
$ fixes 3cbaa59069677920186dcf502632ca1df4329f80
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Fixes 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
$

So that, in vim, I can select the changeset, then do: ":'<,'>!fixes", so
that the author and commiter of the fixed cset gets on the CC list.

- Arnaldo

2016-03-29 21:23:38

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH] perf/core: Fix time tracking bug with multiplexing

On Tue, Mar 29, 2016 at 12:26 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Mar 29, 2016 at 03:33:23AM +0200, Stephane Eranian wrote:
> > This patch fixes a bug introduced by:
> >
> > commit 3cbaa59069677920186dcf502632ca1df4329f80
> > Author: Peter Zijlstra <[email protected]>
> > Date: Wed Feb 24 18:45:47 2016 +0100
> >
> > perf: Fix ctx time tracking by introducing EVENT_TIME
>
> Normal quoting style is:
>
> 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
>
> I have the following git alias to help with that:
>
> one = show -s --pretty='format:%h (\"%s\")'
>
> > The problem is that in case that the kernel enters ctx_sched_out() with the
> > following state:
> > ctx->is_active=0x7 event_type=0x1
> > Call Trace:
> > [<ffffffff813ddd41>] dump_stack+0x63/0x82
> > [<ffffffff81182bdc>] ctx_sched_out+0x2bc/0x2d0
> > [<ffffffff81183896>] perf_mux_hrtimer_handler+0xf6/0x2c0
> > [<ffffffff811837a0>] ? __perf_install_in_context+0x130/0x130
> > [<ffffffff810f5818>] __hrtimer_run_queues+0xf8/0x2f0
> > [<ffffffff810f6097>] hrtimer_interrupt+0xb7/0x1d0
> > [<ffffffff810509a8>] local_apic_timer_interrupt+0x38/0x60
> > [<ffffffff8175ca9d>] smp_apic_timer_interrupt+0x3d/0x50
> > [<ffffffff8175ac7c>] apic_timer_interrupt+0x8c/0xa0
> >
> > In that case, the test:
> > if (is_active & EVENT_TIME)
> >
> > will be false and the time will not be updated. Time must always be updated on
> > sched out. This patch fixes the problem.
>
> Humm, no. It breaks things like ctx_sched_out(.event_type = EVENT_ALL),
> which will set ctx->is_active = 0, and then not update time.
>
> > +++ b/kernel/events/core.c
> > @@ -2447,7 +2447,7 @@ static void ctx_sched_out(struct perf_event_context *ctx,
> >
> > is_active ^= ctx->is_active; /* changed bits */
> >
> > - if (is_active & EVENT_TIME) {
> > + if (ctx->is_active & EVENT_TIME) {
> > /* update (and stop) ctx time */
> > update_context_time(ctx);
> > update_cgrp_time_from_cpuctx(cpuctx);
>
> I think you want something like this.
>
Works for me. Thanks.
Tested-by: Stephane Eranian <[email protected]>

> ---
> kernel/events/core.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 614614821f00..10ee22b8d2a8 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2402,14 +2402,24 @@ static void ctx_sched_out(struct perf_event_context *ctx,
> cpuctx->task_ctx = NULL;
> }
>
> - is_active ^= ctx->is_active; /* changed bits */
> -
> + /*
> + * Always update time if it was set; not only when it changes.
> + * Otherwise we can 'forget' to update time for any but the last
> + * context we sched out. For example:
> + *
> + * ctx_sched_out(.event_type = EVENT_FLEXIBLE)
> + * ctx_sched_out(.event_type = EVENT_PINNED)
> + *
> + * would only update time for the pinned events.
> + */
> if (is_active & EVENT_TIME) {
> /* update (and stop) ctx time */
> update_context_time(ctx);
> update_cgrp_time_from_cpuctx(cpuctx);
> }
>
> + is_active ^= ctx->is_active; /* changed bits */
> +
> if (!ctx->nr_active || !(is_active & EVENT_ALL))
> return;
>

Subject: [tip:perf/core] perf/core: Fix time tracking bug with multiplexing

Commit-ID: 8fdc65391c6ad16461526a685f03262b3b01bfde
Gitweb: http://git.kernel.org/tip/8fdc65391c6ad16461526a685f03262b3b01bfde
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 29 Mar 2016 09:26:44 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 31 Mar 2016 09:54:06 +0200

perf/core: Fix time tracking bug with multiplexing

Stephane reported that commit:

3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")

introduced a regression wrt. time tracking, as easily observed by:

> This patch introduce a bug in the time tracking of events when
> multiplexing is used.
>
> The issue is easily reproducible with the following perf run:
>
> $ perf stat -a -C 0 -e branches,branches,branches,branches,branches,branches -I 1000
> 1.000730239 652,394 branches (66.41%)
> 1.000730239 597,809 branches (66.41%)
> 1.000730239 593,870 branches (66.63%)
> 1.000730239 651,440 branches (67.03%)
> 1.000730239 656,725 branches (66.96%)
> 1.000730239 <not counted> branches
>
> One branches event is shown as not having run. Yet, with
> multiplexing, all events should run especially with a 1s (-I 1000)
> interval. The delta for time_running comes out to 0. Yet, the event
> has run because the kernel is actually multiplexing the events. The
> problem is that the time tracking is the kernel and especially in
> ctx_sched_out() is wrong now.
>
> The problem is that in case that the kernel enters ctx_sched_out() with the
> following state:
> ctx->is_active=0x7 event_type=0x1
> Call Trace:
> [<ffffffff813ddd41>] dump_stack+0x63/0x82
> [<ffffffff81182bdc>] ctx_sched_out+0x2bc/0x2d0
> [<ffffffff81183896>] perf_mux_hrtimer_handler+0xf6/0x2c0
> [<ffffffff811837a0>] ? __perf_install_in_context+0x130/0x130
> [<ffffffff810f5818>] __hrtimer_run_queues+0xf8/0x2f0
> [<ffffffff810f6097>] hrtimer_interrupt+0xb7/0x1d0
> [<ffffffff810509a8>] local_apic_timer_interrupt+0x38/0x60
> [<ffffffff8175ca9d>] smp_apic_timer_interrupt+0x3d/0x50
> [<ffffffff8175ac7c>] apic_timer_interrupt+0x8c/0xa0
>
> In that case, the test:
> if (is_active & EVENT_TIME)
>
> will be false and the time will not be updated. Time must always be updated on
> sched out.

Fix this by always updating time if EVENT_TIME was set, as opposed to
only updating time when EVENT_TIME changed.

Reported-by: Stephane Eranian <[email protected]>
Tested-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/core.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index de24fbc..8c11388 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2417,14 +2417,24 @@ static void ctx_sched_out(struct perf_event_context *ctx,
cpuctx->task_ctx = NULL;
}

- is_active ^= ctx->is_active; /* changed bits */
-
+ /*
+ * Always update time if it was set; not only when it changes.
+ * Otherwise we can 'forget' to update time for any but the last
+ * context we sched out. For example:
+ *
+ * ctx_sched_out(.event_type = EVENT_FLEXIBLE)
+ * ctx_sched_out(.event_type = EVENT_PINNED)
+ *
+ * would only update time for the pinned events.
+ */
if (is_active & EVENT_TIME) {
/* update (and stop) ctx time */
update_context_time(ctx);
update_cgrp_time_from_cpuctx(cpuctx);
}

+ is_active ^= ctx->is_active; /* changed bits */
+
if (!ctx->nr_active || !(is_active & EVENT_ALL))
return;