Hello
I have a new AMD A8 system and perf multiplexing support is acting weird.
If I run the perf_event_tests check_multiplexing test it fails like this:
Event TotalCount RawCount Scale ScaledCount Error
0 500282143 150403975 -2.00 (c8050b5 fffffffff9c224ef) -300807950 160.13%
So the problem is that the TOTAL_TIME_RUNNING value is negative, which as
far as I know shouldn't be possible. I can't seem to track down
why this would happen, and the same kernel (4.2-rc3) running on a Haswell
machine doesn't have this problem.
I tried to replicate the problem using perf, but for some reason I can't
get perf to multiplex on any of my machines. That used to transparently
work didn't it?
here's the result on the haswell machine
perf stat -e cycles,instructions,cycles,instructions,cycles,instructions,cycles,instructions
sleep 5
Performance counter stats for 'sleep 5':
1,354,527 cycles
1,934,694 instructions # 1.76 insns per cycle
1,521,148 cycles
1,934,694 instructions # 1.76 insns per cycle
1,521,148 cycles
1,223,321 instructions # 1.11 insns per cycle [13.61%]
<not counted> cycles
<not counted> instructions
5.003651781 seconds time elapsed
Vince
On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> Hello
>
> I have a new AMD A8 system and perf multiplexing support is acting weird.
>
> If I run the perf_event_tests check_multiplexing test it fails like this:
>
> Event TotalCount RawCount Scale ScaledCount Error
> 0 500282143 150403975 -2.00 (c8050b5 fffffffff9c224ef) -300807950 160.13%
>
> So the problem is that the TOTAL_TIME_RUNNING value is negative, which as
> far as I know shouldn't be possible.
Correct, that should not happen. I don't have a Fam16 machine, but I can
try and stare at the code a wee bit. Although there's another few perf
issues I am still looking into as well :/
On Tue, 21 Jul 2015, Peter Zijlstra wrote:
> On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> > Hello
> >
> > I have a new AMD A8 system and perf multiplexing support is acting weird.
> >
> > If I run the perf_event_tests check_multiplexing test it fails like this:
> >
> > Event TotalCount RawCount Scale ScaledCount Error
> > 0 500282143 150403975 -2.00 (c8050b5 fffffffff9c224ef) -300807950 160.13%
> >
> > So the problem is that the TOTAL_TIME_RUNNING value is negative, which as
> > far as I know shouldn't be possible.
>
> Correct, that should not happen. I don't have a Fam16 machine, but I can
> try and stare at the code a wee bit. Although there's another few perf
> issues I am still looking into as well :/
well I've traced it back to see that in
update_event_times()
run_end is less than event->tstamp_running, which is why the result
is negative.
[ 1279.857205] VMW: 1 run_end=1279846999345 tstamp_running=1279929016607
[ 1279.857222] VMW: 0 run_end=1279855184453 tstamp_running=1279931111760
It seems to happen both in the PERF_EVENT_STATE_INACTIVE and
!PERF_EVENT_STATE_INACTIVE cases.
tracking back why those values are wrong is proving to be a bit trickier.
Vince
On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> I tried to replicate the problem using perf, but for some reason I can't
> get perf to multiplex on any of my machines. That used to transparently
> work didn't it?
>
> here's the result on the haswell machine
>
> perf stat -e cycles,instructions,cycles,instructions,cycles,instructions,cycles,instructions
> sleep 5
>
> Performance counter stats for 'sleep 5':
>
> 1,354,527 cycles
> 1,934,694 instructions # 1.76 insns per cycle
> 1,521,148 cycles
> 1,934,694 instructions # 1.76 insns per cycle
> 1,521,148 cycles
> 1,223,321 instructions # 1.11 insns per cycle [13.61%]
> <not counted> cycles
> <not counted> instructions
>
> 5.003651781 seconds time elapsed
This is because sleep 5 is idle, and an task that's scheduled-out
doesn't have a rotation timer programmed.
If you use cpu-wide events, or a busy task it works:
# perf stat -ae cycles,instructions,cycles,instructions,cycles,instructions,cycles,instructions -- sleep 5
Performance counter stats for 'system wide':
15,460,117,724 cycles (62.79%)
4,276,675,617 instructions # 0.28 insns per cycle (62.77%)
15,475,455,605 cycles (62.75%)
4,284,683,187 instructions # 0.28 insns per cycle (62.74%)
15,483,305,807 cycles (62.74%)
4,286,820,811 instructions # 0.28 insns per cycle (62.63%)
15,410,655,866 cycles (62.59%)
4,257,624,307 instructions # 0.28 insns per cycle (62.60%)
5.002916366 seconds time elapsed
# perf stat -e cycles,instructions,cycles,instructions,cycles,instructions,cycles,instructions -- ./spin5.sh
Performance counter stats for './spin5.sh':
16,195,427,921 cycles (62.45%)
35,544,317,856 instructions # 2.20 insns per cycle (62.43%)
16,181,167,756 cycles (62.49%)
35,484,446,997 instructions # 2.19 insns per cycle (62.54%)
16,167,563,501 cycles (62.58%)
35,486,346,040 instructions # 2.19 insns per cycle (62.57%)
16,177,845,357 cycles (62.56%)
35,532,260,091 instructions # 2.20 insns per cycle (62.48%)
5.009328353 seconds time elapsed
On Tue, Jul 21, 2015 at 04:08:28PM -0400, Vince Weaver wrote:
> On Tue, 21 Jul 2015, Peter Zijlstra wrote:
>
> > On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> > > Hello
> > >
> > > I have a new AMD A8 system and perf multiplexing support is acting weird.
> > >
> > > If I run the perf_event_tests check_multiplexing test it fails like this:
> > >
> > > Event TotalCount RawCount Scale ScaledCount Error
> > > 0 500282143 150403975 -2.00 (c8050b5 fffffffff9c224ef) -300807950 160.13%
> > >
> > > So the problem is that the TOTAL_TIME_RUNNING value is negative, which as
> > > far as I know shouldn't be possible.
> >
> > Correct, that should not happen. I don't have a Fam16 machine, but I can
> > try and stare at the code a wee bit. Although there's another few perf
> > issues I am still looking into as well :/
>
> well I've traced it back to see that in
> update_event_times()
> run_end is less than event->tstamp_running, which is why the result
> is negative.
>
> [ 1279.857205] VMW: 1 run_end=1279846999345 tstamp_running=1279929016607
> [ 1279.857222] VMW: 0 run_end=1279855184453 tstamp_running=1279931111760
>
> It seems to happen both in the PERF_EVENT_STATE_INACTIVE and
> !PERF_EVENT_STATE_INACTIVE cases.
>
> tracking back why those values are wrong is proving to be a bit trickier.
Can you run with something like the below?
---
kernel/events/core.c | 79 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 58 insertions(+), 21 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b99..cd96bb9b1c72 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1115,15 +1115,26 @@ static void perf_unpin_context(struct perf_event_context *ctx)
raw_spin_unlock_irqrestore(&ctx->lock, flags);
}
+static void start_context_time(struct perf_event_context *ctx)
+{
+ u64 now = perf_clock();
+
+ ctx->timestamp = now;
+ trace_printk("ctx: %p now: %Lu\n", ctx, now);
+}
+
/*
* Update the record of the current time in a context.
*/
static void update_context_time(struct perf_event_context *ctx)
{
u64 now = perf_clock();
+ u64 delta = now - ctx->timestamp;
- ctx->time += now - ctx->timestamp;
+ ctx->time += delta;
ctx->timestamp = now;
+ trace_printk("ctx: %p now: %Lu time: %Lu delta: %Lu\n",
+ ctx, now, ctx->time, delta);
}
static u64 perf_event_time(struct perf_event *event)
@@ -1167,6 +1178,10 @@ static void update_event_times(struct perf_event *event)
event->total_time_enabled = run_end - event->tstamp_enabled;
+ trace_printk("event: %p time: %Lu stopped: %Lu end: %Lu enabled: %Lu\n",
+ event, ctx->time, event->tstamp_stopped,
+ run_end, event->tstamp_enabled);
+
if (event->state == PERF_EVENT_STATE_INACTIVE)
run_end = event->tstamp_stopped;
else
@@ -1174,6 +1189,9 @@ static void update_event_times(struct perf_event *event)
event->total_time_running = run_end - event->tstamp_running;
+ trace_printk("event: %p time: %Lu stopped: %Lu end: %Lu running: %Lu\n",
+ event, ctx->time, event->tstamp_stopped,
+ run_end, event->tstamp_running);
}
/*
@@ -1526,21 +1544,25 @@ event_sched_out(struct perf_event *event,
WARN_ON_ONCE(event->ctx != ctx);
lockdep_assert_held(&ctx->lock);
- /*
- * An event which could not be activated because of
- * filter mismatch still needs to have its timings
- * maintained, otherwise bogus information is return
- * via read() for time_enabled, time_running:
- */
- if (event->state == PERF_EVENT_STATE_INACTIVE
- && !event_filter_match(event)) {
- delta = tstamp - event->tstamp_stopped;
- event->tstamp_running += delta;
- event->tstamp_stopped = tstamp;
- }
+ if (event->state != PERF_EVENT_STATE_ACTIVE) {
+ /*
+ * An event which could not be activated because of
+ * filter mismatch still needs to have its timings
+ * maintained, otherwise bogus information is return
+ * via read() for time_enabled, time_running:
+ */
+ if (event->state == PERF_EVENT_STATE_INACTIVE &&
+ !event_filter_match(event)) {
+ delta = tstamp - event->tstamp_stopped;
+ event->tstamp_running += delta;
+ event->tstamp_stopped = tstamp;
+
+ trace_printk("event: %p filter-update: time: %Lu running: %Lu delta: %Lu\n",
+ event, tstamp, event->tstamp_running, delta);
+ }
- if (event->state != PERF_EVENT_STATE_ACTIVE)
return;
+ }
perf_pmu_disable(event->pmu);
@@ -1550,6 +1572,9 @@ event_sched_out(struct perf_event *event,
event->state = PERF_EVENT_STATE_OFF;
}
event->tstamp_stopped = tstamp;
+
+ trace_printk("event: %p time: %Lu\n", event, tstamp);
+
event->pmu->del(event, 0);
event->oncpu = -1;
@@ -1870,6 +1895,9 @@ event_sched_in(struct perf_event *event,
event->tstamp_running += tstamp - event->tstamp_stopped;
+ trace_printk("event: %p time: %Lu stopped: %Lu running: %Lu\n",
+ event, tstamp, event->tstamp_stopped, event->tstamp_running);
+
perf_set_shadow_time(event, ctx, tstamp);
perf_log_itrace_start(event);
@@ -1907,7 +1935,7 @@ group_sched_in(struct perf_event *group_event,
{
struct perf_event *event, *partial_group = NULL;
struct pmu *pmu = ctx->pmu;
- u64 now = ctx->time;
+ u64 tstamp = perf_event_time(group_event);
bool simulate = false;
if (group_event->state == PERF_EVENT_STATE_OFF)
@@ -1954,8 +1982,11 @@ group_sched_in(struct perf_event *group_event,
simulate = true;
if (simulate) {
- event->tstamp_running += now - event->tstamp_stopped;
- event->tstamp_stopped = now;
+ event->tstamp_running += tstamp - event->tstamp_stopped;
+ event->tstamp_stopped = tstamp;
+ trace_printk("event: %p simulate: time: %Lu running: %Lu stopped: %Lu\n",
+ event, tstamp, event->tstamp_running,
+ event->tstamp_stopped);
} else {
event_sched_out(event, cpuctx, ctx);
}
@@ -2010,6 +2041,7 @@ static void add_event_to_ctx(struct perf_event *event,
event->tstamp_enabled = tstamp;
event->tstamp_running = tstamp;
event->tstamp_stopped = tstamp;
+ trace_printk("event: %p time: %Lu\n", event, tstamp);
}
static void task_ctx_sched_out(struct perf_event_context *ctx);
@@ -2153,9 +2185,11 @@ perf_install_in_context(struct perf_event_context *ctx,
/*
* Put a event into inactive state and update time fields.
+ *
* Enabling the leader of a group effectively enables all
* the group members that aren't explicitly disabled, so we
* have to update their ->tstamp_enabled also.
+ *
* Note: this works for group members as well as group leaders
* since the non-leader members' sibling_lists will be empty.
*/
@@ -2166,9 +2200,14 @@ static void __perf_event_mark_enabled(struct perf_event *event)
event->state = PERF_EVENT_STATE_INACTIVE;
event->tstamp_enabled = tstamp - event->total_time_enabled;
+ trace_printk("event: %p time: %Lu enabled: %Lu\n",
+ event, tstamp, event->tstamp_enabled);
list_for_each_entry(sub, &event->sibling_list, group_entry) {
- if (sub->state >= PERF_EVENT_STATE_INACTIVE)
+ if (sub->state >= PERF_EVENT_STATE_INACTIVE) {
sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ trace_printk("event: %p time: %Lu enabled: %Lu\n",
+ sub, tstamp, sub->tstamp_enabled);
+ }
}
}
@@ -2741,15 +2780,13 @@ ctx_sched_in(struct perf_event_context *ctx,
enum event_type_t event_type,
struct task_struct *task)
{
- u64 now;
int is_active = ctx->is_active;
ctx->is_active |= event_type;
if (likely(!ctx->nr_events))
return;
- now = perf_clock();
- ctx->timestamp = now;
+ start_context_time(ctx);
perf_cgroup_set_timestamp(task, ctx);
/*
* First go through the list and put on any pinned groups
On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> If I run the perf_event_tests check_multiplexing test it fails like this:
FWIW, that contains:
/* we have to close, because PERF_EVENT_IOC_RESET */
/* does not clear multiplexing informatio. */
Do you need that? We could add another reset ioctl with a flags argument
on what all to clear.
On Wed, 22 Jul 2015, Peter Zijlstra wrote:
> On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> > If I run the perf_event_tests check_multiplexing test it fails like this:
>
> FWIW, that contains:
>
> /* we have to close, because PERF_EVENT_IOC_RESET */
> /* does not clear multiplexing informatio. */
>
> Do you need that? We could add another reset ioctl with a flags argument
> on what all to clear.
I remember being surprised by that behavior at the time, but I don't think
I've run into the issue much. PAPI (which has pretty awful multiplexing
support) tends to close and reopen events anytime you change the config,
rather than trying to RESET them.
I'll be rebooting the A8 system with your patch shortly.
$%C&^*# systemd insits on delaying reboots by 90 seconds for obscure
reasons which is slowing down my testing regime a bit.
Vince
On Wed, 22 Jul 2015, Peter Zijlstra wrote:
> On Tue, Jul 21, 2015 at 04:08:28PM -0400, Vince Weaver wrote:
> > On Tue, 21 Jul 2015, Peter Zijlstra wrote:
> >
> > > On Tue, Jul 21, 2015 at 02:14:54PM -0400, Vince Weaver wrote:
> > > > Hello
> > > >
> > > > I have a new AMD A8 system and perf multiplexing support is acting weird.
> > > >
> > > > If I run the perf_event_tests check_multiplexing test it fails like this:
> > > >
> > > > Event TotalCount RawCount Scale ScaledCount Error
> > > > 0 500282143 150403975 -2.00 (c8050b5 fffffffff9c224ef) -300807950 160.13%
> > > >
> > > > So the problem is that the TOTAL_TIME_RUNNING value is negative, which as
> > > > far as I know shouldn't be possible.
> > >
> > > Correct, that should not happen. I don't have a Fam16 machine, but I can
> > > try and stare at the code a wee bit. Although there's another few perf
> > > issues I am still looking into as well :/
> >
> > well I've traced it back to see that in
> > update_event_times()
> > run_end is less than event->tstamp_running, which is why the result
> > is negative.
> >
> > [ 1279.857205] VMW: 1 run_end=1279846999345 tstamp_running=1279929016607
> > [ 1279.857222] VMW: 0 run_end=1279855184453 tstamp_running=1279931111760
> >
> > It seems to happen both in the PERF_EVENT_STATE_INACTIVE and
> > !PERF_EVENT_STATE_INACTIVE cases.
> >
> > tracking back why those values are wrong is proving to be a bit trickier.
>
> Can you run with something like the below?
OK, so I have to admit I'm a bit weak on ftrace. What type of command
line are you interested in seeing? The ones I've tried with trace-cmd
have activated your trace_printks but there's 500MB of other data mixed
in so maybe I am doing something wrong.
Vince
On Wed, Jul 22, 2015 at 10:57:23AM -0400, Vince Weaver wrote:
> On Wed, 22 Jul 2015, Peter Zijlstra wrote:
> > Can you run with something like the below?
>
> OK, so I have to admit I'm a bit weak on ftrace. What type of command
> line are you interested in seeing? The ones I've tried with trace-cmd
> have activated your trace_printks but there's 500MB of other data mixed
> in so maybe I am doing something wrong.
something along the lines of:
echo function > /debug/tracing/current_tracer
echo nop > /debug/tracing/current_tracer
echo 0 > /debug/tracing/trace
./check_multiplexing
echo 0 > /debug/tracing/tracing_on
cat /debug/tracing/trace > ~/trace.log
I've no idea how trace-cmd works, I don't even think I have it :-)
On Wed, 22 Jul 2015, Vince Weaver wrote:
>
> OK, so I have to admit I'm a bit weak on ftrace. What type of command
> line are you interested in seeing? The ones I've tried with trace-cmd
> have activated your trace_printks but there's 500MB of other data mixed
> in so maybe I am doing something wrong.
maybe the "nop" plugin was what I wanted. I've posted the trace with that
here:
http://web.eece.maine.edu/~vweaver/junk/jaguar.mpx.out.gz
Vince
On Wed, Jul 22, 2015 at 11:02:21AM -0400, Vince Weaver wrote:
> On Wed, 22 Jul 2015, Vince Weaver wrote:
> >
> > OK, so I have to admit I'm a bit weak on ftrace. What type of command
> > line are you interested in seeing? The ones I've tried with trace-cmd
> > have activated your trace_printks but there's 500MB of other data mixed
> > in so maybe I am doing something wrong.
>
> maybe the "nop" plugin was what I wanted. I've posted the trace with that
> here:
>
> http://web.eece.maine.edu/~vweaver/junk/jaguar.mpx.out.gz
OK, that looks about right, lemme to stare at that.
On Wed, Jul 22, 2015 at 05:06:24PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 22, 2015 at 11:02:21AM -0400, Vince Weaver wrote:
> > On Wed, 22 Jul 2015, Vince Weaver wrote:
> > >
> > > OK, so I have to admit I'm a bit weak on ftrace. What type of command
> > > line are you interested in seeing? The ones I've tried with trace-cmd
> > > have activated your trace_printks but there's 500MB of other data mixed
> > > in so maybe I am doing something wrong.
> >
> > maybe the "nop" plugin was what I wanted. I've posted the trace with that
> > here:
> >
> > http://web.eece.maine.edu/~vweaver/junk/jaguar.mpx.out.gz
>
> OK, that looks about right, lemme to stare at that.
Does this help?
---
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b99..10d076b2572c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1868,8 +1868,6 @@ event_sched_in(struct perf_event *event,
perf_pmu_disable(event->pmu);
- event->tstamp_running += tstamp - event->tstamp_stopped;
-
perf_set_shadow_time(event, ctx, tstamp);
perf_log_itrace_start(event);
@@ -1881,6 +1879,8 @@ event_sched_in(struct perf_event *event,
goto out;
}
+ event->tstamp_running += tstamp - event->tstamp_stopped;
+
if (!is_software_event(event))
cpuctx->active_oncpu++;
if (!ctx->nr_active++)
On Wed, 22 Jul 2015, Peter Zijlstra wrote:
> On Wed, Jul 22, 2015 at 05:06:24PM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 22, 2015 at 11:02:21AM -0400, Vince Weaver wrote:
> > > On Wed, 22 Jul 2015, Vince Weaver wrote:
> > > >
> > > > OK, so I have to admit I'm a bit weak on ftrace. What type of command
> > > > line are you interested in seeing? The ones I've tried with trace-cmd
> > > > have activated your trace_printks but there's 500MB of other data mixed
> > > > in so maybe I am doing something wrong.
> > >
> > > maybe the "nop" plugin was what I wanted. I've posted the trace with that
> > > here:
> > >
> > > http://web.eece.maine.edu/~vweaver/junk/jaguar.mpx.out.gz
> >
> > OK, that looks about right, lemme to stare at that.
>
> Does this help?
yes, with this applied the check_multiplex test passes.
Vince
>
> ---
> kernel/events/core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index d3dae3419b99..10d076b2572c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1868,8 +1868,6 @@ event_sched_in(struct perf_event *event,
>
> perf_pmu_disable(event->pmu);
>
> - event->tstamp_running += tstamp - event->tstamp_stopped;
> -
> perf_set_shadow_time(event, ctx, tstamp);
>
> perf_log_itrace_start(event);
> @@ -1881,6 +1879,8 @@ event_sched_in(struct perf_event *event,
> goto out;
> }
>
> + event->tstamp_running += tstamp - event->tstamp_stopped;
> +
> if (!is_software_event(event))
> cpuctx->active_oncpu++;
> if (!ctx->nr_active++)
>
On Wed, Jul 22, 2015 at 12:56:23PM -0400, Vince Weaver wrote:
> yes, with this applied the check_multiplex test passes.
Awesome, I'll try and write up a proper patch later tonight.
Thanks!
On Wed, 22 Jul 2015, Peter Zijlstra wrote:
> On Wed, Jul 22, 2015 at 12:56:23PM -0400, Vince Weaver wrote:
> > yes, with this applied the check_multiplex test passes.
>
> Awesome, I'll try and write up a proper patch later tonight.
I'm just curious, it seems like sort of a generic fix, is there a reaon I
was only hitting the bug on AMD fam16h?
Vince
On Wed, Jul 22, 2015 at 03:59:28PM -0400, Vince Weaver wrote:
> On Wed, 22 Jul 2015, Peter Zijlstra wrote:
>
> > On Wed, Jul 22, 2015 at 12:56:23PM -0400, Vince Weaver wrote:
> > > yes, with this applied the check_multiplex test passes.
> >
> > Awesome, I'll try and write up a proper patch later tonight.
>
> I'm just curious, it seems like sort of a generic fix, is there a reaon I
> was only hitting the bug on AMD fam16h?
Not that I can think of.. maybe just consistently lucky on timings :/
>From what I can tell of the thing it _should_ fail on pretty much
everything.