Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752986AbdHBRLC (ORCPT ); Wed, 2 Aug 2017 13:11:02 -0400 Received: from merlin.infradead.org ([205.233.59.134]:52202 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752787AbdHBRLA (ORCPT ); Wed, 2 Aug 2017 13:11:00 -0400 Date: Wed, 2 Aug 2017 19:10:51 +0200 From: Peter Zijlstra To: Vince Weaver Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Arnaldo Carvalho de Melo , Stephane Eranian Subject: Re: [patch] perf wrong enabled time after attach/enable/enable Message-ID: <20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4597 Lines: 169 On Tue, Jul 25, 2017 at 09:47:52PM -0400, Vince Weaver wrote: > The problem happens with the following: > fork a child, put it to sleep with SIGSTOP > perf_event_open(), attaching to the pid of the child > ENABLE/DISABLE/ENABLE/DISABLE/ENABLE > read the results with the enabled/running multiplexing fields > > Your "enabled" time will be a huge negative number, which will confuse any > code depening on it (especially the rdpmc scaling code). > > From what I can tell, when you DISABLE an event that's in a > PERF_EVENT_STATE_INACTIVE state, currently it exits early without > updating event->tstamp_stopped. So later when calculating the > enabled time it is wrong and you get the negative number. > The following quick hack fixes things, it just abuses the existing code > that was there to handle a similar case when a filter failed. The proper > fix might be to just put the event->tstamp_stopped update before the > early out code, but the commit log for the change that introduced that > makes it sound a bit scary. fa66f07aa1f0 ("perf_events: Fix time tracking for events with pid != -1 and cpu != -1") is the commit that introduced it (the scary one just moved the && to the right place). > Anyway a reproducible test case for this is in my pref_event_tests tree, > the "tests/rdpmc/rdpmc_attach_multi_enable" test. > > Vince > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 426c2ffba16d..18eadc3b37d5 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -1812,8 +1812,7 @@ event_sched_out(struct perf_event *event, > * maintained, otherwise bogus information is return > * via read() for time_enabled, time_running: > */ > - if (event->state == PERF_EVENT_STATE_INACTIVE && > - !event_filter_match(event)) { > + if (event->state == PERF_EVENT_STATE_INACTIVE) { > delta = tstamp - event->tstamp_stopped; > event->tstamp_running += delta; > event->tstamp_stopped = tstamp; It doesn't look entirely crazy.. just trying to understand why Stephane added that filter clause. Argh, that time accounting always hurts my brain, its so weird. I never dared rewrite it for risk of breaking it all again, but maybe, just maybe we should have.. :/ init(now): enabled = stopped = running = now; in(now): running += now - stopped; out(now): stopped = now; read: *enabled = stopped - enabled; *running = stopped - running; Simple example: init(10): e = s = r = 10 in(11): r += 11 - 10; r=11 out(12): s = 12 read: *e = 12-10 = 2 *r = 12-11 = 1 in(14): r += 14 - 12; r=13 out(16): s = 16 read: *e = 16 - 10 = 6 *r = 16 - 13 = 3 which confusingly works :-) So lets break that by out not updating stopped, then we get: init(10): e = s = r = 10 in(11): r += 11 - 10; r=11 read: *e = 10-10 = 2 *r = 10-11 = -1 in(14): r += 14 - 10; r=15 read: *e = 10 - 10 = 0 *r = 10 - 15 = -5 That is negative for 'running', not enabled. But your test really does report a negative 'enabled'. Playing with that test it really is the IOC_DISABLE while STOP'ed that messes things up. Ah.. magic.. the below seems to fix things, hopefully it doesn't break anything else. --- kernel/events/core.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 426c2ffba16d..5dbbd1f90b4f 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2458,6 +2458,20 @@ perf_install_in_context(struct perf_event_context *ctx, } /* + * reverse update_event_times() + */ +void __perf_event_enable_time(struct perf_event *event, u64 tstamp) +{ + /* + * total_time_enabled = tstamp_stopped - tstamp_enabled + * total_time_running = tstamp_stopped - tstamp_running; + */ + event->tstamp_stopped = tstamp; + event->tstamp_enabled = tstamp - event->total_time_enabled; + event->tstamp_running = tstamp - event->total_time_running; +} + +/* * Put a event into inactive state and update time fields. * Enabling the leader of a group effectively enables all * the group members that aren't explicitly disabled, so we @@ -2471,10 +2485,11 @@ static void __perf_event_mark_enabled(struct perf_event *event) u64 tstamp = perf_event_time(event); event->state = PERF_EVENT_STATE_INACTIVE; - event->tstamp_enabled = tstamp - event->total_time_enabled; + __perf_event_enable_time(event, tstamp); + list_for_each_entry(sub, &event->sibling_list, group_entry) { if (sub->state >= PERF_EVENT_STATE_INACTIVE) - sub->tstamp_enabled = tstamp - sub->total_time_enabled; + __perf_event_enable_time(sub, tstamp); } }