Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751973AbdHCOAb (ORCPT ); Thu, 3 Aug 2017 10:00:31 -0400 Received: from merlin.infradead.org ([205.233.59.134]:38232 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751214AbdHCOA3 (ORCPT ); Thu, 3 Aug 2017 10:00:29 -0400 Date: Thu, 3 Aug 2017 16:00:16 +0200 From: Peter Zijlstra To: Alexey Budankov Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , Mark Rutland , Stephane Eranian , David Carrillo-Cisneros , linux-kernel Subject: Re: [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt Message-ID: <20170803140016.otzlyyszgpznksto@hirez.programming.kicks-ass.net> References: <96c7776f-1f17-a39e-23e9-658596216d6b@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4143 Lines: 114 On Wed, Aug 02, 2017 at 11:15:39AM +0300, Alexey Budankov wrote: > +struct perf_event_tstamp { > + /* > + * These are timestamps used for computing total_time_enabled > + * and total_time_running when the event is in INACTIVE or > + * ACTIVE state, measured in nanoseconds from an arbitrary point > + * in time. > + * enabled: the notional time when the event was enabled > + * running: the notional time when the event was scheduled on > + * stopped: in INACTIVE state, the notional time when the > + * event was scheduled off. > + */ > + u64 enabled; > + u64 running; > + u64 stopped; > +}; So I have the below (untested) patch, also see: https://lkml.kernel.org/r/20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net And I don't think I fully agree with your description of running. Despite its name tstamp_running is not in fact a time stamp afaict. Its more like an accumulator of running, but with an offset of stopped. I'm always completely confused by the way this timekeeping is done. --- Subject: perf: Fix time on IOC_ENABLE From: Peter Zijlstra Date: Thu Aug 3 15:42:09 CEST 2017 Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task is SIGSTOP'ed state the timestamps go wobbly. It turns out we indeed fail to correctly account time while in 'OFF' state and doing IOC_ENABLE without getting scheduled in exposes the problem. Further thinking about this problem, it occurred to me that we can suffer a similar fate when we migrate an uncore event between CPUs. The perf_event_install() on the 'new' CPU will do add_event_to_ctx() which will reset all the time stamp, resulting in a subsequent update_event_times() to overwrite the total_time_* fields with smaller values. Reported-by: Vince Weaver Signed-off-by: Peter Zijlstra (Intel) --- kernel/events/core.c | 36 +++++++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e return can_add_hw; } +/* + * Complement to update_event_times(). This computes the tstamp_* values to + * continue 'enabled' state from @now. And effectively discards the time + * between the prior tstamp_stopped and now (as we were in the OFF state, or + * just switched (context) time base). + * + * This further assumes '@event->state == INACTIVE' (we just came from OFF) and + * cannot have been scheduled in yet. And going into INACTIVE state means + * '@event->tstamp_stopped = @now'. + * + * Thus given the rules of update_event_times(): + * + * total_time_enabled = tstamp_stopped - tstamp_enabled + * total_time_running = tstamp_stopped - tstamp_running + * + * We can insert 'tstamp_stopped == now' and reverse them to compute new + * tstamp_* values. + */ +static void __perf_event_enable_time(struct perf_event *event, u64 now) +{ + WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE); + + event->tstamp_stopped = now; + event->tstamp_enabled = now - event->total_time_enabled; + event->tstamp_running = now - event->total_time_running; +} + static void add_event_to_ctx(struct perf_event *event, struct perf_event_context *ctx) { @@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf list_add_event(event, ctx); perf_group_attach(event); - event->tstamp_enabled = tstamp; - event->tstamp_running = tstamp; - event->tstamp_stopped = tstamp; + __perf_event_enable_time(event, tstamp); } static void ctx_sched_out(struct perf_event_context *ctx, @@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st u64 tstamp = perf_event_time(event); event->state = PERF_EVENT_STATE_INACTIVE; - event->tstamp_enabled = tstamp - event->total_time_enabled; + __perf_event_enable_time(event, tstamp); list_for_each_entry(sub, &event->sibling_list, group_entry) { + /* XXX should not be > INACTIVE if event isn't */ if (sub->state >= PERF_EVENT_STATE_INACTIVE) - sub->tstamp_enabled = tstamp - sub->total_time_enabled; + __perf_event_enable_time(sub, tstamp); } }