Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752098AbdHCP6r (ORCPT ); Thu, 3 Aug 2017 11:58:47 -0400 Received: from mga14.intel.com ([192.55.52.115]:45158 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751158AbdHCP6q (ORCPT ); Thu, 3 Aug 2017 11:58:46 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,316,1498546800"; d="scan'208";a="133138440" Subject: Re: [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , Mark Rutland , Stephane Eranian , David Carrillo-Cisneros , linux-kernel References: <96c7776f-1f17-a39e-23e9-658596216d6b@linux.intel.com> <20170803140016.otzlyyszgpznksto@hirez.programming.kicks-ass.net> From: Alexey Budankov Organization: Intel Corp. Message-ID: <06dc5615-a3ba-cc7b-d172-a13601ba4d4d@linux.intel.com> Date: Thu, 3 Aug 2017 18:58:41 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170803140016.otzlyyszgpznksto@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6064 Lines: 167 On 03.08.2017 17:00, Peter Zijlstra wrote: > On Wed, Aug 02, 2017 at 11:15:39AM +0300, Alexey Budankov wrote: >> +struct perf_event_tstamp { >> + /* >> + * These are timestamps used for computing total_time_enabled >> + * and total_time_running when the event is in INACTIVE or >> + * ACTIVE state, measured in nanoseconds from an arbitrary point >> + * in time. >> + * enabled: the notional time when the event was enabled >> + * running: the notional time when the event was scheduled on >> + * stopped: in INACTIVE state, the notional time when the >> + * event was scheduled off. >> + */ >> + u64 enabled; >> + u64 running; >> + u64 stopped; >> +}; > > > So I have the below (untested) patch, also see: > > https://lkml.kernel.org/r/20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net > > And I don't think I fully agree with your description of running. I copied this comment from the previous place without any change. > Despite its name tstamp_running is not in fact a time stamp afaict. Its > more like an accumulator of running, but with an offset of stopped. I see tstamp_running as something that needs to be subtracted from the timestamp e.g. when update_context_time() is called to get correct event's total timings: total_time_enabled = timestamp - enabled total_time_running = timestamp - running E.g. for the case with a single thread and a single event, running on a dual-core machine during 10 ticks and half time on each core we have: For the first core event instance: 10 = total_time_enabled = timestamp[110] - enabled[100] 5 = total_time_running = timestamp[110] - running[100 + 1 + 1 + 1 + 1 + 1] "+ 1" above for every time event instance doesn't get thru perf_event_filter(). In particular when an event instance is for a CPU different from the one that schedules the instance. So 5/10 = 0.5 - 50% of time event running on the first core. The same is for the second core. When we sum up instances times we get value for the user: 50%(first core) + 50%(second core) = 100% of event run time - no multiplexing case. Without a thread migration we would have: For the first core running thread: 10 = total_time_enabled = timestamp[110] - enabled[100] 10 = total_time_running = timestamp[110] - running[100] 10/10 = 1 - 100% For the second core: 10 = total_time_enabled = timestamp[110] - enabled[100] 0 = total_time_running = timestamp[110] - running[100 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1] 0/10 = 0 - 0% 100% + 0% == 100% of event run time >From this perspective tstamp_running field indeed accumulates some time but is more like tstamp_eligible_to_run so: total_time_running == elapsed - tstamp_eligible_to_run > > I'm always completely confused by the way this timekeeping is done. > > --- > Subject: perf: Fix time on IOC_ENABLE > From: Peter Zijlstra > Date: Thu Aug 3 15:42:09 CEST 2017 > > Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task > is SIGSTOP'ed state the timestamps go wobbly. > > It turns out we indeed fail to correctly account time while in 'OFF' > state and doing IOC_ENABLE without getting scheduled in exposes the > problem. > > Further thinking about this problem, it occurred to me that we can > suffer a similar fate when we migrate an uncore event between CPUs. > The perf_event_install() on the 'new' CPU will do add_event_to_ctx() > which will reset all the time stamp, resulting in a subsequent > update_event_times() to overwrite the total_time_* fields with smaller > values. > > Reported-by: Vince Weaver > Signed-off-by: Peter Zijlstra (Intel) > --- > kernel/events/core.c | 36 +++++++++++++++++++++++++++++++----- > 1 file changed, 31 insertions(+), 5 deletions(-) > > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e > return can_add_hw; > } > > +/* > + * Complement to update_event_times(). This computes the tstamp_* values to > + * continue 'enabled' state from @now. And effectively discards the time > + * between the prior tstamp_stopped and now (as we were in the OFF state, or > + * just switched (context) time base). > + * > + * This further assumes '@event->state == INACTIVE' (we just came from OFF) and > + * cannot have been scheduled in yet. And going into INACTIVE state means > + * '@event->tstamp_stopped = @now'. > + * > + * Thus given the rules of update_event_times(): > + * > + * total_time_enabled = tstamp_stopped - tstamp_enabled > + * total_time_running = tstamp_stopped - tstamp_running > + * > + * We can insert 'tstamp_stopped == now' and reverse them to compute new > + * tstamp_* values. > + */ > +static void __perf_event_enable_time(struct perf_event *event, u64 now) > +{ > + WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE); > + > + event->tstamp_stopped = now; > + event->tstamp_enabled = now - event->total_time_enabled; > + event->tstamp_running = now - event->total_time_running; > +} > + > static void add_event_to_ctx(struct perf_event *event, > struct perf_event_context *ctx) > { > @@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf > > list_add_event(event, ctx); > perf_group_attach(event); > - event->tstamp_enabled = tstamp; > - event->tstamp_running = tstamp; > - event->tstamp_stopped = tstamp; > + __perf_event_enable_time(event, tstamp); > } > > static void ctx_sched_out(struct perf_event_context *ctx, > @@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st > u64 tstamp = perf_event_time(event); > > event->state = PERF_EVENT_STATE_INACTIVE; > - event->tstamp_enabled = tstamp - event->total_time_enabled; > + __perf_event_enable_time(event, tstamp); > list_for_each_entry(sub, &event->sibling_list, group_entry) { > + /* XXX should not be > INACTIVE if event isn't */ > if (sub->state >= PERF_EVENT_STATE_INACTIVE) > - sub->tstamp_enabled = tstamp - sub->total_time_enabled; > + __perf_event_enable_time(sub, tstamp); > } > } > >