Date: Fri, 4 Aug 2017 13:22:53 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Stephane Eranian <eranian@gmail.com>
Subject: Re: [patch] perf wrong enabled time after attach/enable/enable
Message-ID: <20170804112253.vk4rg334732f4s7x@hirez.programming.kicks-ass.net>
References: <alpine.DEB.2.20.1707252136470.2052@macbook-air>
 <20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net>
 <alpine.DEB.2.20.1708031404250.26654@macbook-air>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.20.1708031404250.26654@macbook-air>
User-Agent: NeoMutt/20170609 (1.8.3)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5254
Lines: 136

On Thu, Aug 03, 2017 at 02:07:53PM -0400, Vince Weaver wrote:
> On Wed, 2 Aug 2017, Peter Zijlstra wrote:
> 
> > Playing with that test it really is the IOC_DISABLE while STOP'ed that
> > messes things up.
> > 
> > Ah.. magic.. the below seems to fix things, hopefully it doesn't break
> > anything else.
> 
> yes, I've tested this and it seems to fix things.

New version below.. is a bit different and should handle another case.
I'll try and run more tests on it, because while it _SHOULD_ not trigger
that WARN_ON_ONCE() you know how that goes..

> With both this and the other patch applied PAPI can finally pass all of 
> its validation tests when running with RDPMC support enabled.

I get:

  rdpmc# (for i in *.c ; do ./${i%.c}; done) | grep "^Testing"
  Testing if rdpmc attach works...                           PASSED
  Testing if minimized rdpmc papi-multi-attach works...      PASSED
  Testing if we can rdpmc in execed process...               PASSED
  Testing if we can rdpmc in execed process (PAPI) ...       PASSED
  Testing if rdpmc works with event groups...                PASSED
  -bash: ./rdpmc_lib: No such file or directory
  Testing if rdpmc multi-attach works...                     PASSED
  Testing if rdpmc papi-multi-attach works...                PASSED
  Testing if userspace rdpmc multiplexing works...           PASSED
  Testing if rdpmc with pthreads works...                    PASSED
  Testing if rdpmc with pthreads works...                    PASSED
  Testing if resetting while using rdpmc works...            PASSED
  Testing if userspace rdpmc reads are supported...          NEW BEHAVIOR
  Testing if rdpmc fallback works on sw events...            PASSED
  Testing if userspace rdpmc reads give expected results...  PASSED

is that 'NEW BEHAVIOR' thing something I should worry about?

Full output for that test is:

  rdpmc# ./rdpmc_support
  This test checks if userspace rdpmc() style reads work.

  total start/read/stop latency: 52668 cycles
	  Event 0 -- count: 63 running: 0
  Testing if userspace rdpmc reads are supported...          NEW BEHAVIOR

Unpatched kernel also does that.

---
Subject: perf: Fix time on IOC_ENABLE
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Aug 3 15:42:09 CEST 2017

Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task
is SIGSTOP'ed state the timestamps go wobbly.

It turns out we indeed fail to correctly account time while in 'OFF'
state and doing IOC_ENABLE without getting scheduled in exposes the
problem.

Further thinking about this problem, it occurred to me that we can
suffer a similar fate when we migrate an uncore event between CPUs.
The perf_event_install() on the 'new' CPU will do add_event_to_ctx()
which will reset all the time stamp, resulting in a subsequent
update_event_times() to overwrite the total_time_* fields with smaller
values.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e
 	return can_add_hw;
 }
 
+/*
+ * Complement to update_event_times(). This computes the tstamp_* values to
+ * continue 'enabled' state from @now. And effectively discards the time
+ * between the prior tstamp_stopped and now (as we were in the OFF state, or
+ * just switched (context) time base).
+ *
+ * This further assumes '@event->state == INACTIVE' (we just came from OFF) and
+ * cannot have been scheduled in yet. And going into INACTIVE state means
+ * '@event->tstamp_stopped = @now'.
+ *
+ * Thus given the rules of update_event_times():
+ *
+ *   total_time_enabled = tstamp_stopped - tstamp_enabled
+ *   total_time_running = tstamp_stopped - tstamp_running
+ *
+ * We can insert 'tstamp_stopped == now' and reverse them to compute new
+ * tstamp_* values.
+ */
+static void __perf_event_enable_time(struct perf_event *event, u64 now)
+{
+	WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE);
+
+	event->tstamp_stopped = now;
+	event->tstamp_enabled = now - event->total_time_enabled;
+	event->tstamp_running = now - event->total_time_running;
+}
+
 static void add_event_to_ctx(struct perf_event *event,
 			       struct perf_event_context *ctx)
 {
@@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf
 
 	list_add_event(event, ctx);
 	perf_group_attach(event);
-	event->tstamp_enabled = tstamp;
-	event->tstamp_running = tstamp;
-	event->tstamp_stopped = tstamp;
+	__perf_event_enable_time(event, tstamp);
 }
 
 static void ctx_sched_out(struct perf_event_context *ctx,
@@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st
 	u64 tstamp = perf_event_time(event);
 
 	event->state = PERF_EVENT_STATE_INACTIVE;
-	event->tstamp_enabled = tstamp - event->total_time_enabled;
+	__perf_event_enable_time(event, tstamp);
 	list_for_each_entry(sub, &event->sibling_list, group_entry) {
+		/* XXX should not be > INACTIVE if event isn't */
 		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
-			sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+			__perf_event_enable_time(sub, tstamp);
 	}
 }