Date: Wed, 28 Jun 2017 11:56:01 +0100
From: Mark Rutland <mark.rutland@arm.com>
To: Kyle Huey <me@kylehuey.com>
Cc: "Jin, Yao" <yao.jin@linux.intel.com>, Ingo Molnar <mingo@kernel.org>,
        "Peter Zijlstra (Intel)" <peterz@infradead.org>,
        stable@vger.kernel.org,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Arnaldo Carvalho de Melo <acme@redhat.com>,
        Jiri Olsa <jolsa@redhat.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Namhyung Kim <namhyung@kernel.org>,
        Stephane Eranian <eranian@google.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Vince Weaver <vincent.weaver@maine.edu>, acme@kernel.org,
        jolsa@kernel.org, kan.liang@intel.com,
        Will Deacon <will.deacon@arm.com>, gregkh@linuxfoundation.org,
        "Robert O'Callahan" <robert@ocallahan.org>,
        open list <linux-kernel@vger.kernel.org>
Subject: [PATCH] perf/core: generate overflow signal when samples are dropped
 (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the
 kernel in the "skid" region)
Message-ID: <20170628105600.GC5981@leverpostej>
References: <CAP045Ap8cMx6mzSgcQ3n3bnh_8GJuCp7_KZe_5ZTCR_K6cPTLw@mail.gmail.com>
 <CAP045Ao2=6A7GEhBfmY0gZ2g=urL5U4R5ywLsTo4Yb6Tk1M8Cw@mail.gmail.com>
 <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com>
 <CAP045AryPCUTO92S3hmrkag3D7NqgJM-hK82a7iKm9s-rWdn1w@mail.gmail.com>
 <20170628101248.GB5981@leverpostej>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170628101248.GB5981@leverpostej>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5481
Lines: 154

On Wed, Jun 28, 2017 at 11:12:48AM +0100, Mark Rutland wrote:
> On Tue, Jun 27, 2017 at 09:51:00PM -0700, Kyle Huey wrote:

> As we're trying to avoid smapling state, I think we can move the check
> into perf_prepare_sample() or __perf_event_output(), where that state is
> actually sampled. I'll take a look at that momentarily.
> 
> Just to clarify, you don't care about the sample state at all? i.e. you
> don't need the user program counter?
> 
> Is that signal delivered to the tracee, or to a different process that
> traces it? If the latter, what ensures that the task is stopped
> sufficiently quickly?
> 
> > It seems to me that it might be reasonable to ignore the interrupt if
> > the purpose of the interrupt is to trigger sampling of the CPUs
> > register state.  But if the interrupt will trigger some other
> > operation, such as a signal on an fd, then there's no reason to drop
> > it.
> 
> Agreed. I'll try to have a patch for this soon.
> 
> I just need to figure out exactly where that overflow signal is
> generated by the perf core.

I've figured that out now. That's handled by perf_pending_event(), whcih
is the irq_work we schedule in __perf_event_overflow().

Does the below patch work for you?

---->8----
>From bb1f99dace508ce34ab0818f91d59e73450fa142 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Wed, 28 Jun 2017 11:39:25 +0100
Subject: [PATCH] perf/core: generate overflow signal when samples are dropped

We recently tried to kill an information leak where users could receive
kernel samples due to skid on the PMU interrupt. To block this, we
bailed out early in perf_event_overflow(), as we do for non-sampling
events.

This broke rr, which uses sampling events to receive a signal on
overflow (but does not care about the contents of the sample). These
signals are critical to the correct operation of rr.

Instead of bailing out early in perf_event_overflow, we can bail prior
to performing the actual sampling in __perf_event_output(). This avoids
the information leak, but preserves the generation of the signal.

Since we don't place any sample data into the ring buffer, the signal is
arguably spurious. However, a userspace ringbuffer consumer can already
consume data prior to taking the associated signals, and therefore must
handle spurious signals to operate correctly. Thus, this signal
shouldn't be harmful.

Fixes: cc1582c231ea041f ("perf/core: Drop kernel samples even though :u is specified")
Reported-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523..6b263f3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6090,6 +6090,21 @@ void perf_prepare_sample(struct perf_event_header *header,
 	}
 }
 
+static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
+{
+	/*
+	 * Due to interrupt latency (AKA "skid"), we may enter the
+	 * kernel before taking an overflow, even if the PMU is only
+	 * counting user events.
+	 * To avoid leaking information to userspace, we must always
+	 * reject kernel samples when exclude_kernel is set.
+	 */
+	if (event->attr.exclude_kernel && !user_mode(regs))
+		return false;
+
+	return true;
+}
+
 static void __always_inline
 __perf_event_output(struct perf_event *event,
 		    struct perf_sample_data *data,
@@ -6101,6 +6116,12 @@ void perf_prepare_sample(struct perf_event_header *header,
 	struct perf_output_handle handle;
 	struct perf_event_header header;
 
+	/*
+	 * For security, drop the skid kernel samples if necessary.
+	 */
+	if (!sample_is_allowed(event, regs))
+		return ret;
+
 	/* protect the callchain buffers */
 	rcu_read_lock();
 
@@ -7316,21 +7337,6 @@ int perf_event_account_interrupt(struct perf_event *event)
 	return __perf_event_account_interrupt(event, 1);
 }
 
-static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
-{
-	/*
-	 * Due to interrupt latency (AKA "skid"), we may enter the
-	 * kernel before taking an overflow, even if the PMU is only
-	 * counting user events.
-	 * To avoid leaking information to userspace, we must always
-	 * reject kernel samples when exclude_kernel is set.
-	 */
-	if (event->attr.exclude_kernel && !user_mode(regs))
-		return false;
-
-	return true;
-}
-
 /*
  * Generic event overflow handling, sampling.
  */
@@ -7352,12 +7358,6 @@ static int __perf_event_overflow(struct perf_event *event,
 	ret = __perf_event_account_interrupt(event, throttle);
 
 	/*
-	 * For security, drop the skid kernel samples if necessary.
-	 */
-	if (!sample_is_allowed(event, regs))
-		return ret;
-
-	/*
 	 * XXX event_limit might not quite work as expected on inherited
 	 * events
 	 */
@@ -7372,6 +7372,10 @@ static int __perf_event_overflow(struct perf_event *event,
 
 	READ_ONCE(event->overflow_handler)(event, data, regs);
 
+	/*
+	 * We must generate a wakeup regardless of whether we actually
+	 * generated a sample. This is relied upon by rr.
+	 */
 	if (*perf_event_fasync(event) && event->pending_kill) {
 		event->pending_wakeup = 1;
 		irq_work_queue(&event->pending);
-- 
1.9.1