Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752481AbdF3RoV (ORCPT ); Fri, 30 Jun 2017 13:44:21 -0400 Received: from mail-yb0-f195.google.com ([209.85.213.195]:33985 "EHLO mail-yb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751657AbdF3RoT (ORCPT ); Fri, 30 Jun 2017 13:44:19 -0400 MIME-Version: 1.0 In-Reply-To: References: <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> <20170628101248.GB5981@leverpostej> <20170628105600.GC5981@leverpostej> <20170628174900.GG8252@leverpostej> From: Kyle Huey Date: Fri, 30 Jun 2017 10:44:17 -0700 Message-ID: Subject: Re: [PATCH] perf/core: generate overflow signal when samples are dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region) To: Mark Rutland Cc: Vince Weaver , "Jin, Yao" , Ingo Molnar , "Peter Zijlstra (Intel)" , stable@vger.kernel.org, Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Namhyung Kim , Stephane Eranian , Thomas Gleixner , acme@kernel.org, jolsa@kernel.org, kan.liang@intel.com, Will Deacon , gregkh@linuxfoundation.org, "Robert O'Callahan" , open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2434 Lines: 61 On Wed, Jun 28, 2017 at 3:55 PM, Kyle Huey wrote: > On Wed, Jun 28, 2017 at 10:49 AM, Mark Rutland wrote: >> On Wed, Jun 28, 2017 at 09:48:27AM -0700, Kyle Huey wrote: >>> On Wed, Jun 28, 2017 at 3:56 AM, Mark Rutland wrote: >>> > @@ -6101,6 +6116,12 @@ void perf_prepare_sample(struct perf_event_header *header, >>> > struct perf_output_handle handle; >>> > struct perf_event_header header; >>> > >>> > + /* >>> > + * For security, drop the skid kernel samples if necessary. >>> > + */ >>> > + if (!sample_is_allowed(event, regs)) >>> > + return ret; >>> >>> Just a bare return here. >> >> Ugh, yes. Sorry about that. I'll fix that up. >> >> [...] >> >>> I can confirm that with that fixed to compile, this patch fixes rr. >> >> Thanks for giving this a go. >> >> Having thought about this some more, I think Vince does make a good >> point that throwing away samples is liable to break stuff, e.g. that >> which only relies on (non-sensitive) samples. >> >> It still seems wrong to make up data, though. >> >> Maybe for exclude_kernel && !exclude_user events we can always generate >> samples from the user regs, rather than the exception regs. That's going >> to be closer to what the user wants, regardless. I'll take a look >> tomorrow. > > I'm not very familiar with the kernel internals, but the reason I > didn't suggest this originally is it seems like it will be difficult > to determine what the "correct" userspace registers are. For example, > what happens if a performance counter is fixed to a given tid, the > interrupt fires during a context switch from that task to another that > is not being monitored, and the kernel is far enough along in the > context switch that the current task struct has been switched out? > Reporting the new task's registers seems as bad as reporting the > kernel's registers. But maybe this is easier than I imagine for > whatever reason. > > Something to think about. > > - Kyle We've noticed that the regressing changeset made it into stable branches that distros are shipping now[0], and we're starting to receive bug reports from our users. We would really like to get this patch or something similar into 4.12 and the next stable releases if at all possible. Thanks! - Kyle [0] Full list in https://mail.mozilla.org/pipermail/rr-dev/2017-June/000500.html