Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752242AbdGDJD3 (ORCPT ); Tue, 4 Jul 2017 05:03:29 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:42155 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752202AbdGDJD0 (ORCPT ); Tue, 4 Jul 2017 05:03:26 -0400 Date: Tue, 4 Jul 2017 11:03:13 +0200 From: Peter Zijlstra To: Kyle Huey Cc: Mark Rutland , Vince Weaver , "Jin, Yao" , Ingo Molnar , stable@vger.kernel.org, Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Namhyung Kim , Stephane Eranian , Thomas Gleixner , acme@kernel.org, jolsa@kernel.org, kan.liang@intel.com, Will Deacon , gregkh@linuxfoundation.org, "Robert O'Callahan" , open list Subject: Re: [PATCH] perf/core: generate overflow signal when samples are dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region) Message-ID: <20170704090313.xyb5lntyy55ga7dm@hirez.programming.kicks-ass.net> References: <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> <20170628101248.GB5981@leverpostej> <20170628105600.GC5981@leverpostej> <20170628174900.GG8252@leverpostej> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1811 Lines: 37 On Wed, Jun 28, 2017 at 03:55:07PM -0700, Kyle Huey wrote: > > Having thought about this some more, I think Vince does make a good > > point that throwing away samples is liable to break stuff, e.g. that > > which only relies on (non-sensitive) samples. > > > > It still seems wrong to make up data, though. It is something we do in other places as well though. For example the printk() %pK thing fakes NULL pointers when kptr_restrict is set. Faking data gets a wee bit tricky in how much data we need to clear through, its not only IP, pretty much everything we get from the interrupt context, like the branch stack and registers is also suspect. > > Maybe for exclude_kernel && !exclude_user events we can always generate > > samples from the user regs, rather than the exception regs. That's going > > to be closer to what the user wants, regardless. I'll take a look > > tomorrow. > > I'm not very familiar with the kernel internals, but the reason I > didn't suggest this originally is it seems like it will be difficult > to determine what the "correct" userspace registers are. For example, > what happens if a performance counter is fixed to a given tid, the > interrupt fires during a context switch from that task to another that > is not being monitored, and the kernel is far enough along in the > context switch that the current task struct has been switched out? > Reporting the new task's registers seems as bad as reporting the > kernel's registers. But maybe this is easier than I imagine for > whatever reason. If the counter is fixed to a task then its scheduled along with the task. We'll schedule out the event before doing the actual task switch and switch in the new event after. That said, with a per-cpu event the TID sample value is indeed subject to skid like you describe.