Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941373AbcKQRRh (ORCPT ); Thu, 17 Nov 2016 12:17:37 -0500 Received: from merlin.infradead.org ([205.233.59.134]:43484 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938936AbcKQRRe (ORCPT ); Thu, 17 Nov 2016 12:17:34 -0500 Date: Thu, 17 Nov 2016 18:17:31 +0100 From: Peter Zijlstra To: Josh Poimboeuf Cc: Vince Weaver , "linux-kernel@vger.kernel.org" , Ingo Molnar , Arnaldo Carvalho de Melo , "davej@codemonkey.org.uk" , "dvyukov@google.com" , Stephane Eranian Subject: Re: perf: fuzzer KASAN unwind_get_return_address Message-ID: <20161117171731.GV3157@twins.programming.kicks-ass.net> References: <20161115205748.xtroftp55igs55bz@treble> <20161116130337.GT3142@twins.programming.kicks-ass.net> <20161116143746.zoxdxrfqvmx35wln@treble> <20161116144943.GB3117@twins.programming.kicks-ass.net> <20161116145849.GR3157@twins.programming.kicks-ass.net> <20161117044828.vedc3whqkuki624r@treble> <20161117090446.GC3142@twins.programming.kicks-ass.net> <20161117151848.7sdss3g4waanxfsk@treble> <20161117160700.GF3117@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161117160700.GF3117@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4221 Lines: 111 On Thu, Nov 17, 2016 at 05:07:00PM +0100, Peter Zijlstra wrote: > On Thu, Nov 17, 2016 at 09:18:48AM -0600, Josh Poimboeuf wrote: > > Thanks, I just waded through this and it turned up some good clues. And > > according to 'git blame', you might be able to help :-) > > > > It's not stack corruption. Instead it looks like > > __intel_pmu_pebs_event() is creating a bad or stale pt_regs which gets > > passed to the unwinder. Specifically, regs->bp points to a seemingly > > random address on the NMI stack. Which seems odd, considering the code > > itself is running on the same NMI stack. > > > > I don't know much about the PEBS code but it seems like it's passing > > some stale data. Either that or there's some NMI nesting going on. So the puzzle was BP,SP pointing into the NMI stack at random spots. But I think I can explain this; if the event has a very _very_ short period, then the tail __intel_pmu_enable_all() call from the PMI handler will 'insta' trigger a record and raise another PMI. We then get back-to-back NMIs with a record pointing to a now overwritten stack. The other scenario, where there is an (i)ret between the record and the interrupt would be less confusing but still wrong. Solve this by always using iregs->{bp,sp} for callchains. The below patch still copies the record BP,SP when !CALLCHAINS && SAMPLE_REGS; does this make sense? The fuzzer is still running with this patch applied.. I'll let it run for a while. --- arch/x86/events/intel/ds.c | 35 +++++++++++++++++++++++------------ arch/x86/events/perf_event.h | 2 +- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 0319311dbdbb..be202390bbd3 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1108,20 +1108,20 @@ static void setup_pebs_sample_data(struct perf_event *event, } /* - * We use the interrupt regs as a base because the PEBS record - * does not contain a full regs set, specifically it seems to - * lack segment descriptors, which get used by things like - * user_mode(). + * We use the interrupt regs as a base because the PEBS record does not + * contain a full regs set, specifically it seems to lack segment + * descriptors, which get used by things like user_mode(). * - * In the simple case fix up only the IP and BP,SP regs, for - * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly. - * A possible PERF_SAMPLE_REGS will have to transfer all regs. + * In the simple case fix up only the IP for PERF_SAMPLE_IP. + * + * We must however always use BP,SP from iregs for the unwinder to stay + * sane; the record BP,SP can point into thin air when the record is + * from a previous PMI context or an (I)RET happend between the record + * and PMI. */ *regs = *iregs; regs->flags = pebs->flags; set_linear_ip(regs, pebs->ip); - regs->bp = pebs->bp; - regs->sp = pebs->sp; if (sample_type & PERF_SAMPLE_REGS_INTR) { regs->ax = pebs->ax; @@ -1130,10 +1130,21 @@ static void setup_pebs_sample_data(struct perf_event *event, regs->dx = pebs->dx; regs->si = pebs->si; regs->di = pebs->di; - regs->bp = pebs->bp; - regs->sp = pebs->sp; - regs->flags = pebs->flags; + /* + * Per the above; only set BP,SP if we don't need callchains. + * + * XXX: does this make sense? + */ + if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) { + regs->bp = pebs->bp; + regs->sp = pebs->sp; + } + + /* + * Preserve PERF_EFLAGS_VM from set_linear_ip(). + */ + regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM); #ifndef CONFIG_X86_32 regs->r8 = pebs->r8; regs->r9 = pebs->r9; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 5874d8de1f8d..a77ee026643d 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -113,7 +113,7 @@ struct debug_store { * Per register state. */ struct er_account { - raw_spinlock_t lock; /* per-core: protect structure */ + raw_spinlock_t lock; /* per-core: protect structure */ u64 config; /* extra MSR config */ u64 reg; /* extra MSR number */ atomic_t ref; /* reference count */