Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754176AbcKQEsb (ORCPT ); Wed, 16 Nov 2016 23:48:31 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56000 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753265AbcKQEsa (ORCPT ); Wed, 16 Nov 2016 23:48:30 -0500 Date: Wed, 16 Nov 2016 22:48:28 -0600 From: Josh Poimboeuf To: Peter Zijlstra Cc: Vince Weaver , "linux-kernel@vger.kernel.org" , Ingo Molnar , Arnaldo Carvalho de Melo , "davej@codemonkey.org.uk" , "dvyukov@google.com" , Stephane Eranian Subject: Re: perf: fuzzer KASAN unwind_get_return_address Message-ID: <20161117044828.vedc3whqkuki624r@treble> References: <20161115185756.GL3142@twins.programming.kicks-ass.net> <20161115205748.xtroftp55igs55bz@treble> <20161116130337.GT3142@twins.programming.kicks-ass.net> <20161116143746.zoxdxrfqvmx35wln@treble> <20161116144943.GB3117@twins.programming.kicks-ass.net> <20161116145849.GR3157@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161116145849.GR3157@twins.programming.kicks-ass.net> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 17 Nov 2016 04:48:29 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2339 Lines: 68 On Wed, Nov 16, 2016 at 03:58:49PM +0100, Peter Zijlstra wrote: > 3BUG: KASAN: stack-out-of-bounds in unwind_get_return_address+0x1fb/0x220 at addr ffff88042f88bba0 So I dug through the disassembly (thanks for the vmlinux), and I'm pretty sure the stack-out-of-bounds address is on the NMI stack, in the kasan redzone in the stack frame of intel_pmu_handle_irq(). What's weird though is that perf_callchain_kernel() passes the pt_regs from the IRQ, not from the NMI. The unwinder should have started from the IRQ stack. But somehow it ended up unwinding to the middle of the NMI stack. So it seems like stack corruption in the IRQ or task stack, with a frame pointer that points back to the middle of the NMI stack for some reason. But then again, the kasan error report dumped the stack fine. So that would seem to rule out stack corruption... So I have no idea what's going on. I got perf_fuzzer running and tried to recreate, but no luck. Peter or Vince, can you try to recreate with this patch? It dumps the raw stack contents during a stack dump. Hopefully that would give a clue about what's going wrong. diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 499aa6f..67ff3ac 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -48,6 +48,30 @@ static void printk_stack_address(unsigned long address, int reliable, printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); } +static void raw_stack_dump(struct stack_info *info) +{ + unsigned long *s, word[4]; + int skip = 0; + + for (s = info->begin; s < info->end; s += 4) { + word[0] = READ_ONCE_NOCHECK(s[0]); + word[1] = READ_ONCE_NOCHECK(s[1]); + word[2] = READ_ONCE_NOCHECK(s[2]); + word[3] = READ_ONCE_NOCHECK(s[3]); + + if (!word[0] && !word[1] && !word[2] && !word[3]) { + if (!skip) + printk("%p: %016x ...\n", s, 0); + skip = 1; + continue; + } + + skip = 0; + printk("%p: %016lx %016lx %016lx %016lx\n", + s, word[0], word[1], word[2], word[3]); + } +} + void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unsigned long *stack, char *log_lvl) { @@ -156,6 +180,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, if (str_end) printk("%s <%s>\n", log_lvl, str_end); + + raw_stack_dump(&stack_info); } }