Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754107AbcK2QM4 (ORCPT ); Tue, 29 Nov 2016 11:12:56 -0500 Received: from mx2.suse.de ([195.135.220.15]:50169 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751218AbcK2QMt (ORCPT ); Tue, 29 Nov 2016 11:12:49 -0500 Date: Tue, 29 Nov 2016 17:12:46 +0100 From: Petr Mladek To: Josh Poimboeuf Cc: "Paul E. McKenney" , Peter Zijlstra , Vince Weaver , "linux-kernel@vger.kernel.org" , Ingo Molnar , Arnaldo Carvalho de Melo , "dvyukov@google.com" Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start Message-ID: <20161129161246.GB24060@pathway.suse.cz> References: <20161128215411.fkis7bbimjy4v4j7@treble> <20161129004021.GL3924@linux.vnet.ibm.com> <20161129055241.6dy2dt4q4ptazk2s@treble> <20161129091650.GA3092@twins.programming.kicks-ass.net> <20161129140734.GQ3924@linux.vnet.ibm.com> <20161129150917.tk5xkl7teveybaxa@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161129150917.tk5xkl7teveybaxa@treble> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2283 Lines: 52 On Tue 2016-11-29 09:09:17, Josh Poimboeuf wrote: > On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote: > > On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote: > > > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote: > > > > > We used to do that, but the resulting NMIs were problematic on some > > > > > platforms. Perhaps things have gotten better? > > > > > > > > Did a little digging on git blame and found the following commit (which > > > > seems to be the cause of the KASAN warning and missing stack dump): > > > > > > > > bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks") > > > > > > > > I presume this commit is still needed because of the NMI printk deadlock > > > > issues which were discussed at Kernel Summit. I guess those issues need > > > > to be sorted out before the above commit can be reverted. > > > > > > so printk should more or less work from NMI, esp. after: > > > > > > 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") > > > > And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion > > below. Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as > > needing more work. Has that happened? > > Petr M, any idea? These two architectures do not support the safe printk in NMI. But these architectures also do not implement trigger_all_cpu_backtrace() and other trigger_*_backtrace() functions. Therefore these functions return false there. In fact, only very few architectures implement trigger_*_backtrace(). And only few of them use NMI (x86, arm, tile). I have just double checked that these all use the safe printk in NMI. By other words, if trigger_all_cpu_backtrace() or trigger_single_cpu_backtrace() returns true, it should be NMI safe and you could use it here. > > But I really like the fact that RCU CPU stall warnings dump only those > > stacks that are likely to be involved, and the patch below goes back > > to dumping everyone. Shouldn't be that hard to fix, though... > > There's a new trigger_single_cpu_backtrace() function which can be used > for that. There is newly also trigger_cpumask_backtrace(struct cpumask *mask) where you could select more CPUs using the mask. If this is of any help. Best Regards, Petr