Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933935AbcK2PJv (ORCPT ); Tue, 29 Nov 2016 10:09:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53022 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933710AbcK2PJT (ORCPT ); Tue, 29 Nov 2016 10:09:19 -0500 Date: Tue, 29 Nov 2016 09:09:17 -0600 From: Josh Poimboeuf To: "Paul E. McKenney" Cc: Peter Zijlstra , Vince Weaver , "linux-kernel@vger.kernel.org" , Ingo Molnar , Arnaldo Carvalho de Melo , "dvyukov@google.com" , pmladek@suse.com Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start Message-ID: <20161129150917.tk5xkl7teveybaxa@treble> References: <20161128215411.fkis7bbimjy4v4j7@treble> <20161129004021.GL3924@linux.vnet.ibm.com> <20161129055241.6dy2dt4q4ptazk2s@treble> <20161129091650.GA3092@twins.programming.kicks-ass.net> <20161129140734.GQ3924@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161129140734.GQ3924@linux.vnet.ibm.com> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 29 Nov 2016 15:09:19 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3318 Lines: 80 On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote: > On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote: > > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote: > > > > We used to do that, but the resulting NMIs were problematic on some > > > > platforms. Perhaps things have gotten better? > > > > > > Did a little digging on git blame and found the following commit (which > > > seems to be the cause of the KASAN warning and missing stack dump): > > > > > > bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks") > > > > > > I presume this commit is still needed because of the NMI printk deadlock > > > issues which were discussed at Kernel Summit. I guess those issues need > > > to be sorted out before the above commit can be reverted. > > > > so printk should more or less work from NMI, esp. after: > > > > 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") > > And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion > below. Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as > needing more work. Has that happened? Petr M, any idea? > But I really like the fact that RCU CPU stall warnings dump only those > stacks that are likely to be involved, and the patch below goes back > to dumping everyone. Shouldn't be that hard to fix, though... There's a new trigger_single_cpu_backtrace() function which can be used for that. > ------------------------------------------------------------------------ > > commit e7c9d76ed508fe978c6657e33f4de1b160ee4efe > Author: Paul E. McKenney > Date: Tue Nov 29 05:49:06 2016 -0800 > > rcu: Once again use NMI-based stack traces in stall warnings > > This commit is for all intents and purposes a revert of bc1dce514e9b > ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to > suppose that this can now safely be reverted is the presence of > 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"), > which is said to have made NMI-based stack dumps safe. > > Not-yet-signed-off-by: Paul E. McKenney > Cc: Petr Mladek > Cc: Josh Poimboeuf > Cc: Peter Zijlstra > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 91a68e4e6671..d73ccd4bed86 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1396,7 +1396,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp) > } > > /* > - * Dump stacks of all tasks running on stalled CPUs. > + * Dump stacks of all tasks running on stalled CPUs. First try using > + * NMIs, but fall back to manual remote stack tracing on architectures > + * that don't support NMI-based stack dumps. The NMI-triggered stack > + * traces are more accurate because they are printed by the target CPU. > */ > static void rcu_dump_cpu_stacks(struct rcu_state *rsp) > { > @@ -1404,6 +1407,8 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp) > unsigned long flags; > struct rcu_node *rnp; > > + if (trigger_all_cpu_backtrace()) > + return; > rcu_for_each_leaf_node(rsp, rnp) { > raw_spin_lock_irqsave_rcu_node(rnp, flags); > if (rnp->qsmask != 0) { > -- Josh