Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757337AbcK2OHp (ORCPT ); Tue, 29 Nov 2016 09:07:45 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:37949 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756244AbcK2OHi (ORCPT ); Tue, 29 Nov 2016 09:07:38 -0500 Date: Tue, 29 Nov 2016 06:07:34 -0800 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Josh Poimboeuf , Vince Weaver , "linux-kernel@vger.kernel.org" , Ingo Molnar , Arnaldo Carvalho de Melo , "dvyukov@google.com" Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start Reply-To: paulmck@linux.vnet.ibm.com References: <20161128215411.fkis7bbimjy4v4j7@treble> <20161129004021.GL3924@linux.vnet.ibm.com> <20161129055241.6dy2dt4q4ptazk2s@treble> <20161129091650.GA3092@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161129091650.GA3092@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16112914-0016-0000-0000-0000054AB82E X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006163; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000193; SDB=6.00786897; UDB=6.00380576; IPR=6.00564585; BA=6.00004924; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013479; XFM=3.00000011; UTC=2016-11-29 14:07:36 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16112914-0017-0000-0000-0000350C6152 Message-Id: <20161129140734.GQ3924@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-29_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611290240 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3021 Lines: 72 On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote: > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote: > > > We used to do that, but the resulting NMIs were problematic on some > > > platforms. Perhaps things have gotten better? > > > > Did a little digging on git blame and found the following commit (which > > seems to be the cause of the KASAN warning and missing stack dump): > > > > bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks") > > > > I presume this commit is still needed because of the NMI printk deadlock > > issues which were discussed at Kernel Summit. I guess those issues need > > to be sorted out before the above commit can be reverted. > > so printk should more or less work from NMI, esp. after: > > 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion below. Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as needing more work. Has that happened? But I really like the fact that RCU CPU stall warnings dump only those stacks that are likely to be involved, and the patch below goes back to dumping everyone. Shouldn't be that hard to fix, though... Thanx, Paul ------------------------------------------------------------------------ commit e7c9d76ed508fe978c6657e33f4de1b160ee4efe Author: Paul E. McKenney Date: Tue Nov 29 05:49:06 2016 -0800 rcu: Once again use NMI-based stack traces in stall warnings This commit is for all intents and purposes a revert of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose that this can now safely be reverted is the presence of 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"), which is said to have made NMI-based stack dumps safe. Not-yet-signed-off-by: Paul E. McKenney Cc: Petr Mladek Cc: Josh Poimboeuf Cc: Peter Zijlstra diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 91a68e4e6671..d73ccd4bed86 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1396,7 +1396,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp) } /* - * Dump stacks of all tasks running on stalled CPUs. + * Dump stacks of all tasks running on stalled CPUs. First try using + * NMIs, but fall back to manual remote stack tracing on architectures + * that don't support NMI-based stack dumps. The NMI-triggered stack + * traces are more accurate because they are printed by the target CPU. */ static void rcu_dump_cpu_stacks(struct rcu_state *rsp) { @@ -1404,6 +1407,8 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp) unsigned long flags; struct rcu_node *rnp; + if (trigger_all_cpu_backtrace()) + return; rcu_for_each_leaf_node(rsp, rnp) { raw_spin_lock_irqsave_rcu_node(rnp, flags); if (rnp->qsmask != 0) {