Date: Tue, 29 Nov 2016 17:12:46 +0100
From: Petr Mladek <pmladek@suse.com>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Vince Weaver <vincent.weaver@maine.edu>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        "dvyukov@google.com" <dvyukov@google.com>
Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start
Message-ID: <20161129161246.GB24060@pathway.suse.cz>
References: <alpine.DEB.2.20.1611241229180.25241@macbook-air>
 <20161128215411.fkis7bbimjy4v4j7@treble>
 <20161129004021.GL3924@linux.vnet.ibm.com>
 <20161129055241.6dy2dt4q4ptazk2s@treble>
 <20161129091650.GA3092@twins.programming.kicks-ass.net>
 <20161129140734.GQ3924@linux.vnet.ibm.com>
 <20161129150917.tk5xkl7teveybaxa@treble>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161129150917.tk5xkl7teveybaxa@treble>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2283
Lines: 52

On Tue 2016-11-29 09:09:17, Josh Poimboeuf wrote:
> On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote:
> > > > > We used to do that, but the resulting NMIs were problematic on some
> > > > > platforms.  Perhaps things have gotten better?
> > > > 
> > > > Did a little digging on git blame and found the following commit (which
> > > > seems to be the cause of the KASAN warning and missing stack dump):
> > > > 
> > > >   bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks")
> > > > 
> > > > I presume this commit is still needed because of the NMI printk deadlock
> > > > issues which were discussed at Kernel Summit.  I guess those issues need
> > > > to be sorted out before the above commit can be reverted.
> > > 
> > > so printk should more or less work from NMI, esp. after:
> > > 
> > >   42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
> > 
> > And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion
> > below.  Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as
> > needing more work.  Has that happened?
> 
> Petr M, any idea?

These two architectures do not support the safe printk in NMI. But
these architectures also do not implement trigger_all_cpu_backtrace()
and other trigger_*_backtrace() functions. Therefore these functions
return false there.

In fact, only very few architectures implement trigger_*_backtrace().
And only few of them use NMI (x86, arm, tile). I have just double
checked that these all use the safe printk in NMI.

By other words, if trigger_all_cpu_backtrace() or
trigger_single_cpu_backtrace() returns true, it should be NMI safe
and you could use it here.


> > But I really like the fact that RCU CPU stall warnings dump only those
> > stacks that are likely to be involved, and the patch below goes back
> > to dumping everyone.  Shouldn't be that hard to fix, though...
> 
> There's a new trigger_single_cpu_backtrace() function which can be used
> for that.

There is newly also trigger_cpumask_backtrace(struct cpumask *mask)
where you could select more CPUs using the mask. If this is of any help.

Best Regards,
Petr