Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753893AbcCRXyz (ORCPT ); Fri, 18 Mar 2016 19:54:55 -0400 Received: from e19.ny.us.ibm.com ([129.33.205.209]:53328 "EHLO e19.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751482AbcCRXyr (ORCPT ); Fri, 18 Mar 2016 19:54:47 -0400 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 18 Mar 2016 16:54:45 -0700 From: "Paul E. McKenney" To: Daniel Thompson Cc: Chris Metcalf , Peter Zijlstra , Russell King , Thomas Gleixner , Aaron Tomlin , Ingo Molnar , Andrew Morton , x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Message-ID: <20160318235445.GG4287@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <1458147733-29338-1-git-send-email-cmetcalf@mellanox.com> <1458147733-29338-2-git-send-email-cmetcalf@mellanox.com> <20160317193600.GY6344@twins.programming.kicks-ass.net> <20160317225557.GA4287@linux.vnet.ibm.com> <56EB4937.1010404@mellanox.com> <20160318003322.GC4287@linux.vnet.ibm.com> <56EBCD09.2000400@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56EBCD09.2000400@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16031823-0057-0000-0000-000003CE85F2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2336 Lines: 49 On Fri, Mar 18, 2016 at 09:40:25AM +0000, Daniel Thompson wrote: > On 18/03/16 00:33, Paul E. McKenney wrote: > >On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote: > >>On 3/17/2016 6:55 PM, Paul E. McKenney wrote: > >>>The RCU stall-warn stack traces can be ugly, agreed. > >>> > >>>That said, RCU used to use NMI-based stack traces, but switched to the > >>>current scheme due to the NMIs having the unfortunate habit of locking > >>>things up, which IIRC often meant no stack traces at all. If I recall > >>>correctly, one of the problems was self-deadlock in printk(). > >> > >>Steven Rostedt enabled the per_cpu printk func support in June 2014, and > >>the nmi_backtrace code uses it to just capture printk output to percpu > >>buffers, so I think it's going to be a lot more robust than earlier attempts. > > > >That would be a very good thing, give or take the "I think" qualifier. > >And assuming that the target CPU is healthy enough to find its way back > >to some place that can dump the per-CPU printk buffer. I might well > >be overly paranoid, but I have to suspect that the probability of that > >buffer getting dumped is reduced greatly on a CPU that isn't healthy > >enough to respond to RCU, though. > > The target CPU doesn't dump the buffer. It "just" fields the NMI, > stores the backtrace and sets a flag. > > The buffer is dumped to console by the requesting CPU, either when > all backtraces have come back or when a timeout is reached. That does sound a bit more robust, good! > >But it seems like enabling the experiment might be useful. > > > >"Try enabling the NMI version. If that doesn't get you your RCU CPU > >stall warning stack trace, try the remote-print variant." > > > >Or I suppose we could just do both in succession, just in case their > >console was a serial port. ;-) > > I guess both might be needed but only when the target CPU is dead > enough to fail to respond to NMI. In principle, we could exploit the > timeout in the NMI backtrace logic and only issue the missing > backtraces. It would be really nice if I could call one function that used the best strategy for getting information (including stack trace) about a specified CPU. Ditto for getting information about a specified task, which might be running or might be preempted at the time. Thanx, Paul