Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932357AbcCRJkp (ORCPT ); Fri, 18 Mar 2016 05:40:45 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:35794 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932248AbcCRJk2 (ORCPT ); Fri, 18 Mar 2016 05:40:28 -0400 Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods To: paulmck@linux.vnet.ibm.com, Chris Metcalf References: <1458147733-29338-1-git-send-email-cmetcalf@mellanox.com> <1458147733-29338-2-git-send-email-cmetcalf@mellanox.com> <20160317193600.GY6344@twins.programming.kicks-ass.net> <20160317225557.GA4287@linux.vnet.ibm.com> <56EB4937.1010404@mellanox.com> <20160318003322.GC4287@linux.vnet.ibm.com> Cc: Peter Zijlstra , Russell King , Thomas Gleixner , Aaron Tomlin , Ingo Molnar , Andrew Morton , x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org From: Daniel Thompson Message-ID: <56EBCD09.2000400@linaro.org> Date: Fri, 18 Mar 2016 09:40:25 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160318003322.GC4287@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1903 Lines: 42 On 18/03/16 00:33, Paul E. McKenney wrote: > On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote: >> On 3/17/2016 6:55 PM, Paul E. McKenney wrote: >>> The RCU stall-warn stack traces can be ugly, agreed. >>> >>> That said, RCU used to use NMI-based stack traces, but switched to the >>> current scheme due to the NMIs having the unfortunate habit of locking >>> things up, which IIRC often meant no stack traces at all. If I recall >>> correctly, one of the problems was self-deadlock in printk(). >> >> Steven Rostedt enabled the per_cpu printk func support in June 2014, and >> the nmi_backtrace code uses it to just capture printk output to percpu >> buffers, so I think it's going to be a lot more robust than earlier attempts. > > That would be a very good thing, give or take the "I think" qualifier. > And assuming that the target CPU is healthy enough to find its way back > to some place that can dump the per-CPU printk buffer. I might well > be overly paranoid, but I have to suspect that the probability of that > buffer getting dumped is reduced greatly on a CPU that isn't healthy > enough to respond to RCU, though. The target CPU doesn't dump the buffer. It "just" fields the NMI, stores the backtrace and sets a flag. The buffer is dumped to console by the requesting CPU, either when all backtraces have come back or when a timeout is reached. > But it seems like enabling the experiment might be useful. > > "Try enabling the NMI version. If that doesn't get you your RCU CPU > stall warning stack trace, try the remote-print variant." > > Or I suppose we could just do both in succession, just in case their > console was a serial port. ;-) I guess both might be needed but only when the target CPU is dead enough to fail to respond to NMI. In principle, we could exploit the timeout in the NMI backtrace logic and only issue the missing backtraces. Daniel.