Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751998AbaFROxx (ORCPT ); Wed, 18 Jun 2014 10:53:53 -0400 Received: from cantor2.suse.de ([195.135.220.15]:45822 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751456AbaFROxR (ORCPT ); Wed, 18 Jun 2014 10:53:17 -0400 Date: Wed, 18 Jun 2014 16:53:14 +0200 (CEST) From: Jiri Kosina To: "Paul E. McKenney" cc: Linus Torvalds , Frederic Weisbecker , Petr Mladek , Andrew Morton , Steven Rostedt , Dave Anderson , Kay Sievers , Michal Hocko , Jan Kara , Linux Kernel Mailing List Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context In-Reply-To: <20140618144457.GF4669@linux.vnet.ibm.com> Message-ID: References: <1399626665-29817-1-git-send-email-pmladek@suse.cz> <20140529000909.GC6507@localhost.localdomain> <20140610164641.GD1951@localhost.localdomain> <20140618143612.GC4669@linux.vnet.ibm.com> <20140618144457.GF4669@linux.vnet.ibm.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 18 Jun 2014, Paul E. McKenney wrote: > > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've > > > > seen it happening for real) cause a complete, undebuggable, silent hang > > > > of machine (deadlock in NMI context) > > > > > > I could easily add an option to RCU to allow people to tell it not to > > > use NMIs to dump the stack. Would that help? > > > > Well, that would make unfortunately the information provided by RCU stall > > detector rather useless ... workqueue-based stack dumping is very unlikely > > to point its finger to the real offender, as it'd be coming way too late. > > I would not use workqueues, but rather have the CPU detecting the > stall grovel through the other CPUs' stacks, which is what I do now for > architectures that don't support NMI-based stack dumps. Would that be > a reasonable approach? That would indeed solve lockups induced by RCU stall detector (and we should convert sysrq stack dumping code to use the same mechanism afterwards). But then, the kernel is still polluted by quite a few instances of WARN_ON(in_nmi()) BUG_IN(in_nmi()) if (in_nmi()) printk(....) which need to be fixed separately afterwards anyway. Thanks, -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/