Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754315AbbGWVU6 (ORCPT ); Thu, 23 Jul 2015 17:20:58 -0400 Received: from smtprelay0240.hostedemail.com ([216.40.44.240]:60864 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753811AbbGWVUy (ORCPT ); Thu, 23 Jul 2015 17:20:54 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::,RULES_HIT:41:355:379:541:599:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1541:1593:1594:1711:1730:1747:1777:1792:2393:2553:2559:2562:2901:3138:3139:3140:3141:3142:3353:3622:3865:3866:3867:3868:3870:3871:3874:4250:4321:5007:6119:6261:7808:7875:7903:10004:10400:10848:10967:11232:11658:11914:12517:12519:12740:13069:13311:13357:14096:14097:21080,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0 X-HE-Tag: lunch41_41b91b09e6f0e X-Filterd-Recvd-Size: 2470 Date: Thu, 23 Jul 2015 17:20:50 -0400 From: Steven Rostedt To: Andy Lutomirski Cc: X86 ML , "linux-kernel@vger.kernel.org" , Willy Tarreau , Borislav Petkov , Thomas Gleixner , Peter Zijlstra , Linus Torvalds , Brian Gerst Subject: Re: Dealing with the NMI mess Message-ID: <20150723172050.1e1821e1@gandalf.local.home> In-Reply-To: References: X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1589 Lines: 40 On Thu, 23 Jul 2015 13:21:16 -0700 Andy Lutomirski wrote: > 3. Forbid faults (other than MCE) inside NMI. > > Option 3 is almost easy. There are really only two kinds of faults > that can legitimately nest inside NMI: #PF and #DB. #DB is easy to > fix (e.g. with my patches or Peter's patches). What about int3? Which is needed to make ftrace work. This was a requirement to get rid of stomp-machine when updating ftrace functions, as well as the rational for doing the whole NMI nesting work in the first place. > > What if we went all out and forbade page faults in NMI as well. There > are two reasons that I can think of that we might page fault inside an > NMI: > > a) vmalloc fault. I think Ingo already half-implemented a rework to > eliminate vmalloc faults entirely. > > b) User memory access faults. c) stack tracing faults I would have NMIs debug deadlocks with printing stack traces. The stack tracer can page fault, and before the NMI nesting code, while debugging machines, these stack dumps would randomly reboot the box. While writing the NMI nesting code I realized why those reboots happened, and that was due to the stack trace faulting, and the printk from NMI was slow enough to have another NMI go off and stomp over the outer NMIs stack. Which lead to triple faults and such. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/