Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754299AbbGWVsQ (ORCPT ); Thu, 23 Jul 2015 17:48:16 -0400 Received: from mail-ig0-f175.google.com ([209.85.213.175]:38724 "EHLO mail-ig0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753967AbbGWVsO (ORCPT ); Thu, 23 Jul 2015 17:48:14 -0400 MIME-Version: 1.0 In-Reply-To: <20150723173105.6795c0dc@gandalf.local.home> References: <20150723173105.6795c0dc@gandalf.local.home> Date: Thu, 23 Jul 2015 14:48:13 -0700 X-Google-Sender-Auth: 1GjK8kyXe0-Zal7QivDoP-LKN0g Message-ID: Subject: Re: Dealing with the NMI mess From: Linus Torvalds To: Steven Rostedt Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , Willy Tarreau , Borislav Petkov , Thomas Gleixner , Peter Zijlstra , Brian Gerst Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2392 Lines: 55 On Thu, Jul 23, 2015 at 2:31 PM, Steven Rostedt wrote: > > Let me get this straight. The idea is in the #DB handler to detect that > it was triggered in NMI context, and if so, simply disarm that > breakpoint permanently, right? No, for simplicity, I'd make it cover not just NMI code, but any "kernel code with interrupts disabled". Because that's the test we'd use for "use ret instead of iret". And that wider test is exactly because it's so damn hard to get the exact instruction boundaries right. Let's *not* go down the path (again) of having to get the whole %rip range and "magic stack pointer values" etc. Make it simple and completely unambiguous. The rule really would be: - if we return to kernel space and interrupts are disabled, we will use "ret" rather than "iret" Hard rule. Simple. Straightforward. No random %rip values. No random %rsp values. NO CRAP. - but because we use "ret" rather than "iret" we can't get RF semantics, it means that #DB is special. RF is supposed to make us make forward progress So for that reason, #DB just says "if the breakpoint happened during that interrupts-ff reghion, I will clear %dr7 to guarantee forward progress" So those would be the two main rules. Very simple, and avoiding all nasty cases. Now, I'd be willing to then hide the "oops, we clear dr7 very agrressively" issue by having a few additional _heuristics_. But I call them "heuristics" because unlike the current NMI nesting games, they aren't about core stability. They are about "ok, maybe somebody wants to trigger those faults, and we'll be _nice_ and try to make it easy for them", but nothing more. So for example, if that "#DB clears %dr7" happened, it sounds easy to set _TIF_USER_WORK_MASK, and just force %dr7 to be re-loaded from a cached value, so that if we disabled things because of some user stack trace access, it will be re-enabled by the time we return to user space. I think that sounds reasonable, but it's not something the core low-level entry x86 assembly code needs to even care about. It's not that level of "core", it's just being polite. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/