Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030184AbbGWUtm (ORCPT ); Thu, 23 Jul 2015 16:49:42 -0400 Received: from mail-lb0-f176.google.com ([209.85.217.176]:33928 "EHLO mail-lb0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754150AbbGWUth (ORCPT ); Thu, 23 Jul 2015 16:49:37 -0400 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Thu, 23 Jul 2015 13:49:16 -0700 Message-ID: Subject: Re: Dealing with the NMI mess To: Linus Torvalds Cc: X86 ML , "linux-kernel@vger.kernel.org" , Willy Tarreau , Borislav Petkov , Thomas Gleixner , Peter Zijlstra , Steven Rostedt , Brian Gerst Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2974 Lines: 70 On Thu, Jul 23, 2015 at 1:38 PM, Linus Torvalds wrote: > On Thu, Jul 23, 2015 at 1:21 PM, Andy Lutomirski wrote: >> >> 2. Forbid IRET inside NMIs. Doable but maybe not that pretty. >> >> We haven't considered: >> >> 3. Forbid faults (other than MCE) inside NMI. > > I'd really prefer #2. #3 depends on us getting many things right, and > never introducing new cases in the future. > > #2, in contrast, seems to be fairly localized. Yes, RF is an issue, > but returning to user space with RF clear doesn't really seem to be > all that problematic. > > The point of RF is to make forward progress in the face of debug > register faults, but I don't see what was wrong with the whole > "disable any debug events that happen with interrupts disabled". > > And no, I do *not* believe that we should disable debug faults ahead > of time. We should take them, disable them, and return with 'ret'. No > complex "you can't put breakpoints in this region" crap, no magic > rules, no subtle issues. > > I really think your "disallow #DB" is pointless. I think your "prevent > instruction breakpoints in NMI" is wrong. Let them happen. Take them > and disable them. Return with RT clear. Go on with your life. > > And the "take them and disable them" is really simple. No "am I in an > NMI contect" thing (because that leads to the whole question about > "what is NMI context"). That's not the real rule anyway. > > No, make it very simple and straightforward. Make the test be "uhhuh, > I got a #DB in kernel mode, and interrupts were disabled - I know I'm > going to return with "ret", so I'm just going to have to disable this > breakpoint". > > Nothing clever. Nothing subtle. Nothing that needs "this range of > instructions is magical". No. Just a very simple rule: if the context > we return to is kernel mode and interrupts are disabled, we're using > 'ret', so we cannot suppress debug faults. There are some subtleties in here. Issue A: to return with RF clear, we need to disarm the breakpoint. If it's limited to the duration of the NMI, that's easy. If not, when do we re-arm? New prepare_exit_to_usermode hook? Hmm, setting ti flags during context switch may target the wrong task. Issue B: single-step exception after SYSENTER. The patches I just sent fix that, though. Issue C: #DB with invalid stack pointer (can happen due to watchpoints during SYSCALL entry or SYSRET exit). I guess we need to ban such watchpoints. Issue D: debug exception inside EFI (especially mixed-mode EFI). We can't return using RET, so we need to catch that case. These issues mostly go away if we preemptively disarm DR7 early in NMI processing and rearm it at the end. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/