Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753612AbbGXRLI (ORCPT ); Fri, 24 Jul 2015 13:11:08 -0400 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:9588 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752134AbbGXRLH (ORCPT ); Fri, 24 Jul 2015 13:11:07 -0400 Date: Fri, 24 Jul 2015 19:10:18 +0200 From: Willy Tarreau To: Andy Lutomirski Cc: Peter Zijlstra , Linus Torvalds , Steven Rostedt , X86 ML , "linux-kernel@vger.kernel.org" , Borislav Petkov , Thomas Gleixner , Brian Gerst Subject: Re: Dealing with the NMI mess Message-ID: <20150724171018.GH3612@1wt.eu> References: <20150723173105.6795c0dc@gandalf.local.home> <20150724081326.GO25159@twins.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2548 Lines: 54 On Fri, Jul 24, 2015 at 08:48:57AM -0700, Andy Lutomirski wrote: > So by the time we detect that we've hit a watchpoint, the instruction > that tripped it is done and we don't need RF. Furthermore, after > reading 17.3.1.1: I *think* that regs->flags withh have RF *clear* if > we hit a watchpoint. So this might be as simple as: > > if ((dr6 && (0xf * DR_TRAP0) && (regs->flags & (X86_EFLAGS_RF | > X86_EFLAGS_IF)) == X86_EFLAGS_RF && !user_mode(regs)) > for (i = 0; i < 4; i++) > if (dr6 & (DR_TRAP0< /* hit a kernel breakpoint with IF clear */ > dr7 &= ~(DR_GLOBAL_ENABLE << (i * DR_ENABLE_SHIFT)); > } > > I'm not saying that your code is wrong, but I think this is simpler > and avoids poking at yet more per-cpu state from NMI context, which is > kind of nice. > > If you don't like the RF games above, it would also be straightforward > to parse dr0..dr3 for each DR_TRAP bit that's set and see if it's a > breakpoint. Andy, section 5.8 of the SDM makes me think we could possibly abuse SYSRET to emulate IRET, and then possibly simplify the flags processing. It says that it takes the CPL3 code segment but nowhere it says that the target is validated for effectively being userland, and further it suggests that it doesn't validate anything : "It is the responsibility of the OS to ensure the descriptors in the GDT/LDT correspond to the selectors loaded by SYSCALL/SYSRET (consistent with the base, limit, and attribute values forced by the instructions)." The OS has to set the RSP by itself before doing SYSRET, which opens a race between "mov rsp" and "sysret", but if we only take that path once we figure we come from NMI (using just IF+RSP), we know that IRQs and NMIs are still disabled and cannot strike at this instant. Maybe MCEs can, but they would execute within the NMI's stack just as if they were triggered inside the NMI as well so I don't see a problem here. I tried to imagine a case where kernel page faults, then NMI comes in, then debug strikes and we have to return from debug to NMI then to fault handler and I don't think we break the chain. Of course there are many subtleties I can't grab because I don't understand all the details. Do you think that could simplify things or that it's another stupid idea ? Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/