Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753443AbbGXLGq (ORCPT ); Fri, 24 Jul 2015 07:06:46 -0400 Received: from casper.infradead.org ([85.118.1.10]:34349 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750769AbbGXLGp (ORCPT ); Fri, 24 Jul 2015 07:06:45 -0400 Date: Fri, 24 Jul 2015 13:06:39 +0200 From: Peter Zijlstra To: Linus Torvalds Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , Willy Tarreau , Borislav Petkov , Thomas Gleixner , Steven Rostedt , Brian Gerst Subject: Re: Dealing with the NMI mess Message-ID: <20150724110639.GG19282@twins.programming.kicks-ass.net> References: <20150723212042.GN25159@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3488 Lines: 99 On Thu, Jul 23, 2015 at 02:54:54PM -0700, Linus Torvalds wrote: > On Thu, Jul 23, 2015 at 2:45 PM, Andy Lutomirski wrote: > > > > Or we just re-enable them on the way out of NMI (i.e. the very last > > thing we do in the NMI handler). I don't want to break regular > > userspace gdb when perf is running. > > I'd really prefer it if we don't touch NMI code in those kinds of > ways. The NMI code is fragile as hell. All the problems we have with > it is exactly due to "where is the boundary" issues. > > That's why I *don't* want NMI code to do magic crap. Anything that > says "disable this during this magic window" is broken. The problems > we've had are exactly about atomicity of the entry/exit conditions, > and there is no really good way to get them right. > > I'd be much happier with a _TIF_USER_WORK_MASK approach exactly > because it's so *obvious* that it's not a boundary condition. > > I dislike the "disable and re-enable dr7 in the NMI handler" exactly > because it smells like "we can only handle faults in _this_ region". > It may be true, but it's also what I want us to get away from. I'd > much rather have the "big picture" be that we can take faults anywhere > at all (*), and that none of the core code really cares. Then we "fix > up" user space. A wee bit something like so? We need the intermediate self-IPI because NMI/MCE etc do not deal with TIF flags. I further cleared all of DR7 in an attempt at reducing the amount of state tracked. And it doesn't distinguish between kernel/user watchpoints because the kernel can touch both from !IF. --- arch/x86/kernel/traps.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 8e65d8a9b8db..e8308e9c2b1e 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -570,6 +570,33 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s) NOKPROBE_SYMBOL(fixup_bad_iret); #endif +struct do_debug_state { + unsigned long dr7; + struct irq_work irq_work; + struct callback_head task_work; +}; + +static void __debug_irq_trampoline(struct irq_work *work) +{ + struct do_debug_state *dds = + container_of(work, struct do_debug_state, irq_work); + + task_work_add(current, &dds->task_work, true); +} + +static void __debug_restore_dr7(struct callback_head *work) +{ + struct do_debug_state *dds = + container_of(work, struct do_debug_state, task_work); + + set_debugreg(dds->dr7, 7); +} + +static DEFINE_PER_CPU(struct do_debug_state, do_debug_state) = { + .irq_work = { .func = __debug_irq_trampoline, }, + .task_work = { .func = __debug_restore_dr7, }, +}; + /* * Our handling of the processor debug registers is non-trivial. * We do not clear them on entry and exit from the kernel. Therefore @@ -603,6 +630,16 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) ist_enter(regs); + if (arch_irqs_disabled_flags(regs->flags)) { + struct do_debug_state *dds = this_cpu_ptr(&do_debug_state); + + get_debugreg(dds->dr7, 7); + set_debugreg(0, 7); + irq_work_queue(&dds->irq_work); + + goto exit; + } + get_debugreg(dr6, 6); /* Filter out all the reserved bits which are preset to 1 */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/