Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755901AbdIGSjS (ORCPT ); Thu, 7 Sep 2017 14:39:18 -0400 Received: from mail.kernel.org ([198.145.29.99]:50156 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755873AbdIGSjQ (ORCPT ); Thu, 7 Sep 2017 14:39:16 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CFE021BB7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: AOwi7QAil3PjrEln+yLsiqHlD4BMFa3DRpMA6izzwT/GLhNGFM4OqCKZctFKSf6LtOrJ3qHCzNxkrI3SII6PfeeJV0Q= MIME-Version: 1.0 In-Reply-To: References:

From: Andy Lutomirski Date: Thu, 7 Sep 2017 11:38:55 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC 08/17] x86/asm/64: De-Xen-ify our NMI code To: Juergen Gross Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , Borislav Petkov , Brian Gerst , Andrew Cooper , Boris Ostrovsky , Kees Cook Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4479 Lines: 129 On Thu, Sep 7, 2017 at 2:34 AM, Juergen Gross wrote: > On 06/09/17 23:36, Andy Lutomirski wrote: >> Xen PV is fundamentally incompatible with our fancy NMI code: it >> doesn't use IST at all, and Xen entries clobber two stack slots >> below the hardware frame. >> >> Drop Xen PV support from our NMI code entirely. >> >> XXX: Juergen: could you write and test the tiny patch needed to >> make Xen PV have a xen_nmi entry that handles NMIs? I don't know >> how to test it. > > You mean something like the attached one? Yes. Mind if I add it to my series? > > Seems to work at least for the "normal" case of a NMI coming in at > a random point in time. > > Regarding testing: in case you have a Xen setup you can easily send > a NMI to a domain from dom0: > > xl trigger nmi Thanks! > > > Juergen > >> >> Cc: Juergen Gross >> Cc: Boris Ostrovsky >> Signed-off-by: Andy Lutomirski >> --- >> arch/x86/entry/entry_64.S | 41 +++++++++++++++++------------------------ >> 1 file changed, 17 insertions(+), 24 deletions(-) >> >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >> index a9e318f7cc9b..c81e05fb999e 100644 >> --- a/arch/x86/entry/entry_64.S >> +++ b/arch/x86/entry/entry_64.S >> @@ -1171,21 +1171,12 @@ ENTRY(error_exit) >> jmp retint_user >> END(error_exit) >> >> -/* Runs on exception stack */ >> +/* >> + * Runs on exception stack. Xen PV does not go through this path at all, >> + * so we can use real assembly here. >> + */ >> ENTRY(nmi) >> /* >> - * Fix up the exception frame if we're on Xen. >> - * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most >> - * one value to the stack on native, so it may clobber the rdx >> - * scratch slot, but it won't clobber any of the important >> - * slots past it. >> - * >> - * Xen is a different story, because the Xen frame itself overlaps >> - * the "NMI executing" variable. >> - */ >> - PARAVIRT_ADJUST_EXCEPTION_FRAME >> - >> - /* >> * We allow breakpoints in NMIs. If a breakpoint occurs, then >> * the iretq it performs will take us out of NMI context. >> * This means that we can have nested NMIs where the next >> @@ -1240,7 +1231,7 @@ ENTRY(nmi) >> * stacks lest we corrupt the "NMI executing" variable. >> */ >> >> - SWAPGS_UNSAFE_STACK >> + swapgs >> cld >> movq %rsp, %rdx >> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp >> @@ -1402,7 +1393,7 @@ nested_nmi_out: >> popq %rdx >> >> /* We are returning to kernel mode, so this cannot result in a fault. */ >> - INTERRUPT_RETURN >> + iretq >> >> first_nmi: >> /* Restore rdx. */ >> @@ -1432,7 +1423,7 @@ first_nmi: >> pushfq /* RFLAGS */ >> pushq $__KERNEL_CS /* CS */ >> pushq $1f /* RIP */ >> - INTERRUPT_RETURN /* continues at repeat_nmi below */ >> + iretq /* continues at repeat_nmi below */ >> 1: >> #endif >> >> @@ -1502,20 +1493,22 @@ nmi_restore: >> /* >> * Clear "NMI executing". Set DF first so that we can easily >> * distinguish the remaining code between here and IRET from >> - * the SYSCALL entry and exit paths. On a native kernel, we >> - * could just inspect RIP, but, on paravirt kernels, >> - * INTERRUPT_RETURN can translate into a jump into a >> - * hypercall page. >> + * the SYSCALL entry and exit paths. >> + * >> + * We arguably should just inspect RIP instead, but I (Andy) wrote >> + * this code when I had the misapprehension that Xen PV supported >> + * NMIs, and Xen PV would break that approach. >> */ >> std >> movq $0, 5*8(%rsp) /* clear "NMI executing" */ >> >> /* >> - * INTERRUPT_RETURN reads the "iret" frame and exits the NMI >> - * stack in a single instruction. We are returning to kernel >> - * mode, so this cannot result in a fault. >> + * iretq reads the "iret" frame and exits the NMI stack in a >> + * single instruction. We are returning to kernel mode, so this >> + * cannot result in a fault. Similarly, we don't need to worry >> + * about espfix64 on the way back to kernel mode. >> */ >> - INTERRUPT_RETURN >> + iretq >> END(nmi) >> >> ENTRY(ignore_sysret) >> >