Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp779998ybg; Thu, 19 Mar 2020 08:37:35 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsmNfV0hku7chO+G8jCe/8mJngQwv9TsvcviYsVlfktZXtGYsFqOJ0YyoEbTcUwTXxfjArd X-Received: by 2002:aca:3089:: with SMTP id w131mr2863474oiw.121.1584632255476; Thu, 19 Mar 2020 08:37:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584632255; cv=none; d=google.com; s=arc-20160816; b=Z8H5M2j0ZfjTc8SYgtNhcHXwQFbylqmw4HesX3q0G3sjieMSwqOp4JtwOGa86b9mGe WMnv9nC58tBZxLGh10q8QXgRvahrntWE8i6dMppFguKwpVe7+M08H//wd2sN/gNenAES roZLR57HLXdtGz+6Nb6ZMc0L+TfviegDOtZCRzb/GUbi9Q1Z1UrVi32IVNqQnRZ9vzBy yssqAWe3HmGhjtypSj5q1HhPNuAUBqXaNqqFSGgwQiIAGsfToNuj+eKQj6jlXVCWDcfl 6q8NSFE7/zGRQUjlylssB0+Asio4td2qGsv5aXDtarmU+UxEcFWsHgkOWVUOJTqTcY90 7SLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=vqwtp5HKgAzrOoMbnlpbxKVHmKS3LlkdXcVFz12EwKg=; b=ZJomfIiXpXRg74tviOMsHIF3lss0VRs1s87Z8yyYBJTDc1gUG1K+AUgcW7MjaK3UVB 5ZXP2PgGUQ183VsE3ztd+o1rAgrY12Om/TCLcJmaRqiQM5H+VI/2AH52W/bRi0g6Ra+l 0EAKfNmL9J00F7aWAB0r3NAA8ef2CfGxtQ9zcTraluXz533iEzIzAn1VVbcPxYGXgkd0 h7U4JEbFFtzxOix692RodiuiUs37Px/n3mKqaVIVKni+KB1QYikFHs8KBXYhT878lS8G 4Xtew5CvdeiS5pBteS69qDGRIXSaIsLRx6GANmYnLRJonaI8NRwxsofyJucdYBXG1ciO 3dXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=kDtEIJR+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f15si1360438otp.314.2020.03.19.08.37.00; Thu, 19 Mar 2020 08:37:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=kDtEIJR+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727703AbgCSPgN (ORCPT + 99 others); Thu, 19 Mar 2020 11:36:13 -0400 Received: from mail.kernel.org ([198.145.29.99]:52210 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgCSPgN (ORCPT ); Thu, 19 Mar 2020 11:36:13 -0400 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 032EB208C3 for ; Thu, 19 Mar 2020 15:36:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584632172; bh=p/B+fhi7kTckDWuDXt2TkHFiBI1WQNcNA2Sa0G/Pzkk=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=kDtEIJR+zlEaQ/No+7kRn4gYAD3sfU2FohOc6I6bLex8EKbRZ+YiNu5pKvs8E4Dz+ y1kTaB6IzP0VvgIVn7CnlGmTQKijL05V4UIwcb8uBS/RdloqjAAsyyP+qTCKfZoZBN u9YvM1HKyem8L7u8RA9ZyHUpN5AZ7WyKdzwXgpx0= Received: by mail-wm1-f41.google.com with SMTP id z13so2979000wml.0 for ; Thu, 19 Mar 2020 08:36:11 -0700 (PDT) X-Gm-Message-State: ANhLgQ2qPDDjhjdpKulhcrMHpqS97+OFCCtXFdHoU6ZTla4dhuxHqlfH fVrahPYvW4OiGRfcvYXDsN07DPTb3ZS0bUEvAXPxqQ== X-Received: by 2002:a1c:b0c3:: with SMTP id z186mr4272698wme.36.1584632170436; Thu, 19 Mar 2020 08:36:10 -0700 (PDT) MIME-Version: 1.0 References: <20200319091407.1481-1-joro@8bytes.org> <20200319091407.1481-71-joro@8bytes.org> In-Reply-To: <20200319091407.1481-71-joro@8bytes.org> From: Andy Lutomirski Date: Thu, 19 Mar 2020 08:35:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 70/70] x86/sev-es: Add NMI state tracking To: Joerg Roedel Cc: X86 ML , "H. Peter Anvin" , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Hellstrom , Jiri Slaby , Dan Williams , Tom Lendacky , Juergen Gross , Kees Cook , LKML , kvm list , Linux Virtualization , Joerg Roedel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 19, 2020 at 2:14 AM Joerg Roedel wrote: > > From: Joerg Roedel > > Keep NMI state in SEV-ES code so the kernel can re-enable NMIs for the > vCPU when it reaches IRET. IIRC I suggested just re-enabling NMI in C from do_nmi(). What was wrong with that approach? > +#ifdef CONFIG_AMD_MEM_ENCRYPT > +SYM_CODE_START(sev_es_iret_user) > + UNWIND_HINT_IRET_REGS offset=8 > + /* > + * The kernel jumps here directly from > + * swapgs_restore_regs_and_return_to_usermode. %rsp points already to > + * trampoline stack, but %cr3 is still from kernel. User-regs are live > + * except %rdi. Switch to user CR3, restore user %rdi and user gs_base > + * and single-step over IRET > + */ > + SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi > + popq %rdi > + SWAPGS > + /* > + * Enable single-stepping and execute IRET. When IRET is > + * finished the resulting #DB exception will cause a #VC > + * exception to be raised. The #VC exception handler will send a > + * NMI-complete message to the hypervisor to re-open the NMI > + * window. This is distressing to say the least. The sequence if events is, roughly: 1. We're here with NMI masking in an unknown state because do_nmi() and any nested faults could have done IRET, at least architecturally. NMI could occur or it could not. I suppose that, on SEV-ES, as least on current CPUs, NMI is definitely masked. What about on newer CPUs? What if we migrate? > + */ > +sev_es_iret_kernel: > + pushf > + btsq $X86_EFLAGS_TF_BIT, (%rsp) > + popf Now we have TF on, NMIs (architecturally) in unknown state. > + iretq This causes us to pop the NMI frame off the stack. Assuming the NMI restart logic is invoked (which is maybe impossible?), we get #DB, which presumably is actually delivered. And we end up on the #DB stack, which might already have been in use, so we have a potential increase in nesting. Also, #DB may be called from an unexpected context. Now somehow #DB is supposed to invoke #VC, which is supposed to do the magic hypercall, and all of this is supposed to be safe? Or is #DB unconditionally redirected to #VC? What happens if we had no stack (e.g. we interrupted SYSCALL) or we were already in #VC to begin with? I think there are two credible ways to approach this: 1. Just put the NMI unmask in do_nmi(). The kernel *already* knows how to handle running do_nmi() with NMIs unmasked. This is much, much simpler than your code. 2. Have an entirely separate NMI path for the SEV-ES-on-misdesigned-CPU case. And have very clear documentation for what prevents this code from being executed on future CPUs (Zen3?) that have this issue fixed for real? This hybrid code is no good. --Andy