Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp678966rdh; Thu, 26 Oct 2023 12:31:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IErfveM9G82qM6GecZXWWbh7bJYKsdUCqfWCq67cy65unKEh7KQ9/jPGADlsw9rWVm9a9hQ X-Received: by 2002:a25:d28d:0:b0:da0:29b9:2e39 with SMTP id j135-20020a25d28d000000b00da029b92e39mr1221828ybg.19.1698348679167; Thu, 26 Oct 2023 12:31:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698348679; cv=none; d=google.com; s=arc-20160816; b=ZBtVfWMBrxZj2Sz4sACgEy8ByUCmdcOeTAvfVtcs4NNK8tciTnOq0ZUmoDn+qQwDpU EYsGstheo+kM3pN5bmvAyIF86zR/AgfrHHzjAO7JEK68b1OjIKtyAZrx2ZMdy3qyxqEn WF6pBZEj+CfGLKkLYXn9a8OdYtzqI1IpCaMZN9zHWGxiafKrHWQqTVMAwvoIORguz2UB EgZlV25NvPcEluI/JVf8HCpO0jNexk/2u6Pt9XmTBy3KQ1R8VULT0jTQPRUpQUaZBapd yKXnpV8YkoL5nW2aLjh1uLLTWaefU4Z5mn3Z0ib0qr5ntrRct607CK8YHzKz7jX4WtJc ggMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=O5l+1D3sMytKg497+LD3ZhVhuDQvT+m5NPnG8xfpJ8c=; fh=SyxcvViqygumIva4yJbQoxy7Vq6BRqnWLLPsvZppLLU=; b=VCYQcrhp7roZ4DSkTxJPDAHcmPugIMmOmWO5yAJdSA7WXu6/kvnjRnkP3+mnUS6z7s m1NN3Wv+7dFN/aZ/b7gN68+Q2GpFRnqjM9rQdQi9NKMDH2cRtWz7Ysyu8JV3ibO5D7GN ZuGhk2CM4Wa/4DEt1w0kz5b5n24TSHCnGo0BCT6wI47pi44C714+xnmnWM1dwLXjAxli 9m10hM/4qdAs+OhPm2H6ftEg7FCSEZRZP8hpj8YrnTnriST+SmV5NjMHJm6SeGFj1OXC ZGLmB31YziJ1xyYz7qtVTCg1e+F2vmYwzf2Wg6UHmDsDNC3jvpPHOywUacP7ox510/2L XMMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=R8Dvk+JU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id j17-20020a252311000000b00d9ad3afa69asi137029ybj.52.2023.10.26.12.31.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 12:31:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=R8Dvk+JU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 58E75826CB63; Thu, 26 Oct 2023 12:31:16 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231546AbjJZTbC (ORCPT + 99 others); Thu, 26 Oct 2023 15:31:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230143AbjJZTbA (ORCPT ); Thu, 26 Oct 2023 15:31:00 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 225FC196 for ; Thu, 26 Oct 2023 12:30:57 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1c9f973d319so13514735ad.0 for ; Thu, 26 Oct 2023 12:30:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698348656; x=1698953456; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=O5l+1D3sMytKg497+LD3ZhVhuDQvT+m5NPnG8xfpJ8c=; b=R8Dvk+JU2d5bBxse0HIEZrXlqO+fbya9HcK9Xs6+MGNsyYwhcdV9mBT3f9MAqNdeE5 qsHmU4j1EmU4gnH2hr+U3ET3DQG/43L2UrOsly8ZwPegiSYdFxH8kajpqRSpr4Q+J1Pv j2fHryybNGuBMOzCg+IyY+pUlH6NYQw2k0whOs6LwRA3+cFZZ6R1hRyHScv3OZ47dTxj QSOl8EF9VpbDGJOO/QfSWQ/6LPmvCuok1kpuptWmy4euzFj5UP/M41CST1LuRYwx95tm IZLiIHs3SKdruuYR07wO/G0lv9ik/uxcnLCekfTdvuwNGALx8o6NM8NVddMYGpIKWSkC Tbkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698348656; x=1698953456; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O5l+1D3sMytKg497+LD3ZhVhuDQvT+m5NPnG8xfpJ8c=; b=tV1EirM23YsspX5/2TFHXnprzTEH+AQEZ/I63h8kCK9XtSMWskYdorI8CGrdK7/XGA j5d5qVkGRUv+hcS418XTb2TL1x/Tr20QvRoDIWLpw3+KIzSipCpdl5/k1xt7F4Ucu4if OsSP/l5FNcNrWd7zQm04TiK+z6vXWyiwjXmlL+ehzhh/BevHjw2loTMqgJgbm8rsT6Sl ACMK3h31W5J4wxa3O3kfnAPmbG9jZ+oggto2ZjukPgtx7JRDEwH99rpkGAZQut7lM1GV ga9yeABVYSpRj8FwT0MTq6p/zvj9qlmHVyIG7EONrfC2Mdo0h5YOVzT0aEtRBj6yOvfN r/ag== X-Gm-Message-State: AOJu0Ywaof16KuvoMGpKtITzm0YOMjtp3QHZhlYsYfwpsybBl5L0TRgD ojiYDbMM7zE/Iwli/AtyY+uyfPwSPSI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:ed41:b0:1cc:1813:3023 with SMTP id y1-20020a170902ed4100b001cc18133023mr7873plb.2.1698348656581; Thu, 26 Oct 2023 12:30:56 -0700 (PDT) Date: Thu, 26 Oct 2023 12:30:55 -0700 In-Reply-To: <20231025-delay-verw-v3-6-52663677ee35@linux.intel.com> Mime-Version: 1.0 References: <20231025-delay-verw-v3-0-52663677ee35@linux.intel.com> <20231025-delay-verw-v3-6-52663677ee35@linux.intel.com> Message-ID: Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation From: Sean Christopherson To: Pawan Gupta Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Peter Zijlstra , Josh Poimboeuf , Andy Lutomirski , Jonathan Corbet , Paolo Bonzini , tony.luck@intel.com, ak@linux.intel.com, tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, Alyssa Milburn , Daniel Sneddon , antonio.gomez.iglesias@linux.intel.com Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 26 Oct 2023 12:31:16 -0700 (PDT) On Wed, Oct 25, 2023, Pawan Gupta wrote: > During VMentry VERW is executed to mitigate MDS. After VERW, any memory > access like register push onto stack may put host data in MDS affected > CPU buffers. A guest can then use MDS to sample host data. > > Although likelihood of secrets surviving in registers at current VERW > callsite is less, but it can't be ruled out. Harden the MDS mitigation > by moving the VERW mitigation late in VMentry path. > > Note that VERW for MMIO Stale Data mitigation is unchanged because of > the complexity of per-guest conditional VERW which is not easy to handle > that late in asm with no GPRs available. If the CPU is also affected by > MDS, VERW is unconditionally executed late in asm regardless of guest > having MMIO access. > > Signed-off-by: Pawan Gupta > --- > arch/x86/kvm/vmx/vmenter.S | 3 +++ > arch/x86/kvm/vmx/vmx.c | 10 +++++++--- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > index b3b13ec04bac..139960deb736 100644 > --- a/arch/x86/kvm/vmx/vmenter.S > +++ b/arch/x86/kvm/vmx/vmenter.S > @@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run) > /* Load guest RAX. This kills the @regs pointer! */ > mov VCPU_RAX(%_ASM_AX), %_ASM_AX > > + /* Clobbers EFLAGS.ZF */ > + CLEAR_CPU_BUFFERS > + > /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */ > jnc .Lvmlaunch > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 24e8694b83fc..2d149589cf5b 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, > > guest_state_enter_irqoff(); > > - /* L1D Flush includes CPU buffer clear to mitigate MDS */ > + /* > + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW > + * mitigation for MDS is done late in VMentry and is still > + * executed inspite of L1D Flush. This is because an extra VERW in spite > + * should not matter much after the big hammer L1D Flush. > + */ > if (static_branch_unlikely(&vmx_l1d_should_flush)) > vmx_l1d_flush(vcpu); There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in "conditional mode", and is not guaranteed to do a ucode-based flush (though I can't tell if it's possible for the VERW magic to exist without X86_FEATURE_FLUSH_L1D). If we care, something like the diff at the bottom is probably needed. > - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)) > - mds_clear_cpu_buffers(); > else if (static_branch_unlikely(&mmio_stale_data_clear) && > kvm_arch_has_assigned_device(vcpu->kvm)) > + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */ Please don't put comments inside an if/elif without curly braces (and I don't want to add curly braces). Though I think that's a moot point if we first fix the conditional L1D flush issue. E.g. when the dust settles we can end up with: /* * Note, a ucode-based L1D flush also flushes CPU buffers, i.e. the * manual VERW in __vmx_vcpu_run() to mitigate MDS *may* be redundant. * But an L1D Flush is not guaranteed for "conditional mode", and the * cost of an extra VERW after a full L1D flush is negligible. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) cpu_buffers_flushed = vmx_l1d_flush(vcpu); /* * The MMIO stale data vulnerability is a subset of the general MDS * vulnerability, i.e. this is mutually exclusive with the VERW that's * done just before VM-Enter. The vulnerability requires the attacker, * i.e. the guest, to do MMIO, so this "clear" can be done earlier. */ if (static_branch_unlikely(&mmio_stale_data_clear) && !cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm)) mds_clear_cpu_buffers(); > mds_clear_cpu_buffers(); > > vmx_disable_fb_clear(vmx); LOL, nice. IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS. But the checks aren't identical, which makes this _look_ sketchy. Can you do something like this to ensure we don't accidentally neuter the late VERW? static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx) { vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) && !boot_cpu_has_bug(X86_BUG_MDS) && !boot_cpu_has_bug(X86_BUG_TAA); if (vmx->disable_fb_clear && WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))) vmx->disable_fb_clear = false; ... } -- diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6e502ba93141..cf6e06bb8310 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6606,8 +6606,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) * is not exactly LRU. This could be sized at runtime via topology * information but as all relevant affected CPUs have 32KiB L1D cache size * there is no point in doing so. + * + * Returns %true if CPU buffers were cleared, i.e. if a microcode-based L1D + * flush was executed (which also clears CPU buffers). */ -static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) +static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; @@ -6634,14 +6637,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) kvm_clear_cpu_l1tf_flush_l1d(); if (!flush_l1d) - return; + return false; } vcpu->stat.l1d_flush++; if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH); - return; + return true; } asm volatile( @@ -6665,6 +6668,8 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) :: [flush_pages] "r" (vmx_l1d_flush_pages), [size] "r" (size) : "eax", "ebx", "ecx", "edx"); + + return false; } static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -7222,16 +7227,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, unsigned int flags) { struct vcpu_vmx *vmx = to_vmx(vcpu); + bool cpu_buffers_flushed = false; guest_state_enter_irqoff(); - /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) - vmx_l1d_flush(vcpu); - else if (static_branch_unlikely(&mds_user_clear)) - mds_clear_cpu_buffers(); - else if (static_branch_unlikely(&mmio_stale_data_clear) && - kvm_arch_has_assigned_device(vcpu->kvm)) + cpu_buffers_flushed = vmx_l1d_flush(vcpu); + + if ((static_branch_unlikely(&mds_user_clear) || + (static_branch_unlikely(&mmio_stale_data_clear) && + kvm_arch_has_assigned_device(vcpu->kvm))) && + !cpu_buffers_flushed) mds_clear_cpu_buffers(); vmx_disable_fb_clear(vmx);