Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1341443rwd; Thu, 18 May 2023 10:45:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4bImDYXiU3kggRatXDnBuSY2Fi3aNox7LwVQZEMXnUTFAOaGsZGgTh1czSfrNf2fpMZXof X-Received: by 2002:a17:902:aa8a:b0:1a9:b0a3:f03a with SMTP id d10-20020a170902aa8a00b001a9b0a3f03amr3098388plr.9.1684431930238; Thu, 18 May 2023 10:45:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684431930; cv=none; d=google.com; s=arc-20160816; b=fk2musuDd9MGmnsLETPAsxREywogn6L6hcPJgyejonFfjik5ww4LG+b4lCuA3Czl4p iDo4dA34/db6Ez/BFmBuTlAtbGPAqdO6rNZlg7sIl1nljrvTPGy0ykTd/AP4miGGUUEw 7pOt9pC7C4wI1EQemz5JNHCP844W8wbVlyyQKHgDnArH1FcyLVOTOhocnYWaSqxvrlgC 4NGiR70k0pd+l4Xdct5Bedsd6OAxUVINxcibTfz4N8KqGMZn51+th1OzvyvVJSQI1F22 MZd6xZLR8hSsnx96SXMeFxd0O7TJfWrwA2UXS+dkGjhyMBJVtoh9BYMtcBch+rowf+42 ZABQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8cEl1ydTMl5q3gHf+vzUZhqFLws3YlkU0676DjuVAEc=; b=ytm8swylsdVtTYBD8T1Olm/WSrarESoFAwi3z9rCpM79cVUMkYWLCVtWFFPjSiMOXs L0/q71Awp8y6xwNRY+6CG2uzzc091oQsAGxxME5CyQOxf32RdpOpqkfLqCfaDd1UBKPn O8Fmzx4dZWyQzG+bF3ejqaDM2uU2BQiLkWEn57OutuuH7ebFTPWAyOZGUTJoDDF30AR0 k0g7ivrWQNIw8rLvWByOy+wQTY8Kl9hbA/BYDiurv5mFj/u1wNxmbsg91wSpdYLCCyp+ 6FIXb1MP71Tx8w/ldRBMcepglCmhD4afW5AROgvyB/1HLpcY+BmauyPg2LhdJ1RWnWQ3 /8Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=BKULirzL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u2-20020a170902e80200b001ae38227983si1923782plg.199.2023.05.18.10.45.17; Thu, 18 May 2023 10:45:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=BKULirzL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229948AbjERRd2 (ORCPT + 99 others); Thu, 18 May 2023 13:33:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229922AbjERRdY (ORCPT ); Thu, 18 May 2023 13:33:24 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B96310C6 for ; Thu, 18 May 2023 10:33:18 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-ba2526a8918so4176356276.1 for ; Thu, 18 May 2023 10:33:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684431198; x=1687023198; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8cEl1ydTMl5q3gHf+vzUZhqFLws3YlkU0676DjuVAEc=; b=BKULirzLQS6eZU2YCMK6cPgVX6t43Y+ZWllWKSO0hfYQazIpj8NW0eV6zU3q766It9 w7vRgme9vwtezARoQpPcNqTT2qvuusR243H2BX2Bg11/cQa/6u+KVtjbTPkdgRZOiL0/ E3AIukIhXubYQV+mzsndFfzl+ISjTU46r5MpvM6v1yKTs39dOaIsEo41jCrUm135Fdm/ wlRVbMAWoMdExKceIndrby8bV5wvuT9esReEpiRU6NyllmKilS8evoCVWCl4YwANzrsw ozzsurn6tZv1LMXAr4nKnZqEFn8huB+EPpPz6S11Xaeftd23J/LSUb5qoP01URtDL6R7 nROQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684431198; x=1687023198; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8cEl1ydTMl5q3gHf+vzUZhqFLws3YlkU0676DjuVAEc=; b=kbxCcwLCuq+ADtZKS+AFyYBS9b048fiJnWwqLfcGqkqVoUR/9arrelKumiKaSlWGrW ouHQROw9FE+i9WkgDNtMqcVxPXc/jtizeHeYv8Ldz+s+j8IGikUZkcBGdhaR8ZgpkRXn CrU0l+Z+f+BNYKZDSFgsXWgzfr7iXVfB7RQML5G0BB2TaZPNI2qC7uJiQpxqGTGu0W+a QSdk7lpoWsDKdbzOIIHTkn/Rl608t0ZTWQbsFX5yYmP+VVxtkoLiwf+95/L4y9hAqE5G 9cUyGdheOud7DtKq1L7meAlmWzshl0gbZE3nZDfLWhggLnP9OvtryrDSbZFfULG0DrJv BOPQ== X-Gm-Message-State: AC+VfDwFP1cNikAJ9K/kCJgwo04cI8QlYVik8pUQXWwWFtlqb8mHIynS 73jTH97WpCTYj1qyGSWJtkw+WG+jQ0k= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:e785:0:b0:ba8:3e1a:3fd6 with SMTP id e127-20020a25e785000000b00ba83e1a3fd6mr1111536ybh.3.1684431197829; Thu, 18 May 2023 10:33:17 -0700 (PDT) Date: Thu, 18 May 2023 10:33:15 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230506030435.80262-1-chao.gao@intel.com> Message-ID: Subject: Re: [PATCH] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps From: Sean Christopherson To: Xiaoyao Li Cc: Chao Gao , kvm@vger.kernel.org, Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Jim Mattson Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Jim On Thu, May 18, 2023, Xiaoyao Li wrote: > On 5/6/2023 11:04 AM, Chao Gao wrote: > > to avoid computing the supported value at runtime every time. > > > > No functional change intended. > > the value of kvm_get_arch_capabilities() can be changed due to > > if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER) > data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; > > and l1tf_vmx_mitigation can be runtime changed by vmentry_l1d_flush module > param. Nice catch! > We need a detailed analysis that in no real case can > ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit change runtime. No, the fact that it _can_ be modified by a writable module param is enough to make this patch buggy. I do like snapshotting and then updating the value, even though there's likely no meaningful performance benefit, as that would provide a place to document that the "supported" value is dynamic. Though the fact that it's dynamic is arguably a bug in its own right, e.g. if userspace isn't careful, a VM can have vCPUs with different values for ARCH_CAPABILITIES. But fixing that is probably a fool's errand. So I vote to snapshot the value and toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation is modified. On a somewhat related topic, what in the absolute #$#$ is going on with FB_CLEAR_DIS!?!? I made the mistake of digging into why KVM doesn't advertise ARCH_CAP_FB_CLEAR_CTRL... 1. I see *nothing* in commit 027bbb884be0 ("KVM: x86/speculation: Disable Fill buffer clear within guests") that justifies 1x RDMSR and 2x WRMSR on every entry+exit. 2. I'm pretty sure conditioning mmio_stale_data_clear on kvm_arch_has_assigned_device() is a bug. AIUI, the vulnerability applies to _any_ MMIO accesses. Assigning a device is necessary to let the device DMA into the guest, but it's not necessary to let the guest access MMIO addresses, that's done purely via memslots. 3. Irrespective of whether or not there is a performance benefit, toggling the MSR on every entry+exit is completely unnecessary if KVM won't do VERW before VM-Enter, i.e. if (!mds_user_clear && !mmio_stale_data_clear), then the toggling can be done in vmx_prepare_switch_to_{guest,host}(). This probably isn't worth pursuing though, as #4 below is more likely, especially since X86_BUG_MSBDS_ONLY is limited to Atom (and MIC, lol) CPUs. 4. If the host will will _never_ do VERW, i.e. #3 + !X86_BUG_MSBDS_ONLY, then KVM just needs to context switch the MSR between guests since the value that's loaded while running in the host is irrelevant. E.g. use a percpu cache to track the current value. 5. MSR_IA32_MCU_OPT_CTRL is not modified by the host after a CPU is brought up, i.e. the host's desired value is effectively static post-boot, and barring a buggy configuration (running KVM as a guest), the boot CPU's value will be the same as every other CPU. 6. Performance aside, KVM should not be speculating (ha!) on what the guest will and will not do, and should instead honor whatever behavior is presented to the guest. If the guest CPU model indicates that VERW flushes buffers, then KVM damn well needs to let VERW flush buffers. 7. Why on earth did Intel provide a knob that affects both the host and guest, since AFAICT the intent of the MSR is purely to suppress FB clearing for an unsuspecting (or misconfigured?) guest!?!?! FWIW, this trainwreck is another reason why I'm not going to look at the proposed "Intel IA32_SPEC_CTRL Virtualization" crud until external forces dictate that I do so. I have zero confidence that a paravirt interface defined by hardware vendors to fiddle with mitigations will be sane, flexible, and extensible. Anyways, can someone from Intel do some basic performance analysis to justify doing RDMSR + WRMSRx2 on every transition? Unless someone provides numbers that show a clear, meaningful benefit to the aggressive toggling, I'm inclined to have KVM do #4, e.g. end up with something like: /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) { vmx_l1d_flush(vcpu); } else if (static_branch_unlikely(&mds_user_clear) || static_branch_unlikely(&mmio_stale_data_clear)) { mds_clear_cpu_buffers(); } else if (static_branch_unlikely(&kvm_toggle_fb_clear) { bool enable_fb_clear = !!(vcpu->arch.arch_capabilities & ARCH_CAP_FB_CLEAR); if (this_cpu_read(kvm_fb_clear_enabled) != enable_fb_clear) { u64 mcu_opt_ctrl = host_mcu_opt_ctrl; if (enable_fb_clear) mcu_opt_ctrl &= ~FB_CLEAR_DIS; else mcu_opt_ctrl |= FB_CLEAR_DIS; native_wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_opt_ctrl); this_cpu_write(kvm_fb_clear_enabled, enable_fb_clear); } }