Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3776874rwd; Mon, 22 May 2023 20:48:14 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5E0sqrdyXFjHO493uuTHUosUn9uIrN81UpwCyogh3QHz96nHCXeBHJ3pegu/KMV56FBaAA X-Received: by 2002:a05:6a00:f88:b0:64d:2db5:f792 with SMTP id ct8-20020a056a000f8800b0064d2db5f792mr13284287pfb.2.1684813694243; Mon, 22 May 2023 20:48:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684813694; cv=none; d=google.com; s=arc-20160816; b=EdPCx82G905u0zbq6YBGt8HLEO5bCC6nMhZfPwZosEEfvazTzAmQzhIcFWprh7e71A ZeV5FHZ+jwptl9JLTAwrErpjyDCl5/Zk/cXE6prJel6B123cG/qfWVDv1ZW0TEoZb4Cz 3zPcbHfs4B4Dnn99StlbrhEE6hcm9idCiOCLJqMcmJaGW3dZas8iScuEZBsXRIaEd6nf yGe2VJn1Jb0DPQZj9WtCtDzojnjBfYkAWVVe2a7xWoT1wAUgVT/81xa8dP1vFq7rrY0r cBBgDmDoDbfpvrJ5gN6nkY8eLTB7KaMcX2pRCtLotbeA6hTzy6T74uGDNJavllSksrNF LPLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=lHgL+6H7PptXy1W821y3l7ROL2jKLy3+euuwA0H3b5E=; b=OrD7sFmhuFAdSQQOHDSCzfBLfZO3Zt/+2VGFytTcSRbJ3/TRNx+VXfsv+qMpi4Wb3J F+GRUljsRPIBtMBzQ+QWE8wEj5ECudVmIWcMHkxSMEG8fR5ziu2Nk7CEL4SEu3dqE6xo dDP6imYwNhNg/HM521sXT3N/X6v7M4QqIp1HmlJLiblPRtunEgn2x+xnNsOC7fWxvWnq woIREJkdWSsufeL09H5bmpf9CR1aspr9hIeuHKi2lomI6mx8G5xI14tjjMzYEhoSC+1P VGz5J0QGM7E02cQsEqFgPBKdoEKSwUIN050ADL/3By5YdvWMACTsaBGqb4LcGBQJIWpc pqMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LrFSPKNI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m188-20020a6258c5000000b0064a15fc65d5si5757065pfb.83.2023.05.22.20.48.01; Mon, 22 May 2023 20:48:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LrFSPKNI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234005AbjEWDeZ (ORCPT + 99 others); Mon, 22 May 2023 23:34:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233597AbjEWDeP (ORCPT ); Mon, 22 May 2023 23:34:15 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AACAEE5; Mon, 22 May 2023 20:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684812851; x=1716348851; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=vHwVs4S7JjiFah7OUkcxaesKhl5zkeIvce1HmrmuvLg=; b=LrFSPKNIrHfgfWN3RndU6T8l+FAFgwg6CfSLPzF7az7fbPMlhCaRoXSe DcmdsFQE6ZYgbP0sxtumC/JlqeBykXAD/1o5D9WM07NaBDqbZTPVNUBRI Jre2T8H4Kovan7dhtadKONGqnlsMM5fZgVWe7EitoQbFK6VwWE7yULFcf lAQDc0uklFp667/LA+EnryhUg8JohfO+F9pM6PV2PqMpl2OsXweK2IinN uqS7kPtiRZOaxdV8Ao+HqtuLqImysqMzwiSzrlqxU2d1Y7wYVc9gaTVUN ZhakaA8prX1OiMrycJRuVDzWfRWgtAB87eYPaiFfyykST5WaQC9X3pCQa Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="416580594" X-IronPort-AV: E=Sophos;i="6.00,185,1681196400"; d="scan'208";a="416580594" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 20:34:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="768812334" X-IronPort-AV: E=Sophos;i="6.00,185,1681196400"; d="scan'208";a="768812334" Received: from arkaneta-mobl.amr.corp.intel.com (HELO desk) ([10.212.144.239]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 20:34:08 -0700 Date: Mon, 22 May 2023 20:34:05 -0700 From: Pawan Gupta To: Xiaoyao Li Cc: Sean Christopherson , Chao Gao , kvm@vger.kernel.org, Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Jim Mattson , antonio.gomez.iglesias@linux.intel.com, Daniel Sneddon Subject: Re: [PATCH] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Message-ID: <20230523033405.dr2ci7h3ol5se64o@desk> References: <20230506030435.80262-1-chao.gao@intel.com> <20230520010237.3tepk3q44j52leuk@desk> <20230522212328.uwyvp3hpwvte6t6g@desk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 23, 2023 at 09:00:50AM +0800, Xiaoyao Li wrote: > On 5/23/2023 5:23 AM, Pawan Gupta wrote: > > On Tue, May 23, 2023 at 03:31:44AM +0800, Xiaoyao Li wrote: > > > On 5/23/2023 1:43 AM, Sean Christopherson wrote: > > > > > > 6. Performance aside, KVM should not be speculating (ha!) on what the guest > > > > > > will and will not do, and should instead honor whatever behavior is presented > > > > > > to the guest. If the guest CPU model indicates that VERW flushes buffers, > > > > > > then KVM damn well needs to let VERW flush buffers. > > > > > The current implementation allows guests to have VERW flush buffers when > > > > > they enumerate FB_CLEAR. It only restricts the flush behavior when the > > > > > guest is trying to mitigate against a vulnerability(like MDS) on a > > > > > hardware that is not affected. I guess its common for guests to be > > > > > running with older gen configuration on a newer hardware. > > > > Right, I'm saying that that behavior is wrong. KVM shouldn't assume the guest > > > > the guest will do things a certain way and should instead honor the "architectural" > > > > definition, in quotes because I realize there probably is no architectural > > > > definition for any of this. > > > > > > > > It might be that the code does (unintentionally?) honor the "architecture", i.e. > > > > this code might actually be accurrate with respect to when the guest can expect > > > > VERW to flush buffers. But the comment is so, so wrong. > > > > > > The comment is wrong and the code is wrong in some case as well. > > > > > > If none of ARCH_CAP_FB_CLEAR, ARCH_CAP_MDS_NO, ARCH_CAP_TAA_NO, > > > ARCH_CAP_PSDP_NO, ARCH_CAP_FBSDP_NO and ARCH_CAP_SBDR_SSDP_NO are exposed to > > > VM, the VM is type of "affected by MDS". > > > > > > And accroding to the page https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html > > > > > > if the VM enumerates support for both L1D_FLUSH and MD_CLEAR, it implicitly > > > enumerates FB_CLEAR as part of their MD_CLEAR support. > > > > This is the excerpt from the link that you mentioned: > > > > "For processors that are affected by MDS and support L1D_FLUSH > > operations and MD_CLEAR operations, the VERW instruction flushes fill > > buffers." > > > > You are missing an important information here "For the processors > > _affected_ by MDS". On such processors ... > > > > > However, the code will leave vmx->disable_fb_clear as 1 if hardware supports > > > it, and VERW intruction doesn't clear FB in the VM, which conflicts > > > "architectural" definition. > > > > ... Fill buffer clear is not enabled at all: > > > > vmx_setup_fb_clear_ctrl() > > { > > u64 msr; > > if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) && > > !boot_cpu_has_bug(X86_BUG_MDS) && > > !boot_cpu_has_bug(X86_BUG_TAA)) { > > rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr); > > if (msr & ARCH_CAP_FB_CLEAR_CTRL) > > vmx_fb_clear_ctrl_available = true; > > } > > } > > This is the check of bare metal, while the check in > vmx_update_fb_clear_dis() is of guest VM. > > For example, if the hardware (host) enumerates ARCH_CAP_TAA_NO, > ARCH_CAP_MDS_NO, ARCH_CAP_PSDP_NO, ARCH_CAP_FBSDP_NO, ARCH_CAP_SBDR_SSDP_NO, > ARCH_CAP_FB_CLEAR, and ARCH_CAP_FB_CLEAR_CTRL, the VERW on this hardware > clears Fill Buffer (if FB_CLEAR_DIS is not enabled in > MSR_IA32_MCU_OPT_CTRL). vmx_setup_fb_clear_ctrl() does set > vmx_fb_clear_ctrl_available to true. > > If a guest is exposed without ARCH_CAP_TAA_NO, ARCH_CAP_MDS_NO, > ARCH_CAP_PSDP_NO, ARCH_CAP_FBSDP_NO, ARCH_CAP_SBDR_SSDP_NO and > ARCH_CAP_FB_CLEAR, vmx_update_fb_clear_dis() will leave > vmx->disable_fb_clear as true. So VERW doesn't clear Fill Buffer for guest. > But in the view of guset, it expects VERW to clear Fill Buffer. That is correct, but whether VERW clears the CPU buffers also depends on if the hardware is affected or not, enumerating MD_CLEAR solely does not guarantee that VERW will flush CPU buffers. This was true even before MMIO Stale Data was discovered. If host(hardware) enumerates: MD_CLEAR | MDS_NO | VERW behavior ---------|--------|------------------- 1 | 0 | Clears CPU buffers But on an MDS mitigated hardware(MDS_NO=1) if guest enumerates: MD_CLEAR | MDS_NO | VERW behavior ---------|--------|----------------------- 1 | 0 | Not guaranteed to clear CPU buffers After MMIO Stale Data, FB_CLEAR_DIS was introduced to keep this behavior intact(for hardware that is not affected by MDS/TAA). If the userspace truly wants the guest to have VERW flush behavior, it can export FB_CLEAR. I see your point that from a guest's perspective it is being lied about VERW behavior. OTOH, I am not sure if it is a good enough reason for mitigated hardware to keep the overhead of clearing micro-architectural buffers for generations of CPUs.