Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp1447047rdb; Wed, 24 Jan 2024 16:15:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IHKrOWleVLRp5RmQyGXg7EgzpusBiGy6u6t1TU9//sw7MFwHuAhFZvdhhI7QuD0YRlkW++o X-Received: by 2002:a05:620a:17ab:b0:783:b62c:3b7a with SMTP id ay43-20020a05620a17ab00b00783b62c3b7amr712049qkb.14.1706141701731; Wed, 24 Jan 2024 16:15:01 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706141701; cv=pass; d=google.com; s=arc-20160816; b=jqMEvMGUyvKfU0qL360L/VdfJAkwmGjTO0vQw/79/w0dhw/E+NFplVQxy2ZvqXf5jf wfELEg3ahLQuzC/LKVHvcQMe1QcADpoLSUD55qEBVGJo39vTejQVF2hMTEicDbjmYuYm BjtUoijLqA3Mjme0y5P9UBbcAkwh0/w+kPYaOmsVDw4Ojs87ckrIbaeQWwUQuLb8X3AQ m/um9SovxunQ8a0UKVATyStN1uZLtBcqfpQxlHBjuv/BvS0JOM76fZXeN2RL5p1VtrIj K6f0iTNpEXPgFSylvsq0t1Yf/OSSGt9g/SYeFABfjEl5EZaGcN5ygMuDeue1cmA8Kky3 103g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Td59QsnOnNAIxqMoEJwJRDHKQ4MnG7nuO2f/+7fni28=; fh=IODi27EZIV34nFPj50awIM2dbsuWYzk8dX5Hlm2fpRg=; b=RZ/+833cYmbJ4evHJNNC+IAuhrJ6L7xb08oCkPLt4tYWPjuSqauTcsBY1zBDvAw1KN vfHaJIdSZXETb0X92MTFRirx3zsFm8WPOCSpzqmA/QcvtQj2qv6zNnqATcaxCN90Z/7s KJDRsZsQ2XI0PS3N/RYEiIUFQHgqv9CLu/DlncdK+6e4Sgqh4+KXexroJA3vqvc0x7g8 1nLXKEkLwLUHJFWOJcfMU0oZMjJ3gErYdTQwvrVUG6TgPkcmBAL3SLVhm7xnSkF0PZAK VxLh71xF4CY36tcUCzg0z/p4PlAlBF+t2iL4751UCDjdPwn6Wqn/e7Iy6Q65BbqyakJp 9MVA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=EhVaID8U; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-37816-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37816-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id v25-20020a05620a0a9900b0078172064464si11560841qkg.180.2024.01.24.16.15.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jan 2024 16:15:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-37816-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=EhVaID8U; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-37816-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37816-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 66E841C21ADE for ; Thu, 25 Jan 2024 00:15:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AB98480B; Thu, 25 Jan 2024 00:14:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EhVaID8U" Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44B6263AC for ; Thu, 25 Jan 2024 00:14:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706141688; cv=none; b=O65oIGIOIFkpGc8fn/QhRPr4P4F44nblQEtFvTAdeZf8Bv8nBilBEaUqxZZTapzlecqY2FkbGBXURtJehhtEOvm8D2q5e0xZuUNdiejObyLhozy2TbSd2YWPDQqjZfeU6odbnGZhF/3QW0Wcga6cID1XXnLI3a1pQLU/fDJQ34E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706141688; c=relaxed/simple; bh=YOAcjvoGiuZ6mQwS+jVJ9CpTyuzi9sgrbhGFuSRZiwE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lZ5bupzAWxXJ8lBjatbHBAZXzzJqZdKIkfiWhUENDMmbfyd2nZqJpIjkCDqv+AQX5KAYMEBWCCYsYndqhWtwMqETJ3KUmIoqYfUS+tY6NMM6KOIhIx4TKK62ShPr6RWZMGvWI++11sGNs/iaw2e3l8v0uT2Dlw0nexNCQ6UF3PU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EhVaID8U; arc=none smtp.client-ip=209.85.216.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-290b1e19101so166054a91.0 for ; Wed, 24 Jan 2024 16:14:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706141686; x=1706746486; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Td59QsnOnNAIxqMoEJwJRDHKQ4MnG7nuO2f/+7fni28=; b=EhVaID8UEHVHk0M1jQu/LpF3WTuj4AgIymLbl49IMh0OTMIyoh9PuF7D2V/oUmII9v glsYzdx36Abuq8uucK0sfl8giZiQJf8IkQ8fkzndRq+YQYtjvI2lXBilxCPwSZFoLF0L jlandM7ZEKGveNdc9MwkGASDQ6dXkUUefDBw+x1ZJSmLdMQM+TqhTJ+gt7c/7OAFWx7H edLlEdimgF0eTstTb025+JKJF4AzNGy+zKyTN1FGxyVh4YrjN3BbzM2wqApMFBTTRrOL VGQrPIcjoboccb32ya0klA3mknMrHraFKH9qWL5JpDhZ27Pwznn0j3f7QAkQ63L/bj+5 bg9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706141686; x=1706746486; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Td59QsnOnNAIxqMoEJwJRDHKQ4MnG7nuO2f/+7fni28=; b=bKu1Tao9xrvCSx6u5XO2hd0YOFY9YzbdSdDcCr/f0CKSlOhbk6jrZdvDQAiLfJYOTp v5Noa21r/Tb2l39P08H2WYfz6oNNNnEE33gWf5bakY+VSHKxFfqlW0+jVXGUye3GX8pr Hq1gR++jsmHD+m3HeNu2T28Ald/7DSdwDmwaelgU3FIx8Z0O7rfbw8NU5cJiYT7a+MBj 5dcKNBddS2uHqPSg9qF1ZEuPiiTjJbTDDtrr1Hu2f6Wb8k+vWJbhfJGxTf9dctcJDJdf UxPOFQm7l2P3+XhG40cgys3SdEAWmTxvRcvNfIlRvknZM6oj8lBkr7xSHR9jDvjZqiCn CsHA== X-Gm-Message-State: AOJu0Yzz+i3M1Bg4JIcRg3zcrvs/ghKh0eoY2xIt11x0PJtjHvrioja2 QfBSLLAvQ8YV0UXCaSKv+6MVj6t0wdmJlwuRh/0vPxzVIfHw15ID0bvA2eRE8w== X-Received: by 2002:a17:90a:4093:b0:290:98a7:20d with SMTP id l19-20020a17090a409300b0029098a7020dmr139522pjg.28.1706141686292; Wed, 24 Jan 2024 16:14:46 -0800 (PST) Received: from google.com (176.13.105.34.bc.googleusercontent.com. [34.105.13.176]) by smtp.gmail.com with ESMTPSA id v13-20020a170903238d00b001d5e03543dcsm11041859plh.38.2024.01.24.16.14.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jan 2024 16:14:45 -0800 (PST) Date: Thu, 25 Jan 2024 00:14:42 +0000 From: Mingwei Zhang To: Sean Christopherson Cc: Aaron Lewis , Paolo Bonzini , "H. Peter Anvin" , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] KVM: x86/pmu: Reset perf_capabilities in vcpu to 0 if PDCM is disabled Message-ID: References: <20240124003858.3954822-1-mizhang@google.com> <20240124003858.3954822-2-mizhang@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Jan 24, 2024, Sean Christopherson wrote: > On Wed, Jan 24, 2024, Mingwei Zhang wrote: > > On Wed, Jan 24, 2024, Sean Christopherson wrote: > > > On Wed, Jan 24, 2024, Aaron Lewis wrote: > > > > On Wed, Jan 24, 2024 at 7:49 AM Sean Christopherson wrote: > > > > > > > > > > On Wed, Jan 24, 2024, Mingwei Zhang wrote: > > > > > No, this is just papering over the underlying bug. KVM shouldn't be stuffing > > > > > vcpu->arch.perf_capabilities without explicit writes from host userspace. E.g > > > > > KVM_SET_CPUID{,2} is allowed multiple times, at which point KVM could clobber a > > > > > host userspace write to MSR_IA32_PERF_CAPABILITIES. It's unlikely any userspace > > > > > actually does something like that, but KVM overwriting guest state is almost > > > > > never a good thing. > > > > > > > > > > I've been meaning to send a patch for a long time (IIRC, Aaron also ran into this?). > > > > > KVM needs to simply not stuff vcpu->arch.perf_capabilities. I believe we are > > > > > already fudging around this in our internal kernels, so I don't think there's a > > > > > need to carry a hack-a-fix for the destination kernel. > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > index 27e23714e960..fdef9d706d61 100644 > > > > > --- a/arch/x86/kvm/x86.c > > > > > +++ b/arch/x86/kvm/x86.c > > > > > @@ -12116,7 +12116,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > > > > > > > > > > kvm_async_pf_hash_reset(vcpu); > > > > > > > > > > - vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap; > > > > > > > > Yeah, that will fix the issue we are seeing. The only thing that's > > > > not clear to me is if userspace should expect KVM to set this or if > > > > KVM should expect userspace to set this. How is that generally > > > > decided? > > > > > > By "this", you mean the effective RESET value for vcpu->arch.perf_capabilities? > > > To be consistent with KVM's CPUID module at vCPU creation, which is completely > > > empty (vCPU has no PMU and no PDCM support) KVM *must* zero > > > vcpu->arch.perf_capabilities. > > > > > > If userspace wants a non-zero value, then userspace needs to set CPUID to enable > > > PDCM and set MSR_IA32_PERF_CAPABILITIES. > > > > > > MSR_IA32_ARCH_CAPABILITIES is in the same boat, e.g. a vCPU without > > > X86_FEATURE_ARCH_CAPABILITIES can end up seeing a non-zero MSR value. That too > > > should be excised. > > > > > hmm, does that mean KVM just allows an invalid vcpu state exist from > > host point of view? > > Yes. > > https://lore.kernel.org/all/ZC4qF90l77m3X1Ir@google.com > > > I think this makes a lot of confusions on migration where VMM on the source > > believes that a non-zero value from KVM_GET_MSRS is valid and the VMM on the > > target will find it not true. > > Yes, but seeing a non-zero value is a KVM bug that should be fixed. > How about adding an entry in vmx_get_msr() for MSR_IA32_PERF_CAPABILITIES and check pmu_version? This basically pairs with the implementation in vmx_set_msr() for MSR_IA32_PERF_CAPABILITIES. Doing so allows KVM_GET_MSRS return 0 for the MSR instead of returning the initial permitted value. The benefit is that it is not enforcing the VMM to explicitly set the value. In fact, there are several platform MSRs which has initial value that VMM may rely on instead of explicitly setting. MSR_IA32_PERF_CAPABILITIES is only one of them. > > If we follow the suggestion by removing the initial value at vCPU > > creation time, then I think it breaks the existing VMM code, since that > > requires VMM to explicitly set the MSR, which I am not sure we do today. > > Yeah, I'm hoping we can squeak by without breaking existing setups. > > I'm 99% certain QEMU is ok, as QEMU has explicitly set MSR_IA32_PERF_CAPABILITIES > since support for PDCM/PERF_CAPABILITIES was added by commit ea39f9b643 > ("target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIES"). > > Frankly, if our VMM doesn't do the same, then it's wildly busted. Relying on > KVM to define the vCPU is irresponsible, to put it nicely. > > > The following code below is different. The key difference is that the > > following code preserves a valid value, but this case is to not preserve > > an invalid value. > > But it's a completely different fix. I referenced that commit to call out that > the need for the commit and changelog suggests that someone (*cough* us) is relying > on KVM to initialize MSR_PLATFORM_INFO, and has been doing so for a very long time. > That doesn't mean it's the correct KVM behavior, just that it's much riskier to > change.