Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp1790883lqp; Mon, 15 Apr 2024 18:49:47 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWWDljAyrhgwcJzRJLrYfSYNZy0ZeLsH00jWoXC3SlsmTdBEH0/t7KVv5MHbnx1B+NjYPPstz9q3YKr3i77KuNzmQ/60Npztdv/7bdsaw== X-Google-Smtp-Source: AGHT+IH4VQ2w5UHHoBECZApGmZ/qBolev2EPbBnAmzruTFAfOSY0Tb30GSi2by186rTjb5cJ6ZGy X-Received: by 2002:a05:622a:8097:b0:436:8622:4c18 with SMTP id js23-20020a05622a809700b0043686224c18mr11418361qtb.24.1713232187063; Mon, 15 Apr 2024 18:49:47 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713232187; cv=pass; d=google.com; s=arc-20160816; b=bWH2ZR8g7mo4qqVhUKq//27xhElWZfjEjN/uGJQbx7lj2XdQvwGGglK4tuux4EjL3h 3auRpqw2+TkqsDy30dxLHYVXpYzhBpCXiAlNqXNrTpyvlYZTPEMuvYpnQlBtsuswX76Q gK6vNI+aKPDJ4D61mW3VGw1hluXmQtma7POuIdD5NLWZL8X7yzGRPVyMcdrHiK46LuNK KDQGg4B3Tv7t1ycXlJThqyAaVWpf9IcBs5HkmqRby4G8goZd+Lz2theMAAfgzLDypGqi 4wAXV7k362fFAgH+Dp2Vc3eRwSz2XSNVZsa3aSfuRhgXqFRPZ9H3DJxClvXHoIzM+1rv Jj6g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vrrSSy3fjhxjVeMjX1VxBTb2RLNQ/czjukVGfofgRqk=; fh=vM2s5vbau4bXItzMl74LfHbQ6wStLGdyB5sYgTSnObM=; b=CYzl0strDWS2mBobN1Xk9r7SvIdGHpmaoRlbLNj1uai/hnDLwHUhCiqPsXGv2gw76M AeMlBkCeqG26VsL7IXGuCctDzd23YwsImvMNmkXcsBokb62tsuNrgtOcXM698xGW+wYb EdQaNq4Ri/XGo85lT2SLtODFVsJ6UvT4xxNxNHeovym68mJ77/7Hdlq+o/eckFHQ1tWy AXjiXswqd8V23O4+s3RtGNFz/XBO/hWos1KS6O5siO2akSrY5UpbwWqCZUmKbCqSvpLS 9yUIy7+Gs75gw3PScyPoJwMQS45B/XlIV5BSuKUqka0F2I/WXZcUqi2MLdkXAWH6gL7K TIKA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mBqXApRl; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-146094-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146094-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id b21-20020a05622a021500b00434eece899csi11723253qtx.98.2024.04.15.18.49.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 18:49:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-146094-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mBqXApRl; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-146094-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146094-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9D3C41C20BCD for ; Tue, 16 Apr 2024 01:49:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A4D67AD2D; Tue, 16 Apr 2024 01:49:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mBqXApRl" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A8A7523D; Tue, 16 Apr 2024 01:49:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713232174; cv=none; b=C9JPs+V5bXZTYy6qH8AgtUgptTWhYJcD8U2nTAzvPoaqQZysY3tORa9SR1e88kYsuPDgrB4OtRqFQAqhR5/aczeMs3rYjpyjDKg19Naw+vnjryj8E3MqO3DS9SMkLG5FQ4c4yiGDNN1yYEqlg9RsPAIwW3ltrTA4nX26ploG9wo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713232174; c=relaxed/simple; bh=6Y4/RS0oDUw2DhhxsvAbLpzBeknSxp/eT964rRA6+FA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lUzSPGuTGpEW2A0qciaws67T1vzren702iLJL+wA/A664VVoaLnSgZzSbGm2SDm0TEo8mTkukXYssFoT/+dnNKm31uTK8OFWnBTnRiwo3nvRi5stDq/h777/N5cwQbxWz+jy8cuY8LH0j9ntALIDdXMDlokQL+t7WRy+M1LiP0Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mBqXApRl; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713232174; x=1744768174; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=6Y4/RS0oDUw2DhhxsvAbLpzBeknSxp/eT964rRA6+FA=; b=mBqXApRl/SrFgSRJ/zlDDvz1gGjhTr3TPvDVNV+Bv+4k2mPmr8Y8YjMn sBSdnmKeKNqCfxiQkDNnHmlgSxY5C8pDQtZHvH7+sgWxZhTy2rMIOaABM JSwVuvUs+pU9hbpDao94j7YmvvG2W5ww/MjokJGI5xwKIqlGmdrOYTykl UMl78P7dBmvwPWiGBpgce5vTzQA8NiF/x9LyREvA5Z1j0Rgbjp7pgLctZ MCcqNB9JBhzPEP4s3D1ucuHR326n/Vx/Roz09iO1MWbez+gu7cecrlScH 7lPaanC7WlqfDsvw2yraxasCiSb+ohUEFzx3V+vtD6fnxsm+4/5+y8WL9 Q==; X-CSE-ConnectionGUID: KmP1PwmFRiG9mbd0/H4b2g== X-CSE-MsgGUID: 95RtQDSsQHS9AU7FQGnnQA== X-IronPort-AV: E=McAfee;i="6600,9927,11045"; a="8775452" X-IronPort-AV: E=Sophos;i="6.07,204,1708416000"; d="scan'208";a="8775452" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2024 18:49:33 -0700 X-CSE-ConnectionGUID: aX5fHtZASlu0rpJTGptYzg== X-CSE-MsgGUID: uEVWQssKQquP2MsihFBK+g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,204,1708416000"; d="scan'208";a="22172185" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2024 18:49:32 -0700 Date: Mon, 15 Apr 2024 18:49:31 -0700 From: Isaku Yamahata To: Sean Christopherson Cc: Rick P Edgecombe , "kvm@vger.kernel.org" , Isaku Yamahata , Kai Huang , "federico.parola@polito.it" , "linux-kernel@vger.kernel.org" , "isaku.yamahata@gmail.com" , "dmatlack@google.com" , "michael.roth@amd.com" , "pbonzini@redhat.com" , isaku.yamahata@linux.intel.com Subject: Re: [PATCH v2 07/10] KVM: x86: Always populate L1 GPA for KVM_MAP_MEMORY Message-ID: <20240416014931.GW3039520@ls.amr.corp.intel.com> References: <2f1de1b7b6512280fae4ac05e77ced80a585971b.1712785629.git.isaku.yamahata@intel.com> <116179545fafbf39ed01e1f0f5ac76e0467fc09a.camel@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Apr 15, 2024 at 02:17:02PM -0700, Sean Christopherson wrote: > > > - Return error on guest mode or SMM mode:  Without this patch. > > >   Pros: No additional patch. > > >   Cons: Difficult to use. > > > > Hmm... For the non-TDX use cases this is just an optimization, right? For TDX > > there shouldn't be an issue. If so, maybe this last one is not so horrible. > > And the fact there are so variables to control (MAXPHADDR, SMM, and guest_mode) > basically invalidates the argument that returning an error makes the ioctl() hard > to use. I can imagine it might be hard to squeeze this ioctl() into QEMU's > existing code, but I don't buy that the ioctl() itself is hard to use. > > Literally the only thing userspace needs to do is set CPUID to implicitly select > between 4-level and 5-level paging. If userspace wants to pre-map memory during > live migration, or when jump-starting the guest with pre-defined state, simply > pre-map memory before stuffing guest state. In and of itself, that doesn't seem > difficult, e.g. at a quick glance, QEMU could add a hook somewhere in > kvm_vcpu_thread_fn() without too much trouble (though that comes with a huge > disclaimer that I only know enough about how QEMU manages vCPUs to be dangerous). > > I would describe the overall cons for this patch versus returning an error > differently. Switching MMU state puts the complexity in the kernel. Returning > an error punts any complexity to userspace. Specifically, anything that KVM can > do regarding vCPU state to get the right MMU, userspace can do too. > > Add on that silently doing things that effectively ignore guest state usually > ends badly, and I don't see a good argument for this patch (or any variant > thereof). Ok, here is a experimental patch on top of the 7/10 to return error. Is this a direction? or do we want to invoke KVM page fault handler without any check? I can see the following options. - Error if vCPU is in SMM mode or guest mode: This patch Defer the decision until the use cases come up. We can utilize KVM_CAP_MAP_MEMORY and struct kvm_map_memory.flags for future enhancement. Pro: Keep room for future enhancement for unclear use cases to defer the decision. Con: The use space VMM has to check/switch the vCPU mode. - No check of vCPU mode and go on Pro: It works. Con: Unclear how the uAPI should be without concrete use cases. - Always populate with L1 GPA: This is a bad idea. --- arch/x86/kvm/x86.c | 32 +++++++++----------------------- 1 file changed, 9 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ba9c1720ac9..2f3ceda5c225 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5871,10 +5871,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, struct kvm_memory_mapping *mapping) { - struct kvm_mmu *mmu = NULL, *walk_mmu = NULL; u64 end, error_code = 0; u8 level = PG_LEVEL_4K; - bool is_smm; int r; /* @@ -5884,25 +5882,21 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, if (!tdp_enabled) return -EOPNOTSUPP; - /* Force to use L1 GPA despite of vcpu MMU mode. */ - is_smm = !!(vcpu->arch.hflags & HF_SMM_MASK); - if (is_smm || - vcpu->arch.mmu != &vcpu->arch.root_mmu || - vcpu->arch.walk_mmu != &vcpu->arch.root_mmu) { - vcpu->arch.hflags &= ~HF_SMM_MASK; - mmu = vcpu->arch.mmu; - walk_mmu = vcpu->arch.walk_mmu; - vcpu->arch.mmu = &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; - kvm_mmu_reset_context(vcpu); - } + /* + * SMM mode results in populating SMM memory space with memslots id = 1. + * guest mode results in populating with L2 GPA. + * Don't support those cases for now and punt them for the future + * discussion. + */ + if (is_smm(vcpu) || is_guest_mode(vcpu)) + return -EOPNOTSUPP; /* reload is optimized for repeated call. */ kvm_mmu_reload(vcpu); r = kvm_tdp_map_page(vcpu, mapping->base_address, error_code, &level); if (r) - goto out; + return r; /* mapping->base_address is not necessarily aligned to level-hugepage. */ end = (mapping->base_address & KVM_HPAGE_MASK(level)) + @@ -5910,14 +5904,6 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, mapping->size -= end - mapping->base_address; mapping->base_address = end; -out: - /* Restore MMU state. */ - if (is_smm || mmu) { - vcpu->arch.hflags |= is_smm ? HF_SMM_MASK : 0; - vcpu->arch.mmu = mmu; - vcpu->arch.walk_mmu = walk_mmu; - kvm_mmu_reset_context(vcpu); - } return r; } -- 2.43.2 -- Isaku Yamahata