Received: by 2002:ab2:b82:0:b0:1f3:401:3cfb with SMTP id 2csp218119lqh; Wed, 27 Mar 2024 22:34:47 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWn9QGMjmlPczkhk+BODkojFVSBiKpvDI6SEk7+E6quWg1CafQ7pumq5y3bslf0tFMTpV6/r92DRg+rEFEMXxXHMY9Zp2XG/g0qZk3p8g== X-Google-Smtp-Source: AGHT+IEpo8MrHK2DKXsU82sUQR5hk79cPzrk8hO+6too999M5oZgwt9sUQ3fLU6n7TpxAVMK5uak X-Received: by 2002:ac8:5fc5:0:b0:432:b423:9700 with SMTP id k5-20020ac85fc5000000b00432b4239700mr1039857qta.19.1711604086867; Wed, 27 Mar 2024 22:34:46 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711604086; cv=pass; d=google.com; s=arc-20160816; b=0BuAKTNbuJolJCouoasl3Bwew/msDZU83V0X03sS/dAQmqo4Z0aRQgua6KJYpf1I7u PQrVBsaRF8cZ+yR4Wxz4EE7kRLGVfI2QuVjXkHGz81rOO28RWv9A9HLFXH4yzHtDyv0F imCwP26ETn3EraX7ZuEunfaq+gDllSnpFn1BZpuotogj5prvGHoDuX9cwT4rzMiAYk0r bQ4cwrO8NM0MihJwS9Bhq6Y61BD/5DygG2lqB9zN4Ruey3P5oEM0Vc6Q/dwlCgGBeXtu hjk1EATuITgsvcowRtJflyVUJFpdD8BTEWqvTJ/Ru6PBy2Gx9FEBQ6EsFqBvMSSSF/VS 2URA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Gu7I+3e6fz8kjszhCRtmnDHXTZdeUH2EvTVEJcS54LM=; fh=yLzbud3IZI49Gt7UZs4eV2gcs4vbVkk4Y9I6HEQkyDg=; b=xsEHc6EnOCbH2YMm3GNJFmvyNqls3VP1xkHW5ay5SzV+M5ErKnNjGcVUEB+mPDiqZM g6ollDGhp48LfmvehpzwvxKTQ9X3OXvzGzJCKOe782nL6N1rI95Yn1+45oDIqek0y+Tl wV7KPZCJp2/BOY7pp93ZRw1i2aSg5IZ4PWxROgQ7bIOhtEEB4/QUdACwHL198UOh5fuZ 93IvVD6SiXLGxbluRfK9RfDuumKgSxKAtF0fcyDHrfTEiUICEc1nB4xOFzwpI59RBg7M q9+iYM3edNvckOC2KF971w3zJIaPErCoK1EhtWJEr/f6OPoaeKSR+V2cDqBWJuNQg+Mk NEEA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SuoXD+Wl; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-122426-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-122426-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id c19-20020a05622a059300b00431258790d8si708131qtb.224.2024.03.27.22.34.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Mar 2024 22:34:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-122426-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SuoXD+Wl; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-122426-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-122426-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id B8A5C1C26DB7 for ; Thu, 28 Mar 2024 05:34:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E7E744597D; Thu, 28 Mar 2024 05:34:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SuoXD+Wl" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C2EA29CFB; Thu, 28 Mar 2024 05:34:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711604076; cv=none; b=jyv2o5J7QLdKWWuTv66a0gsOrplZOFl2PGiSxerml8KuGVGaT3HExkxkaYXjDraMTJnZFgaOhOX0/R6DTRBp0aV/Mhef6DP6+Xpe/Gx5pdUCMysPsXaLGRdRMmGxsnJ/O5CZqdCDPYDT63fXH7bUqd6QopiCjXvGxpZKHPNxZWY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711604076; c=relaxed/simple; bh=mD9mu8ljlH9TWKDCVXvuWCIuqCU25f2hMJJw0FVXMn0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=fWUgAl6uCBSxk2lfVHWs/lA3iRBKH4UuJzh/YG+VWtI6GvJt8Z53bUkEy1aFADlCx8Z6MOTDyMRLuDUQrhj3bVBs+mE/79DM6fDjS9WrgBK+hCo7YCN8Vwe090ZiAo3d1CaadRZ3wRPwITeskKREXKyW+DtdHkoai8FWgnW8kZk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SuoXD+Wl; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711604075; x=1743140075; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=mD9mu8ljlH9TWKDCVXvuWCIuqCU25f2hMJJw0FVXMn0=; b=SuoXD+Wlqu53QvAVOBdkay6Ld6oCb/D9+/MjzCFFPy66aNZd8goWfCR1 KJhPnyFJwok+up1RMzKn3MglI8sbCqwwGI5c814RnmSZhMc8tyWeiNBmt Ai0U0bFvB9OaiJFrphT1aojG6PD2I8ek383pcxua2fcvG4JpnbYRrJieb njDlj7HfpJFKmrKNcOF5SNcnpL7rfKsoj1e0PVQ2DllBhnTtZBBa7/ZKC 62rm5+X8IFrXBds8xgHcgFa0+WB//BUoDPfLJYgWXCDikUZfJQpE7ezQU QuG4j/N1ZBH9neaciq7ZHFLmaKhFyVnw4r04ID/QGO1UtrRoLb6PKJejM g==; X-CSE-ConnectionGUID: O829oumkQDGS3vy26fujCg== X-CSE-MsgGUID: ZahUq5iwTdWYm5jvYMnpaw== X-IronPort-AV: E=McAfee;i="6600,9927,11026"; a="10532019" X-IronPort-AV: E=Sophos;i="6.07,160,1708416000"; d="scan'208";a="10532019" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 22:34:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,160,1708416000"; d="scan'208";a="16554966" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 22:34:33 -0700 Date: Wed, 27 Mar 2024 22:34:32 -0700 From: Isaku Yamahata To: "Huang, Kai" Cc: Isaku Yamahata , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "isaku.yamahata@gmail.com" , Paolo Bonzini , "Aktas, Erdem" , Sean Christopherson , Sagi Shahar , "Chen, Bo2" , "Yuan, Hang" , "Zhang, Tina" , Sean Christopherson , isaku.yamahata@linux.intel.com Subject: Re: [PATCH v19 038/130] KVM: TDX: create/destroy VM structure Message-ID: <20240328053432.GO2444378@ls.amr.corp.intel.com> References: <7a508f88e8c8b5199da85b7a9959882ddf390796.1708933498.git.isaku.yamahata@intel.com> <20240327225337.GF2444378@ls.amr.corp.intel.com> <4d925a79-d3cf-4555-9c00-209be445310d@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <4d925a79-d3cf-4555-9c00-209be445310d@intel.com> On Thu, Mar 28, 2024 at 02:49:56PM +1300, "Huang, Kai" wrote: > > > On 28/03/2024 11:53 am, Isaku Yamahata wrote: > > On Tue, Mar 26, 2024 at 02:43:54PM +1300, > > "Huang, Kai" wrote: > > > > > ... continue the previous review ... > > > > > > > + > > > > +static void tdx_reclaim_control_page(unsigned long td_page_pa) > > > > +{ > > > > + WARN_ON_ONCE(!td_page_pa); > > > > > > From the name 'td_page_pa' we cannot tell whether it is a control page, but > > > this function is only intended for control page AFAICT, so perhaps a more > > > specific name. > > > > > > > + > > > > + /* > > > > + * TDCX are being reclaimed. TDX module maps TDCX with HKID > > > > > > "are" -> "is". > > > > > > Are you sure it is TDCX, but not TDCS? > > > > > > AFAICT TDCX is the control structure for 'vcpu', but here you are handling > > > the control structure for the VM. > > > > TDCS, TDVPR, and TDCX. Will update the comment. > > But TDCX, TDVPR are vcpu-scoped. Do you want to mention them _here_? So I'll make the patch that frees TDVPR, TDCX will change this comment. > Otherwise you will have to explain them. > > [...] > > > > > + > > > > +void tdx_mmu_release_hkid(struct kvm *kvm) > > > > +{ > > > > + bool packages_allocated, targets_allocated; > > > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > > > > + cpumask_var_t packages, targets; > > > > + u64 err; > > > > + int i; > > > > + > > > > + if (!is_hkid_assigned(kvm_tdx)) > > > > + return; > > > > + > > > > + if (!is_td_created(kvm_tdx)) { > > > > + tdx_hkid_free(kvm_tdx); > > > > + return; > > > > + } > > > > > > I lost tracking what does "td_created()" mean. > > > > > > I guess it means: KeyID has been allocated to the TDX guest, but not yet > > > programmed/configured. > > > > > > Perhaps add a comment to remind the reviewer? > > > > As Chao suggested, will introduce state machine for vm and vcpu. > > > > https://lore.kernel.org/kvm/ZfvI8t7SlfIsxbmT@chao-email/ > > Could you elaborate what will the state machine look like? > > I need to understand it. Not yet. Chao only propose to introduce state machine. Right now it's just an idea. > > How about this? > > > > /* > > * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(), and > > * TDH.MNG.KEY.FREEID() to free the HKID. > > * Other threads can remove pages from TD. When the HKID is assigned, we need > > * to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). > > * TDH.PHYMEM.PAGE.RECLAIM() is needed when the HKID is free. Get lock to not > > * present transient state of HKID. > > */ > > Could you elaborate why it is still possible to have other thread removing > pages from TD? > > I am probably missing something, but the thing I don't understand is why > this function is triggered by MMU release? All the things done in this > function don't seem to be related to MMU at all. The KVM releases EPT pages on MMU notifier release. kvm_mmu_zap_all() does. If we follow that way, kvm_mmu_zap_all() zaps all the Secure-EPTs by TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). Because TDH.MEM.{SEPT, PAGE}.REMOVE() is slow, we can free HKID before kvm_mmu_zap_all() to use TDH.PHYMEM.PAGE.RECLAIM(). > IIUC, by reaching here, you must already have done VPFLUSHDONE, which should > be called when you free vcpu? Not necessarily. > Freeing vcpus is done in > kvm_arch_destroy_vm(), which is _after_ mmu_notifier->release(), in which > this tdx_mmu_release_keyid() is called? guest memfd complicates things. The race is between guest memfd release and mmu notifier release. kvm_arch_destroy_vm() is called after closing all kvm fds including guest memfd. Here is the example. Let's say, we have fds for vhost, guest_memfd, kvm vcpu, and kvm vm. The process is exiting. Please notice vhost increments the reference of the mmu to access guest (shared) memory. exit_mmap(): Usually mmu notifier release is fired. But not yet because of vhost. exit_files() close vhost fd. vhost starts timer to issue mmput(). close guest_memfd. kvm_gmem_release() calls kvm_mmu_unmap_gfn_range(). kvm_mmu_unmap_gfn_range() eventually this calls TDH.MEM.SEPT.REMOVE() and TDH.MEM.PAGE.REMOVE(). This takes time because it processes whole guest memory. Call kvm_put_kvm() at last. During unmapping on behalf of guest memfd, the timer of vhost fires to call mmput(). It triggers mmu notifier release. Close kvm vcpus/vm. they call kvm_put_kvm(). The last one calls kvm_destroy_vm(). It's ideal to free HKID first for efficiency. But KVM doesn't have control on the order of fds. > But here we are depending vcpus to be freed before tdx_mmu_release_hkid()? Not necessarily. -- Isaku Yamahata