Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2994496lqz; Wed, 3 Apr 2024 15:00:36 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVcnGgRGHFFjRqbxTn/07H3s37ojxCgymFAkFzaewMa2D0RFpGSTmcLA8V2lgW+UK1CeMjDc2j92J/CE4O2Ox8kNIkcGYXlqiT3EBYY+w== X-Google-Smtp-Source: AGHT+IHrQiEfCO8S/AX5UrxrtaQxDoR+WIgwnq3tmPZ5AAoKD48p2iPWpbdWErVP3q1+Dd3kyzQ9 X-Received: by 2002:ad4:5dea:0:b0:699:2c7d:89d3 with SMTP id jn10-20020ad45dea000000b006992c7d89d3mr727603qvb.59.1712181636465; Wed, 03 Apr 2024 15:00:36 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712181636; cv=pass; d=google.com; s=arc-20160816; b=YMEYcNRePRaZ9IPr39lU5C1c35OCq/QmPmizsS9oNTDYdtZZ77gdVBGdsmMqeO8F1k zN1aP9Ki8tVsX2gym90idAiRrdkK0vNtThgJrDKCesiN35DLoye/D2k0ztv+ca4PJG1I yzcMFP8PEupOC+kH9STxYgd4T1mNLwtoKBtCNuR3sfZft42RRWG431G4AmFlwLqrvxas RFD08YJlO1+/SA4DwaFVPY3g8KJo3qDMEbzXz3DF9ZOEULRdu/PWnQ+hSmm54Ykog7i/ 2kJeSZIsQX0QFObHiMZX0rJvEpdDNZhY7+k6SSRqKOMxhADnRgrKSGRM8z7N0afy60wU 88FQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=JYOdNyg/s8FjNzomUHtsspIIJNiIxD5uC69YTN2zxcY=; fh=11yXVOntLz4OaEN3H4OpYo4SVJst7PdZObXFKICrP3U=; b=jQc3vk77N1/OVD6YWteOVREKdqXRr98+3WS+xuRFNo6UfZgSyMxjksc7YWX1uHKgyE 1OMbX1ZCiA+kpZbvdvDHx9hFg2wuBmWpXtSaiA5DaQRWTC/RLmdlWYMIEKLSYOaxn4tm 3fedPptcG5yo1KZOxXvR6JAeXttbuMwQl9ssNRKPyDWOpvNGM16jup5qnSWDXpKtumPF xtp7LUM9Kp31UUMVoncuaOJz25po79B7ZRj/1kuRH6R1bI2Vnu4b5wIBk0VeUCODZkwM CSUTs1SJfmhrber+7/p7i28nBT5bb61hn4G27coalcsj9XYZAyxXH6fS7xYcNLrjJ/Ip S7aQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=k7smNkFR; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-130661-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130661-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id z7-20020a0cfec7000000b006992f1b2fbesi936491qvs.375.2024.04.03.15.00.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 15:00:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-130661-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=k7smNkFR; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-130661-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130661-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 254F01C21975 for ; Wed, 3 Apr 2024 22:00:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 61682156968; Wed, 3 Apr 2024 22:00:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="k7smNkFR" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 321152C683; Wed, 3 Apr 2024 22:00:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712181627; cv=none; b=gKCqr7+cMvYzvfMRjx5hBQvmtgjpqnKs76PiG0KE2PqKSHftuzyAR1yKshorRItTGJyCOxVGKtlWZ5rQmPWHw3W7bhl/nwm7IoSKSendLWVfRAWloofGns2bp4deFpdkQsXkXltzWLXZR1rT3rkbDIcT6CMfrga4L5Hp1ZFZ6ls= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712181627; c=relaxed/simple; bh=pe5Iu1sJLHjRTIi+uy7Qm6CLV3XbBf2X8M1qkum1Lf8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eQOQCRcA0njz4GA12hFT+/uQ9HdZj46RSqlhyeQb8f2uNYnp2p/fhCrg310Rm/TFCKxOs97clUf2qqdQt75fpu4NHP0Fox+o1okd4bviPZjaPiZYYv8lW8mzkFHI8NucRKuSwPCqh6QYt7LHMQv7tiBcSGvqNnVWOm3DJe88bxo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=k7smNkFR; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712181626; x=1743717626; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=pe5Iu1sJLHjRTIi+uy7Qm6CLV3XbBf2X8M1qkum1Lf8=; b=k7smNkFRCPSG/yJbkx6u2ly96OUjtsf+rlWM2YbQEAraCeMEHL1YBZJg Dyd9ZlKTVmbh0GOKYMWbaps9SIL2vUBMBqgGG2d2buPuPjM3Rv975g9BP EmvlIgqW4MPtLYliUbe/jkTPhrLF7ATuWxw34XATuNF/LRKRnaWtOeK5o jFGTnpvn7t4GEAEPle1mPfI79Icnkt/QsLNAjUpm58nUgKhmp9a5SuZ8+ 23nuRG2XFTSFEjPHH+xQ+QnoIxqZjpcB+qCqz3Vr6f5V+XZMG8w7ndOdU iNN56tCNMuvtg7eNWtG/C49lqzk/vPC7w3LYU6lmETbHWyHfHyyiCVeXs g==; X-CSE-ConnectionGUID: abzajC8WQr+FTIAVS9U1LQ== X-CSE-MsgGUID: pHH8hDdCTI+Hi2Ea30v3sw== X-IronPort-AV: E=McAfee;i="6600,9927,11033"; a="7363348" X-IronPort-AV: E=Sophos;i="6.07,178,1708416000"; d="scan'208";a="7363348" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2024 15:00:25 -0700 X-CSE-ConnectionGUID: sypYwUpeQ7+S20xofyUb/A== X-CSE-MsgGUID: F+hlOZuiTpuvXGw3hlTk5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,178,1708416000"; d="scan'208";a="18434572" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2024 15:00:25 -0700 Date: Wed, 3 Apr 2024 15:00:23 -0700 From: Isaku Yamahata To: Sean Christopherson Cc: Isaku Yamahata , David Matlack , kvm@vger.kernel.org, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Paolo Bonzini , Michael Roth , Federico Parola , isaku.yamahata@linux.intel.com Subject: Re: [RFC PATCH 0/8] KVM: Prepopulate guest memory API Message-ID: <20240403220023.GL2444378@ls.amr.corp.intel.com> References: <20240307020954.GG368614@ls.amr.corp.intel.com> <20240319163309.GG1645738@ls.amr.corp.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Wed, Apr 03, 2024 at 11:30:21AM -0700, Sean Christopherson wrote: > On Tue, Mar 19, 2024, Isaku Yamahata wrote: > > On Wed, Mar 06, 2024 at 06:09:54PM -0800, > > Isaku Yamahata wrote: > > > > > On Wed, Mar 06, 2024 at 04:53:41PM -0800, > > > David Matlack wrote: > > > > > > > On 2024-03-01 09:28 AM, isaku.yamahata@intel.com wrote: > > > > > From: Isaku Yamahata > > > > > > > > > > Implementation: > > > > > - x86 KVM MMU > > > > > In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault(). It's not confined to > > > > > KVM TDP MMU. We can restrict it to KVM TDP MMU and introduce an optimized > > > > > version. > > > > > > > > Restricting to TDP MMU seems like a good idea. But I'm not quite sure > > > > how to reliably do that from a vCPU context. Checking for TDP being > > > > enabled is easy, but what if the vCPU is in guest-mode? > > > > > > As you pointed out in other mail, legacy KVM MMU support or guest-mode will be > > > troublesome. > > Why is shadow paging troublesome? I don't see any obvious issues with effectively > prefetching into a shadow MMU with read fault semantics. It might be pointless > and wasteful, as the guest PTEs need to be in place, but that's userspace's problem. The populating address for shadow paging is GVA, not GPA. I'm not sure if that's what the user space wants. If it's user-space problem, I'm fine. > Testing is the biggest gap I see, as using the ioctl() for shadow paging will > essentially require a live guest, but that doesn't seem like it'd be too hard to > validate. And unless we lock down the ioctl() to only be allowed on vCPUs that > have never done KVM_RUN, we need that test coverage anyways. So far I tried only TDP MMU case. I can try other MMU type. > And I don't think it makes sense to try and lock down the ioctl(), because for > the enforcement to have any meaning, KVM would need to reject the ioctl() if *any* > vCPU has run, and adding that code would likely add more complexity than it solves. > > > > The use case I supposed is pre-population before guest runs, the guest-mode > > > wouldn't matter. I didn't add explicit check for it, though. > > KVM shouldn't have an explicit is_guest_mode() check, the support should be a > property of the underlying MMU, and KVM can use the TDP MMU for L2 (if L1 is > using legacy shadow paging, not TDP). I see. So the type of the populating address can vary depending on vcpu mode. It's user-space problem which address (GVA, L1 GPA, L2 GPA) is used. > > > Any use case while vcpus running? > > > > > > > > > > Perhaps we can just return an error out to userspace if the vCPU is in > > > > guest-mode or TDP is disabled, and make it userspace's problem to do > > > > memory mapping before loading any vCPU state. > > > > > > If the use case for default VM or sw-proteced VM is to avoid excessive kvm page > > > fault at guest boot, error on guest-mode or disabled TDP wouldn't matter. > > > > Any input? If no further input, I assume the primary use case is pre-population > > before guest running. > > Pre-populating is the primary use case, but that could happen if L2 is active, > e.g. after live migration. > > I'm not necessarily opposed to initially adding support only for the TDP MMU, but > if the delta to also support the shadow MMU is relatively small, my preference > would be to add the support right away. E.g. to give us confidence that the uAPI > can work for multiple MMUs, and so that we don't have to write documentation for > x86 to explain exactly when it's legal to use the ioctl(). If we call kvm_mmu.page_fault() without caring of what address will be populated, I don't see the big difference. -- Isaku Yamahata