Received: by 2002:ab2:3319:0:b0:1ef:7a0f:c32d with SMTP id i25csp281780lqc; Thu, 7 Mar 2024 18:20:05 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWLt5bCW7DZGrn0oEIPIO3SIwsiMlEHxF3n1r9CFPlCOsDetVOZsaeprEqRg8nLhnN30mJNMddDuUpXjDF60N6sxOydR6N+YJys24PvPA== X-Google-Smtp-Source: AGHT+IEoaCeZn9ZaQzUokHZuSJiheJ9tk0SNRdLcKinHRhSOEdKqafYx65FnegTA64WD5ENCDQ7b X-Received: by 2002:a05:622a:1755:b0:42e:ecca:c8ee with SMTP id l21-20020a05622a175500b0042eeccac8eemr5524176qtk.16.1709864405567; Thu, 07 Mar 2024 18:20:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709864405; cv=pass; d=google.com; s=arc-20160816; b=Xr74o5GPqS/94M364njVg+ZLgPHMkN6seJVbMz0O/WeDKcHjPQB2zSwt/Zy01gOGLv WDaTJlD0dX039jGqThwn8eGnq5dQqFyJeM1ODSIankv0O/wSRzmKBs5F7QAuFF12DvqL 7wrKIfqAIhTYi4GIOg/l/5qwTQjD4tU1zaJHif8H2ngwJrkWm+HXJ9z4XkzS1K5UP/1Z HISVbpxzB4pU5WOgQaaYXz6CBP9KPjXrj4r7PY4daagRNION71t5YZzVb5KkuVmAizT6 vx6qhNNZchGXHY5dU9C2CFCY7kjMDapm03qpBjSlBa+8jkbNcL++/PEgUOx3/lxa5LO7 xttg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=8TRRW6N4jTHk8gf4QLvkn3hpASV203VfRCFoOXIWOss=; fh=OycHwMsCFFPMXv4mzx3cJJn6JptjwftCq0GJL06EWU8=; b=TsFKBW2f5FggM1tvWRBcqayy6mzZpAxQhvEuEfw6Vy8VblNu7xE54VrBq/rQZNPaJH mJPBO3+GhNAXIpKmGpEgSujGQ+LuaqjYEk7jCXJGqKIxCv/o8wVFPBTP1updchN66beA H8kgDS9HGqfP6Ndpxsle4TO8VFtM7jazKatKeHFjJcqtbhqU6rbQzeBRn8Wj2NyNwTHO BzolygZEqbLfnIrrSUTlMkVAKijIV0OR2lI6tOFR3Qzqb3FB9Navt9QjMSVA91A4ydjW j7I6uT+2peu/+A9KAH9w+vyirX4R9g6ySK0g3HX2rEjUpm+65HPHt2zJwIPeFgPeZL3h rLjA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R5gUINCI; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-96447-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-96447-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 13-20020ac8594d000000b0042eeab9c3c0si11686260qtz.315.2024.03.07.18.20.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Mar 2024 18:20:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-96447-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R5gUINCI; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-96447-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-96447-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 3DF251C2132D for ; Fri, 8 Mar 2024 02:20:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1CDC724B47; Fri, 8 Mar 2024 02:19:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R5gUINCI" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 657E0241EC; Fri, 8 Mar 2024 02:19:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709864386; cv=none; b=qXF1Tu75V0v1sz1RDTN04HYL6+YsgS30qOJJa5ZjCaO0GVZfatvQPFpYpYzljcsuv2mjUgPoiRE8wIsZNqGWPrCwNkfvfevk07Y1SPTuJC0+8dtOXHOPAVrtRY+PhFnKPXHanPFHmA8/ScBg9eoqbFwrvh+qfkKud94FkFZfXpY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709864386; c=relaxed/simple; bh=kGRs/fz24VkTWHH4S8pCCKusmOcJgBiU+R29+4TngFs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=qFz7b6vS4RVXGB1hw1dVdnIDWJnmryTLilw0g5FqffvV+0EQDmN645OnQCcZWRPOk2YOEo3nnX23Uy1WRH6/XYv/0ZIlcen87Kb8H2bmH/yfcKWibSgo2HyoKL36D5gxTzPT+H3XL/2SvXOS/RLIuET4gXr96YhHO3QRjmXjFQs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=R5gUINCI; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1709864383; x=1741400383; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=kGRs/fz24VkTWHH4S8pCCKusmOcJgBiU+R29+4TngFs=; b=R5gUINCI0Jvd9HUMKAt2PG8ElVauoPpi/EPDsFsWEJHD5vpnIrBD5WM9 +uhGkGgVO8zDEDcyaqkCbaxfWhZw5KzOtVEQXOhEc65y46eSpeGfcwhfy UOMyCShlSzOpb3WkGLFa6zpJXU4WTKbAj1womxcCZYxlWtCzYzvMHnJ+5 G6DgKXNlkKhtSxrn/Dh5EP/BDrZZxhBTDXHuTa0d+roJ6aUTNzOd57s/J qBmd3XyzwWPm/iyXF5t+rKWs2bSUPW/sRzs+R0/sfiJ/wFbn8E5beRd9a kJvN2vDqXuTNXKyegjktrenPuJrSbmEOaizRMg0iPZEWNOu0kvoFzCHy+ w==; X-IronPort-AV: E=McAfee;i="6600,9927,11006"; a="15216711" X-IronPort-AV: E=Sophos;i="6.07,108,1708416000"; d="scan'208";a="15216711" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Mar 2024 18:19:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,108,1708416000"; d="scan'208";a="10870644" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Mar 2024 18:19:42 -0800 Date: Thu, 7 Mar 2024 18:19:41 -0800 From: Isaku Yamahata To: Sean Christopherson Cc: David Matlack , Kai Huang , Isaku Yamahata , "kvm@vger.kernel.org" , Isaku Yamahata , "federico.parola@polito.it" , "pbonzini@redhat.com" , "linux-kernel@vger.kernel.org" , "isaku.yamahata@gmail.com" , "michael.roth@amd.com" Subject: Re: [RFC PATCH 1/8] KVM: Document KVM_MAP_MEMORY ioctl Message-ID: <20240308021941.GM368614@ls.amr.corp.intel.com> References: <9f8d8e3b707de3cd879e992a30d646475c608678.camel@intel.com> <20240307203340.GI368614@ls.amr.corp.intel.com> <35141245-ce1a-4315-8597-3df4f66168f8@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Thu, Mar 07, 2024 at 05:28:20PM -0800, Sean Christopherson wrote: > On Thu, Mar 07, 2024, David Matlack wrote: > > On 2024-03-08 01:20 PM, Huang, Kai wrote: > > > > > > +:Parameters: struct kvm_memory_mapping(in/out) > > > > > > +:Returns: 0 on success, <0 on error > > > > > > + > > > > > > +KVM_MAP_MEMORY populates guest memory without running vcpu. > > > > > > + > > > > > > +:: > > > > > > + > > > > > > + struct kvm_memory_mapping { > > > > > > + __u64 base_gfn; > > > > > > + __u64 nr_pages; > > > > > > + __u64 flags; > > > > > > + __u64 source; > > > > > > + }; > > > > > > + > > > > > > + /* For kvm_memory_mapping:: flags */ > > > > > > + #define KVM_MEMORY_MAPPING_FLAG_WRITE _BITULL(0) > > > > > > + #define KVM_MEMORY_MAPPING_FLAG_EXEC _BITULL(1) > > > > > > + #define KVM_MEMORY_MAPPING_FLAG_USER _BITULL(2) > > > > > > > > > > I am not sure what's the good of having "FLAG_USER"? > > > > > > > > > > This ioctl is called from userspace, thus I think we can just treat this always > > > > > as user-fault? > > > > > > > > The point is how to emulate kvm page fault as if vcpu caused the kvm page > > > > fault. Not we call the ioctl as user context. > > > > > > Sorry I don't quite follow. What's wrong if KVM just append the #PF USER > > > error bit before it calls into the fault handler? > > > > > > My question is, since this is ABI, you have to tell how userspace is > > > supposed to use this. Maybe I am missing something, but I don't see how > > > USER should be used here. > > > > If we restrict this API to the TDP MMU then KVM_MEMORY_MAPPING_FLAG_USER > > is meaningless, PFERR_USER_MASK is only relevant for shadow paging. > > +1 > > > KVM_MEMORY_MAPPING_FLAG_WRITE seems useful to allow memslots to be > > populated with writes (which avoids just faulting in the zero-page for > > anon or tmpfs backed memslots), while also allowing populating read-only > > memslots. > > > > I don't really see a use-case for KVM_MEMORY_MAPPING_FLAG_EXEC. > > It would midly be interesting for something like the NX hugepage mitigation. > > For the initial implementation, I don't think the ioctl() should specify > protections, period. > > VMA-based mappings, i.e. !guest_memfd, already have a way to specify protections. > And for guest_memfd, finer grained control in general, and long term compatibility > with other features that are in-flight or proposed, I would rather userspace specify > RWX protections via KVM_SET_MEMORY_ATTRIBUTES. Oh, and dirty logging would be a > pain too. > > KVM doesn't currently support execute-only (XO) or !executable (RW), so I think > we can simply define KVM_MAP_MEMORY to behave like a read fault. E.g. map RX, > and add W if all underlying protections allow it. > > That way we can defer dealing with things like XO and RW *if* KVM ever does gain > support for specifying those combinations via KVM_SET_MEMORY_ATTRIBUTES, which > will likely be per-arch/vendor and non-trivial, e.g. AMD's NPT doesn't even allow > for XO memory. > > And we shouldn't need to do anything for KVM_MAP_MEMORY in particular if > KVM_SET_MEMORY_ATTRIBUTES gains support for RWX protections the existing RWX and > RX combinations, e.g. if there's a use-case for write-protecting guest_memfd > regions. > > We can always expand the uAPI, but taking away functionality is much harder, if > not impossible. Ok, let me drop all the flags. Here is the updated one. 4.143 KVM_MAP_MEMORY ------------------------ :Capability: KVM_CAP_MAP_MEMORY :Architectures: none :Type: vcpu ioctl :Parameters: struct kvm_memory_mapping(in/out) :Returns: 0 on success, < 0 on error Errors: ====== ============================================================= EINVAL vcpu state is not in TDP MMU mode or is in guest mode. Currently, this ioctl is restricted to TDP MMU. EAGAIN The region is only processed partially. The caller should issue the ioctl with the updated parameters. EINTR An unmasked signal is pending. The region may be processed partially. If `nr_pages` > 0, the caller should issue the ioctl with the updated parameters. ====== ============================================================= KVM_MAP_MEMORY populates guest memory before the VM starts to run. Multiple vcpus can call this ioctl simultaneously. It may result in the error of EAGAIN due to race conditions. :: struct kvm_memory_mapping { __u64 base_gfn; __u64 nr_pages; __u64 flags; __u64 source; }; KVM_MAP_MEMORY populates guest memory at the specified range (`base_gfn`, `nr_pages`) in the underlying mapping. `source` is an optional user pointer. If `source` is not NULL and the underlying technology supports it, the memory contents of `source` are copied into the guest memory. The backend may encrypt it. `flags` must be zero. It's reserved for future use. When the ioctl returns, the input values are updated. If `nr_pages` is large, it may return EAGAIN or EINTR for pending signal and update the values (`base_gfn` and `nr_pages`. `source` if not zero) to point to the remaining range. -- Isaku Yamahata