Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp938917pxb; Thu, 26 Aug 2021 19:34:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxcwlcivKwVPcL9VD16/nmQtom+MKwE2fv6OfETmJnRrZEfOdfI3hnHcQ9qK7BmIr04RisY X-Received: by 2002:aa7:c14b:: with SMTP id r11mr7221270edp.108.1630031641565; Thu, 26 Aug 2021 19:34:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630031641; cv=none; d=google.com; s=arc-20160816; b=lD30dgJMqZSqqKynMC96ZaHZEfeuoWc+O4KSoj8k2XdXDYFjNV+tRFWbnJYJZX6UXy netvdVA9s7Jx5DLJfLFWSmg3+e6uABvnZIqLq2DrmaIntjA5tODnnvFmmB/S0iut0Dpd RVmlaNpD5+71no+Gd+hSA4bhfM3iFJ1XPvUpb8rBKhpP7wg5o2E40ftW6O4d05mw9173 luSYzcrx874YCSF0GWSXNn3JmYalluWtNQadGIoK9OcsSd8/VK2NRCCWTt+HGM18HnSS i0cRg+qrL4L5VGJRkrXnuBQWKF2nmudw2fKo+jHRBNmb8R02NK9W4RZ7RdaZPy0v6z5O R/1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=bUxb+JIJlzrFgknxck3RrwTHGKE/yqlkND/yrLFEqmI=; b=aCliUM5wS6F86NEHK8TXV9cAaHnjMuGRMPb5Z0DLG82zGm3NlSV2WDW7fTWmc3cyRh 344pnbBus78piqzDjQgsV/jwa8y3Sd9A6eeR11pqBFN7ckvwzs8O5Cn9klgwLJP3ILoI WL35uA8lm664BRxzJurjUl5vqRns3ZBxa0yWtBZql4FT2cQv4/4eN/p55DNY347F91pj 8iDshV6/6RpX2KQnDnfd1zkj2Cbwtiduj4/6i4amZiAvekm4liFBCWe/xeJKybdXlypU JpRFItG4Itu1tOn7RrXY5aJfCGI+uyHPDKtHWmz0f0fjmMsB6Jf5NL1qA1cf7pI4t90s kuDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x16si1403121edd.522.2021.08.26.19.33.38; Thu, 26 Aug 2021 19:34:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242967AbhH0Ccr (ORCPT + 99 others); Thu, 26 Aug 2021 22:32:47 -0400 Received: from mga07.intel.com ([134.134.136.100]:37472 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231154AbhH0Ccq (ORCPT ); Thu, 26 Aug 2021 22:32:46 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281601373" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281601373" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 19:31:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="528139305" Received: from xumingcu-mobl.ccr.corp.intel.com (HELO localhost) ([10.249.172.104]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 19:31:49 -0700 Date: Fri, 27 Aug 2021 10:31:50 +0800 From: Yu Zhang To: David Hildenbrand Cc: Sean Christopherson , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Andy Lutomirski , Andrew Morton , Joerg Roedel , Andi Kleen , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, "Kirill A . Shutemov" , "Kirill A . Shutemov" , Kuppuswamy Sathyanarayanan , Dave Hansen Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20210827023150.jotwvom7mlsawjh4@linux.intel.com> References: <20210824005248.200037-1-seanjc@google.com> <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> User-Agent: NeoMutt/20171215 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 26, 2021 at 12:15:48PM +0200, David Hildenbrand wrote: > On 24.08.21 02:52, Sean Christopherson wrote: > > The goal of this RFC is to try and align KVM, mm, and anyone else with skin in the > > game, on an acceptable direction for supporting guest private memory, e.g. for > > Intel's TDX. The TDX architectural effectively allows KVM guests to crash the > > host if guest private memory is accessible to host userspace, and thus does not > > play nice with KVM's existing approach of pulling the pfn and mapping level from > > the host page tables. > > > > This is by no means a complete patch; it's a rough sketch of the KVM changes that > > would be needed. The kernel side of things is completely omitted from the patch; > > the design concept is below. > > > > There's also fair bit of hand waving on implementation details that shouldn't > > fundamentally change the overall ABI, e.g. how the backing store will ensure > > there are no mappings when "converting" to guest private. > > > > This is a lot of complexity and rather advanced approaches (not saying they > are bad, just that we try to teach the whole stack something completely > new). > > > What I think would really help is a list of requirements, such that > everybody is aware of what we actually want to achieve. Let me start: > > GFN: Guest Frame Number > EPFN: Encrypted Physical Frame Number > > > 1) An EPFN must not get mapped into more than one VM: it belongs exactly to > one VM. It must neither be shared between VMs between processes nor between > VMs within a processes. > > > 2) User space (well, and actually the kernel) must never access an EPFN: > > - If we go for an fd, essentially all operations (read/write) have to > fail. > - If we have to map an EPFN into user space page tables (e.g., to > simplify KVM), we could only allow fake swap entries such that "there > is something" but it cannot be accessed and is flagged accordingly. > - /proc/kcore and friends have to be careful as well and should not read > this memory. So there has to be a way to flag these pages. > > 3) We need a way to express the GFN<->EPFN mapping and essentially assign an > EPFN to a GFN. > > > 4) Once we assigned a EPFN to a GFN, that assignment must not longer change. > Further, an EPFN must not get assigned to multiple GFNs. > > > 5) There has to be a way to "replace" encrypted parts by "shared" parts > and the other way around. > > What else? Thanks a lot for this summary. A question about the requirement: do we or do we not have plan to support assigned device to the protected VM? If yes. The fd based solution may need change the VFIO interface as well( though the fake swap entry solution need mess with VFIO too). Because: 1> KVM uses VFIO when assigning devices into a VM. 2> Not knowing which GPA ranges may be used by the VM as DMA buffer, all guest pages will have to be mapped in host IOMMU page table to host pages, which are pinned during the whole life cycle fo the VM. 3> IOMMU mapping is done during VM creation time by VFIO and IOMMU driver, in vfio_dma_do_map(). 4> However, vfio_dma_do_map() needs the HVA to perform a GUP to get the HPA and pin the page. But if we are using fd based solution, not every GPA can have a HVA, thus the current VFIO interface to map and pin the GPA(IOVA) wont work. And I doubt if VFIO can be modified to support this easily. B.R. Yu