Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp2728031pxb; Mon, 25 Apr 2022 00:39:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzHgW8lDQatdbFDzx/Gy5FNA6R37mrK/mKNmxkQhpUU0AYgRzQRde5FnEn+4bFaxMBqWNwT X-Received: by 2002:a17:907:6ea1:b0:6f3:7e2e:ea73 with SMTP id sh33-20020a1709076ea100b006f37e2eea73mr7472186ejc.331.1650872383945; Mon, 25 Apr 2022 00:39:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650872383; cv=none; d=google.com; s=arc-20160816; b=Xtk+SF22YxfJr5B38vAFkXg0adlvY1NH2fzUj6+zrJPu0Y452gufvlrN+moA8OfDvv +ymJUDsZmscddgQ9HDmerDFAcZOMmmYXzDN4hd6PWDW9uQ5UKORGc4xwXa5018EfHZBW LWwGmQpETuZhj3bYlcnWltySb+S+FGyit17kuWu6w/fl9pNOwmMP6yQdIPwO9MtyI9so coi5Bs+rmFdjg4EYBuKChSmVzJd0ivLAv1OD1C+h3VqAWZwYbxXnIoW7KV4YO3i8KPSe UwviSiUYI60iSNIWuDRMJw+3ndY8hF1GI29L7fNUiGj8DDNRaVPP2AtFFa6RirCKJCht RHpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:references:in-reply-to :message-id:mime-version:user-agent:dkim-signature; bh=qnie10xIAW83z3FfO33PnUffR3sldKAX9U5rO2pcZAM=; b=o7Xjf9K/XxKRzKFz4wyvdKI3f1upJ2GDJringvyfVUkBR8a1upL7Irb+HGOnx0NjR5 McLVzBFCjcbwTMd10yXC1Zpej5nkcFKKeFoPtI73OHI5fWd1Z4jFRkWrAOY/SQDEjQLX m6W/bk/bOUnyXmAZzeQx7bJEAjhG4WGHY9f3n+86zdMhvKw6yBvd+GojKfF5hupQfq+l rPdnkm/hhl2pX4B59azLNpfsB94FnKAg5LNfm4XmdGxbqmiGq1rDzYBHXL6oTGJeZVWv TkgjRyC+LJ9aagS4Z98rWOFchguwMoxmxXXKA5qjWgrOD1VXDh0mfT5nkrzcj2kP17Pr h+LA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=LQiLb34F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c26-20020aa7df1a000000b00425ab568dd0si8602684edy.325.2022.04.25.00.39.18; Mon, 25 Apr 2022 00:39:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=LQiLb34F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236007AbiDXRDG (ORCPT + 99 others); Sun, 24 Apr 2022 13:03:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234895AbiDXRDF (ORCPT ); Sun, 24 Apr 2022 13:03:05 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DB1EAE7C; Sun, 24 Apr 2022 10:00:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0B6C3611DE; Sun, 24 Apr 2022 17:00:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0358C385AD; Sun, 24 Apr 2022 17:00:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1650819602; bh=Je+h7tOqeRuiQPlVHJ9YBtSjYKQSsE87o9H6bqNhv0E=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=LQiLb34FBw2eNgmoSiK231PONOUqKINkvAm71Hqa/6IFcH84IBd4KI62BNG2mLhtK Hc6wfzDdol11Q6iTwsjdogfokTjyEWWEbGSpsOx/knOJKp6Crx5ySz6ZBPNfCPm2Y2 UTOM8YD+BmK8eAmsrEjDxw20mEt5bF7RNwdoAAbcP7G3bq3wsJBVcC+ket5QCp7pNZ ebwBj9BNaRZhioAXdPrTG0cP2ypE2dSneAE2Y4RaBLyhY3E0B2q6+EJuUHx1fwlqL9 ky9DUb3woG18oxgarN9knFcnkhJA5NdzVOb2xc+hnzYweKSFCn+6GsqSZcJd4W0g4C eLcYkopG6nWTg== Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id AF9FB27C0054; Sun, 24 Apr 2022 12:59:59 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute2.internal (MEProxy); Sun, 24 Apr 2022 12:59:59 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrtdelgddutdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsehttdertderredtnecuhfhrohhmpedftehn ugihucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecugg ftrfgrthhtvghrnhepvdfhuedvtdfhudffhfekkefftefghfeltdelgeffteehueegjeff udehgfetiefhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homheprghnugihodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduudeiudek heeifedvqddvieefudeiiedtkedqlhhuthhopeepkhgvrhhnvghlrdhorhhgsehlihhnuh igrdhluhhtohdruhhs X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 5AA8821E006E; Sun, 24 Apr 2022 12:59:57 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-569-g7622ad95cc-fm-20220421.002-g7622ad95 Mime-Version: 1.0 Message-Id: <3b99f157-0f30-4b30-8399-dd659250ab8d@www.fastmail.com> In-Reply-To: <20220422105612.GB61987@chaop.bj.intel.com> References: <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> <20220422105612.GB61987@chaop.bj.intel.com> Date: Sun, 24 Apr 2022 09:59:37 -0700 From: "Andy Lutomirski" To: "Chao Peng" , "Sean Christopherson" Cc: "Quentin Perret" , "Steven Price" , "kvm list" , "Linux Kernel Mailing List" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Linux API" , qemu-devel@nongnu.org, "Paolo Bonzini" , "Jonathan Corbet" , "Vitaly Kuznetsov" , "Wanpeng Li" , "Jim Mattson" , "Joerg Roedel" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "the arch/x86 maintainers" , "H. Peter Anvin" , "Hugh Dickins" , "Jeff Layton" , "J . Bruce Fields" , "Andrew Morton" , "Mike Rapoport" , "Maciej S . Szmigiero" , "Vlastimil Babka" , "Vishal Annapurve" , "Yu Zhang" , "Kirill A. Shutemov" , "Nakajima, Jun" , "Dave Hansen" , "Andi Kleen" , "David Hildenbrand" , "Marc Zyngier" , "Will Deacon" Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Content-Type: text/plain X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 22, 2022, at 3:56 AM, Chao Peng wrote: > On Tue, Apr 05, 2022 at 06:03:21PM +0000, Sean Christopherson wrote: >> On Tue, Apr 05, 2022, Quentin Perret wrote: >> > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > Only when the register succeeds, the fd is > converted into a private fd, before that, the fd is just a normal (shared) > one. During this conversion, the previous data is preserved so you can put > some initial data in guest pages (whether the architecture allows this is > architecture-specific and out of the scope of this patch). I think this can be made to work, but it will be awkward. On TDX, for example, what exactly are the semantics supposed to be? An error code if the memory isn't all zero? An error code if it has ever been written? Fundamentally, I think this is because your proposed lifecycle for these memfiles results in a lightweight API but is awkward for the intended use cases. You're proposing, roughly: 1. Create a memfile. Now it's in a shared state with an unknown virt technology. It can be read and written. Let's call this state BRAND_NEW. 2. Bind to a VM. Now it's an a bound state. For TDX, for example, let's call the new state BOUND_TDX. In this state, the TDX rules are followed (private memory can't be converted, etc). The problem here is that the BOUND_NEW state allows things that are nonsensical in TDX, and the binding step needs to invent some kind of semantics for what happens when binding a nonempty memfile. So I would propose a somewhat different order: 1. Create a memfile. It's in the UNBOUND state and no operations whatsoever are allowed except binding or closing. 2. Bind the memfile to a VM (or at least to a VM technology). Now it's in the initial state appropriate for that VM. For TDX, this completely bypasses the cases where the data is prepopulated and TDX can't handle it cleanly. For SEV, it bypasses a situation in which data might be written to the memory before we find out whether that data will be unreclaimable or unmovable. ---------------------------------------------- Now I have a question, since I don't think anyone has really answered it: how does this all work with SEV- or pKVM-like technologies in which private and shared pages share the same address space? I sounds like you're proposing to have a big memfile that contains private and shared pages and to use that same memfile as pages are converted back and forth. IO and even real physical DMA could be done on that memfile. Am I understanding correctly? If so, I think this makes sense, but I'm wondering if the actual memslot setup should be different. For TDX, private memory lives in a logically separate memslot space. For SEV and pKVM, it doesn't. I assume the API can reflect this straightforwardly. And the corresponding TDX question: is the intent still that shared pages aren't allowed at all in a TDX memfile? If so, that would be the most direct mapping to what the hardware actually does. --Andy