Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3434149pxb; Mon, 4 Apr 2022 16:53:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyPiONCWIi6loSUo+Ru3PBxi/rZUmD2KyVvre+BARlggWHw1HRBMwlKtEAWkNKc+X4ETs8g X-Received: by 2002:a65:494b:0:b0:399:28c:614f with SMTP id q11-20020a65494b000000b00399028c614fmr537997pgs.182.1649116435930; Mon, 04 Apr 2022 16:53:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649116435; cv=none; d=google.com; s=arc-20160816; b=uCP2MlkRo2+obp4mQo22G5hrzZXJI3ikDpFK1eA/vQ7hbt7vXpGMfgLGsxV2d4UvFK LbV54pNPn+7cSApybhecbkBT++jHeAbJxXRX4LydMaXNJJ/9CNLS/C7sBL/nD+pyLhE/ 2AW+HUGk83WeDyI8Z1AgslgXPOX0e3ZRUlyz23Wzek222QynmssP198Dfk71wVHqySqM nVvqplbX8C4X21dfqlXLmYB5l84V3TAdYCeW5QFThWBaIO4hGNyd7KU09e80bltMiFEA +wTYD5UF6u0rkR/7Z7Ml4kyw2Q7gwDKd8Kx3vAp1seUPLxskuEnh15n2jhSKNZ5zehlA 9dgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:references:in-reply-to :message-id:mime-version:user-agent:dkim-signature; bh=1dFj6I/hddLZwmtAkJgAndtUY1g+76IV5jy6tVTi9yE=; b=wb7lY96BWss2HvyJzjwCEYHG3xgwmUKn94uAHiclCYEvv6jLoV4WFni5WdCE9W5pRj 7WcN8OwFtI7O2AZV8t2mQNTUHCVLiblnj3wp/6Rw5QA3EMedPoRSyPpDUwu5SBxXyRHP SFgB0nvPeKsvFHVvMvasFbtMZdeNz2Qp+QO4HQ9ZyTzW3+ZZoNZYMLF/ntovgVlwYRRp LL/AasOixFScdOaMmf4t8RrTHmTIwimVvDY058E6c0qxKJ1rZaU69d/uZkHGzc+4SSpU Y4rQM3fwpqNOKnR9j5p+mX4agppl6VFimhxpP9h9pothx3n/iAv2czwjhj5Hm6v0pS5b 2atw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VSS0WYvC; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id t32-20020a634620000000b00398a0cb6022si11012585pga.198.2022.04.04.16.53.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 16:53:55 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VSS0WYvC; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 22D6F5EDC6; Mon, 4 Apr 2022 16:36:45 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237662AbiDDWwK (ORCPT + 99 others); Mon, 4 Apr 2022 18:52:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238042AbiDDWvw (ORCPT ); Mon, 4 Apr 2022 18:51:52 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F25B162105; Mon, 4 Apr 2022 15:05:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8F63E615E9; Mon, 4 Apr 2022 22:05:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 527C5C34110; Mon, 4 Apr 2022 22:05:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649109902; bh=crI8OWa/ILQsosVqMgWDeCCHWKp1heV2rpDD9+ltpjk=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=VSS0WYvCh8U/wTA1tky+XNuhgT9ZBOQSihN2yC977jGLr7A4FIdpX8DHWg3Xwp+94 4mnoqR12GB1qEYr4lsd2B0DWz0J7J+6xxdg1XLixgxZZfw7318LkINdU3v8BPjvGJ7 DArHMpmCjlvsLUTajVXcgDPMxNYV0xVSyCHhluHKuMFkYjr640tISjOj3dKwiy2rgX S1/PXqn3kPkcqjNn62evu/NVfVHISL+lgZEnNdXsayL5cm7khfiVfUS6F2h/kIoOA4 VJ5scAYLwR0ySDPXZx0xV5V5spGuHRNORpc3PFFnnQT0h0zKa483p5VIQnW08XKVcK A5ySxn3g4LBxQ== Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id E91D827C005C; Mon, 4 Apr 2022 18:04:59 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute2.internal (MEProxy); Mon, 04 Apr 2022 18:04:59 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudejfedgtdejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdetnhgu hicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenucggtf frrghtthgvrhhnpedthfehtedtvdetvdetudfgueeuhfdtudegvdelveelfedvteelfffg fedvkeegfeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedukeeh ieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinhhugi drlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id B623D21E0073; Mon, 4 Apr 2022 18:04:58 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-385-g3a17909f9e-fm-20220404.001-g3a17909f Mime-Version: 1.0 Message-Id: In-Reply-To: References: <88620519-029e-342b-0a85-ce2a20eaf41b@arm.com> <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> Date: Mon, 04 Apr 2022 15:04:17 -0700 From: "Andy Lutomirski" To: "Sean Christopherson" , "Quentin Perret" Cc: "Steven Price" , "Chao Peng" , "kvm list" , "Linux Kernel Mailing List" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Linux API" , qemu-devel@nongnu.org, "Paolo Bonzini" , "Jonathan Corbet" , "Vitaly Kuznetsov" , "Wanpeng Li" , "Jim Mattson" , "Joerg Roedel" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "the arch/x86 maintainers" , "H. Peter Anvin" , "Hugh Dickins" , "Jeff Layton" , "J . Bruce Fields" , "Andrew Morton" , "Mike Rapoport" , "Maciej S . Szmigiero" , "Vlastimil Babka" , "Vishal Annapurve" , "Yu Zhang" , "Kirill A. Shutemov" , "Nakajima, Jun" , "Dave Hansen" , "Andi Kleen" , "David Hildenbrand" , "Marc Zyngier" , "Will Deacon" Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Content-Type: text/plain X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 4, 2022, at 10:06 AM, Sean Christopherson wrote: > On Mon, Apr 04, 2022, Quentin Perret wrote: >> On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote: >> FWIW, there are a couple of reasons why I'd like to have in-place >> conversions: >> >> - one goal of pKVM is to migrate some things away from the Arm >> Trustzone environment (e.g. DRM and the likes) and into protected VMs >> instead. This will give Linux a fighting chance to defend itself >> against these things -- they currently have access to _all_ memory. >> And transitioning pages between Linux and Trustzone (donations and >> shares) is fast and non-destructive, so we really do not want pKVM to >> regress by requiring the hypervisor to memcpy things; > > Is there actually a _need_ for the conversion to be non-destructive? > E.g. I assume > the "trusted" side of things will need to be reworked to run as a pKVM > guest, at > which point reworking its logic to understand that conversions are > destructive and > slow-ish doesn't seem too onerous. > >> - it can be very useful for protected VMs to do shared=>private >> conversions. Think of a VM receiving some data from the host in a >> shared buffer, and then it wants to operate on that buffer without >> risking to leak confidential informations in a transient state. In >> that case the most logical thing to do is to convert the buffer back >> to private, do whatever needs to be done on that buffer (decrypting a >> frame, ...), and then share it back with the host to consume it; > > If performance is a motivation, why would the guest want to do two > conversions > instead of just doing internal memcpy() to/from a private page? I > would be quite > surprised if multiple exits and TLB shootdowns is actually faster, > especially at > any kind of scale where zapping stage-2 PTEs will cause lock contention > and IPIs. I don't know the numbers or all the details, but this is arm64, which is a rather better architecture than x86 in this regard. So maybe it's not so bad, at least in very simple cases, ignoring all implementation details. (But see below.) Also the systems in question tend to have fewer CPUs than some of the massive x86 systems out there. If we actually wanted to support transitioning the same page between shared and private, though, we have a bit of an awkward situation. Private to shared is conceptually easy -- do some bookkeeping, reconstitute the direct map entry, and it's done. The other direction is a mess: all existing uses of the page need to be torn down. If the page has been recently used for DMA, this includes IOMMU entries. Quentin: let's ignore any API issues for now. Do you have a concept of how a nondestructive shared -> private transition could work well, even in principle? The best I can come up with is a special type of shared page that is not GUP-able and maybe not even mmappable, having a clear option for transitions to fail, and generally preventing the nasty cases from happening in the first place. Maybe there could be a special mode for the private memory fds in which specific pages are marked as "managed by this fd but actually shared". pread() and pwrite() would work on those pages, but not mmap(). (Or maybe mmap() but the resulting mappings would not permit GUP.) And transitioning them would be a special operation on the fd that is specific to pKVM and wouldn't work on TDX or SEV. Hmm. Sean and Chao, are we making a bit of a mistake by making these fds technology-agnostic? That is, would we want to distinguish between a TDX backing fd, a SEV backing fd, a software-based backing fd, etc? API-wise this could work by requiring the fd to be bound to a KVM VM instance and possibly even configured a bit before any other operations would be allowed. (Destructive transitions nicely avoid all the nasty cases. If something is still pinning a shared page when it's "transitioned" to private (really just replaced with a new page), then the old page continues existing for as long as needed as a separate object.)