Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp939583pxb; Wed, 6 Apr 2022 04:53:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWKSHoA2rFrC2pXdkr1FZ7zLG++u9DYfpXKsflkSuLtQa67wytunukqFW6oau0pdrASI5C X-Received: by 2002:a17:903:408c:b0:156:8617:17be with SMTP id z12-20020a170903408c00b00156861717bemr8410344plc.162.1649246011804; Wed, 06 Apr 2022 04:53:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649246011; cv=none; d=google.com; s=arc-20160816; b=oa8WxCI8GOh7beV7OIPVTh2b/JmipP4flJVBVwBPtt/4uh1RSOB+fOmgDug1U2lt/u 0VFGnuc2lkgLRWKp02rOo/gyIK7CDhwT2rS1Ur4gve3VoAn9pcyZ1PhYIsm4FL+3yIRQ nj2u8m0SWVs/bJLnKQTleBfYOqXhBecSUBu9aVSA11Z9JFBUOTWDnZMgq08IaCby7Lh/ lVtG1mn1p+LGTTq/deVuBA50nDgsRVZMfrqlkIvP2r7RY9fb9KiOOniDvIotJxkbDOhh YzGwVqOiRgzJ3v/zNoAdLU3+GvWMbHFxkQgFB/8KNDBO247igOV+/pvi6AF45vgSMT5d pL7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=G8+4mXlfXEhZHOhFIRvGTNgBOyulQouXw+I2k4OPfSs=; b=LsAZ3Zc4ojAuW5qUVrdZMziZwsM7Pynfpi3VuMNjhBIvV+UZ4hCmYvmYObJDriBaOC +dSLCvFj2CYVLHYD74PqLlAs0OTElH69UbUn88qcco4yr74hCiXc4RFsGyn9LAxEbo/d CFIyhA30BgeqewkLaaJTe5Fon0Yo2KOeI8Bv28qHf7UyWD60weZTiPeXueJBl136KSYd kxbmBz8JsU5U5qyRQ7jP2dzaefA2wfjwaO+pqo1yp4Ihl67ckTRzHpnkEQqtRhNu100G 5QRFGDSxEzZF5Oq8ca732I3VTEgRZ74ZaAuM1MPt6Qs3BDAiNA0Jtd9cCNqQvXeyO01D 0qFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JEAiwwWf; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id rm6-20020a17090b3ec600b001c9e16e3b0dsi5074606pjb.73.2022.04.06.04.53.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 04:53:31 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JEAiwwWf; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BCFED6416EC; Wed, 6 Apr 2022 03:14:12 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442859AbiDEWT5 (ORCPT + 99 others); Tue, 5 Apr 2022 18:19:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573246AbiDEScj (ORCPT ); Tue, 5 Apr 2022 14:32:39 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF6A813F81 for ; Tue, 5 Apr 2022 11:30:40 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id a16-20020a17090a6d9000b001c7d6c1bb13so275147pjk.4 for ; Tue, 05 Apr 2022 11:30:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=G8+4mXlfXEhZHOhFIRvGTNgBOyulQouXw+I2k4OPfSs=; b=JEAiwwWf8T9LZvKvUJKSgjehqTRQAgWTmIg0Kl3BSYJYTC+v/3bm2x1/9Hy8KfSmg+ cHSPWpCpQx/B9uyWSDx7KDa583awqWBLkdvoEKwQ/QBUVRp4lZ6Cj73ZO0l8R1jsuMTu f4CTIP1KefeADG3WAL+m7kzbPsENWR2ewqxk3Vv6zcy974PyQLpH4p1GEvsWlPaOBnDD BG/d5Cparj8Y2vpN7tIW/k4gGVthNC5Ac8769ebux1ZwYDdSobMpJ6R2kheemPJpQheC Hjen7QcB7kjlOMvcpc1sK+TrMXslHzsMWklqEGe8pGwPZ2paBwziGZrdbb+U0gJgoIyn aHEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=G8+4mXlfXEhZHOhFIRvGTNgBOyulQouXw+I2k4OPfSs=; b=hcEj6KZ6e/i+rZPKeKjNZk5us24nzCkBCwPbndVzCRaGLDlcptwqqFHtJv1kEKmnwh 2QSFr5OqkXF98LbG8rRKE265xwLyEtn534MrSf+F4giIENqXTEercm618g3EvCJfE4oO 9btpGo+MN3M5F+soYBFspSMzzfXnwdkZzn09h+iSL5qD4l6CIwIZXD+dxaMokT1R5Lkl qsmVftKqHcuoFFyjCOauHw4i3sM21z6qI43yFOiG5DDtoog1vTigTSppre9wsFJ/x9ag SZNU1Za+z+axrl5BXR7B3vhYy9mgRg0KIGQbBzpwP2aZZ2KqSOsB4opZb57q1Ni+GhUp mKoQ== X-Gm-Message-State: AOAM532XWeM+ogalsLU8/lEHDG/q/xwWYMAY/uTrHXI0d0RZKfQgfktR CGaxog8ZZDvD8YJlvadvMPxt1A== X-Received: by 2002:a17:902:8217:b0:156:9c4f:90eb with SMTP id x23-20020a170902821700b001569c4f90ebmr4788820pln.121.1649183440130; Tue, 05 Apr 2022 11:30:40 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id i7-20020a628707000000b004fa6eb33b02sm16131023pfe.49.2022.04.05.11.30.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 11:30:39 -0700 (PDT) Date: Tue, 5 Apr 2022 18:30:35 +0000 From: Sean Christopherson To: Andy Lutomirski Cc: Quentin Perret , Steven Price , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand , Marc Zyngier , Will Deacon Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> <54acbba9-f4fd-48c1-9028-d596d9f63069@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54acbba9-f4fd-48c1-9028-d596d9f63069@www.fastmail.com> X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 05, 2022, Andy Lutomirski wrote: > On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote: > > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > >> The best I can come up with is a special type of shared page that is not > >> GUP-able and maybe not even mmappable, having a clear option for > >> transitions to fail, and generally preventing the nasty cases from > >> happening in the first place. > > > > Right, that sounds reasonable to me. > > At least as a v1, this is probably more straightforward than allowing mmap(). > Also, there's much to be said for a simpler, limited API, to be expanded if > genuinely needed, as opposed to starting out with a very featureful API. Regarding "genuinely needed", IMO the same applies to supporting this at all. Without numbers from something at least approximating a real use case, we're just speculating on which will be the most performant approach. > >> Maybe there could be a special mode for the private memory fds in which > >> specific pages are marked as "managed by this fd but actually shared". > >> pread() and pwrite() would work on those pages, but not mmap(). (Or maybe > >> mmap() but the resulting mappings would not permit GUP.) And > >> transitioning them would be a special operation on the fd that is specific > >> to pKVM and wouldn't work on TDX or SEV. > > > > Aha, didn't think of pread()/pwrite(). Very interesting. > > There are plenty of use cases for which pread()/pwrite()/splice() will be as > fast or even much faster than mmap()+memcpy(). ... > resume guest > *** host -> hypervisor -> guest *** > Guest unshares the page. > *** guest -> hypervisor *** > Hypervisor removes PTE. TLBI. > *** hypervisor -> guest *** > > Obviously considerable cleverness is needed to make a virt IOMMU like this > work well, but still. > > Anyway, my suggestion is that the fd backing proposal get slightly modified > to get it ready for multiple subtypes of backing object, which should be a > pretty minimal change. Then, if someone actually needs any of this > cleverness, it can be added later. In the mean time, the > pread()/pwrite()/splice() scheme is pretty good. Tangentially related to getting private-fd ready for multiple things, what about implementing the pread()/pwrite()/splice() scheme in pKVM itself? I.e. read() on the VM fd, with the offset corresponding to gfn in some way. Ditto for mmap() on the VM fd, though that would require additional changes outside of pKVM. That would allow pKVM to support in-place conversions without the private-fd having to differentiate between the type of protected VM, and without having to provide new APIs from the private-fd. TDX, SNP, etc... Just Work by not supporting the pKVM APIs. And assuming we get multiple consumers down the road, pKVM will need to be able to communicate the "true" state of a page to other consumers, because in addition to being a consumer, pKVM is also an owner/enforcer analogous to the TDX Module and the SEV PSP.