Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2541673rwb; Sat, 8 Oct 2022 09:32:13 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4LrYJ05EoJ6mDBUQCNix3piTEt6hkfqdUSmou9L+elCYWy9p+j6IQ+X1e9DKOEMjXf6Cvm X-Received: by 2002:aa7:c58a:0:b0:459:1511:6cff with SMTP id g10-20020aa7c58a000000b0045915116cffmr9752030edq.27.1665246733722; Sat, 08 Oct 2022 09:32:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665246733; cv=none; d=google.com; s=arc-20160816; b=mCSxJfxBcH5BSLLxsGKD3VLIxsZIs9c80jP0PvEguC5YXlWbc9qr1ASy4N+m4Cn4UP fGqVFPOJZYh4N5h4FCqe5JKMoN20LpG9fdHfhRrQ6ZPdbrWqkxdPz6YSrIbcxmaATsWv KvVJ1PJyzlHkfo8z8+BUxE0wAnlf8ViNRKGd4wYQkSGPE4IaGB/PnzDHBwRoKpmcUPzF dx9kp5A88eo/m8KmEfkskbNXmn+rnpjZrqvfudwhkEUKq6BHdUMbpxD/xJ9B3xKUzvTp NPYQFUlaVdhjEeU02+sjOW215CQp7Lc4gXk6VmWTjHtihh2X0UbLmAtSv9GmqlG52xGm ExJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=/UsMdOtVkUvX57Nnk7pCVAkg+XmQ5yH9N7x2AknFg5E=; b=as57CTKoi9pEKrQRfOTUgDIMQqHqvqztnItNqqz0gcZt5PeNvmcnODXIOtfyU4MV4T ItYY914E1x9BI6RxNTBNk5t+ZSObvxzTC7ANp0n3aUcJPIvCKV14L58N2deP5tdPRaTl EzRekLXFustd/rbUqLjJr8jlcPQL9cyoSmWMgopNiY9EYF4RNEtxA8pNcDPzbB6t9H/W ahTv9/v21MFr7v08emQ6FU/Owko8TKIwoiwzRgmi/UVDddkM3uvW0Z84M3agmyh1LvKX cTjlsYpI1m0lx5rk/0foLn/rrY4JXKN80AI7LttXfXbt+7X4Rf3JvcGJCrqHmQDXn5V6 oEvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VAGeVVDf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hr29-20020a1709073f9d00b0077bbac643c7si6497139ejc.879.2022.10.08.09.31.47; Sat, 08 Oct 2022 09:32:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VAGeVVDf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229853AbiJHQPX (ORCPT + 99 others); Sat, 8 Oct 2022 12:15:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229609AbiJHQPV (ORCPT ); Sat, 8 Oct 2022 12:15:21 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9ED1431DC7; Sat, 8 Oct 2022 09:15:20 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 34BE160A37; Sat, 8 Oct 2022 16:15:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0A207C433D6; Sat, 8 Oct 2022 16:15:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665245719; bh=NIgNigZexN067z2ad+UvtpUHM+JSZc0JYM0evS6yNx0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VAGeVVDf6KIOhTWVugZOJTuKPzDIFSGBDrGjIzZCQaYbWM1BU4NBt3yHdh/p+QB8Z Mx6iHaLe+DEGh+bDFLK83w/Rr0LUpsfWq5pCWNpvvFsJhEEnvolwGToq7kVMYm16C7 QcypgNRE4zg87Wqm9iJT8rOYJDbQxSGv57JHugrmcHqPoIxRdFRiHZJqRPFSyCDMtn VuP1L+Zw3IgC/rMVlFF4fXoapnhUapUQB/vBk7S49rXo0meL1hSUbsDRuB/MFvpsc1 IrKiCJ6IemXXuaxmEzEV+cS22e0yEOJl6KXc6pQBJg1bl7a6cOxdRgEbitVsWeCajN r4dGxRV6nD9YQ== Date: Sat, 8 Oct 2022 19:15:13 +0300 From: Jarkko Sakkinen To: Sean Christopherson Cc: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Subject: Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory Message-ID: References: <20220915142913.2213336-1-chao.p.peng@linux.intel.com> <20220915142913.2213336-3-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 08, 2022 at 12:54:32AM +0300, Jarkko Sakkinen wrote: > On Fri, Oct 07, 2022 at 02:58:54PM +0000, Sean Christopherson wrote: > > On Fri, Oct 07, 2022, Jarkko Sakkinen wrote: > > > On Thu, Oct 06, 2022 at 03:34:58PM +0000, Sean Christopherson wrote: > > > > On Thu, Oct 06, 2022, Jarkko Sakkinen wrote: > > > > > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote: > > > > > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote: > > > > > > > This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds two > > > > > > > additional KVM memslot fields private_fd/private_offset to allow > > > > > > > userspace to specify that guest private memory provided from the > > > > > > > private_fd and guest_phys_addr mapped at the private_offset of the > > > > > > > private_fd, spanning a range of memory_size. > > > > > > > > > > > > > > The extended memslot can still have the userspace_addr(hva). When use, a > > > > > > > single memslot can maintain both private memory through private > > > > > > > fd(private_fd/private_offset) and shared memory through > > > > > > > hva(userspace_addr). Whether the private or shared part is visible to > > > > > > > guest is maintained by other KVM code. > > > > > > > > > > > > What is anyway the appeal of private_offset field, instead of having just > > > > > > 1:1 association between regions and files, i.e. one memfd per region? > > > > > > > > Modifying memslots is slow, both in KVM and in QEMU (not sure about Google's VMM). > > > > E.g. if a vCPU converts a single page, it will be forced to wait until all other > > > > vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is faulting in > > > > memory. KVM's memslot updates also hold a mutex for the entire duration of the > > > > update, i.e. conversions on different vCPUs would be fully serialized, exacerbating > > > > the SRCU problem. > > > > > > > > KVM also has historical baggage where it "needs" to zap _all_ SPTEs when any > > > > memslot is deleted. > > > > > > > > Taking both a private_fd and a shared userspace address allows userspace to convert > > > > between private and shared without having to manipulate memslots. > > > > > > Right, this was really good explanation, thank you. > > > > > > Still wondering could this possibly work (or not): > > > > > > 1. Union userspace_addr and private_fd. > > > > No, because userspace needs to be able to provide both userspace_addr (shared > > memory) and private_fd (private memory) for a single memslot. > > Got it, thanks for clearing my misunderstandings on this topic, and it > is quite obviously visible in 5/8 and 7/8. I.e. if I got it right, > memblock can be partially private, and you dig the shared holes with > KVM_MEMORY_ENCRYPT_UNREG_REGION. We have (in Enarx) ATM have memblock > per host mmap, I was looking into this dilated by that mindset but makes > definitely sense to support that. For me the most useful reference with this feature is kvm_set_phys_mem() implementation in privmem-v8 branch. Took while to find it because I did not have much experience with QEMU code base. I'd even recommend to mention that function in the cover letter because it is really good reference on how this feature is supposed to be used. BR, Jarkko