Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp3105085rwp; Fri, 14 Jul 2023 17:49:08 -0700 (PDT) X-Google-Smtp-Source: APBJJlG9AFDv1ccExKSV4pwzJVdATG+uXExbAYGYQHthIJ1EgYbjLRNmIhDgsA1Korfygpc71TBF X-Received: by 2002:a05:6a00:1992:b0:66e:8635:a18e with SMTP id d18-20020a056a00199200b0066e8635a18emr7423235pfl.22.1689382147934; Fri, 14 Jul 2023 17:49:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689382147; cv=none; d=google.com; s=arc-20160816; b=aPyj45/5ImKoECLpFSsTFDS/giVMQ95T+Kr5VG5b9WbrTmmAYl3SeO25NjUzUDuyar cEedw9J/xlvhoveW2vLpGFqdubQz0ZQjwpQ/xl6pvNc5XL3AFBg2besQh2XtOQXcsmLe RMhP/LFVfYIrcc3lpONDQdBA+JKHiPBUU7DwcMGr2S5rcY1vG/qYD/ChztFwPvuOaWMA dSGELtPAbXfQGj7zQJnj8GlvwEJkjtygUAS1rOykc2/9en9QmV9AnDMITv/VRtX66P1i G/gcOTjUnjfV8RCDB08JB5sGEgOoxsxInBzTUHGlDGABhziiNmiHmW8uSiLBhMCrRXw1 x8PQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=FHKwfmmmzPP47eD0/lc7LkRrZ2d2K2NzwRoC7JoUoLc=; fh=WWPays0v6KJdA8ZS1Q4+YChcxUtNz+mRz/3ALYCRZkg=; b=JzZvAC/7q1BdInR9I07zTvT17skgdgMj9HVxnF1uPyH5AXqWVXrLjpVPttwZgZkekZ as/qxVmcqQEtv3JMZL/Roysc6OuJCDXFK237AVog5VDekx9RxV/qZlmVRwQ8ZVVZfbmo LN03MxluhqQtsihZd4ykzzHXCpj8/XaSGuBRIPukrE86OzSqdbsFyx0i0Nb8kbuo76HD 3RD10rrOUIjMGSokcts+WD/RNig/NbyPzknGFHcg/XeV+adg90oaWR9NdFJXBgXeVlDR LUegcy3ZcT/ES7PTpYku3hmUgg1CpaUo9rlmOfK6T12ZQiJ8SE/AXOBC3V2ZlN3rS/PL 7eXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=a+FzRVPi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bt23-20020a056a00439700b0068255c2b8a8si7648774pfb.151.2023.07.14.17.48.53; Fri, 14 Jul 2023 17:49:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=a+FzRVPi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230140AbjGOAal (ORCPT + 99 others); Fri, 14 Jul 2023 20:30:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229455AbjGOAak (ORCPT ); Fri, 14 Jul 2023 20:30:40 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66E073A99 for ; Fri, 14 Jul 2023 17:30:39 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c5cea5773e8so2072666276.1 for ; Fri, 14 Jul 2023 17:30:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689381038; x=1691973038; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=FHKwfmmmzPP47eD0/lc7LkRrZ2d2K2NzwRoC7JoUoLc=; b=a+FzRVPiZrW0fqmlFJABpuG8Y6bGxU0TqA8TqCxHRZGgObxX5SSgo0oB/FUgNQLgEV FnKY6jtRfmR6N1X2EdhsLXkGnsOs5ZX7WY1Np2nRkJAroDmflKvdxidI2RA22XtNjA7S ksvMF2+/gACQLm2dupXD7q6eHQmmgjM8GptqORnXW/68I+uw58SCs66dnsWzhIGYjvbQ IsviHBtVGyoohHaiivukhm9S1lT7rHdyuzDrUT1pxzIq/A0h3zTVBEFp4Nq3vjtaY6SN LSMvKFW154ElxAMDejVexeo7imsNyG9vqc4KbPWDP7Iw7AU5YCYD86+eG+Sr5oohd97d PbTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689381038; x=1691973038; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=FHKwfmmmzPP47eD0/lc7LkRrZ2d2K2NzwRoC7JoUoLc=; b=S8Dueaq259rmYGWlwgjvVUGG9VI9P6bi8yKBhXh9AfV/TcsLoQQuEK9BYEta9Xq4bk lN4cUZikFMJIgq/ebufQ9fNYIKu8SibG6mlpVJvOHfj+eltXMOwucAHWchgr7B5FJm0Y YKaFBrkvrrAhWtMkIjrvyZ4kEMKMS90Hbm6MEiHaLrHP30DjJOesb4+Zhpy2TcmH8Oj0 VXltOo1bgFazNabttRMA7e8A8H3Z50lhrH0cXvXXAaQ5ZLn8LrPbzUAOXiesZ2FcxYBq LpvzmIItWsgUup39EJ0+fDjEV41l5i5XXyYQZPxjU4S5fMwwEFqXIpDWhJr37jWhm9Rk kRZg== X-Gm-Message-State: ABy/qLa4dJXu40vFZ3t/WxoTWBvRgvWmKsWo8f7qnCNx8PT9w+4Nf8j+ tdkmpLyi8ky3YFTDqVUdVhbGn2Oze6Q= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:ab81:0:b0:c5f:85f5:a0e5 with SMTP id v1-20020a25ab81000000b00c5f85f5a0e5mr12471ybi.5.1689381038652; Fri, 14 Jul 2023 17:30:38 -0700 (PDT) Date: Fri, 14 Jul 2023 17:30:37 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM) From: Sean Christopherson To: Vishal Annapurve Cc: Ackerley Tng , david@redhat.com, chao.p.peng@linux.intel.com, pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com, joro@8bytes.org, mail@maciej.szmigiero.name, vbabka@suse.cz, yu.c.zhang@linux.intel.com, kirill.shutemov@linux.intel.com, dhildenb@redhat.com, qperret@google.com, tabba@google.com, michael.roth@amd.com, wei.w.wang@intel.com, rppt@kernel.org, liam.merwick@oracle.com, isaku.yamahata@gmail.com, jarkko@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hughd@google.com, brauner@kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 14, 2023, Vishal Annapurve wrote: > On Fri, Jul 14, 2023 at 12:29=E2=80=AFPM Sean Christopherson wrote: > > ... > > And _if_ there is a VMM that instantiates memory before KVM_CREATE_VM, = IMO making > > the ioctl() /dev/kvm scoped would have no meaningful impact on adapting= userspace > > to play nice with the required ordering. If userspace can get at /dev/= kvm, then > > it can do KVM_CREATE_VM, because the only input to KVM_CREATE_VM is the= type, i.e. > > the only dependencies for KVM_CREATE_VM should be known/resolved long b= efore the > > VMM knows it wants to use gmem. >=20 > I am not sure about the benefits of tying gmem creation to any given > kvm instance. IMO, making gmem->kvm immutable is very nice to have, e.g. gmem->kvm will a= lways be valid and the refcounting rules are fairly straightforward. =20 > I think the most important requirement here is that a given gmem range is= always > tied to a single VM=20 I'm not convinced that that requirement will always hold true (see below). > This can be enforced when memslots are bound to the gmem files. Yeah, but TBH, waiting until the guest faults in memory to detect an invali= d memslot is gross. And looking more closely, taking filemap_invalidate_lock(), i.e.= taking a semaphore for write, in the page fault path is a complete non-starter. T= he "if (existing_slot =3D=3D slot)" check is likely a non-starter, because KVM= handles FLAGS_ONLY memslot updates, e.g. toggling dirty logging, by duplicating and replacing the memslot, not by updating the live memslot. > I believe "Required ordering" is that gmem files are created first and > then supplied while creating the memslots whose gpa ranges can > generate private memory accesses. > Is there any other ordering we want to enforce here? I wasn't talking about enforcing arbitrary ordering, I was simply talking a= bout what userspace literally needs to be able to do KVM_CREATE_GUEST_MEMFD. > > Practically, I think that gives us a clean, intuitive way to handle int= ra-host > > migration. Rather than transfer ownership of the file, instantiate a n= ew file > > for the target VM, using the gmem inode from the source VM, i.e. create= a hard > > link. That'd probably require new uAPI, but I don't think that will be= hugely > > problematic. KVM would need to ensure the new VM's guest_memfd can't b= e mapped > > until KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM (which would also need to verify= the > > memslots/bindings are identical), but that should be easy enough to enf= orce. > > > > That way, a VM, its memslots, and its SPTEs are tied to the file, while= allowing > > the memory and the *contents* of memory to outlive the VM, i.e. be effe= ctively > > transfered to the new target VM. And we'll maintain the invariant that= each > > guest_memfd is bound 1:1 with a single VM. > > > > As above, that should also help us draw the line between mapping memory= into a > > VM (file), and freeing/reclaiming the memory (inode). > > > > There will be extra complexity/overhead as we'll have to play nice with= the > > possibility of multiple files per inode, e.g. to zap mappings across al= l files > > when punching a hole, but the extra complexity is quite small, e.g. we = can use > > address_space.private_list to keep track of the guest_memfd instances a= ssociated > > with the inode. >=20 > Are we talking about a different usecase of sharing gmem fd across VMs > other than intra-host migration? Well, I am :-) I don't want to build all of this on an assumption that we'= ll never ever want to share a guest_memfd across multiple VMs. E.g. SEV (and = SEV-ES?) already have the migration helper concept, and I've heard more than a few r= umblings of TDX utilizing helper TDs. IMO, it's not far fetched at all to think tha= t there will eventually be a need to let multiple VMs share a guest_memfd. > If not, ideally only one of the files should be catering to the guest > memory mappings at any given time. i.e. any inode should be ideally > bound to (through the file) a single kvm instance, Why? Honest question, what does it buy us? For TDX and SNP intra-host migration, it should be easy enough to ensure th= e new VM can't create mappings before migration, and that the old VM can't create= mappings or run after migration. I don't see that being any harder if the source an= d dest use different files. FWIW, it might be easier to hold off on this discussion until I post the RF= C (which is going to happen on Monday at this point), as then we'll have actu= al code to discuss. > as we we are planning to ensure that guest_memfd can't be mapped until > KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM is invoked on the target side.