Received: by 2002:a05:7412:6592:b0:d7:7d3a:4fe2 with SMTP id m18csp1179471rdg; Fri, 11 Aug 2023 12:22:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFxPCCnFN7pMq61aXCrMWU/zIAWHJegR1GRJw31C33QUQa2FPsXcduIGs7EraGe9hIHBfK2 X-Received: by 2002:a17:907:774e:b0:992:ef60:aadd with SMTP id kx14-20020a170907774e00b00992ef60aaddmr3414090ejc.13.1691781721870; Fri, 11 Aug 2023 12:22:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691781721; cv=none; d=google.com; s=arc-20160816; b=k+to0TeSLA7P6/2IT54Nhq3SpbNOqaXqc2knwJ3qAM7Dhc7IJfzosdA8kSThBRJHtJ 918t7l0jLCjV/a9zmkXO4oYyHf0XNF1gA5wB1aYy8N5ZgdFxw5zGxr1E2NIBzNsrguGw K389XE7YNCnBYqWlEN0is3PMgR6G2uP42qI60SJA2XYgxYI9s+4HesmbBqAZ2FoULuf2 Ag6K4E42UazeKfYFbR1/bk3HsHeE3/3eSkM8kKCkh/y25h+SGRtuR4PTSg43ZejU6MYi OMSyxhdxcGwhzTvSewC6y+gFsngTBkNXJeEgkLtXXycLDZjvTDDeLzQVV6dGyPi5DbMU A8PQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=y5tf/etP9591sBoXXP0u0lhmE2sGhQYCIj4rXej+PgY=; fh=WufCTO25PYSjPWjtg/k+uJHqIXACOjfe6bJwoUQ8cZE=; b=uYgiHoQOaKEIpgXNNXV4V1qaFoxm12Wpu6oAl0C55CdCX1pYoR6IKRyzlJ3cJ7l0q1 mnfkvLFLbREewaak52zGonQrda9d5K5rFgC+ETUdE5M4kMGQiXU4/RjsUsCcNlIwK1FA qAhU0C0JdZv39kuPc2ixwGulGYtSlOIiPGnOzXRqROOVqE2nrlAJuOo82UBBSAhd4A1B CtwbCIGrh0zqQ7CZNEF4kjrYiy80m6vbeP5YgGSFIqOsYjFMPDuGEPFE4MSvb1XuX6Li EGJcuLrWujQEtq1Zl9GooMrmdJ1BCOCFL86kzzjS1BAWSyoMQv4zU4l2pehYSUV1Uejz DiUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=z4avZMYB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u1-20020a170906408100b0099263f8cdc4si4228980ejj.207.2023.08.11.12.21.34; Fri, 11 Aug 2023 12:22:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=z4avZMYB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236897AbjHKRoR (ORCPT + 99 others); Fri, 11 Aug 2023 13:44:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232348AbjHKRoO (ORCPT ); Fri, 11 Aug 2023 13:44:14 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3765BEA for ; Fri, 11 Aug 2023 10:44:14 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-586a5ac5c29so26337637b3.1 for ; Fri, 11 Aug 2023 10:44:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691775853; x=1692380653; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=y5tf/etP9591sBoXXP0u0lhmE2sGhQYCIj4rXej+PgY=; b=z4avZMYBxF/USDP2G14KA+P8N8yreov1u2JOHwfc1TVZdl6Iz+YTKH19wDx7mUMq9M o4u6qon1QiMk8DVxh93qeN2Mz6fZGsUFtzOg33bB4Mo7KFsOypRjJRdzz9Y8WFCiY85q br+fPgBO/on1YU5soxnymBtm9OPZyICHaFKZfYSg9Ropko2mo0wKLVwIok1lVrWFw51i jjhtl/HK+nEz+jNhJe7imTF+VBA6+NGFZjFm7+hHqXCkfP3+rYY1oaYSypJ0v6009kFT lq7bEwknPOhWjDhET6YMFrJ/3n1YTZ5hf1bP/IHZB/5/XDyHPzuDGnqfF1kEhe1cc6AD 1nfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691775853; x=1692380653; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=y5tf/etP9591sBoXXP0u0lhmE2sGhQYCIj4rXej+PgY=; b=iwV8MX761MqgvpC9J8Qw3+TatAY6x0xmF2L8WI4vi8Dn++e7OyJpiZupZ+Ek9e4Jyr XoF2GT7Z/qR6Kn1EzacaxtzBaichs6/i1ULn/m+JgMqbfvBdbeZjVTPGTx6p2EIbZ2KQ MOWrtG6n2jZ9/0uncy21lzmdgIa9nuheRZmWl81BPOb17SfBnO0DAXTlUKfBidl2BAcU zSh5TzOzWZjjA2Ad2dXW2SDqNvAeHdOfLdDp0STh/xIdzD0CxtRQrjNV6uFzzEUOlWs8 o2Pz+CIu0I9EOYww6u1QpoCcrPZsPdaBrC0JzsGud0fD15mxBLGUYMfhxqsiJ4e0tbKH O3vA== X-Gm-Message-State: AOJu0Yz/kMmonwABO+o4aSn3D0placpJAQEAZVFnWy68pzB1yLoLolEY +IRbFeIX9h6lZv6z+kqCZO3BKJ/BeHg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:7443:0:b0:d20:7752:e384 with SMTP id p64-20020a257443000000b00d207752e384mr41859ybc.3.1691775853469; Fri, 11 Aug 2023 10:44:13 -0700 (PDT) Date: Fri, 11 Aug 2023 10:44:11 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230718234512.1690985-13-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Vishal Annapurve Cc: Ackerley Tng , pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, willy@infradead.org, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, chao.p.peng@linux.intel.com, tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com, mail@maciej.szmigiero.name, vbabka@suse.cz, david@redhat.com, qperret@google.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 10, 2023, Vishal Annapurve wrote: > On Tue, Aug 8, 2023 at 2:13=E2=80=AFPM Sean Christopherson wrote: > > ... >=20 > > > + When binding a memslot to the file, if a kvm pointer exists, it mus= t > > > be the same kvm as the one in this binding > > > + When the binding to the last memslot is removed from a file, NULL t= he > > > kvm pointer. > > > > Nullifying the KVM pointer isn't sufficient, because without additional= actions > > userspace could extract data from a VM by deleting its memslots and the= n binding > > the guest_memfd to an attacker controlled VM. Or more likely with TDX = and SNP, > > induce badness by coercing KVM into mapping memory into a guest with th= e wrong > > ASID/HKID. > > >=20 > TDX/SNP have mechanisms i.e. PAMT/RMP tables to ensure that the same > memory is not assigned to two different VMs. One of the main reasons we pivoted away from using a flag in "struct page" = to indicate that a page was private was so that KVM could enforce 1:1 VM:page = ownership *without* relying on hardware. And FWIW, the PAMT provides no protection in this specific case because KVM= does TDH.MEM.PAGE.REMOVE when zapping S-EPT entries, and that marks the page cle= ar in the PAMT. The danger there is that physical memory is still encrypted with= the guest's HKID, and so mapping the memory into a different VM, which might no= t be a TDX guest!, could lead to corruption and/or poison #MCs. The HKID issues wouldn't be a problem if v15 is merged as-is, because zappi= ng S-EPT entries also fully purges and reclaims the page, but as we discussed = in one of the many threads, reclaiming physical memory should be tied to the i= node, i.e. to memory truly being freed, and not to S-EPTs being zapped. And ther= e is a very good reason for wanting to do that, as it allows KVM to do the expen= sive cache flush + clear outside of mmu_lock. > Deleting memslots should also clear out the contents of the memory as the= EPT > tables will be zapped in the process No, deleting a memslot should not clear memory. As I said in my previous r= esponse, the fact that zapping S-EPT entries is destructive is a limitiation of TDX,= not a feature we want to apply to other VM types. And that's not even a fundamen= tal property of TDX, e.g. TDX could remove the limitation, at the cost of consu= ming quite a bit more memory, by tracking the exact owner by HKID in the PAMT an= d decoupling S-EPT entries from page ownership. Or in theory, KVM could workaround the limitation by only doing TDH.MEM.RAN= GE.BLOCK when zapping S-EPTs. Hmm, that might actually be worth looking at. > and the host will reclaim the memory. There are no guarantees that the host will reclaim the memory. E.g. QEMU w= ill delete and re-create memslots for "regular" VMs when emulating option ROMs.= Even if that use case is nonsensical for confidential VMs (and it probably is no= nsensical), I don't want to define KVM's ABI based on what we *think* userspace will do= .