Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5571369pxb; Mon, 28 Mar 2022 14:23:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwL+ioU+FftrXgQvCLUiKVlv+vhYTK6Sp5ZHOXylvPxzPZf1HS5jXD5q8NT617QWi+2ZPdv X-Received: by 2002:a05:6a00:1da2:b0:4fa:f803:ddd1 with SMTP id z34-20020a056a001da200b004faf803ddd1mr21498468pfw.53.1648502590064; Mon, 28 Mar 2022 14:23:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648502590; cv=none; d=google.com; s=arc-20160816; b=vfFp5LCj5B/8w5/z0+tyGzFOr2u+0IjKtuqLHZb/PoOIQ0v+UsNB71yuYJhDHh3cPP hxjt8MKCTQcctUiT4OmA8gscFmlXznK6xeqTXN5MhcGolCHlGWjKwUXGqIfZv1Ul8NW7 eY50xCgKBQc/5LHqRidRl7bndMQEZYEmgyziM/bsInfcSyDgMRrdS7tXx+mcJt2nrrj5 C/0WuPy/Dm+0il2eM2gZR7+08i7LrMu359ydEJLxwgnT1/sHjYD2pmtZ8onWEL//EyL6 y/k2isgKSHIPvS+nH6DoL4NUIvh5pq3/7kkyVS1nsqm9RZB3wXO/42wFlYsQwrPxos8z iyCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xkOsBgVuG0zCndlt1G8crnufHuKqrN/5aTVkYRyBZfc=; b=oVP47/DOMq/tnqTwSeMySI3CoVDMjd6rOJSkSI+Dqob080I0ePdvyJy2N8yKTHWDpk +8tNqSAUfsW0gbE+QglnKla/MeDxrPTpAkKv1hdVKbU0dmCKFafdZIOFLd1vhzhb498I IMrokHa/PrYITZFZLRRKOda5sXG9peKBMGPC07J2g+l1zt1C/DFuBhSh8cu9LSI4BlFp XjzfDJTkPu78sfBpWaChnxYW1gazkQwbK3+uHFW12S20IAK9UyCaLfqd9r7iumdeY3py vC+ax2Ne5WKFbrSFS9Q/bMkN0XvKUz/lGEUv/RRr0061FbPN0OBr3XotA6vjIGq12k49 z4Vw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YDY+YeRk; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id f22-20020a170902ab9600b00153b2d165c9si13028205plr.465.2022.03.28.14.23.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 14:23:10 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YDY+YeRk; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B4E112250D; Mon, 28 Mar 2022 14:08:37 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244481AbiC1RPC (ORCPT + 99 others); Mon, 28 Mar 2022 13:15:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244494AbiC1RO5 (ORCPT ); Mon, 28 Mar 2022 13:14:57 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 567F4D63 for ; Mon, 28 Mar 2022 10:13:15 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id p8so13283812pfh.8 for ; Mon, 28 Mar 2022 10:13:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=xkOsBgVuG0zCndlt1G8crnufHuKqrN/5aTVkYRyBZfc=; b=YDY+YeRkB1EvXc+KqVUKjZJIfyymRN45O804LxSksnIdheInoNyrxLgz1F47fxHVQL O0Kja0/2Q4FdPsdKGceWLGPOsgIB7koi3bsv2jQY5FvEw9UpUJEaEeBfe/lq8itIV2He SQnYSKK1Fz1SViat1FInTLq4FgbBR06LBs+nbppiyo+/q2iYabnxipScvrMV4E1MSScm IuEuDnKZdCEpPFMGG4zz4wq7C6Q1bv6ZTbkfGcUvtR4ftGNRAKLv/g8foAvB0eSaH3Lv j5RVx3sG+vmQwgxgF8/tNjDTKl16iAAeEO1TLbSC9OarbzM8Xt1ZEbnWEKcUJqGexWp3 CiSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=xkOsBgVuG0zCndlt1G8crnufHuKqrN/5aTVkYRyBZfc=; b=xiEq5aIg1Va9IGvO0NRFFOL28YiMKun5V1Fd0g09DFUu9e3CE9sDkq279rl4dxTPHt Au1SvD/dGa8gMsGuo5ewrU0ObIma+9hni+0UlB79SQu90WJfAIWrGTPr6O1lfrO1269J z+G5lQndTkL9LQ83B/qt8wSwO9Nzbc2sMM9sTZyGm5UIplcSLAL8JT9aAoFIdEJM1M8j 1lAEnvHedTGbp/ajKFJlocoUBnrQ7DXVxGNB5W0LV82NCqUHsIyAGyr5ePZPPK82dSH/ XA5iwi9b9R5IRgks+BvxBLcrChq2ONNB2ev9HR1hvoTZ7pS0qBgn1RvLdP7UPYnlpPev zKhA== X-Gm-Message-State: AOAM530TRnCI/XRLMtMqLwpms/wDWmechIZZgT0XgGAshyMC1eyXRynp zojxPeM8FvyKTslR2ukSssctSw== X-Received: by 2002:a05:6a00:1a02:b0:4fb:20f0:c1aa with SMTP id g2-20020a056a001a0200b004fb20f0c1aamr15081324pfv.45.1648487594529; Mon, 28 Mar 2022 10:13:14 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id f91-20020a17090a706400b001c7858a6879sm80756pjk.12.2022.03.28.10.13.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 10:13:13 -0700 (PDT) Date: Mon, 28 Mar 2022 17:13:10 +0000 From: Sean Christopherson To: Quentin Perret Cc: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, maz@kernel.org, will@kernel.org Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 24, 2022, Quentin Perret wrote: > For Protected KVM (and I suspect most other confidential computing > solutions), guests have the ability to share some of their pages back > with the host kernel using a dedicated hypercall. This is necessary > for e.g. virtio communications, so these shared pages need to be mapped > back into the VMM's address space. I'm a bit confused about how that > would work with the approach proposed here. What is going to be the > approach for TDX? > > It feels like the most 'natural' thing would be to have a KVM exit > reason describing which pages have been shared back by the guest, and to > then allow the VMM to mmap those specific pages in response in the > memfd. Is this something that has been discussed or considered? The proposed solution is to exit to userspace with a new exit reason, KVM_EXIT_MEMORY_ERROR, when the guest makes the hypercall to request conversion[1]. The private fd itself will never allow mapping memory into userspace, instead userspace will need to punch a hole in the private fd backing store. The absense of a valid mapping in the private fd is how KVM detects that a pfn is "shared" (memslots without a private fd are always shared)[2]. The key point is that KVM never decides to convert between shared and private, it's always a userspace decision. Like normal memslots, where userspace has full control over what gfns are a valid, this gives userspace full control over whether a gfn is shared or private at any given time. Another important detail is that this approach means the kernel and KVM treat the shared backing store and private backing store as independent, albeit related, entities. This is very deliberate as it makes it easier to reason about what is and isn't allowed/required. E.g. the kernel only needs to handle freeing private memory, there is no special handling for conversion to shared because no such path exists as far as host pfns are concerned. And userspace doesn't need any new "rules" for protecting itself against a malicious guest, e.g. userspace already needs to ensure that it has a valid mapping prior to accessing guest memory (or be able to handle any resulting signals). A malicious guest can DoS itself by instructing userspace to communicate over memory that is currently mapped private, but there are no new novel attack vectors from the host's perspective as coercing the host into accessing an invalid mapping after shared=>private conversion is just a variant of a use-after-free. One potential conversions that's TBD (at least, I think it is, I haven't read through this most recent version) is how to support populating guest private memory with non-zero data, e.g. to allow in-place conversion of the initial guest firmware instead of having to an extra memcpy(). [1] KVM will also exit to userspace with the same info on "implicit" conversions, i.e. if the guest accesses the "wrong" GPA. Neither SEV-SNP nor TDX mandate explicit conversions in their guest<->host ABIs, so KVM has to support implicit conversions :-/ [2] Ideally (IMO), KVM would require userspace to completely remove the private memslot, but that's too slow due to use of SRCU in both KVM and userspace (QEMU at least uses SRCU for memslot changes).