Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3130790pxb; Mon, 4 Apr 2022 08:28:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzk48I1cPgJiRd8uJPy3XzIdvZ+Bv6LZcAiumsNJJPPBVjuOfpd3nTElFdTwEvcK6t4kRq X-Received: by 2002:a17:902:d482:b0:154:6f46:a602 with SMTP id c2-20020a170902d48200b001546f46a602mr161778plg.155.1649086091178; Mon, 04 Apr 2022 08:28:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649086091; cv=none; d=google.com; s=arc-20160816; b=pHxcUY9JSmhJQ6/vBGQ3GH663YcCB1mjBJKS7QIn4A7++Q9W1+gqI2mhVskfjmUM2n RVzr5FwIWoit2lQpviO4ZaoPtCTNH9D4eWFsi00svVfs1aBvuK895NgzvJ3NGp4eELfr 3NUoLUeigOsjU+0/A+nwIqq3R/X7VfbQksado334KxF4xJGxMLaJCJZmq6+OEfv17a22 BVCrAeLw+c/V0kU68ozeuM+/iiKqkhqcfHRWMOZoOo4ehyq7JMurQjPcOPeqijfor7BQ FdJqzhEzdBpwEq5dYLGmaK87Db6BrEXA7zXiTZc4YnwUH9EkSwGEObA7SzTygwMgUfEr 5bbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=u8pAbwXuVZ7sCmxLVbJSUFO5pNLY6s4GVBTnQJD82aI=; b=G7cLdU3ZC0bwh5CHSrfpcpWtlbHt9NMaPgfTslSOvTkzBilhN672BMrOatYNGEcBU2 vA46nd/wH2A8qYXMWYIiSD34eM+Yqly7Xib3yLPVCUhir0n/5i1bACfTG1yUUBgqHdM8 Y3rE/07lQAkwQVC8RTFCBiOVAbsMB+YJPLD8kdbWk1BVcP0MXVCkfR4bRA082VVWL/5I 0iBx6AMsaoUQCXB+DkwdJ9Md18NQnp5gxV2YGsIDMG6UMEkHyS8xKF//jva44fePYHgC kLJxTCVBA1Wh6JM+jmGI6cu/ttC4hkTdrSOFjSCl1qp05bMeDBGb5TFow2sj0g1Dswgb DKRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="YYzb/Ra4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d12-20020a170903230c00b00153b2d165c2si1364107plh.458.2022.04.04.08.27.54; Mon, 04 Apr 2022 08:28:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="YYzb/Ra4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350323AbiDAS0o (ORCPT + 99 others); Fri, 1 Apr 2022 14:26:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347368AbiDAS0j (ORCPT ); Fri, 1 Apr 2022 14:26:39 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1020319752E for ; Fri, 1 Apr 2022 11:24:49 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id c15-20020a17090a8d0f00b001c9c81d9648so3297715pjo.2 for ; Fri, 01 Apr 2022 11:24:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=u8pAbwXuVZ7sCmxLVbJSUFO5pNLY6s4GVBTnQJD82aI=; b=YYzb/Ra45v4lgC8Mo/lsfCtm73X48xfnKrwB54MwA5wPabpTz4mmwu8S2I8qAYyJj5 Kn8uWkEGKw42ptlWD2HrRNgNxXueDq73aO//NsxdRXksW3j8tF8LEZ9TNnOWeavfn4mS ZnUNSezrBOtvRZXs/fW/nfMTjCy5B2muTjyglf0mxjSXQ6YnqfEviq2D7D++dUzYgxcG UdF4Qru6HTQXhc6znRvUaCdL76qs761lqxWW05blkvDbtrYsJVEI4tJZNsF09hAOBLyT 1WnaRrSyuvDFbGtFyDjWoiF7bOi7Ge6us5fAD3EwikvFCxEpyQbi8pGe9Fn/yt7zc2oz m/pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=u8pAbwXuVZ7sCmxLVbJSUFO5pNLY6s4GVBTnQJD82aI=; b=NLMq0B4MCGacqspfd5elaSiil7V0lsC8L9k4/zKG5KjnZ+cS12G+fGsIaPH38g8kkl r+0TgXr3P3VgK15qCi+vFzW2Zfc3cj1EBfMnSIyt3Oe/B7zA/ux2vSDEKN72h/ReVhK7 8pnlQQXXRGKmneo8RVEhhHlUXJu0oDICE1CZo2Ey3sIqlfrRuywg1zOeJsU6BjLVyBWg Z1rNLDRau+Ovt5zXk4G5Fx82XoiYBRNI0Q1B6djAepl/K1eDG0dU8tSI7wh87pXaKR6K 3Fo0W4Z2qqI1cu54dS8BAFBUQmoobSi146kBuXMZX07jStWZYNBEGlouJCAUykseD0SE UJwQ== X-Gm-Message-State: AOAM532kK0b4Pd9DX/GoM4jdpuIZzpNBTziOU8WRP3H/8IlTvHGpu7gt 3jAiUDksGeAShY0ugT1g1KUDxg== X-Received: by 2002:a17:90b:1c86:b0:1bf:2a7e:5c75 with SMTP id oo6-20020a17090b1c8600b001bf2a7e5c75mr13348243pjb.145.1648837488223; Fri, 01 Apr 2022 11:24:48 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id oc10-20020a17090b1c0a00b001c7510ed0c8sm14589897pjb.49.2022.04.01.11.24.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Apr 2022 11:24:47 -0700 (PDT) Date: Fri, 1 Apr 2022 18:24:44 +0000 From: Sean Christopherson To: Quentin Perret Cc: Andy Lutomirski , Steven Price , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand , Marc Zyngier , Will Deacon Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <88620519-029e-342b-0a85-ce2a20eaf41b@arm.com> <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 01, 2022, Quentin Perret wrote: > On Friday 01 Apr 2022 at 17:14:21 (+0000), Sean Christopherson wrote: > > On Fri, Apr 01, 2022, Quentin Perret wrote: > > I assume there is a scenario where a page can be converted from shared=>private? > > If so, is there a use case where that happens post-boot _and_ the contents of the > > page are preserved? > > I think most our use-cases are private=>shared, but how is that > different? Ah, it's not really different. What I really was trying to understand is if there are post-boot conversions that preserve data. I asked about shared=>private because there are known pre-boot conversions, e.g. populating the initial guest image, but AFAIK there are no use cases for post-boot conversions, which might be more needy in terms of performance. > > > We currently don't allow the host punching holes in the guest IPA space. > > > > The hole doesn't get punched in guest IPA space, it gets punched in the private > > backing store, which is host PA space. > > Hmm, in a previous message I thought that you mentioned when a whole > gets punched in the fd KVM will go and unmap the page in the private > SPTEs, which will cause a fatal error for any subsequent access from the > guest to the corresponding IPA? Oooh, that was in the context of TDX. Mixing VMX and arm64 terminology... TDX has two separate stage-2 roots, one for private IPAs and one for shared IPAs. The guest selects private/shared by toggling a bit stolen from the guest IPA space. Upon conversion, KVM will remove from one stage-2 tree and insert into the other. But even then, subsequent accesses to the wrong IPA won't be fatal, as KVM will treat them as implicit conversions. I wish they could be fatal, but that's not "allowed" given the guest/host contract dictated by the TDX specs. > If that's correct, I meant that we currently don't support that - the > host can't unmap anything from the guest stage-2, it can only tear it > down entirely. But again, I'm not too worried about that, we could > certainly implement that part without too many issues. I believe for the pKVM case it wouldn't be unmapping, it would be a PFN change. > > > Once it has donated a page to a guest, it can't have it back until the > > > guest has been entirely torn down (at which point all of memory is > > > poisoned by the hypervisor obviously). > > > > The guest doesn't have to know that it was handed back a different page. It will > > require defining the semantics to state that the trusted hypervisor will clear > > that page on conversion, but IMO the trusted hypervisor should be doing that > > anyways. IMO, forcing on the guest to correctly zero pages on conversion is > > unnecessarily risky because converting private=>shared and preserving the contents > > should be a very, very rare scenario, i.e. it's just one more thing for the guest > > to get wrong. > > I'm not sure I agree. The guest is going to communicate with an > untrusted entity via that shared page, so it better be careful. Guest > hardening in general is a major topic, and of all problems, zeroing the > page before sharing is probably one of the simplest to solve. Yes, for private=>shared you're correct, the guest needs to be paranoid as there are no guarantees as to what data may be in the shared page. I was thinking more in the context of shared=>private conversions, e.g. the guest is done sharing a page and wants it back. In that case, forcing the guest to zero the private page upon re-acceptance is dicey. Hmm, but if the guest needs to explicitly re-accept the page, then putting the onus on the guest to zero the page isn't a big deal. The pKVM contract would just need to make it clear that the guest cannot make any assumptions about the state of private data Oh, now I remember why I'm biased toward the trusted entity doing the work. IIRC, thanks to TDX's lovely memory poisoning and cache aliasing behavior, the guest can't be trusted to properly initialize private memory with the guest key, i.e. the guest could induce a #MC and crash the host. Anywho, I agree that for performance reasons, requiring the guest to zero private pages is preferable so long as the guest must explicitly accept/initiate conversions. > Also, note that in pKVM all the hypervisor code at EL2 runs with > preemption disabled, which is a strict constraint. As such one of the > main goals is the spend as little time as possible in that context. > We're trying hard to keep the amount of zeroing/memcpy-ing to an > absolute minimum. And that's especially true as we introduce support for > huge pages. So, we'll take every opportunity we get to have the guest > or the host do that work. FWIW, TDX has the exact same constraints (they're actually worse as the trusted entity runs with _all_ interrupts blocked). And yeah, it needs to be careful when dealing with huge pages, e.g. many flows force the guest/host to do 512 * 4kb operations.