Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp3335187rwb; Fri, 20 Jan 2023 14:43:56 -0800 (PST) X-Google-Smtp-Source: AMrXdXt5KLNipYHnjngaG+2F+GXW8Pf1C/MTr3z03hrMfFJpExWV7FnrQTuXNHhLWx2q+PXvcyze X-Received: by 2002:a17:906:6819:b0:872:23b8:d6f1 with SMTP id k25-20020a170906681900b0087223b8d6f1mr18366633ejr.14.1674254636112; Fri, 20 Jan 2023 14:43:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674254636; cv=none; d=google.com; s=arc-20160816; b=VyT6mW9YWbaIovbRtmvmw4SDuvt67rfGm7PUZiKFCX4xSkkE+daoz7bNsq7EKUbUMf XcHihqjwBM3L8ZCWvCN8sVIvilHI+n3D+4tbWepL04tkS5EyFR28UcE5IbvRJ7X8Uxwa Q/sPziuqFI1sagzimPHBuk5kMB8cZC1BBEs6+KEc1WW2mJeHtVhBSyCGpH7mMHQ24qUA ZzcnNQ6UoQklqgbDWbJgXDCuzb/MA8R4Ed3FtHoMao1C41ClW5yB1SOVlMxinIuJnnHZ 2iaAPFErcUiuNHY8TO+NFO4ptBwXPtY1o3cdwCJh4eUQYtInRdtwruFegOGPkU8EDUZP VL0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=aUZ0ffAJxXSAs75mrnI1JYQKtxulHE6LLibfAEIYCUE=; b=vhJAxmDLQkJqoNOL2nbI1+7gtfDHzhLHzNGOJXk3PhnCjCl8DR98wnu8kC74OdidPh 2rGKalHLAIMRckiqvTYvLT4H6kbkJHFwzupTeeZP8fHauKk0vHNDQOS+BXVj4aTF6xRi INpmyNYHzY4efdGjJsbqENvoLlkImmc9z5VDoEz+1QClvEREcxMDlveJp74jN+lInvlt 6SGSJH7C/LPyu62EHpt3YHbpvyDKZlnPl6TLSBHFGPhi16n+IvC6y6j1h0Kq1f4Nh0t/ mYf9ZDSLMVL9nC8hWwleVPiGhhMq+q7m49sz6UjYi0/SXcYfMI1gSMKt5QOT7KuymV9f sqhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Hp9pWr8v; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xj1-20020a170906db0100b007c1753251a4si19937713ejb.928.2023.01.20.14.43.42; Fri, 20 Jan 2023 14:43:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Hp9pWr8v; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229991AbjATWWC (ORCPT + 50 others); Fri, 20 Jan 2023 17:22:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229556AbjATWWA (ORCPT ); Fri, 20 Jan 2023 17:22:00 -0500 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A404D6FD11 for ; Fri, 20 Jan 2023 14:21:58 -0800 (PST) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-5018be4ae8eso29354847b3.4 for ; Fri, 20 Jan 2023 14:21:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=aUZ0ffAJxXSAs75mrnI1JYQKtxulHE6LLibfAEIYCUE=; b=Hp9pWr8vPaWaHT+lrAMRQ2ucltxYK2bG/0HMjvFmx3K6MDIeTUplEsmdQbjt4FgodK PrAUBGgKy9MnLhlVVEPXB1G2qw5pC/mgmQ3kDbMFNr+T6lwuuu6/xbPfjv7QXrA4BTRI Kuga8w97T3EIpIO96C3x8Y1VQT7bioTSMyBKrcwydWpRy4cZkn0qDM2yGNYY4vs4dHhr 24ziDrJuOztpNU3hRnEqYJhQLzAvvRY7Oz7fi7HVA3ITzX4BSU1xXOYprlZZcQ/DHRyM KRYh/85BN6IavNe8kucJOQzQfX0xX28K08m5VF1bzzuteRw29hJ5+WlP0eows+mZyfW/ guIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aUZ0ffAJxXSAs75mrnI1JYQKtxulHE6LLibfAEIYCUE=; b=7OWhsXZx/pR1B52RBdxq8CnaE1y0doiRQfHPO2H5VOFYSULogpRM6Eew7IMYk+foci 7ElxlDf763ZnXbQJeFRt/sR6KX3/YZ2ek+Tx9k9mDK5h86lUXZs0ZLLSRp/uMaXezH98 wStY8Dlu+IcLD6dmY0kHSTWiPEz4faLNY9hgn3IzwLddvVhmSf9ndVhLXm8k6g5lfNw7 x2Puc0yb6P2rVoEjC75/TaAaX1uTLo1ikOIyWk+fjUQxWAiqc+/W8nC5vM+mpkDklAG9 6ah+PGzwvFRo9dtsJN/Ri+6tuoV8MyKxETfIeWLpJoPicpiKMUits6yXRRnSb8/FcTf6 9RSw== X-Gm-Message-State: AFqh2koGSrXk5or2wQ0wOgm1hp0rRklvPaMTfTOvn443c/24wSgeUrfV xXgi7Ij+Ox7QNjwKtzscGzjKOwTaKpSRUquv+coTEw== X-Received: by 2002:a0d:e611:0:b0:4ff:b3a2:5962 with SMTP id p17-20020a0de611000000b004ffb3a25962mr467741ywe.111.1674253317753; Fri, 20 Jan 2023 14:21:57 -0800 (PST) MIME-Version: 1.0 References: <20230114111621.00001840@gmail.com> <20230117214414.00003229@gmail.com> <02b0e551647beed9ec3a2fefd3b659eb52c4846c.camel@intel.com> <1c71eda35e03372f29162c6a5286f5b4d1e1d7e1.camel@intel.com> In-Reply-To: From: David Matlack Date: Fri, 20 Jan 2023 14:21:31 -0800 Message-ID: Subject: Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure To: Sean Christopherson Cc: "Huang, Kai" , "sean.j.christopherson@intel.com" , "Shahar, Sagi" , "isaku.yamahata@gmail.com" , "Aktas, Erdem" , "kvm@vger.kernel.org" , "pbonzini@redhat.com" , "zhi.wang.linux@gmail.com" , "linux-kernel@vger.kernel.org" , "Yamahata, Isaku" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 19, 2023 at 4:16 PM Sean Christopherson wrote: > > On Thu, Jan 19, 2023, Huang, Kai wrote: > > On Thu, 2023-01-19 at 21:36 +0000, Sean Christopherson wrote: > > > The least invasive idea I have is expand the TDP MMU's concept of "frozen" SPTEs > > > and freeze (a.k.a. lock) the SPTE (KVM's mirror) until the corresponding S-EPT > > > update completes. > > > > This will introduce another "having-to-wait while SPTE is frozen" problem I > > think, which IIUC means (one way is) you have to do some loop and retry, perhaps > > similar to yield_safe. > > Yes, but because the TDP MMU already freezes SPTEs (just for a shorter duration), > I'm 99% sure all of the affected flows already know how to yield/bail when necessary. > > The problem with the zero-step mitigation is that it could (theoretically) cause > a "busy" error on literally any accesses, which makes it infeasible for KVM to have > sane behavior. E.g. freezing SPTEs to avoid the ordering issues isn't necessary > when holding mmu_lock for write, whereas the zero-step madness brings everything > into play. (I'm still ramping up on TDX so apologies in advance if the following is totally off base.) The complexity, and to a lesser extent the memory overhead, of mirroring Secure EPT tables with the TDP MMU makes me wonder if it is really worth it. Especially since the initial TDX support has so many constraints that would seem to allow a simpler implementation: all private memory is pinned, no live migration support, no test/clear young notifiers, etc. For the initial version of KVM TDX support, what if we implemented the Secure EPT management entirely off to the side? i.e. Not on top of the TDP MMU. For example, write TDX-specific routines for: - Fully populating the Secure EPT tree some time during VM creation. - Tearing down the Secure EPT tree during VM destruction. - Support for unmapping/mapping specific regions of the Secure EPT tree for private<->shared conversions. With that in place, KVM never would need to handle a fault on a Secure EPT mapping. Any fault (e.g. due to an in-progress private<->shared conversion) can just return back to the guest to retry the memory access until the operation is complete. If we start with only supporting 4K pages in the Secure EPT, the Secure EPT routines described above would be almost trivial to implement. Huge Pages would add some complexity, but I don't think it would be terrible. Concurrency can be handled with a single lock since we don't have to worry about concurrent faulting. This would avoid having TDX add a bunch of complexity to the TDP MMU (which would only be used for shared mappings). If and when we want to have more complicated memory management for TDX private mappings, we could revisit TDP MMU integration. But I think this design could even get us to the point of supporting Dirty Logging (where the only fault KVM would have to handle for TDX private mappings would be write-protection faults). I'm not sure it would work for Demand-Paging (at least the performance would not be great behind a single lock), but we can cross that bridge when we get there.