Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp742055ioo; Thu, 26 May 2022 13:47:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzlOQI1XyD6e7a11CiOcZXsgp0LygVvPxvDMsSKyipLNt6lvUpIGfmQIwoGqayYFWksuzHC X-Received: by 2002:a17:902:d483:b0:162:3f26:7fb3 with SMTP id c3-20020a170902d48300b001623f267fb3mr15788858plg.82.1653598057710; Thu, 26 May 2022 13:47:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653598057; cv=none; d=google.com; s=arc-20160816; b=URcS2QVSPOOsbxXNTXB2FNJZgYD1q85yAYLXWyweGilgQipY5x7pjMyoR7FZp6b0G1 LByiNMEvhMju4FTq/9amvPTguV4YpZX54XgmPE2Ry3dur6CWatbOpgO5EUMdqnkqV2mv nU16mDts70NWGDIfVkIQczIoAZcYdZdR37/8slIurChwZf3PKlInLJ77OAvKWkn9Veea KcRV8ZzpKjbEvO/G6Nk4fmpTJJZdWXQUc6zIdBAC93GKq8VyoSqO2ZJXwahAh5k053PV T5uH/wvVzlBny9epvTvPvLybQfS9BqlwKATcniQjq/tTlIypIfq+i08k4XFPsjPntfLr rUyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GoHQcTc7Wsv9SwEZbOghaRKz9fDmm73K2D/ZBJdetK0=; b=Xxts6yQ0ifS3z+4To9dOzq2rVYq0yGp3Q7OmBBzMp9sM0nrBbJfitkdpGM+VgBC2Ix Vi18N8h/65+vMGZZIYKKuVbXM1Rfr/vYZR9txONCDV3jrfqCMAm57ULjzQrmsUYM7yCJ axTk/x/QDoneNSgTBxyeLvwQe1uO4vK22OrjKARhkITL1kE2EnftuurniaiuNpH3w+kQ 7Uojtub81Bp1ZmrxvZVNFJQkhgKtccvjFXo7fV7h/rld9OfG2FYTMB1wmfOsdk1emYYH oy7Fm1FXx/wBjANy6qDyNG6ME2rCL60Vg5fAylRdEeBlf/YGPr69jJOsV+jbYCb9G5IG twJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=irhC+b9v; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i33-20020a632221000000b003fae900c220si3716582pgi.188.2022.05.26.13.47.26; Thu, 26 May 2022 13:47:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=irhC+b9v; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242801AbiEYVpd (ORCPT + 99 others); Wed, 25 May 2022 17:45:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbiEYVpb (ORCPT ); Wed, 25 May 2022 17:45:31 -0400 Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34EF1B36E5 for ; Wed, 25 May 2022 14:45:30 -0700 (PDT) Received: by mail-lf1-x133.google.com with SMTP id bq30so38101601lfb.3 for ; Wed, 25 May 2022 14:45:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GoHQcTc7Wsv9SwEZbOghaRKz9fDmm73K2D/ZBJdetK0=; b=irhC+b9v9qZtwxvP5MydHnwGD3VKeav5v58wHWzrGqIIOocpTu3ejs77IIuuwJGB7h mcPgHORLJeyUukYR0PwAvSW9nXBOPYRWqXZAyKhGwseTzyTVIMKbXXLrQSHfR8aoNWXF 9ltCVMwyQQJ307rdrK62SEpcg29weRwdeiD+BrqRXnhumwixz8xtlqCRvvb17AsonF1o LyKOosUSQ58CusfD4giLhD/P0+1vPM4qqat49NGeBac2lndV/vjqcljY9leblaqeDURr A9R8voqyNdFslFtP6mBLOSVCmiIF1ipvaPR9uT1aYZWbSTsH95K3l00Qpx3UFqwWkN/T FgRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GoHQcTc7Wsv9SwEZbOghaRKz9fDmm73K2D/ZBJdetK0=; b=nc+JnzyTNLdc8pvxUvs4Gb9TvERd5+LGf50Kk2MCBeqvDEsX5J6tUm/qPLKE0nS97y zid9o+VEex9Win1iRIp9z8pBACS7Fkk92a+e7acEZ9GXHMRHsUF9s0JUDx094gpxWa1e nQYBcm8ColGIvRsOpYXl25R+DBQKZLLtWE46rVeETtkMTmB3FUOEybZgxXDmu01+Nidt dfK6ffZAq+x0w92QJXKSmzV2wPNppBXFHCMl0Bt7AP51Ot8KIxHeSvPbjnQxIhbdQOCC v2t1LBCOoSQJNyp0q63oW3DeBZEnvr5cKglbNNtOh/YWgykTujCOh380vx6bhyj9TjF3 g7ug== X-Gm-Message-State: AOAM530qLf1i3Tw3C7onYnlINbhDa21PAc0FPJN7GsQaZNwwj/BHzsTq ukQ5l3XvBNHRXsaE7dMM+tY+XftYfdHYs2m0Lb85Vg== X-Received: by 2002:a19:674c:0:b0:448:3f49:e6d5 with SMTP id e12-20020a19674c000000b004483f49e6d5mr25211060lfj.518.1653515128361; Wed, 25 May 2022 14:45:28 -0700 (PDT) MIME-Version: 1.0 References: <20220415103414.86555-1-jiangshanlai@gmail.com> In-Reply-To: From: David Matlack Date: Wed, 25 May 2022 14:45:01 -0700 Message-ID: Subject: Re: [PATCH] kvm: x86/svm/nested: Cache PDPTEs for nested NPT in PAE paging mode To: Sean Christopherson Cc: Lai Jiangshan , LKML , Lai Jiangshan , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , Marcelo Tosatti , Avi Kivity , kvm list Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 16, 2022 at 2:06 PM Sean Christopherson wrote: > > On Fri, Apr 15, 2022, Lai Jiangshan wrote: > > From: Lai Jiangshan > > > > When NPT enabled L1 is PAE paging, vcpu->arch.mmu->get_pdptrs() which > > is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from memroy > > unconditionally for each call. > > > > The guest PAE root page is not write-protected. > > > > The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get different > > values every time or it is different from the return value of > > mmu->get_pdptrs() in mmu_alloc_shadow_roots(). > > > > And it will cause FNAME(fetch) installs the spte in a wrong sp > > or links a sp to a wrong parent since FNAME(gpte_changed) can't > > check these kind of changes. > > > > Cache the PDPTEs and the problem is resolved. The guest is responsible > > to info the host if its PAE root page is updated which will cause > > nested vmexit and the host updates the cache when next nested run. > > Hmm, no, the guest is responsible for invalidating translations that can be > cached in the TLB, but the guest is not responsible for a full reload of PDPTEs. > Per the APM, the PDPTEs can be cached like regular PTEs: > > Under SVM, however, when the processor is in guest mode with PAE enabled, the > guest PDPT entries are not cached or validated at this point, but instead are > loaded and checked on demand in the normal course of address translation, just > like page directory and page table entries. Any reserved bit violations ared > etected at the point of use, and result in a page-fault (#PF) exception rather > than a general-protection (#GP) exception. This paragraph from the APM describes the behavior of CR3 loads while in SVM guest-mode. But this patch is changing how KVM emulates SVM host-mode (i.e. L1), right? It seems like AMD makes no guarantee whether or not CR3 loads pre-load PDPTEs while in SVM host-mode. (Although the APM does say that "modern processors" do not pre-load PDPTEs.) > > So if L1 modifies a PDPTE from !PRESENT (or RESERVED) to PRESENT (and valid), then > any active L2 vCPUs should recognize the new PDPTE without a nested VM-Exit because > the old entry can't have been cached in the TLB. > > In practice, snapshotting at nested VMRUN would likely work, but architecturally > it's wrong and could cause problems if L1+L2 are engange in paravirt shenanigans, > e.g. async #PF comes to mind. > > I believe the correct way to fix this is to write-protect nNPT PDPTEs like all other > shadow pages, which shouldn't be too awful to do as part of your series to route > PDPTEs through kvm_mmu_get_page().