Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp3465438iob; Tue, 17 May 2022 00:03:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy/5uwrPjPfsi1v56opvYAAXRcpt3B4Gc+iU4EZ4+x0ZtEByTQdvN9gjXXnIHKS4BPXuE6S X-Received: by 2002:a17:907:72ca:b0:6f9:8675:6a2a with SMTP id du10-20020a17090772ca00b006f986756a2amr18627668ejc.98.1652770999881; Tue, 17 May 2022 00:03:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652770999; cv=none; d=google.com; s=arc-20160816; b=qND0xUoN6OXOR3K08xaXDIH+Momhjm0TmooB0F1R5QslRRpNN0IG5yW2bU23gAz4Oe tUXThC+IzO6H0i/YzRZHXJ0BWvFV/jPD9nBmecgxHJpk/Y2Cfk2wgKZJDTNEqfD9IeWi WQeF8TeZ8DaOb2flWj7fZlY8InXakuS5tssQaAXky9III74mSoSET51qldyM13HW4NY1 3q4YPXZNFjFUwhGFp7tI6dgy2f1khVAnn8XcBmwAVCdSEr/qeZUasLcSDMkfghQj3vrJ Odq9WHJ3aGaJIw85y1ngpFkyXQ92XnJdRcna96p3gJZ6wdXfxLmyB2oJdRGrkRPmaTFe N3Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=tK+bP9vo0Ukbos70BXEqWIcCtWqGVN373k0TKiHr/AU=; b=dypVq3YZCdNiaT2AK3uzQO3mvl6yTFQzBM1pJBu2txoY/4r1Jz/kJDqvGdFJlhn3lE jI+5HKz63W+OURbLPs6UAbO3ZqZDkDWyiz7V1FJFx9sHgJKbnaCCr//5CaTKRnpEK0Ie VYNDSqkGcDRroKvRiUkyDLbjRaYzKsxg3bqD0hcwgxkHNQchKWk7blCkDCcGxkGYDeAz jVGc1Whliha7lLzNuL/cdZZEwbO/H4Iq1SXugMe3jcQk9zpBS00XxnyYMSAITGJM7Wzi xwNZM830Hy+C+nAShEqs9a7OGTILibL+P/IbZSiFpnSoOR/P4vqQR0zDU6pzh0DElgqw kcnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=WQpneyzB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l9-20020a170906938900b006f39e6a26b0si1751278ejx.252.2022.05.17.00.02.54; Tue, 17 May 2022 00:03:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=WQpneyzB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229993AbiEQBL3 (ORCPT + 99 others); Mon, 16 May 2022 21:11:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236996AbiEQBLH (ORCPT ); Mon, 16 May 2022 21:11:07 -0400 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEB65765F; Mon, 16 May 2022 18:11:02 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-2fefb051547so43903597b3.5; Mon, 16 May 2022 18:11:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tK+bP9vo0Ukbos70BXEqWIcCtWqGVN373k0TKiHr/AU=; b=WQpneyzBu2DwklQKiFWEpiHu9SfX4DPeZo4vixgqjEmC+97QmkJZ09ZQ3HGrEwyt/f E2zLyIC1HGW/cKocnRjovrLcvXqXeoq3B6ZuEBLGSmHbBT+u6LMt3hMowgm0sRGA4ESf 84Cq5l3apQw8RDdVJiL7V9A5Bm0spDEuaKQQhfEKvJu7Wwf9f6MyGRJmxPzYxij2LycU qUhLC8QTAYMOwc+oFQXfqcCvXAJEHdgcNMX7vKskCuDgi1PWc1TXaiHcM+iklvSbN/9/ G09UjC9vgpjvmDSZgTxlHRItUyyga59jzlMv00m5T4z2K3D78LSPgxJjPv6nxz7miJaa 3GCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tK+bP9vo0Ukbos70BXEqWIcCtWqGVN373k0TKiHr/AU=; b=UiK/n8UN8rPCscRdS5iXlA5CsArpeb+DOP4vRgNaqEQgNuGMdwewryNRnDWtcXACJC P3IPtl7N1WqEH+0F0vnUbgusb0W8wD5hYF1rl4Dlr5bNfRdX/GTzAsygQNFf/mJh/HPh EEBD6J+6O+aInfh8WDB3EwAB5CtkQ67HdaKPTwjZ/zEQvTkIDAjx28R5puHctbFvy/oz 1TMSBMLiCri2sN/uNE65QmgS5I++jyfSYFX5kPyfqmV4aOeVx8Eyz/oPJ+zuvIFnmb4C ciIfdGUaq0bRDrnJZkEDxogBKqWlo6Me10meV61YBlb8KewsFIbucshIBod8Ive2a2dF 3+hg== X-Gm-Message-State: AOAM533PLPXi7CoRegETAsHL41R2eJqXpZgxxW4RlWJv6ZzlRKtYLk6p 8qYRBHMt7id1gbE+93gOZJ+N8FdqS8OdDWl8tmyiiXHZI6I= X-Received: by 2002:a0d:da87:0:b0:2fb:994f:7bc4 with SMTP id c129-20020a0dda87000000b002fb994f7bc4mr23996906ywe.369.1652749861989; Mon, 16 May 2022 18:11:01 -0700 (PDT) MIME-Version: 1.0 References: <20220415103414.86555-1-jiangshanlai@gmail.com> In-Reply-To: From: Lai Jiangshan Date: Tue, 17 May 2022 09:10:50 +0800 Message-ID: Subject: Re: [PATCH] kvm: x86/svm/nested: Cache PDPTEs for nested NPT in PAE paging mode To: Sean Christopherson Cc: LKML , Lai Jiangshan , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , Marcelo Tosatti , Avi Kivity , "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 17, 2022 at 9:02 AM Lai Jiangshan wrote: > > On Tue, May 17, 2022 at 4:45 AM Sean Christopherson wrote: > > > > On Fri, Apr 15, 2022, Lai Jiangshan wrote: > > > From: Lai Jiangshan > > > > > > When NPT enabled L1 is PAE paging, vcpu->arch.mmu->get_pdptrs() which > > > is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from memroy > > > unconditionally for each call. > > > > > > The guest PAE root page is not write-protected. > > > > > > The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get different > > > values every time or it is different from the return value of > > > mmu->get_pdptrs() in mmu_alloc_shadow_roots(). > > > > > > And it will cause FNAME(fetch) installs the spte in a wrong sp > > > or links a sp to a wrong parent since FNAME(gpte_changed) can't > > > check these kind of changes. > > > > > > Cache the PDPTEs and the problem is resolved. The guest is responsible > > > to info the host if its PAE root page is updated which will cause > > > nested vmexit and the host updates the cache when next nested run. > > > > Hmm, no, the guest is responsible for invalidating translations that can be > > cached in the TLB, but the guest is not responsible for a full reload of PDPTEs. > > Per the APM, the PDPTEs can be cached like regular PTEs: > > > > Under SVM, however, when the processor is in guest mode with PAE enabled, the > > guest PDPT entries are not cached or validated at this point, but instead are > > loaded and checked on demand in the normal course of address translation, just > > like page directory and page table entries. Any reserved bit violations ared > > etected at the point of use, and result in a page-fault (#PF) exception rather > > than a general-protection (#GP) exception. > > > > So if L1 modifies a PDPTE from !PRESENT (or RESERVED) to PRESENT (and valid), then > > any active L2 vCPUs should recognize the new PDPTE without a nested VM-Exit because > > the old entry can't have been cached in the TLB. > > In this case, it is still !PRESENT in the shadow page, and it will cause > a vmexit when L2 tries to use the translation. I can't see anything wrong > in TLB or vTLB(shadow pages). > > But I think some code is needed to reload the cached PDPTEs > when guest_mmu->get_pdptrs() returns !PRESENT and reload mmu if > the cache is changed. (and add code to avoid infinite loop) > > The patch fails to reload mmu if the cache is changed which > leaves the problem described in the changelog partial resolved > only. > > Maybe we need to add mmu parameter back in load_pdptrs() for it. > It is still too complicated, I will try to do a direct check in FNAME(fetch) instead of (re-)using the cache. > > > > In practice, snapshotting at nested VMRUN would likely work, but architecturally > > it's wrong and could cause problems if L1+L2 are engange in paravirt shenanigans, > > e.g. async #PF comes to mind. > > > > I believe the correct way to fix this is to write-protect nNPT PDPTEs like all other > > shadow pages, which shouldn't be too awful to do as part of your series to route > > PDPTEs through kvm_mmu_get_page(). > > In the one-off special shadow page (will be renamed to one-off local > shadow page) > patchsets, PAE PDPTEs is not write-protected. Wirte-protecting it causing nasty > code.