Received: by 2002:ac2:464d:0:0:0:0:0 with SMTP id s13csp1994791lfo; Sat, 28 May 2022 13:04:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzTlJvEtYI478yTmjHZzhUus2aM26fZ8eJW44WADbTPMiweRsZ2snhka63aXYpFm8ivOuqA X-Received: by 2002:a17:90a:aa8c:b0:1df:359a:1452 with SMTP id l12-20020a17090aaa8c00b001df359a1452mr14933166pjq.75.1653768276611; Sat, 28 May 2022 13:04:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653768276; cv=none; d=google.com; s=arc-20160816; b=vTKCbFaAokvbooTKvbM4Hfw/GEPnTVNb80ng4nYEvGx3kq31CZ/Vq3Kna/z4pOdAK6 9CvVCx3SdoBVyv9QqodW58BQMdUvK6lx5B5e88UlJ5357I3teZBbDkghJSBa8lCoE08r /VUbWF5W/vPkcJuXdvLABB9IOtetKfiRmTtpS2tgi8UV2H2jESDCXeLscz4dgGpPF8Th YW4T3yuiopF4QXDxgNOGSZcAFrKoOAXg0vWHnKslzxyQIvvwhVtLdmex45U0qhrk7/Io A2QH3a47RdMCcXQot+fTMV5bsForSkFFx0M7aUxBRp8/WNjoNGdIVl+0i9JYJ4t0QE1/ 3NXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Vw/Z0MTuWn6e5jQEzUWaV6QYfzZaWvqRsRp1agJKyw4=; b=qDzoFxW+rBQJwkhFJ1Lf8fSnRIq7kIIKnDOeluf1Uj9kNpEbpRi0DJ67Hx55LM57Nm NBA6BUGjAwpKxFMFtFlUgH3vf/vZ59rjR2QBidaFz9qszLwveI31HplZoUW8lUSBpnyy 749XHYZ6JAEMygOKRoNjQm3x5EGqUJ1JDe4nyu4NGLaohdm7Xz37uBS4IoBTu8vYgid8 gWZAbS00KCLH1s2JBrgj2FPwXSLiR8AZhrfgAM1Iw/UK5j1EVT4WCPm2+1lVPkyPZf3W 07vKesueukrPthUry0Ejv+JdjQ84GvM6j4Ts7hNO3LbxIJfvWTPMokVhAoHTOUB0cMB9 ivgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=S9LhIA14; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n14-20020a170903110e00b001631214cc97si11223089plh.350.2022.05.28.13.04.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 13:04:36 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=S9LhIA14; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D110E1116D8; Sat, 28 May 2022 12:21:18 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348035AbiEZP57 (ORCPT + 99 others); Thu, 26 May 2022 11:57:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348031AbiEZP55 (ORCPT ); Thu, 26 May 2022 11:57:57 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4B7D62CCC for ; Thu, 26 May 2022 08:57:55 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id v26so3634475ybd.2 for ; Thu, 26 May 2022 08:57:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Vw/Z0MTuWn6e5jQEzUWaV6QYfzZaWvqRsRp1agJKyw4=; b=S9LhIA14NROGEwwJ+ujpEYnhkzl8zYc0ZWijDxo27R5l17Dfo0AxKuwY6TytzrM6+v ekV3USG9J3mXyb6YlaLsDZYxwKjicMLOWNLeWdG/DgGf/iAhuEN8SUP1lc2eMmWFR8ad lFLzrX6bHf4x8oKn4WZ1XCvnUV6qaWAqSowv+U4lF+S/rXt/LLyfrSv36r+fhN6v+xbx df+iKtR52liGmvTrU37JbMYNX7CHfj6hbXGAEXKIGn5/mOtGl1de8wKcXwb7DzsuitnC bk7Cip0OoDMHQmYqJSTk5+cXA2aIJBAONPCS0jpgLpCn93yZIc+D0HsbXhCTKTwuRtye afmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Vw/Z0MTuWn6e5jQEzUWaV6QYfzZaWvqRsRp1agJKyw4=; b=z1r16Zj9tgEpn1JXenCZzVU+WuA3MTDtnFNmFXzMY4sk6b6EzjeZs9FwJ+K4e4b8QY b8lSsyhlsKcF32LxUFZrJGAYD67K5EToQbjpzS94+xKD2DlZVKGk8yo/zfBKP2ki6/Na r2hCKgZVMx+H+vSuMirZpmC7GwoVF/05H/7uvH3A0EspkQm43onM3j5ayY1kc1pwfRUD QpxpTiz0LhNhDsbX0zScSu4TfS1wkqwCxBqcn1fvQSF/2Ai6vJJGLUda6NamgI7dbnYb 7XG072CcgU9HkUGpH2YbDLdcBfmtsE4VYPOf0Hyh/6kZPSXaGoCA9bNIWUb/ljceAJlg Q9LQ== X-Gm-Message-State: AOAM530uUpqZ09DsVoktCKdIWbbBJJDPf9RR1Ky1UZaVc1YBuFwmMNq3 yQPgNNVTeENvU8DBXlEoYeYiRwKfxe3I5p1nsiaoJw== X-Received: by 2002:a25:69c7:0:b0:64f:674a:87d6 with SMTP id e190-20020a2569c7000000b0064f674a87d6mr28768174ybc.301.1653580674612; Thu, 26 May 2022 08:57:54 -0700 (PDT) MIME-Version: 1.0 References: <20220525230904.1584480-1-bgardon@google.com> <20220526013010.ag4jzs7bbt5mudrg@yy-desk-7060> In-Reply-To: <20220526013010.ag4jzs7bbt5mudrg@yy-desk-7060> From: Ben Gardon Date: Thu, 26 May 2022 08:57:43 -0700 Message-ID: Subject: Re: [PATCH] KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging To: Yuan Yao Cc: kvm , Paolo Bonzini , LKML , Peter Xu , Sean Christopherson , David Matlack , Jim Mattson , David Dunn , Jing Zhang , Junaid Shahid Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 25, 2022 at 6:30 PM Yuan Yao wrote: > > On Wed, May 25, 2022 at 11:09:04PM +0000, Ben Gardon wrote: > > When disabling dirty logging, zap non-leaf parent entries to allow > > replacement with huge pages instead of recursing and zapping all of the > > child, leaf entries. This reduces the number of TLB flushes required. > > > > Currently disabling dirty logging with the TDP MMU is extremely slow. > > On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds > > to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with > > the shadow MMU. This patch reduces the disable dirty log time with the > > TDP MMU to ~3 seconds. > > > > Testing: > > Ran KVM selftests and kvm-unit-tests on an Intel Haswell. This > > patch introduced no new failures. > > > > Signed-off-by: Ben Gardon > > --- > > arch/x86/kvm/mmu/tdp_iter.c | 9 +++++++++ > > arch/x86/kvm/mmu/tdp_iter.h | 1 + > > arch/x86/kvm/mmu/tdp_mmu.c | 38 +++++++++++++++++++++++++++++++------ > > 3 files changed, 42 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c > > index 6d3b3e5a5533..ee4802d7b36c 100644 > > --- a/arch/x86/kvm/mmu/tdp_iter.c > > +++ b/arch/x86/kvm/mmu/tdp_iter.c > > @@ -145,6 +145,15 @@ static bool try_step_up(struct tdp_iter *iter) > > return true; > > } > > > > +/* > > + * Step the iterator back up a level in the paging structure. Should only be > > + * used when the iterator is below the root level. > > + */ > > +void tdp_iter_step_up(struct tdp_iter *iter) > > +{ > > + WARN_ON(!try_step_up(iter)); > > +} > > + > > /* > > * Step to the next SPTE in a pre-order traversal of the paging structure. > > * To get to the next SPTE, the iterator either steps down towards the goal > > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h > > index f0af385c56e0..adfca0cf94d3 100644 > > --- a/arch/x86/kvm/mmu/tdp_iter.h > > +++ b/arch/x86/kvm/mmu/tdp_iter.h > > @@ -114,5 +114,6 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root, > > int min_level, gfn_t next_last_level_gfn); > > void tdp_iter_next(struct tdp_iter *iter); > > void tdp_iter_restart(struct tdp_iter *iter); > > +void tdp_iter_step_up(struct tdp_iter *iter); > > > > #endif /* __KVM_X86_MMU_TDP_ITER_H */ > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > > index 841feaa48be5..7b9265d67131 100644 > > --- a/arch/x86/kvm/mmu/tdp_mmu.c > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > > @@ -1742,12 +1742,12 @@ static void zap_collapsible_spte_range(struct kvm *kvm, > > gfn_t start = slot->base_gfn; > > gfn_t end = start + slot->npages; > > struct tdp_iter iter; > > + int max_mapping_level; > > kvm_pfn_t pfn; > > > > rcu_read_lock(); > > > > tdp_root_for_each_pte(iter, root, start, end) { > > -retry: > > if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) > > continue; > > > > @@ -1755,15 +1755,41 @@ static void zap_collapsible_spte_range(struct kvm *kvm, > > !is_last_spte(iter.old_spte, iter.level)) > > continue; > > > > + /* > > + * This is a leaf SPTE. Check if the PFN it maps can > > + * be mapped at a higher level. > > + */ > > pfn = spte_to_pfn(iter.old_spte); > > - if (kvm_is_reserved_pfn(pfn) || > > - iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn, > > - pfn, PG_LEVEL_NUM)) > > + > > + if (kvm_is_reserved_pfn(pfn)) > > continue; > > > > + max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, > > + iter.gfn, pfn, PG_LEVEL_NUM); > > + > > + WARN_ON(max_mapping_level < iter.level); > > + > > + /* > > + * If this page is already mapped at the highest > > + * viable level, there's nothing more to do. > > + */ > > + if (max_mapping_level == iter.level) > > + continue; > > + > > + /* > > + * The page can be remapped at a higher level, so step > > + * up to zap the parent SPTE. > > + */ > > + while (max_mapping_level > iter.level) > > + tdp_iter_step_up(&iter); > > So the benefit from this is: > Before: Zap 512 ptes in 4K level page table do TLB flush 512 times. > Now: Zap higher level 1 2MB level pte do TLB flush 1 time, event > it also handles all 512 lower level 4K ptes, but just atomic operation > there, see handle_removed_pt(). > > Is my understanding correct ? Yes, that's exactly right. > > > + > > /* Note, a successful atomic zap also does a remote TLB flush. */ > > - if (tdp_mmu_zap_spte_atomic(kvm, &iter)) > > - goto retry; > > + tdp_mmu_zap_spte_atomic(kvm, &iter); > > + > > + /* > > + * If the atomic zap fails, the iter will recurse back into > > + * the same subtree to retry. > > + */ > > } > > > > rcu_read_unlock(); > > -- > > 2.36.1.124.g0e6072fb45-goog > >