Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp194710ioo; Thu, 26 May 2022 01:16:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyBrAz7cnxOK0fKI4ie+J3mdKKokOnZbRq/N4h+gqdS4cutLN0rVYMFk0VIxG7FAK45bk0x X-Received: by 2002:a05:6a00:88f:b0:510:7a49:b72f with SMTP id q15-20020a056a00088f00b005107a49b72fmr37442159pfj.21.1653553003421; Thu, 26 May 2022 01:16:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653553003; cv=none; d=google.com; s=arc-20160816; b=0USECJ+VPChz7NNb9GbOudmrim2gOb9XoUzwws/3gvDNxNpn0OMax5j1tDs035IWcp pe0V1QVdLQPTpMow2wmRiSJo/+6JXOBOTJDuUyEoa9EbolanjjJ420MMODF2+lxQ9BBO OQJgRpCSc71G2DSva0IjlnJ1yCxoA7tJaYTbcXKTrJRnCfJ+Ry6AWzA5TY4/E2qU6UT/ K4W2+C0J3/SzcB9dLW9M8ej4GfoS9DAefWAfl4LiclhavV3jQ8fSAMllzL4teugntcrO Njqcre7uYqI5oSF2GCMi9fR1Tgp+hMx44ZI8q2E9AIh0nb6+v5PT/jSrkNTj3Cg9aWbr sDEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=3HuIP+nqyRF8L6y+wKn2Kh5on79F68UHyTPuqohrHwM=; b=I812Hb/M6WjpGblTPHyQj39/xh8vqoVtWc3tYvQd23ILBAz9JoPGiVdXI6z5ebCNLc 7UvvVIRLe8vF2BpFOIgEfLRA0hMstovGpkpg7cBiE42AWypHawtdLOcAZbJSduwhU86p tYkMco5uQ4BmhH70BhrqqLMQsAE5z/0kFOJrqGpdd/qpd+hWKucDMH4mhHFhV+RycrKD LDg39hIy3mnYR8Ugmn9Qx3dmqGm53r6aH8vSFpRy7iRnY+Alp7gKU0Qtne3plk5TOYxS sWMVcAW1Gher1LjIF937A+Dh/ZAGtJvYV5XMBmh7MbJHtFkHsG5cDWWJUXzjwJ5wdNba Jf6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=q91bENJS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o190-20020a625ac7000000b005184f87b172si989983pfb.268.2022.05.26.01.16.32; Thu, 26 May 2022 01:16:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=q91bENJS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344722AbiEYXMn (ORCPT + 99 others); Wed, 25 May 2022 19:12:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232384AbiEYXMl (ORCPT ); Wed, 25 May 2022 19:12:41 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F38CA5AAC for ; Wed, 25 May 2022 16:12:40 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id o80so302637ybg.1 for ; Wed, 25 May 2022 16:12:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3HuIP+nqyRF8L6y+wKn2Kh5on79F68UHyTPuqohrHwM=; b=q91bENJS36gM/g/TdQqB4Dce/yHUDCheW9UcqVc+r6FHL58HqxubbiTYJ3FLTW44a2 +G0B4dqus4vH2qKiK7YGQzVwYjYuZfkgN1tRqGiOFaH1mnar8LzaYXNr7y8YH9U4qnAq 3FGscSxCQ+AtZXwiAgfpgZsEub7cJo+J8zZ2xzGjRp/l1Ubf+/XGGmHM6NN4IcKXjnyI ddy7gxQWa6WevgyCmjZpE79fJPgQjmXMqKRuQrmrksQY+/qh9r50ao0y/GPWmKKSfIgc M7DrvoDulU6PKEb6YUv2EigGAlfTeoZrM1pn31EZ5+COO/wq7/ppBRWfwPnVMYHBosQL 6oOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3HuIP+nqyRF8L6y+wKn2Kh5on79F68UHyTPuqohrHwM=; b=nT8qaCgX4YqZkiUnM0vZvSU90/iftiU/K9RZ2UbG065wT0uPV6hDJmOK062RsAFbnA rr/f0iQInIX62SvZ9LsCAU2Q2PEuR36PwF5Zzjt9ChA5Xf6JmZiuuaxST7w4Wm2jRcyE A241cFZiQcBvvDgLLrqKXHiPnWcaWmeTcUgMn8dz0bCNA4Jn8dEybbnYhfkvkSQzxs0w tmTWVezr5VMGBMYlUzgA1R+b7a1YVC3aUyWUDLDqhzYoUs0ZfIXUeWrZLbqzYsGkPr5s AkQRq5gLgTlR+VeEEj5u/PtRHvKB3aVCiDHkW1hNLX8XhwCB3qW9srMuvGNfu5cD2++2 E3IA== X-Gm-Message-State: AOAM5319vBVRTBMBsPelZ9qDkyiAuODGUU6Q1yRFpvRk2K3HnzYFuH+R S+5TNiVKHlsaZuiumcd/jYb1NarrbHVURmhaN98J3g== X-Received: by 2002:a05:6902:1202:b0:64f:5209:f44d with SMTP id s2-20020a056902120200b0064f5209f44dmr30345604ybu.562.1653520359638; Wed, 25 May 2022 16:12:39 -0700 (PDT) MIME-Version: 1.0 References: <20220525230904.1584480-1-bgardon@google.com> In-Reply-To: <20220525230904.1584480-1-bgardon@google.com> From: Ben Gardon Date: Wed, 25 May 2022 16:12:28 -0700 Message-ID: Subject: Re: [PATCH] KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging To: kvm , Paolo Bonzini Cc: LKML , Peter Xu , Sean Christopherson , David Matlack , Jim Mattson , David Dunn , Jing Zhang , Junaid Shahid Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 25, 2022 at 4:09 PM Ben Gardon wrote: > > When disabling dirty logging, zap non-leaf parent entries to allow > replacement with huge pages instead of recursing and zapping all of the > child, leaf entries. This reduces the number of TLB flushes required. > > Currently disabling dirty logging with the TDP MMU is extremely slow. > On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds > to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with > the shadow MMU. This patch reduces the disable dirty log time with the > TDP MMU to ~3 seconds. > After Sean pointed out that I was changing the zapping scheme to zap non-leaf SPTEs in my in-place promotion series, I started to wonder if that would provide good-enough performance without the complexity of in-place promo. Turns out it does! This relatively simple patch gives essentially the same disable time as the in-place promo series. The main downside to this approach is that it does cause all the vCPUs to take page faults, so it may still be worth investigating in-place promotion. > Testing: > Ran KVM selftests and kvm-unit-tests on an Intel Haswell. This > patch introduced no new failures. > > Signed-off-by: Ben Gardon > --- > arch/x86/kvm/mmu/tdp_iter.c | 9 +++++++++ > arch/x86/kvm/mmu/tdp_iter.h | 1 + > arch/x86/kvm/mmu/tdp_mmu.c | 38 +++++++++++++++++++++++++++++++------ > 3 files changed, 42 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c > index 6d3b3e5a5533..ee4802d7b36c 100644 > --- a/arch/x86/kvm/mmu/tdp_iter.c > +++ b/arch/x86/kvm/mmu/tdp_iter.c > @@ -145,6 +145,15 @@ static bool try_step_up(struct tdp_iter *iter) > return true; > } > > +/* > + * Step the iterator back up a level in the paging structure. Should only be > + * used when the iterator is below the root level. > + */ > +void tdp_iter_step_up(struct tdp_iter *iter) > +{ > + WARN_ON(!try_step_up(iter)); > +} > + > /* > * Step to the next SPTE in a pre-order traversal of the paging structure. > * To get to the next SPTE, the iterator either steps down towards the goal > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h > index f0af385c56e0..adfca0cf94d3 100644 > --- a/arch/x86/kvm/mmu/tdp_iter.h > +++ b/arch/x86/kvm/mmu/tdp_iter.h > @@ -114,5 +114,6 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root, > int min_level, gfn_t next_last_level_gfn); > void tdp_iter_next(struct tdp_iter *iter); > void tdp_iter_restart(struct tdp_iter *iter); > +void tdp_iter_step_up(struct tdp_iter *iter); > > #endif /* __KVM_X86_MMU_TDP_ITER_H */ > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 841feaa48be5..7b9265d67131 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -1742,12 +1742,12 @@ static void zap_collapsible_spte_range(struct kvm *kvm, > gfn_t start = slot->base_gfn; > gfn_t end = start + slot->npages; > struct tdp_iter iter; > + int max_mapping_level; > kvm_pfn_t pfn; > > rcu_read_lock(); > > tdp_root_for_each_pte(iter, root, start, end) { > -retry: > if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) > continue; > > @@ -1755,15 +1755,41 @@ static void zap_collapsible_spte_range(struct kvm *kvm, > !is_last_spte(iter.old_spte, iter.level)) > continue; > > + /* > + * This is a leaf SPTE. Check if the PFN it maps can > + * be mapped at a higher level. > + */ > pfn = spte_to_pfn(iter.old_spte); > - if (kvm_is_reserved_pfn(pfn) || > - iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn, > - pfn, PG_LEVEL_NUM)) > + > + if (kvm_is_reserved_pfn(pfn)) > continue; > > + max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, > + iter.gfn, pfn, PG_LEVEL_NUM); > + > + WARN_ON(max_mapping_level < iter.level); > + > + /* > + * If this page is already mapped at the highest > + * viable level, there's nothing more to do. > + */ > + if (max_mapping_level == iter.level) > + continue; > + > + /* > + * The page can be remapped at a higher level, so step > + * up to zap the parent SPTE. > + */ > + while (max_mapping_level > iter.level) > + tdp_iter_step_up(&iter); > + > /* Note, a successful atomic zap also does a remote TLB flush. */ > - if (tdp_mmu_zap_spte_atomic(kvm, &iter)) > - goto retry; > + tdp_mmu_zap_spte_atomic(kvm, &iter); > + > + /* > + * If the atomic zap fails, the iter will recurse back into > + * the same subtree to retry. > + */ > } > > rcu_read_unlock(); > -- > 2.36.1.124.g0e6072fb45-goog >