Received: by 2002:ac2:464d:0:0:0:0:0 with SMTP id s13csp1993833lfo; Sat, 28 May 2022 13:02:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwro1CJmOGu4wRinaiFXAb1gROpiQvIF6chKzUBXkEV1xAEV6WE2J1IueoIqz9CsRmSB/ei X-Received: by 2002:a17:90b:3509:b0:1e0:5ecd:f72f with SMTP id ls9-20020a17090b350900b001e05ecdf72fmr14974363pjb.194.1653768173180; Sat, 28 May 2022 13:02:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653768173; cv=none; d=google.com; s=arc-20160816; b=HtUa8q5g/EMedamwJ6le1Owiw9qlImqtPh3UXiQC7uqeayk4pcSCVZITyG4q7OHIlZ Vc8nVaiH1s1GReSxhrB04Ffle0qMzMMdcQnLjuhQaxw6LmyHxLypu+hgJ86CP1lePIKp pRcOVMdrLELNl8LovsIHxxVqDkV2HwnVUmVJMjmrYbijgGbHm2UiRXG58Z9EHf58MB8Y NUjeMHhZe3JRMivfIIKuS2sY6PuyJzX4KP4V8M7h7JrBLCsxi8I4QM9oNhyqhyfaszat nKnPtVroSrXUdVz58xe+0AX9UE2fvF/fHp/BxQMS09+yzvSsOb4vZoeNQHPv0WgDLmvd 06Zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Qm/bVNNdXqTapnzq5a3prBU390ldFSuOXJiLpWHawkQ=; b=D7C8XJ59E499xHu1tdzbjCZeIa7QI1E3uyxEVSuNGqyf/r/NFXVPW5RsnZq5cJoiUC /ZfDAUeMas9iWn5c9iXex/FtoL1rVFMPXfSo/EGHvkJU/lZItoMaSLXR3UBx1SmKHfQC x4gyRebf3YWSp8V5gkANZJlWJMEIPzUy1ux1kDt9+rTIS4TrbJmAoEEEf9z7HjTfPR4V GMu3ngQU3ZHjRG5R7WDbQySeNUj7x0nrPjVNv89buuTN7p/Ku/gCN4yQ1AbU5zrstRVV 2LC/XPmO7n+9Vt+zreus5fNx4zxiSmms5ORpXKyMGYzRfDEFNJjy52axkVxKBry/Kh4f 04IA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=R+x1MUxR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id z11-20020a630a4b000000b003fa4f57122dsi9744074pgk.773.2022.05.28.13.02.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 13:02:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=R+x1MUxR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EEA7AFD34B; Sat, 28 May 2022 12:20:11 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242469AbiEZQkc (ORCPT + 99 others); Thu, 26 May 2022 12:40:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229800AbiEZQka (ORCPT ); Thu, 26 May 2022 12:40:30 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 297B83DA53 for ; Thu, 26 May 2022 09:40:29 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id a9so1723766pgv.12 for ; Thu, 26 May 2022 09:40:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Qm/bVNNdXqTapnzq5a3prBU390ldFSuOXJiLpWHawkQ=; b=R+x1MUxR33XfpedDwvQaiuAUW+Q7ZEcX45T5t+DjlDw+0CKO6rEsBRh61K2L5VY7B0 lZfl1BBiXcxYE8Mruw1/QPLIUOXDeQ/hc7m8dtGqLKVnRX0zjb7Uexm2wyMOxftJi9tC f56c9oAqTf0cBZY2+H9f9rE5oWA9ikXykDr4Ghw/bAhX6JahCuIH3yZ/8ih8I8r/2AG6 8t5MrmFIJYmmjmPfNwXUwqYyJh+ACyMziAzj5dXUU7LeITI5IANrDMSaWiyOeJbgKjYh MkA8ivnsvzjqXpv2k/bYgr4jb+QuoYuzTDszQaqOB+E/lRytNu5W2LhpBtBaDXyy+tSh IHCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Qm/bVNNdXqTapnzq5a3prBU390ldFSuOXJiLpWHawkQ=; b=xOjRP8Yh85XeOd+Di/T/V0U11Vyj3bk5khJmbaaQg7jY14hI5LuIDUA/lbPZPxugoF f7GKX/nauNkr8nsMxZP4Rl0xXYlhrJObZDtBZKWfLFmxq9XCcbtwoVC63LchQGl8TMDC nlkMy3SiYdLaZgfR2j66XFlBkDRj2U4cEoBV2WrmnDFJ4PDWYcLhnKN57T9K/139kVkH Wdm+EWJvnIwqjzc/m0pHvjJ/HkwurVGis2xgcdExDp7O4xUaMmjTm4G14ntOmpY6J2TH s28yNEaBKORGLXLm8n/4ZHXa+OBSIGW8H9HT4rhqmfnVlPAp0V1wbURqcm5/rjUwR40z KgMw== X-Gm-Message-State: AOAM532XXVk7huwMkp0WZ3i8mcOfnIiG3H0QlwYCCqwp7UChCgtXoEhh VVfEoS04JL10KaOxUew/uFJsdQ== X-Received: by 2002:a05:6a00:1411:b0:4fd:e594:fac0 with SMTP id l17-20020a056a00141100b004fde594fac0mr39518591pfu.79.1653583228391; Thu, 26 May 2022 09:40:28 -0700 (PDT) Received: from google.com (254.80.82.34.bc.googleusercontent.com. [34.82.80.254]) by smtp.gmail.com with ESMTPSA id h18-20020a056a00231200b005104c6d7941sm1767647pfh.31.2022.05.26.09.40.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 May 2022 09:40:27 -0700 (PDT) Date: Thu, 26 May 2022 16:40:23 +0000 From: David Matlack To: Ben Gardon Cc: kvm@vger.kernel.org, Paolo Bonzini , linux-kernel@vger.kernel.org, Peter Xu , Sean Christopherson , Jim Mattson , David Dunn , Jing Zhang , Junaid Shahid Subject: Re: [PATCH] KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging Message-ID: References: <20220525230904.1584480-1-bgardon@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220525230904.1584480-1-bgardon@google.com> X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 25, 2022 at 11:09:04PM +0000, Ben Gardon wrote: > When disabling dirty logging, zap non-leaf parent entries to allow > replacement with huge pages instead of recursing and zapping all of the > child, leaf entries. This reduces the number of TLB flushes required. > > Currently disabling dirty logging with the TDP MMU is extremely slow. > On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds > to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with > the shadow MMU. This patch reduces the disable dirty log time with the > TDP MMU to ~3 seconds. Nice! It'd be good to also mention the new WARN. e.g. Opportunistically add a WARN() to catch GFNS that are mapped at a higher level than their max level. > > Testing: > Ran KVM selftests and kvm-unit-tests on an Intel Haswell. This > patch introduced no new failures. > > Signed-off-by: Ben Gardon Reviewed-by: David Matlack > --- > arch/x86/kvm/mmu/tdp_iter.c | 9 +++++++++ > arch/x86/kvm/mmu/tdp_iter.h | 1 + > arch/x86/kvm/mmu/tdp_mmu.c | 38 +++++++++++++++++++++++++++++++------ > 3 files changed, 42 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c > index 6d3b3e5a5533..ee4802d7b36c 100644 > --- a/arch/x86/kvm/mmu/tdp_iter.c > +++ b/arch/x86/kvm/mmu/tdp_iter.c > @@ -145,6 +145,15 @@ static bool try_step_up(struct tdp_iter *iter) > return true; > } > > +/* > + * Step the iterator back up a level in the paging structure. Should only be > + * used when the iterator is below the root level. > + */ > +void tdp_iter_step_up(struct tdp_iter *iter) > +{ > + WARN_ON(!try_step_up(iter)); > +} > + > /* > * Step to the next SPTE in a pre-order traversal of the paging structure. > * To get to the next SPTE, the iterator either steps down towards the goal > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h > index f0af385c56e0..adfca0cf94d3 100644 > --- a/arch/x86/kvm/mmu/tdp_iter.h > +++ b/arch/x86/kvm/mmu/tdp_iter.h > @@ -114,5 +114,6 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root, > int min_level, gfn_t next_last_level_gfn); > void tdp_iter_next(struct tdp_iter *iter); > void tdp_iter_restart(struct tdp_iter *iter); > +void tdp_iter_step_up(struct tdp_iter *iter); > > #endif /* __KVM_X86_MMU_TDP_ITER_H */ > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 841feaa48be5..7b9265d67131 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -1742,12 +1742,12 @@ static void zap_collapsible_spte_range(struct kvm *kvm, > gfn_t start = slot->base_gfn; > gfn_t end = start + slot->npages; > struct tdp_iter iter; > + int max_mapping_level; > kvm_pfn_t pfn; > > rcu_read_lock(); > > tdp_root_for_each_pte(iter, root, start, end) { > -retry: > if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) > continue; > > @@ -1755,15 +1755,41 @@ static void zap_collapsible_spte_range(struct kvm *kvm, > !is_last_spte(iter.old_spte, iter.level)) > continue; > > + /* > + * This is a leaf SPTE. Check if the PFN it maps can > + * be mapped at a higher level. > + */ > pfn = spte_to_pfn(iter.old_spte); > - if (kvm_is_reserved_pfn(pfn) || > - iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn, > - pfn, PG_LEVEL_NUM)) > + > + if (kvm_is_reserved_pfn(pfn)) > continue; > > + max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, > + iter.gfn, pfn, PG_LEVEL_NUM); > + > + WARN_ON(max_mapping_level < iter.level); > + > + /* > + * If this page is already mapped at the highest > + * viable level, there's nothing more to do. > + */ > + if (max_mapping_level == iter.level) > + continue; > + > + /* > + * The page can be remapped at a higher level, so step > + * up to zap the parent SPTE. > + */ > + while (max_mapping_level > iter.level) > + tdp_iter_step_up(&iter); > + > /* Note, a successful atomic zap also does a remote TLB flush. */ > - if (tdp_mmu_zap_spte_atomic(kvm, &iter)) > - goto retry; > + tdp_mmu_zap_spte_atomic(kvm, &iter); > + > + /* > + * If the atomic zap fails, the iter will recurse back into > + * the same subtree to retry. > + */ > } > > rcu_read_unlock(); > -- > 2.36.1.124.g0e6072fb45-goog >