Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1047857ioo; Thu, 26 May 2022 23:37:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNYQfnExR3YqWkbcmbnO3kL/OWyKkWOSruWumt3tKlnAmsw0K/KtxBAtUbLucbFNemtctP X-Received: by 2002:a62:6d47:0:b0:4fe:15fa:301d with SMTP id i68-20020a626d47000000b004fe15fa301dmr42073756pfc.29.1653633429923; Thu, 26 May 2022 23:37:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653633429; cv=none; d=google.com; s=arc-20160816; b=LvuQeJ2c79wIc0Z1RTUFjSmmpXd3ft8/tPIB57+XbyNgRpfTNVh7xC1nZ3v0CP6KQG Q0tMKQRlE3tGDfsz+8+aK1UU0nmtb3Ce/t/L5nXucbskn6J3A6V/enfq6I49IwRiFf7L JG9bF2dP2cJOyazefs9ruWwufKJEkDOSMM8e1BK8mUI+lG41hDaJ78XA9ORjFXd+c0FF Td1inPutm6zVbuq3m2Z3nKqaqay6BAAGIq9jGd5Rwbf6uyvFDi0dszn34n5zUOZzuT5E 1jbaPReJIQo1VTB5gGVxMDOF9bzaHa3lJDl+8s9LUAXIGqQdg0Njn8nTs0xaOSZxdJqI 3xoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :dkim-signature; bh=YrjJHRFdS5VonxkVsAHBuBVkWew0ibSghSiVOFylo/0=; b=wLFUdmP2h0NsivpJ7BKSdCgiyhkvwva1K15rfBNJS2nBzbLtEnO6NmlC/6MvgY178Y 34peb1rNbOo4cefWzqbCnCRhSmMgBx/DjhUo8W0yY6oLiUqqcTmsg70M729oaxT5ipSN 3DmHWVYQh5LboqGoDxezVLR6N31ZrfEwcah0Y5xNmMv5EmkLVikUZSsuUZ8EWsOtgHFU qhSWJgqwnduT5tjDv/9MW8OVmWpm6QhEKUAThpXqX7S4z4JUQkVBmevFAmATbH4BuoIy Vn04IOqOvefXF4He5MOkJzFFUViO3VVlgB+WyhOQSs2fLXUrKsACCKb5xwwXQMV0kzcF 1hwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=gFfiWRVG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j11-20020a17090276cb00b00153b2d16507si4132336plt.271.2022.05.26.23.36.57; Thu, 26 May 2022 23:37:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=gFfiWRVG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345198AbiEYXJM (ORCPT + 99 others); Wed, 25 May 2022 19:09:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343916AbiEYXJK (ORCPT ); Wed, 25 May 2022 19:09:10 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD7CF8FFBF for ; Wed, 25 May 2022 16:09:09 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id z18-20020a656112000000b003fa0ac4b723so5454825pgu.5 for ; Wed, 25 May 2022 16:09:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=YrjJHRFdS5VonxkVsAHBuBVkWew0ibSghSiVOFylo/0=; b=gFfiWRVGugXdj5kNECaC63se48jXOxl37x3pQVu/GVdoyQAVtuM/EK0DWrxIz5eUqb HBRcZZA/xAqyUbMfWJ13wlYg4NU2bBktbl2Oh4qqoJSR/gC6M6lpnAQhO84Z7ftzZVXF RjE2gdia/87jx/S7lxbRW3lGpFw4BL/jqQ6NmyogTRAZpLfx26cza5BdYkGwzDcfmBBv CWt2X3uKH3NZChj1746knGWBVJetjBjJ1hcqzxXAxRs3B0V00ZqLdeuaWczFVTmYf6vc 00RAmX0eOL3LR8LuFFB9GunMA6l/yWmQetsIsSxvOaKO0SxnO22FxhYc1GuihdOKPkof poqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=YrjJHRFdS5VonxkVsAHBuBVkWew0ibSghSiVOFylo/0=; b=y8lZVx6EEUy9efkbCCDtU9KsLWr9bBEd2xODK1ywGwCwjL7ce9tF05/89HC7GsdMiK 1X7Tv8h17nRtpvZvqHaORzdHfSfmYpF0Fn74DCGw9ZAH8mBr6nuV2wgAjMB/RYpYlMA/ G67Jx2G2RxM7caWm97ZOxjWp+eM6Ud996rlXxTM4RynTdtIbpC9DfKyBe0MnaR+wmsE/ Xs6bTs+PNzymmDSeAEqyo9h4oAVgcs4yYsoXSNS0+rqElbjFepG8AojPhxZjEoDbzToD gB+aZzjYORSeraXLlDSdji5TS5HXaMdmNm4RC6TxluyIra7PJpJxB2eqmOb4ORE8KLO7 o3QA== X-Gm-Message-State: AOAM532izQQ5M/LkZBmhBtHbiF3jrwP3VJyhIC+6ptj8hO+DHaHp3hbt O8VS00PMz41BxsxR3ikiDvDT7621tMce X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45]) (user=bgardon job=sendgmr) by 2002:aa7:8081:0:b0:518:26c4:ea42 with SMTP id v1-20020aa78081000000b0051826c4ea42mr29589288pff.7.1653520149143; Wed, 25 May 2022 16:09:09 -0700 (PDT) Date: Wed, 25 May 2022 23:09:04 +0000 Message-Id: <20220525230904.1584480-1-bgardon@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.36.1.124.g0e6072fb45-goog Subject: [PATCH] KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging From: Ben Gardon To: kvm@vger.kernel.org, Paolo Bonzini Cc: linux-kernel@vger.kernel.org, Peter Xu , Sean Christopherson , David Matlack , Jim Mattson , David Dunn , Jing Zhang , Junaid Shahid , Ben Gardon Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When disabling dirty logging, zap non-leaf parent entries to allow replacement with huge pages instead of recursing and zapping all of the child, leaf entries. This reduces the number of TLB flushes required. Currently disabling dirty logging with the TDP MMU is extremely slow. On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with the shadow MMU. This patch reduces the disable dirty log time with the TDP MMU to ~3 seconds. Testing: Ran KVM selftests and kvm-unit-tests on an Intel Haswell. This patch introduced no new failures. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu/tdp_iter.c | 9 +++++++++ arch/x86/kvm/mmu/tdp_iter.h | 1 + arch/x86/kvm/mmu/tdp_mmu.c | 38 +++++++++++++++++++++++++++++++------ 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index 6d3b3e5a5533..ee4802d7b36c 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -145,6 +145,15 @@ static bool try_step_up(struct tdp_iter *iter) return true; } +/* + * Step the iterator back up a level in the paging structure. Should only be + * used when the iterator is below the root level. + */ +void tdp_iter_step_up(struct tdp_iter *iter) +{ + WARN_ON(!try_step_up(iter)); +} + /* * Step to the next SPTE in a pre-order traversal of the paging structure. * To get to the next SPTE, the iterator either steps down towards the goal diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index f0af385c56e0..adfca0cf94d3 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -114,5 +114,6 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root, int min_level, gfn_t next_last_level_gfn); void tdp_iter_next(struct tdp_iter *iter); void tdp_iter_restart(struct tdp_iter *iter); +void tdp_iter_step_up(struct tdp_iter *iter); #endif /* __KVM_X86_MMU_TDP_ITER_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 841feaa48be5..7b9265d67131 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1742,12 +1742,12 @@ static void zap_collapsible_spte_range(struct kvm *kvm, gfn_t start = slot->base_gfn; gfn_t end = start + slot->npages; struct tdp_iter iter; + int max_mapping_level; kvm_pfn_t pfn; rcu_read_lock(); tdp_root_for_each_pte(iter, root, start, end) { -retry: if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) continue; @@ -1755,15 +1755,41 @@ static void zap_collapsible_spte_range(struct kvm *kvm, !is_last_spte(iter.old_spte, iter.level)) continue; + /* + * This is a leaf SPTE. Check if the PFN it maps can + * be mapped at a higher level. + */ pfn = spte_to_pfn(iter.old_spte); - if (kvm_is_reserved_pfn(pfn) || - iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn, - pfn, PG_LEVEL_NUM)) + + if (kvm_is_reserved_pfn(pfn)) continue; + max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, + iter.gfn, pfn, PG_LEVEL_NUM); + + WARN_ON(max_mapping_level < iter.level); + + /* + * If this page is already mapped at the highest + * viable level, there's nothing more to do. + */ + if (max_mapping_level == iter.level) + continue; + + /* + * The page can be remapped at a higher level, so step + * up to zap the parent SPTE. + */ + while (max_mapping_level > iter.level) + tdp_iter_step_up(&iter); + /* Note, a successful atomic zap also does a remote TLB flush. */ - if (tdp_mmu_zap_spte_atomic(kvm, &iter)) - goto retry; + tdp_mmu_zap_spte_atomic(kvm, &iter); + + /* + * If the atomic zap fails, the iter will recurse back into + * the same subtree to retry. + */ } rcu_read_unlock(); -- 2.36.1.124.g0e6072fb45-goog