Received: by 2002:a05:7412:6592:b0:d7:7d3a:4fe2 with SMTP id m18csp804168rdg; Thu, 10 Aug 2023 23:17:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IELMJ6syuOYgeR1jfNc7UK8tJGqLHwvbWe66QVtlBzOUX+Sf3YgK0dQhN5AZl4zE6KdwsOW X-Received: by 2002:a17:90a:c42:b0:268:5431:f2a7 with SMTP id u2-20020a17090a0c4200b002685431f2a7mr497496pje.28.1691734672154; Thu, 10 Aug 2023 23:17:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691734672; cv=none; d=google.com; s=arc-20160816; b=ck0JHenIIDV2xH0FPM+ZMbU2Sd6z56uOdy46toU1DqnH/pbvezbh+CD/ISbstq0+u2 8eGOzdB7eUWpPZRp+u1HzaoqVJIiyGkybdHT7IFJGZLD23IAAIRuVKyjIoqpEOKZks0x UfESMs4GrpZFNImk5eYfXvbB9ly3aZHsS2zzSyq99EFJmAhXn/1IrW3J+6tnuuRQ/S87 BuKA8L+tsWre9QsJ5mY00jrmKpFS0LPj+JsynFEL8ta+78JvQVE0leWdgq739/8pwAE8 YecsGu4ZZS2VGIX3H9WJgb8nA2dA8joI6EfRsztdupV8OIrtArSW9uBJY1JygBDQJuy3 m3GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=VS0CCBIXj2dXRaleivgkgdvUTnqHpof7/ktb+RvS/Q0=; fh=JR+gzu7F1SrtWWbHsiT7UPttLnQsOlcDdqVQ8y19PD0=; b=J+c10KRS9Fzv5K3sza2kTv6UViVM7ydh5rDDwktPDgKRi6wQfA+IT8dsiY4Q1CfN5l FnVkFqmzhhHJbAEQ9ohY5oz1P8PQKvNsVhMHD4kJ38HYFrlG5vn6VKvH9szMwxio87ae HFgqO1bPJRpyS26B/W+CQ7PjNARE6qvnEOw3l8ne36201fuQLlbvJWF90eFs00U8KBr7 FwVU8yc2HvcoL+Q/0QnrwcyvNHiA2CEQh+H1xtpPP1NdXKXrXhfzb5CvYrhmuKK2SS9b 01hCgXs2EPx97G7kcGYC4QhjCPC3gcRnfDFhkoGostjpAkRtU912tlYydxtuEEXpm6Jl Fnow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="Xs/XQE9Y"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga16-20020a17090b039000b0026b276117a5si136663pjb.0.2023.08.10.23.17.39; Thu, 10 Aug 2023 23:17:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="Xs/XQE9Y"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233273AbjHKEvm (ORCPT + 99 others); Fri, 11 Aug 2023 00:51:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229499AbjHKEvl (ORCPT ); Fri, 11 Aug 2023 00:51:41 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A2EC2727 for ; Thu, 10 Aug 2023 21:51:36 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-583c49018c6so20300837b3.0 for ; Thu, 10 Aug 2023 21:51:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691729495; x=1692334295; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=VS0CCBIXj2dXRaleivgkgdvUTnqHpof7/ktb+RvS/Q0=; b=Xs/XQE9YeCWvYTqdPeTfLYpyNY+5vqWMaDD/9s8nntNtSpFE1SquqGb3Y6nXv4gzZF Y1G+5HOIs2773hFQjBIvR0YvWdVzIwPRq/p+8KxLOx7KlL80m9UdfaK7TN9LWhx0OuYC XZ6VA+QC6JksKnvDfv+h2priALiVJBzdzgzVbgwDXndoy0tnctTrav/sLhV1rurI0/0o vJ4D00GOGTWUJRwpsrG7AZCI4X4qdWawWeAJdjmvfDrea90E0hCVsc879pXRENm8zjFC 4U4S1UcTDRbsssSnIiwfJsGpQdvlY33tr3o1TUkTIs1Cn5DRl5+09ZD4Yuz0qok6NGXf gXhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691729495; x=1692334295; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=VS0CCBIXj2dXRaleivgkgdvUTnqHpof7/ktb+RvS/Q0=; b=c2WCw+wgiSDsjfOq12sFhiSr6ihl6M1+psT9ou33ryrqBNyGcX2z4o84vpnxmAPjyu Y/E10tkPbAqGo9YV+SRAHBnafNSk28pliJlOJMxunX/wjkdRwnFv04bxCL0ygtlSMimS mDJw82yFNzkL9rrWuSMvPn9YLDK4/ZO2KP2FNgiJ9gixq0MqQFjm2BcXkqUkdJOdqjDr VdW+yFPo5ggrk4yJP/MbtmoT1o2b2GvaqfLFIfR5/G+7/aRS7iYU3wZi+ab9l0piiazW bmY2bkMCocTKf/vscMDeUG7LlKI/ec6WlDlKVn7RHc3S1ZbLWDlOs5araBlQIhpK+LMx eIQQ== X-Gm-Message-State: AOJu0YzCi0yHo2SIgu8zFrpgCinRpfdaUbWtKGT82bJ7gD2QNYtGvyeO 6LQ6q/wCjDPAwQrafOu5FEKBuJ+sKTkR X-Received: from rananta-linux.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:22b5]) (user=rananta job=sendgmr) by 2002:a81:4509:0:b0:565:9e73:f937 with SMTP id s9-20020a814509000000b005659e73f937mr13188ywa.4.1691729495434; Thu, 10 Aug 2023 21:51:35 -0700 (PDT) Date: Fri, 11 Aug 2023 04:51:13 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.41.0.640.ga95def55d0-goog Message-ID: <20230811045127.3308641-1-rananta@google.com> Subject: [PATCH v9 00/14] KVM: arm64: Add support for FEAT_TLBIRANGE From: Raghavendra Rao Ananta To: Oliver Upton , Marc Zyngier , James Morse , Suzuki K Poulose Cc: Paolo Bonzini , Sean Christopherson , Huacai Chen , Zenghui Yu , Anup Patel , Atish Patra , Jing Zhang , Reiji Watanabe , Colton Lewis , Raghavendra Rao Anata , David Matlack , Fuad Tabba , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In certain code paths, KVM/ARM currently invalidates the entire VM's page-tables instead of just invalidating a necessary range. For example, when collapsing a table PTE to a block PTE, instead of iterating over each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to flush all the entries. This is inefficient since the guest would have to refill the TLBs again, even for the addresses that aren't covered by the table entry. The performance impact would scale poorly if many addresses in the VM is going through this remapping. For architectures that implement FEAT_TLBIRANGE, KVM can replace such inefficient paths by performing the invalidations only on the range of addresses that are in scope. This series tries to achieve the same in the areas of stage-2 map, unmap and write-protecting the pages. As suggested by Oliver in the original v5 of the series [1], I'm posting the series by including v2 of David Matlack's 'KVM: Add a common API for range-based TLB invalidation' series [2]. Patches 1-6 includes David M.'s patches 1, 2, 6, and 7 from [2], with minor modifications as per upstream comments. Patch-7 refactors the core arm64's __flush_tlb_range() to be used by other entities and patch-8 introduces a wrapper over it, __flush_s2_tlb_range_op(), more suited for stage-2 flushes. Patch-9,10 adds a range-based TLBI mechanism for KVM (VHE and nVHE). Patch-11 implements the kvm_arch_flush_remote_tlbs_range() for arm64. Patch-12 aims to flush only the memslot that undergoes a write-protect, instead of the entire VM. Patch-13 operates on stage2_try_break_pte() to use the range based TLBI instructions when collapsing a table entry. The map path is the immediate consumer of this when KVM remaps a table entry into a block. Patch-14 modifies the stage-2 unmap path in which, if the system supports FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table. walk. Instead it's done in one go after the entire walk is finished. The series is based off of upstream v6.5-rc1. The performance evaluation was done on a hardware that supports FEAT_TLBIRANGE, on a VHE configuration, using a modified kvm_page_table_test. The modified version updates the guest code in the ADJUST_MAPPINGS case to not only access this page but also to access up to 512 pages backwards for every new page it iterates through. This is done to test the effect of TLBI misses after KVM has handled a fault. The series captures the impact in the map and unmap paths as described above. $ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i +--------+------------------------------+------------------------------+ | mem_sz | ADJUST_MAPPINGS (s) | Unmap VM (s) | | (GB) | Baseline | Baseline + series | Baseline | Baseline + series | +--------+----------|-------------------+------------------------------+ | 1 | 3.33 | 3.22 | 0.009 | 0.005 | | 2 | 7.39 | 7.32 | 0.012 | 0.006 | | 4 | 13.49 | 10.50 | 0.017 | 0.008 | | 8 | 21.60 | 21.50 | 0.027 | 0.011 | | 16 | 57.02 | 43.63 | 0.046 | 0.018 | | 32 | 95.92 | 83.26 | 0.087 | 0.030 | | 64 | 199.57 | 165.14 | 0.146 | 0.055 | | 128 | 423.65 | 349.37 | 0.280 | 0.100 | +--------+----------+-------------------+----------+-------------------+ $ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i +--------+------------------------------+ | vCPUs | ADJUST_MAPPINGS (s) | | | Baseline | Baseline + series | +--------+----------|-------------------+ | 1 | 111.44 | 114.63 | | 2 | 102.88 | 74.64 | | 4 | 134.83 | 98.78 | | 8 | 98.81 | 95.01 | | 16 | 127.41 | 99.05 | | 32 | 105.35 | 91.75 | | 64 | 201.13 | 163.63 | | 128 | 423.65 | 349.37 | +--------+----------+-------------------+ For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to 2M hugepages, the series sees an average improvement of ~15%. For unmapping 2M hugepages, we see a gain of 2x to 3x. $ kvm_page_table_test -m 2 -b $i +--------+------------------------------+ | mem_sz | Unmap VM (s) | | (GB) | Baseline | Baseline + series | +--------+------------------------------+ | 1 | 0.54 | 0.13 | | 2 | 1.07 | 0.25 | | 4 | 2.10 | 0.47 | | 8 | 4.19 | 0.92 | | 16 | 8.35 | 1.92 | | 32 | 16.66 | 3.61 | | 64 | 32.36 | 7.62 | | 128 | 64.65 | 14.39 | +--------+----------+-------------------+ The series sees an average gain of 4x when the guest backed by PAGE_SIZE (4K) pages. Other testing: - Booted on x86_64 and ran KVM selftests. - Build tested for MIPS and RISCV architectures with malta_kvm_defconfig and defconfig, respectively. Cc: David Matlack v8: https://lore.kernel.org/all/20230808231330.3855936-1-rananta@google.com/ - Thanks for the suggestions from Gavin, Sean, and Shaoqin - Fix the build error reported by the kernel test robot in patch-11/14. - Fix typo in patch-9/14. (Gavin) - Renamed 'start_gfn' argument to 'gfn' in patch-5/14. (Gavin) v7: https://lore.kernel.org/all/20230722022251.3446223-1-rananta@google.com/ Thanks, Marc and Sean for the reviews and suggestions - Made the function declaration for kvm_arch_flush_remote_tlbs() global. (Marc, Sean) - Rename 'pages' to 'nr_pages' in TLBI functions that were introduced in the series. (Sean) - Define __flush_s2_tlb_range_op() as a wrapper over __flush_tlb_range_op(). (Marc) - Correct/improve the comments as suggested throughout the series. (Marc) - Get rid of WARN_ON() check and the 'struct stage2_unmap_data' as they were not necessary. (Marc) v6: https://lore.kernel.org/all/20230715005405.3689586-1-rananta@google.com/ Thank you, Philippe and Shaoqin for the reviews and suggestions - Split the patch-2/11 to separate the removal of CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL and arm64 to switch to using kvm_arch_flush_remote_tlbs(). (Philippe) - Align the 'pages' argument with 'kvm' in patch-3/11. (Shaoqin) - Call __tlb_switch_to_guest() before __flush_tlb_range_op() in the VHE's implementation of __kvm_tlb_flush_vmid_range(). (Shaoqin) v5 (RESEND): https://lore.kernel.org/all/20230621175002.2832640-1-rananta@google.com/ Thanks, Gavin for the suggestions: - Adjusted the comment on patch-2 to align with the code. - Fixed checkpatch.pl warning on patch-5. v5: https://lore.kernel.org/all/20230606192858.3600174-1-rananta@google.com/ Thank you, Marc and Oliver for the comments - Introduced a helper, kvm_tlb_flush_vmid_range(), to handle the decision of using range-based TLBI instructions or invalidating the entire VMID, rather than depending on __kvm_tlb_flush_vmid_range() for it. - kvm_tlb_flush_vmid_range() splits the range-based invalidations if the requested range exceeds MAX_TLBI_RANGE_PAGES. - All the users in need of invalidating the TLB upon a range now depends on kvm_tlb_flush_vmid_range() rather than directly on __kvm_tlb_flush_vmid_range(). - stage2_unmap_defer_tlb_flush() introduces a WARN_ON() to track if there's any change in TLBIRANGE or FWB support during the unmap process as the features are based on alternative patching and the TLBI operations solely depend on this check. - Corrected an incorrect hunk being present on v4's patch-3. - Updated the patches changelog and code comments as per the suggestions. v4: https://lore.kernel.org/all/20230519005231.3027912-1-rananta@google.com/ Thanks again, Oliver for all the comments - Updated the __kvm_tlb_flush_vmid_range() implementation for nVHE to adjust with the modified __tlb_switch_to_guest() that accepts a new 'bool nsh' arg. - Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed the 'skip_flush' argument. - Defined stage2_unmap_defer_tlb_flush() to check if the PTE flushes can be deferred during the unmap table walk. It's being called from stage2_unmap_put_pte() and kvm_pgtable_stage2_unmap(). - Got rid of the 'struct stage2_unmap_data'. v3: https://lore.kernel.org/all/20230414172922.812640-1-rananta@google.com/ Thanks, Oliver for all the suggestions. - The core flush API (__kvm_tlb_flush_vmid_range()) now checks if the system support FEAT_TLBIRANGE or not, thus eliminating the redundancy in the upper layers. - If FEAT_TLBIRANGE is not supported, the implementation falls back to invalidating all the TLB entries with the VMID, instead of doing an iterative flush for the range. - The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP if the system doesn't implement FEAT_TLBIRANGE. It depends on __kvm_tlb_flush_vmid_range() to do take care of the decisions and return 0 regardless of the underlying feature support. - __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to calculate the 'stride'. Instead, it always assumes PAGE_SIZE. - Fast unmap path is eliminated. Instead, the existing unmap walker is modified to skip the TLBIs during the walk, and do it all at once after the walk, using the range-based instructions. v2: https://lore.kernel.org/all/20230206172340.2639971-1-rananta@google.com/ - Rebased the series on top of David Matlack's series for common TLB invalidation API[1]. - Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending the support introduced by [1]. - Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush only the current memslot after write-protect. - Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an argument to calculate the 'stride' instead of just using PAGE_SIZE. - Split the patch that introduces the range-based TLBI to KVM and the implementation of IPA-based invalidation into its own patches. - Dropped the patch that tries to optimize the mmu notifiers paths. - Rename the function kvm_table_pte_flush() to kvm_pgtable_stage2_flush_range(), and accept the range of addresses to flush. [Oliver] - Drop the 'tlb_level' argument for stage2_try_break_pte() and directly pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver] v1: https://lore.kernel.org/all/20230109215347.3119271-1-rananta@google.com/ Thank you. Raghavendra David Matlack (3): KVM: Rename kvm_arch_flush_remote_tlb() to kvm_arch_flush_remote_tlbs() KVM: Allow range-based TLB invalidation from common code KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code Raghavendra Rao Ananta (11): KVM: Declare kvm_arch_flush_remote_tlbs() globally KVM: arm64: Use kvm_arch_flush_remote_tlbs() KVM: Remove CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range arm64: tlb: Implement __flush_s2_tlb_range_op() KVM: arm64: Implement __kvm_tlb_flush_vmid_range() KVM: arm64: Define kvm_tlb_flush_vmid_range() KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() KVM: arm64: Flush only the memslot after write-protect KVM: arm64: Invalidate the table entries upon a range KVM: arm64: Use TLBI range-based instructions for unmap arch/arm64/include/asm/kvm_asm.h | 3 + arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/kvm_pgtable.h | 10 +++ arch/arm64/include/asm/tlbflush.h | 124 +++++++++++++++------------ arch/arm64/kvm/Kconfig | 1 - arch/arm64/kvm/arm.c | 6 -- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++ arch/arm64/kvm/hyp/nvhe/tlb.c | 30 +++++++ arch/arm64/kvm/hyp/pgtable.c | 63 ++++++++++++-- arch/arm64/kvm/hyp/vhe/tlb.c | 28 ++++++ arch/arm64/kvm/mmu.c | 16 +++- arch/mips/include/asm/kvm_host.h | 3 +- arch/mips/kvm/mips.c | 12 +-- arch/riscv/kvm/mmu.c | 6 -- arch/x86/include/asm/kvm_host.h | 6 +- arch/x86/kvm/mmu/mmu.c | 26 ++---- arch/x86/kvm/mmu/mmu_internal.h | 3 - arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 24 ++++-- virt/kvm/Kconfig | 3 - virt/kvm/kvm_main.c | 35 ++++++-- 21 files changed, 286 insertions(+), 130 deletions(-) -- 2.41.0.640.ga95def55d0-goog