2020-10-14 09:26:26

by Kalesh Singh

[permalink] [raw]
Subject: [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD

HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
source and destination addresses are PMD-aligned.

HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
introduced this config did not enable it on arm64 at the time because
of performance issues with flushing the TLB on every PMD move. These
issues have since been addressed in more recent releases with
improvements to the arm64 TLB invalidation and core mmu_gather code as
Will Deacon mentioned in [2].

From the data below, it can be inferred that there is approximately
8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.

--------- Test Results ----------

The following results were obtained on an arm64 device running a 5.4
kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
destination. The results from 10 iterations of the test are given below.
All times are in nanoseconds.

Control HAVE_MOVE_PMD

9220833 1247761
9002552 1219896
9254115 1094792
8725885 1227760
9308646 1043698
9001667 1101771
8793385 1159896
8774636 1143594
9553125 1025833
9374010 1078125

9100885.4 1134312.6 <-- Mean Time in nanoseconds

Total mremap time for a 1GB sized PMD-aligned region drops from
~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).

[1] https://lore.kernel.org/r/[email protected]
[2] https://www.mail-archive.com/[email protected]/msg140837.html

Signed-off-by: Kalesh Singh <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andrew Morton <[email protected]>
---
Changes in v4:
- Add Kirill's Acked-by.

arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4b136e923ccb..434d6791e869 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -123,6 +123,7 @@ config ARM64
select GENERIC_VDSO_TIME_NS
select HANDLE_DOMAIN_IRQ
select HARDIRQS_SW_RESEND
+ select HAVE_MOVE_PMD
select HAVE_PCI
select HAVE_ACPI_APEI if (ACPI && EFI)
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
--
2.28.0.1011.ga647a8990f-goog


2020-10-15 12:08:40

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD

On Wed, Oct 14, 2020 at 12:53:07AM +0000, Kalesh Singh wrote:
> HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
> source and destination addresses are PMD-aligned.
>
> HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
> introduced this config did not enable it on arm64 at the time because
> of performance issues with flushing the TLB on every PMD move. These
> issues have since been addressed in more recent releases with
> improvements to the arm64 TLB invalidation and core mmu_gather code as
> Will Deacon mentioned in [2].
>
> From the data below, it can be inferred that there is approximately
> 8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.
>
> --------- Test Results ----------
>
> The following results were obtained on an arm64 device running a 5.4
> kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
> destination. The results from 10 iterations of the test are given below.
> All times are in nanoseconds.
>
> Control HAVE_MOVE_PMD
>
> 9220833 1247761
> 9002552 1219896
> 9254115 1094792
> 8725885 1227760
> 9308646 1043698
> 9001667 1101771
> 8793385 1159896
> 8774636 1143594
> 9553125 1025833
> 9374010 1078125
>
> 9100885.4 1134312.6 <-- Mean Time in nanoseconds
>
> Total mremap time for a 1GB sized PMD-aligned region drops from
> ~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).
>
> [1] https://lore.kernel.org/r/[email protected]
> [2] https://www.mail-archive.com/[email protected]/msg140837.html
>
> Signed-off-by: Kalesh Singh <[email protected]>
> Acked-by: Kirill A. Shutemov <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> Changes in v4:
> - Add Kirill's Acked-by.

Argh, I thought we already enabled this for PMDs back in 2018! Looks like
that we forgot to actually do that after I improved the performance of
the TLB invalidation.

I'll pick this one patch up for 5.10.

Will