2018-06-29 06:41:07

by Jia He

[permalink] [raw]
Subject: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 introduce new config to make codes more generic
Patch 2 remain the memblock_next_valid_pfn on arm and arm64
Patch 3 optimizes the memblock_next_valid_pfn()
Patch 4~6 optimizes the early_pfn_valid()

As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 27956us to 13537us in my
armv8a server(QDF2400 with 96G memory, pagesize 64k).

Attached the memblock region information in my server.
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000000200000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x00000017ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000200000-0x000000000021ffff]
[ 0.000000] node 0: [mem 0x0000000000820000-0x000000000307ffff]
[ 0.000000] node 0: [mem 0x0000000003080000-0x000000000308ffff]
[ 0.000000] node 0: [mem 0x0000000003090000-0x00000000031fffff]
[ 0.000000] node 0: [mem 0x0000000003200000-0x00000000033fffff]
[ 0.000000] node 0: [mem 0x0000000003410000-0x00000000034fffff]
[ 0.000000] node 0: [mem 0x0000000003500000-0x000000000351ffff]
[ 0.000000] node 0: [mem 0x0000000003520000-0x000000000353ffff]
[ 0.000000] node 0: [mem 0x0000000003540000-0x0000000003e3ffff]
[ 0.000000] node 0: [mem 0x0000000003e40000-0x0000000003e7ffff]
[ 0.000000] node 0: [mem 0x0000000003e80000-0x0000000003ecffff]
[ 0.000000] node 0: [mem 0x0000000003ed0000-0x0000000003ed5fff]
[ 0.000000] node 0: [mem 0x0000000003ed6000-0x0000000006eeafff]
[ 0.000000] node 0: [mem 0x0000000006eeb000-0x000000000710ffff]
[ 0.000000] node 0: [mem 0x0000000007110000-0x0000000007f0ffff]
[ 0.000000] node 0: [mem 0x0000000007f10000-0x0000000007faffff]
[ 0.000000] node 0: [mem 0x0000000007fb0000-0x000000000806ffff]
[ 0.000000] node 0: [mem 0x0000000008070000-0x00000000080affff]
[ 0.000000] node 0: [mem 0x00000000080b0000-0x000000000832ffff]
[ 0.000000] node 0: [mem 0x0000000008330000-0x000000000836ffff]
[ 0.000000] node 0: [mem 0x0000000008370000-0x000000000838ffff]
[ 0.000000] node 0: [mem 0x0000000008390000-0x00000000083a9fff]
[ 0.000000] node 0: [mem 0x00000000083aa000-0x00000000083bbfff]
[ 0.000000] node 0: [mem 0x00000000083bc000-0x00000000083fffff]
[ 0.000000] node 0: [mem 0x0000000008400000-0x000000000841ffff]
[ 0.000000] node 0: [mem 0x0000000008420000-0x000000000843ffff]
[ 0.000000] node 0: [mem 0x0000000008440000-0x000000000865ffff]
[ 0.000000] node 0: [mem 0x0000000008660000-0x000000000869ffff]
[ 0.000000] node 0: [mem 0x00000000086a0000-0x00000000086affff]
[ 0.000000] node 0: [mem 0x00000000086b0000-0x00000000086effff]
[ 0.000000] node 0: [mem 0x00000000086f0000-0x0000000008b6ffff]
[ 0.000000] node 0: [mem 0x0000000008b70000-0x0000000008bbffff]
[ 0.000000] node 0: [mem 0x0000000008bc0000-0x0000000008edffff]
[ 0.000000] node 0: [mem 0x0000000008ee0000-0x0000000008ee0fff]
[ 0.000000] node 0: [mem 0x0000000008ee1000-0x0000000008ee2fff]
[ 0.000000] node 0: [mem 0x0000000008ee3000-0x000000000decffff]
[ 0.000000] node 0: [mem 0x000000000ded0000-0x000000000defffff]
[ 0.000000] node 0: [mem 0x000000000df00000-0x000000000fffffff]
[ 0.000000] node 0: [mem 0x0000000010800000-0x0000000017feffff]
[ 0.000000] node 0: [mem 0x000000001c000000-0x000000001c00ffff]
[ 0.000000] node 0: [mem 0x000000001c010000-0x000000001c7fffff]
[ 0.000000] node 0: [mem 0x000000001c810000-0x000000007efbffff]
[ 0.000000] node 0: [mem 0x000000007efc0000-0x000000007efdffff]
[ 0.000000] node 0: [mem 0x000000007efe0000-0x000000007efeffff]
[ 0.000000] node 0: [mem 0x000000007eff0000-0x000000007effffff]
[ 0.000000] node 0: [mem 0x000000007f000000-0x00000017ffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[ 0.000000] On node 0 totalpages: 25145296
[ 0.000000] DMA32 zone: 16376 pages used for memmap
[ 0.000000] DMA32 zone: 0 pages reserved
[ 0.000000] DMA32 zone: 1028048 pages, LIFO batch:31
[ 0.000000] Normal zone: 376832 pages used for memmap
[ 0.000000] Normal zone: 24117248 pages, LIFO batch:31

Changelog:
V9: - rebase to mmotm master, refine the log description. No major changes
V8: - introduce new config and move generic code to early_pfn.h
- optimize memblock_next_valid_pfn as suggested by Matthew Wilcox
V7: - fix i386 compilation error. refine the commit description
V6: - simplify the codes, move arm/arm64 common codes to one file.
- refine patches as suggested by Danial Vacek and Ard Biesheuvel
V5: - further refining as suggested by Danial Vacek. Make codes
arm/arm64 more arch specific
V4: - refine patches as suggested by Danial Vacek and Wei Yang
- optimized on arm besides arm64
V3: - fix 2 issues reported by kbuild test robot
V2: - rebase to mmotm latest
- remain memblock_next_valid_pfn on arm64
- refine memblock_search_pfn_regions and pfn_valid_region

Jia He (6):
arm: arm64: introduce CONFIG_HAVE_MEMBLOCK_PFN_VALID
mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64
arm: arm64: page_alloc: reduce unnecessary binary search in
memblock_next_valid_pfn()
mm/memblock: introduce memblock_search_pfn_regions()
arm: arm64: introduce pfn_valid_region()
mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

arch/arm/Kconfig | 4 +++
arch/arm/mm/init.c | 1 +
arch/arm64/Kconfig | 4 +++
arch/arm64/mm/init.c | 1 +
include/linux/early_pfn.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/memblock.h | 2 ++
include/linux/mmzone.h | 18 ++++++++++-
mm/Kconfig | 3 ++
mm/memblock.c | 9 ++++++
mm/page_alloc.c | 5 ++-
10 files changed, 124 insertions(+), 2 deletions(-)
create mode 100644 include/linux/early_pfn.h

--
1.8.3.1



2018-06-29 06:42:16

by Jia He

[permalink] [raw]
Subject: [PATCH v9 1/6] arm: arm64: introduce CONFIG_HAVE_MEMBLOCK_PFN_VALID

Make CONFIG_HAVE_MEMBLOCK_PFN_VALID a new config option so it can move
memblock_next_valid_pfn to generic code file. All the latter optimizations
are based on this config.

The memblock initialization time on arm/arm64 can benefit from this.

Signed-off-by: Jia He <[email protected]>
---
arch/arm/Kconfig | 4 ++++
arch/arm64/Kconfig | 4 ++++
mm/Kconfig | 3 +++
3 files changed, 11 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 843edfd..7ea2636 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1642,6 +1642,10 @@ config ARCH_SELECT_MEMORY_MODEL
config HAVE_ARCH_PFN_VALID
def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM

+config HAVE_MEMBLOCK_PFN_VALID
+ def_bool y
+ depends on HAVE_ARCH_PFN_VALID
+
config HAVE_GENERIC_GUP
def_bool y
depends on ARM_LPAE
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090c..26d75f4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -778,6 +778,10 @@ config ARCH_SELECT_MEMORY_MODEL
config HAVE_ARCH_PFN_VALID
def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM

+config HAVE_MEMBLOCK_PFN_VALID
+ def_bool y
+ depends on HAVE_ARCH_PFN_VALID
+
config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU
diff --git a/mm/Kconfig b/mm/Kconfig
index ce95491..2c38080a5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
config HAVE_MEMBLOCK_PHYS_MAP
bool

+config HAVE_MEMBLOCK_PFN_VALID
+ bool
+
config HAVE_GENERIC_GUP
bool

--
1.8.3.1


2018-06-29 06:42:16

by Jia He

[permalink] [raw]
Subject: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.

But as suggested by Daniel Vacek, it is fine to using memblock to skip
gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.

On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
not always return the next valid one but skips more resulting in some
valid frames to be skipped (as if they were invalid). And that's why
kernel was eventually crashing on some !arm machines.

And as verified by Eugeniu Rosca, arm can benifit from commit
b92df1de5d28. So it would be better if we remain the
memblock_next_valid_pfn on arm/arm64 and move the related codes to
one file include/linux/early_pfn.h

Suggested-by: Daniel Vacek <[email protected]>
Signed-off-by: Jia He <[email protected]>
---
arch/arm/mm/init.c | 1 +
arch/arm64/mm/init.c | 1 +
include/linux/early_pfn.h | 34 ++++++++++++++++++++++++++++++++++
include/linux/mmzone.h | 11 +++++++++++
mm/page_alloc.c | 5 ++++-
5 files changed, 51 insertions(+), 1 deletion(-)
create mode 100644 include/linux/early_pfn.h

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index c186474..aa99f4d 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -25,6 +25,7 @@
#include <linux/dma-contiguous.h>
#include <linux/sizes.h>
#include <linux/stop_machine.h>
+#include <linux/early_pfn.h>

#include <asm/cp15.h>
#include <asm/mach-types.h>
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 325cfb3..495e299 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -40,6 +40,7 @@
#include <linux/mm.h>
#include <linux/kexec.h>
#include <linux/crash_dump.h>
+#include <linux/early_pfn.h>

#include <asm/boot.h>
#include <asm/fixmap.h>
diff --git a/include/linux/early_pfn.h b/include/linux/early_pfn.h
new file mode 100644
index 0000000..1b001c7
--- /dev/null
+++ b/include/linux/early_pfn.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2018 HXT-semitech Corp. */
+#ifndef __EARLY_PFN_H
+#define __EARLY_PFN_H
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
+{
+ struct memblock_type *type = &memblock.memory;
+ unsigned int right = type->cnt;
+ unsigned int mid, left = 0;
+ phys_addr_t addr = PFN_PHYS(++pfn);
+
+ do {
+ mid = (right + left) / 2;
+
+ if (addr < type->regions[mid].base)
+ right = mid;
+ else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+ left = mid + 1;
+ else {
+ /* addr is within the region, so pfn is valid */
+ return pfn;
+ }
+ } while (left < right);
+
+ if (right == type->cnt)
+ return -1UL;
+ else
+ return PHYS_PFN(type->regions[right].base);
+}
+EXPORT_SYMBOL(memblock_next_valid_pfn);
+#endif /*CONFIG_HAVE_MEMBLOCK_PFN_VALID*/
+#endif /*__EARLY_PFN_H*/
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2..57cdc42 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1241,6 +1241,8 @@ static inline int pfn_valid(unsigned long pfn)
return 0;
return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}
+
+#define next_valid_pfn(pfn) (pfn + 1)
#endif

static inline int pfn_present(unsigned long pfn)
@@ -1266,6 +1268,10 @@ static inline int pfn_present(unsigned long pfn)
#endif

#define early_pfn_valid(pfn) pfn_valid(pfn)
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+extern ulong memblock_next_valid_pfn(ulong pfn);
+#define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn)
+#endif
void sparse_init(void);
#else
#define sparse_init() do {} while (0)
@@ -1287,6 +1293,11 @@ struct mminit_pfnnid_cache {
#define early_pfn_valid(pfn) (1)
#endif

+/* fallback to default definitions*/
+#ifndef next_valid_pfn
+#define next_valid_pfn(pfn) (pfn + 1)
+#endif
+
void memory_present(int nid, unsigned long start, unsigned long end);

/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cd3c7b9..607deff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5485,8 +5485,11 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;

- if (!early_pfn_valid(pfn))
+ if (!early_pfn_valid(pfn)) {
+ pfn = next_valid_pfn(pfn) - 1;
continue;
+ }
+
if (!early_pfn_in_nid(pfn, nid))
continue;
if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
--
1.8.3.1


2018-06-29 06:43:04

by Jia He

[permalink] [raw]
Subject: [PATCH v9 3/6] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn. Furthermore, if the pfn is in a *gap* of two
memory region, skip to next region directly if possible.

Signed-off-by: Jia He <[email protected]>
---
include/linux/early_pfn.h | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/early_pfn.h b/include/linux/early_pfn.h
index 1b001c7..f9e40c3 100644
--- a/include/linux/early_pfn.h
+++ b/include/linux/early_pfn.h
@@ -3,31 +3,52 @@
#ifndef __EARLY_PFN_H
#define __EARLY_PFN_H
#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+static int early_region_idx __init_memblock = -1;
ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
{
struct memblock_type *type = &memblock.memory;
- unsigned int right = type->cnt;
- unsigned int mid, left = 0;
+ struct memblock_region *regions = type->regions;
+ uint right = type->cnt;
+ uint mid, left = 0;
+ ulong start_pfn, end_pfn, next_start_pfn;
phys_addr_t addr = PFN_PHYS(++pfn);

+ /* fast path, return pfn+1 if next pfn is in the same region */
+ if (early_region_idx != -1) {
+ start_pfn = PFN_DOWN(regions[early_region_idx].base);
+ end_pfn = PFN_DOWN(regions[early_region_idx].base +
+ regions[early_region_idx].size);
+
+ if (pfn >= start_pfn && pfn < end_pfn)
+ return pfn;
+
+ early_region_idx++;
+ next_start_pfn = PFN_DOWN(regions[early_region_idx].base);
+
+ if (pfn >= end_pfn && pfn <= next_start_pfn)
+ return next_start_pfn;
+ }
+
+ /* slow path, do the binary searching */
do {
mid = (right + left) / 2;

- if (addr < type->regions[mid].base)
+ if (addr < regions[mid].base)
right = mid;
- else if (addr >= (type->regions[mid].base +
- type->regions[mid].size))
+ else if (addr >= (regions[mid].base + regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn is valid */
+ early_region_idx = mid;
return pfn;
}
} while (left < right);

if (right == type->cnt)
return -1UL;
- else
- return PHYS_PFN(type->regions[right].base);
+
+ early_region_idx = right;
+
+ return PHYS_PFN(regions[early_region_idx].base);
}
EXPORT_SYMBOL(memblock_next_valid_pfn);
#endif /*CONFIG_HAVE_MEMBLOCK_PFN_VALID*/
--
1.8.3.1


2018-06-29 06:43:04

by Jia He

[permalink] [raw]
Subject: [PATCH v9 6/6] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check whether pfn++ is still in the same
region.

Currently it only improve the performance on arm/arm64 and will have no
impact on other arches.

For the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 27956us to 13537us in my
armv8a server(QDF2400 with 96G memory, pagesize 64k).

Signed-off-by: Jia He <[email protected]>
---
include/linux/mmzone.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 57cdc42..ac34238 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1267,11 +1267,16 @@ static inline int pfn_present(unsigned long pfn)
#define pfn_to_nid(pfn) (0)
#endif

-#define early_pfn_valid(pfn) pfn_valid(pfn)
#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
extern ulong memblock_next_valid_pfn(ulong pfn);
#define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn)
-#endif
+
+extern int pfn_valid_region(ulong pfn);
+#define early_pfn_valid(pfn) pfn_valid_region(pfn)
+#else
+#define early_pfn_valid(pfn) pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
void sparse_init(void);
#else
#define sparse_init() do {} while (0)
--
1.8.3.1


2018-06-29 06:43:27

by Jia He

[permalink] [raw]
Subject: [PATCH v9 4/6] mm/memblock: introduce memblock_search_pfn_regions()

This helper is to find the memory region index of input pfn.

Signed-off-by: Jia He <[email protected]>
---
include/linux/memblock.h | 2 ++
mm/memblock.c | 9 +++++++++
2 files changed, 11 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index ca59883..b0f0307 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

+int memblock_search_pfn_regions(unsigned long pfn);
+
/**
* for_each_free_mem_range - iterate through free memblock areas
* @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index 611a970..3a4d251 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1625,6 +1625,15 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr
return -1;
}

+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+ struct memblock_type *type = &memblock.memory;
+ int mid = memblock_search(type, PFN_PHYS(pfn));
+
+ return mid;
+}
+
bool __init memblock_is_reserved(phys_addr_t addr)
{
return memblock_search(&memblock.reserved, addr) != -1;
--
1.8.3.1


2018-06-29 06:44:05

by Jia He

[permalink] [raw]
Subject: [PATCH v9 5/6] arm: arm64: introduce pfn_valid_region()

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), we can record
the last returned memblock region. If current pfn and last pfn are in the
same memory region, we needn't do the unnecessary binary searches because
memblock_is_nomap is the same result for whole memory region.

Signed-off-by: Jia He <[email protected]>
---
include/linux/early_pfn.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/include/linux/early_pfn.h b/include/linux/early_pfn.h
index f9e40c3..9609391 100644
--- a/include/linux/early_pfn.h
+++ b/include/linux/early_pfn.h
@@ -51,5 +51,29 @@ ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
return PHYS_PFN(regions[early_region_idx].base);
}
EXPORT_SYMBOL(memblock_next_valid_pfn);
+
+int pfn_valid_region(ulong pfn)
+{
+ ulong start_pfn, end_pfn;
+ struct memblock_type *type = &memblock.memory;
+ struct memblock_region *regions = type->regions;
+
+ if (early_region_idx != -1) {
+ start_pfn = PFN_DOWN(regions[early_region_idx].base);
+ end_pfn = PFN_DOWN(regions[early_region_idx].base +
+ regions[early_region_idx].size);
+
+ if (pfn >= start_pfn && pfn < end_pfn)
+ return !memblock_is_nomap(
+ &regions[early_region_idx]);
+ }
+
+ early_region_idx = memblock_search_pfn_regions(pfn);
+ if (early_region_idx == -1)
+ return false;
+
+ return !memblock_is_nomap(&regions[early_region_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
#endif /*CONFIG_HAVE_MEMBLOCK_PFN_VALID*/
#endif /*__EARLY_PFN_H*/
--
1.8.3.1


2018-06-29 17:39:00

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

On Thu, Jun 28, 2018 at 10:30 PM Jia He <[email protected]> wrote:
>
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But it causes
> possible panic bug. So Daniel Vacek reverted it later.
>
> But as suggested by Daniel Vacek, it is fine to using memblock to skip
> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>
> On arm and arm64, memblock is used by default. But generic version of
> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
> not always return the next valid one but skips more resulting in some
> valid frames to be skipped (as if they were invalid). And that's why
> kernel was eventually crashing on some !arm machines.

Hi Jia,

Is this a bug? Should we make other arches that support memblock to
use memblock_is_map_memory() ? it is more expensive, but if the
default is broken, maybe it makes sense to change?

Thank you,
Pavel

2018-06-29 18:50:40

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v9 1/6] arm: arm64: introduce CONFIG_HAVE_MEMBLOCK_PFN_VALID

On Thu, Jun 28, 2018 at 10:30 PM Jia He <[email protected]> wrote:
>
> Make CONFIG_HAVE_MEMBLOCK_PFN_VALID a new config option so it can move
> memblock_next_valid_pfn to generic code file. All the latter optimizations
> are based on this config.
>
> The memblock initialization time on arm/arm64 can benefit from this.
>
> Signed-off-by: Jia He <[email protected]>

Reviewed-by: Pavel Tatashin <[email protected]>
On Thu, Jun 28, 2018 at 10:30 PM Jia He <[email protected]> wrote:
>
> Make CONFIG_HAVE_MEMBLOCK_PFN_VALID a new config option so it can move
> memblock_next_valid_pfn to generic code file. All the latter optimizations
> are based on this config.
>
> The memblock initialization time on arm/arm64 can benefit from this.
>
> Signed-off-by: Jia He <[email protected]>
> ---
> arch/arm/Kconfig | 4 ++++
> arch/arm64/Kconfig | 4 ++++
> mm/Kconfig | 3 +++
> 3 files changed, 11 insertions(+)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 843edfd..7ea2636 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1642,6 +1642,10 @@ config ARCH_SELECT_MEMORY_MODEL
> config HAVE_ARCH_PFN_VALID
> def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
>
> +config HAVE_MEMBLOCK_PFN_VALID
> + def_bool y
> + depends on HAVE_ARCH_PFN_VALID
> +
> config HAVE_GENERIC_GUP
> def_bool y
> depends on ARM_LPAE
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 42c090c..26d75f4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -778,6 +778,10 @@ config ARCH_SELECT_MEMORY_MODEL
> config HAVE_ARCH_PFN_VALID
> def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
>
> +config HAVE_MEMBLOCK_PFN_VALID
> + def_bool y
> + depends on HAVE_ARCH_PFN_VALID
> +
> config HW_PERF_EVENTS
> def_bool y
> depends on ARM_PMU
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ce95491..2c38080a5 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> config HAVE_MEMBLOCK_PHYS_MAP
> bool
>
> +config HAVE_MEMBLOCK_PFN_VALID
> + bool
> +
> config HAVE_GENERIC_GUP
> bool
>
> --
> 1.8.3.1
>

2018-06-29 20:40:53

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

> +++ b/include/linux/early_pfn.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2018 HXT-semitech Corp. */
> +#ifndef __EARLY_PFN_H
> +#define __EARLY_PFN_H
> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> +ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
> +{
> + struct memblock_type *type = &memblock.memory;

Why put it in a header file and not in some C file? In my opinion it
is confusing to have non-line functions in header files. Basically,
you can include this header file in exactly one C file without
breaking compilation.

2018-07-02 11:54:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

On Fri 29-06-18 14:13:08, Pavel Tatashin wrote:
> > +++ b/include/linux/early_pfn.h
> > @@ -0,0 +1,34 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/* Copyright (C) 2018 HXT-semitech Corp. */
> > +#ifndef __EARLY_PFN_H
> > +#define __EARLY_PFN_H
> > +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> > +ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
> > +{
> > + struct memblock_type *type = &memblock.memory;
>
> Why put it in a header file and not in some C file? In my opinion it
> is confusing to have non-line functions in header files. Basically,
> you can include this header file in exactly one C file without
> breaking compilation.

It is not confusing. It is outright broken.

--
Michal Hocko
SUSE Labs

2018-07-02 12:14:41

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64

On Mon, Jul 2, 2018 at 7:40 AM Michal Hocko <[email protected]> wrote:
>
> On Fri 29-06-18 10:29:17, Jia He wrote:
> > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> > where possible") tried to optimize the loop in memmap_init_zone(). But
> > there is still some room for improvement.
>
> It would be great to shortly describe those optimization from high level
> POV.
>
> >
> > Patch 1 introduce new config to make codes more generic
> > Patch 2 remain the memblock_next_valid_pfn on arm and arm64
> > Patch 3 optimizes the memblock_next_valid_pfn()
> > Patch 4~6 optimizes the early_pfn_valid()
> >
> > As for the performance improvement, after this set, I can see the time
> > overhead of memmap_init() is reduced from 27956us to 13537us in my
> > armv8a server(QDF2400 with 96G memory, pagesize 64k).
>
> So this is 13ms saving when booting 96G machine. Is this really worth
> the additional code? Are there any other benefits?

While 0.0144s for 96G is definitely small, I think the time is
proportional to the number of pages since memmap_init() loops through
all the pages. If base pages were changed to 4K, I bet the time would
increase 16 times: 0.23s on given machine, in other words around 2s
per 1T of memory.

I agree, a high level description of optimization is needed, and also
an explanation of why it would not work on other arches that support
memblock.

Pavel

2018-07-02 12:44:42

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64

On Fri 29-06-18 10:29:17, Jia He wrote:
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") tried to optimize the loop in memmap_init_zone(). But
> there is still some room for improvement.

It would be great to shortly describe those optimization from high level
POV.

>
> Patch 1 introduce new config to make codes more generic
> Patch 2 remain the memblock_next_valid_pfn on arm and arm64
> Patch 3 optimizes the memblock_next_valid_pfn()
> Patch 4~6 optimizes the early_pfn_valid()
>
> As for the performance improvement, after this set, I can see the time
> overhead of memmap_init() is reduced from 27956us to 13537us in my
> armv8a server(QDF2400 with 96G memory, pagesize 64k).

So this is 13ms saving when booting 96G machine. Is this really worth
the additional code? Are there any other benefits?
[...]
> arch/arm/Kconfig | 4 +++
> arch/arm/mm/init.c | 1 +
> arch/arm64/Kconfig | 4 +++
> arch/arm64/mm/init.c | 1 +
> include/linux/early_pfn.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/memblock.h | 2 ++
> include/linux/mmzone.h | 18 ++++++++++-
> mm/Kconfig | 3 ++
> mm/memblock.c | 9 ++++++
> mm/page_alloc.c | 5 ++-
> 10 files changed, 124 insertions(+), 2 deletions(-)
> create mode 100644 include/linux/early_pfn.h

--
Michal Hocko
SUSE Labs

2018-07-03 01:56:32

by Jia He

[permalink] [raw]
Subject: Re: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

Hi, Pavel
Thanks for the comments.

On 6/30/2018 2:13 AM, Pavel Tatashin Wrote:
>> +++ b/include/linux/early_pfn.h
>> @@ -0,0 +1,34 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright (C) 2018 HXT-semitech Corp. */
>> +#ifndef __EARLY_PFN_H
>> +#define __EARLY_PFN_H
>> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
>> +ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
>> +{
>> + struct memblock_type *type = &memblock.memory;
>
> Why put it in a header file and not in some C file? In my opinion it
> is confusing to have non-line functions in header files. Basically,
> you can include this header file in exactly one C file without
> breaking compilation.
>
My original intent is to make this helper memblock_next_valid_pfn
a common api between arm64 and arm arches since both arches will
use enable CONFIG_HAVE_MEMBLOCK_PFN_VALID by default.

Do you think it looks ok if I add the inline prefix?

--
Cheers,
Jia

2018-07-03 02:01:54

by Jia He

[permalink] [raw]
Subject: Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64

Hi Michal
Thanks for the comments

On 7/2/2018 7:40 PM, Michal Hocko Wrote:
> On Fri 29-06-18 10:29:17, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") tried to optimize the loop in memmap_init_zone(). But
>> there is still some room for improvement.
>
> It would be great to shortly describe those optimization from high level
> POV.

Ok

>
>>
>> Patch 1 introduce new config to make codes more generic
>> Patch 2 remain the memblock_next_valid_pfn on arm and arm64
>> Patch 3 optimizes the memblock_next_valid_pfn()
>> Patch 4~6 optimizes the early_pfn_valid()
>>
>> As for the performance improvement, after this set, I can see the time
>> overhead of memmap_init() is reduced from 27956us to 13537us in my
>> armv8a server(QDF2400 with 96G memory, pagesize 64k).
>
> So this is 13ms saving when booting 96G machine. Is this really worth
> the additional code? Are there any other benefits?

hmm.. Currently my answer is no.
But I believe it can shorten the boot time when the memory is larger than n TBs.
--
Cheers,
Jia

> [...]
>> arch/arm/Kconfig | 4 +++
>> arch/arm/mm/init.c | 1 +
>> arch/arm64/Kconfig | 4 +++
>> arch/arm64/mm/init.c | 1 +
>> include/linux/early_pfn.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++
>> include/linux/memblock.h | 2 ++
>> include/linux/mmzone.h | 18 ++++++++++-
>> mm/Kconfig | 3 ++
>> mm/memblock.c | 9 ++++++
>> mm/page_alloc.c | 5 ++-
>> 10 files changed, 124 insertions(+), 2 deletions(-)
>> create mode 100644 include/linux/early_pfn.h
>


2018-07-03 02:13:08

by Jia He

[permalink] [raw]
Subject: Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64



On 7/2/2018 7:40 PM, Michal Hocko Wrote:
> On Fri 29-06-18 10:29:17, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") tried to optimize the loop in memmap_init_zone(). But
>> there is still some room for improvement.
>
> It would be great to shortly describe those optimization from high level
> POV.
>
>>
>> Patch 1 introduce new config to make codes more generic
>> Patch 2 remain the memblock_next_valid_pfn on arm and arm64
>> Patch 3 optimizes the memblock_next_valid_pfn()
>> Patch 4~6 optimizes the early_pfn_valid()
>>
>> As for the performance improvement, after this set, I can see the time
>> overhead of memmap_init() is reduced from 27956us to 13537us in my
>> armv8a server(QDF2400 with 96G memory, pagesize 64k).
>
> So this is 13ms saving when booting 96G machine. Is this really worth
> the additional code? Are there any other benefits?
Sorry, Michal
I missed one thing.
This 13ms optimization is merely the result of my patch 3~6
Patch 1 is originated by Paul Burton in commit b92df1de5d289.
In its description,
===
James said "I have tested this patch on a virtual model of a Samurai CPU
with a sparse memory map. The kernel boot time drops from 109 to
62 seconds. "
===

--
Cheers,
Jia
> [...]
>> arch/arm/Kconfig | 4 +++
>> arch/arm/mm/init.c | 1 +
>> arch/arm64/Kconfig | 4 +++
>> arch/arm64/mm/init.c | 1 +
>> include/linux/early_pfn.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++
>> include/linux/memblock.h | 2 ++
>> include/linux/mmzone.h | 18 ++++++++++-
>> mm/Kconfig | 3 ++
>> mm/memblock.c | 9 ++++++
>> mm/page_alloc.c | 5 ++-
>> 10 files changed, 124 insertions(+), 2 deletions(-)
>> create mode 100644 include/linux/early_pfn.h
>



2018-07-03 03:04:44

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

Can you put it into memblock.c

> Do you think it looks ok if I add the inline prefix?

I would say no, this function is a too complex, and is not in some
critical path to be always inlined.

I would put it into memblock.c, and have #ifdef
CONFIG_HAVE_MEMBLOCK_PFN_VALID around it.

Thank you,
Pavel

2018-07-03 07:29:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64

On Tue 03-07-18 10:11:11, Jia He wrote:
> On 7/2/2018 7:40 PM, Michal Hocko Wrote:
[...]
> > So this is 13ms saving when booting 96G machine. Is this really worth
> > the additional code? Are there any other benefits?
> Sorry, Michal
> I missed one thing.
> This 13ms optimization is merely the result of my patch 3~6
> Patch 1 is originated by Paul Burton in commit b92df1de5d289.
> In its description,
> ===
> James said "I have tested this patch on a virtual model of a Samurai CPU
> with a sparse memory map. The kernel boot time drops from 109 to
> 62 seconds. "
> ===

Those numbers should be in the changelog.
--
Michal Hocko
SUSE Labs

2018-07-06 01:39:42

by Jia He

[permalink] [raw]
Subject: Re: [PATCH v9 2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64


Hi Pavel, sorry for the late reply

On 6/30/2018 1:07 AM, Pavel Tatashin Wrote:
> On Thu, Jun 28, 2018 at 10:30 PM Jia He <[email protected]> wrote:
>>
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic bug. So Daniel Vacek reverted it later.
>>
>> But as suggested by Daniel Vacek, it is fine to using memblock to skip
>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>>
>> On arm and arm64, memblock is used by default. But generic version of
>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
>> not always return the next valid one but skips more resulting in some
>> valid frames to be skipped (as if they were invalid). And that's why
>> kernel was eventually crashing on some !arm machines.
>
> Hi Jia,
>
> Is this a bug? Should we make other arches that support memblock to
> use memblock_is_map_memory() ? it is more expensive, but if the
> default is broken, maybe it makes sense to change?
>
IIUC, the bug is in memblock_next_valid_pfn instead of pfn_valid.
memblock_next_valid_pfn will return the incorrect next valid pfn on
!arm arches (e.g. X86). Please refer to b92df1de5.

Currently only arm/arm64 use MEMBLOCK_NOMAP, it is really beyond my
power to implement it on all other arches ;-)


--
Cheers,
Jia