Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.
Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
is enabled
Patch 2 optimizes the memblock_next_valid_pfn()
Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
because the changes are located across subsystems.
I tested the pfn loop process in memmap_init(), the same as before.
As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).
Attached the memblock region information in my server.
[ 86.956758] Zone ranges:
[ 86.959452] DMA [mem 0x0000000000200000-0x00000000ffffffff]
[ 86.966041] Normal [mem 0x0000000100000000-0x00000017ffffffff]
[ 86.972631] Movable zone start for each node
[ 86.977179] Early memory node ranges
[ 86.980985] node 0: [mem 0x0000000000200000-0x000000000021ffff]
[ 86.987666] node 0: [mem 0x0000000000820000-0x000000000307ffff]
[ 86.994348] node 0: [mem 0x0000000003080000-0x000000000308ffff]
[ 87.001029] node 0: [mem 0x0000000003090000-0x00000000031fffff]
[ 87.007710] node 0: [mem 0x0000000003200000-0x00000000033fffff]
[ 87.014392] node 0: [mem 0x0000000003410000-0x000000000563ffff]
[ 87.021073] node 0: [mem 0x0000000005640000-0x000000000567ffff]
[ 87.027754] node 0: [mem 0x0000000005680000-0x00000000056dffff]
[ 87.034435] node 0: [mem 0x00000000056e0000-0x00000000086fffff]
[ 87.041117] node 0: [mem 0x0000000008700000-0x000000000871ffff]
[ 87.047798] node 0: [mem 0x0000000008720000-0x000000000894ffff]
[ 87.054479] node 0: [mem 0x0000000008950000-0x0000000008baffff]
[ 87.061161] node 0: [mem 0x0000000008bb0000-0x0000000008bcffff]
[ 87.067842] node 0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
[ 87.074524] node 0: [mem 0x0000000008c50000-0x0000000008e2ffff]
[ 87.081205] node 0: [mem 0x0000000008e30000-0x0000000008e4ffff]
[ 87.087886] node 0: [mem 0x0000000008e50000-0x0000000008fcffff]
[ 87.094568] node 0: [mem 0x0000000008fd0000-0x000000000910ffff]
[ 87.101249] node 0: [mem 0x0000000009110000-0x00000000092effff]
[ 87.107930] node 0: [mem 0x00000000092f0000-0x000000000930ffff]
[ 87.114612] node 0: [mem 0x0000000009310000-0x000000000963ffff]
[ 87.121293] node 0: [mem 0x0000000009640000-0x000000000e61ffff]
[ 87.127975] node 0: [mem 0x000000000e620000-0x000000000e64ffff]
[ 87.134657] node 0: [mem 0x000000000e650000-0x000000000fffffff]
[ 87.141338] node 0: [mem 0x0000000010800000-0x0000000017feffff]
[ 87.148019] node 0: [mem 0x000000001c000000-0x000000001c00ffff]
[ 87.154701] node 0: [mem 0x000000001c010000-0x000000001c7fffff]
[ 87.161383] node 0: [mem 0x000000001c810000-0x000000007efbffff]
[ 87.168064] node 0: [mem 0x000000007efc0000-0x000000007efdffff]
[ 87.174746] node 0: [mem 0x000000007efe0000-0x000000007efeffff]
[ 87.181427] node 0: [mem 0x000000007eff0000-0x000000007effffff]
[ 87.188108] node 0: [mem 0x000000007f000000-0x00000017ffffffff]
[ 87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
Without this patchset:
[ 117.106153] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[ 117.113677] before memmap_init
[ 117.118195] after memmap_init
>>> memmap_init takes 4518 us
[ 117.121446] before memmap_init
[ 117.154992] after memmap_init
>>> memmap_init takes 33546 us
[ 117.158241] before memmap_init
[ 117.161490] after memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us
With this patchset:
[ 87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[ 87.202314] before memmap_init
[ 87.206164] after memmap_init
>>> memmap_init takes 3850 us
[ 87.209416] before memmap_init
[ 87.226662] after memmap_init
>>> memmap_init takes 17246 us
[ 87.229911] before memmap_init
[ 87.233160] after memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us
Changelog:
V2: - rebase to mmotm latest
- remain memblock_next_valid_pfn on arm64
- refine memblock_search_pfn_regions and pfn_valid_region
Jia He (5):
mm: page_alloc: remain memblock_next_valid_pfn() when
CONFIG_HAVE_ARCH_PFN_VALID is enable
mm: page_alloc: reduce unnecessary binary search in
memblock_next_valid_pfn()
mm/memblock: introduce memblock_search_pfn_regions()
arm64: introduce pfn_valid_region()
mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
arch/arm64/include/asm/page.h | 3 ++-
arch/arm64/mm/init.c | 25 ++++++++++++++++++-
arch/x86/include/asm/mmzone_32.h | 2 +-
include/linux/memblock.h | 6 +++++
include/linux/mmzone.h | 12 ++++++---
mm/memblock.c | 53 ++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 12 ++++++++-
7 files changed, 106 insertions(+), 7 deletions(-)
--
2.7.4
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.
But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
enabled. And as verified by Eugeniu Rosca, arm can benifit from this
commit. So remain the memblock_next_valid_pfn.
Signed-off-by: Jia He <[email protected]>
---
include/linux/memblock.h | 4 ++++
mm/memblock.c | 29 +++++++++++++++++++++++++++++
mm/page_alloc.c | 11 ++++++++++-
3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0257aee..efbbe4b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#endif
+
/**
* for_each_free_mem_range - iterate through free memblock areas
* @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index ba7c878..bea5a9c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
*out_nid = r->nid;
}
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+ struct memblock_type *type = &memblock.memory;
+ unsigned int right = type->cnt;
+ unsigned int mid, left = 0;
+ phys_addr_t addr = PFN_PHYS(++pfn);
+
+ do {
+ mid = (right + left) / 2;
+
+ if (addr < type->regions[mid].base)
+ right = mid;
+ else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+ left = mid + 1;
+ else {
+ /* addr is within the region, so pfn is valid */
+ return pfn;
+ }
+ } while (left < right);
+
+ if (right == type->cnt)
+ return -1UL;
+ else
+ return PHYS_PFN(type->regions[right].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
/**
* memblock_set_node - set node ID on memblock regions
* @base: base of area to set node ID for
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c19f5ac..2a967f7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
- if (!early_pfn_valid(pfn))
+ if (!early_pfn_valid(pfn)) {
+#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
+ /*
+ * Skip to the pfn preceding the next valid one (or
+ * end_pfn), such that we hit a valid pfn (or end_pfn)
+ * on our next iteration of the loop.
+ */
+ pfn = memblock_next_valid_pfn(pfn) - 1;
+#endif
continue;
+ }
if (!early_pfn_in_nid(pfn, nid))
continue;
if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
--
2.7.4
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn.
Signed-off-by: Jia He <[email protected]>
---
include/linux/memblock.h | 2 +-
mm/memblock.c | 73 +++++++++++++++++++++++++++++-------------------
mm/page_alloc.c | 3 +-
3 files changed, 47 insertions(+), 31 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index efbbe4b..a8fb2ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long memblock_next_valid_pfn(unsigned long pfn);
+unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
#endif
/**
diff --git a/mm/memblock.c b/mm/memblock.c
index bea5a9c..06c1a08 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
*out_nid = r->nid;
}
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
-{
- struct memblock_type *type = &memblock.memory;
- unsigned int right = type->cnt;
- unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(++pfn);
-
- do {
- mid = (right + left) / 2;
-
- if (addr < type->regions[mid].base)
- right = mid;
- else if (addr >= (type->regions[mid].base +
- type->regions[mid].size))
- left = mid + 1;
- else {
- /* addr is within the region, so pfn is valid */
- return pfn;
- }
- } while (left < right);
-
- if (right == type->cnt)
- return -1UL;
- else
- return PHYS_PFN(type->regions[right].base);
-}
-#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
-
/**
* memblock_set_node - set node ID on memblock regions
* @base: base of area to set node ID for
@@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
+ int *last_idx)
+{
+ struct memblock_type *type = &memblock.memory;
+ unsigned int right = type->cnt;
+ unsigned int mid, left = 0;
+ unsigned long start_pfn, end_pfn;
+ phys_addr_t addr = PFN_PHYS(++pfn);
+
+ /* fast path, return pfh+1 if next pfn is in the same region */
+ if (*last_idx != -1) {
+ start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+ end_pfn = PFN_DOWN(type->regions[*last_idx].base +
+ type->regions[*last_idx].size);
+
+ if (pfn < end_pfn && pfn > start_pfn)
+ return pfn;
+ }
+
+ /* slow path, do the binary searching */
+ do {
+ mid = (right + left) / 2;
+
+ if (addr < type->regions[mid].base)
+ right = mid;
+ else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+ left = mid + 1;
+ else {
+ *last_idx = mid;
+ return pfn;
+ }
+ } while (left < right);
+
+ if (right == type->cnt)
+ return -1UL;
+
+ *last_idx = right;
+
+ return PHYS_PFN(type->regions[*last_idx].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid, ulong flags)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2a967f7..0bb0274 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
unsigned long end_pfn = start_pfn + size;
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
+ int idx = -1;
unsigned long nr_initialised = 0;
struct page *page;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
@@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* end_pfn), such that we hit a valid pfn (or end_pfn)
* on our next iteration of the loop.
*/
- pfn = memblock_next_valid_pfn(pfn) - 1;
+ pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
#endif
continue;
}
--
2.7.4
This api is the preparation for further optimizing early_pfn_valid
Signed-off-by: Jia He <[email protected]>
---
include/linux/memblock.h | 2 ++
mm/memblock.c | 9 +++++++++
2 files changed, 11 insertions(+)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a8fb2ab..104bca6 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -207,6 +207,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
#endif
+int memblock_search_pfn_regions(unsigned long pfn);
+
/**
* for_each_free_mem_range - iterate through free memblock areas
* @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index 06c1a08..15fcde2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1661,6 +1661,15 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr
return -1;
}
+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+ struct memblock_type *type = &memblock.memory;
+ int mid = memblock_search(type, PFN_PHYS(pfn));
+
+ return mid;
+}
+
bool __init memblock_is_reserved(phys_addr_t addr)
{
return memblock_search(&memblock.reserved, addr) != -1;
--
2.7.4
This is the preparation for further optimizing in early_pfn_valid
on arm64.
Signed-off-by: Jia He <[email protected]>
---
arch/arm64/include/asm/page.h | 3 ++-
arch/arm64/mm/init.c | 25 ++++++++++++++++++++++++-
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 60d02c8..da2cba3 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,7 +38,8 @@ extern void clear_page(void *to);
typedef struct page *pgtable_t;
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
+extern int pfn_valid(unsigned long pfn);
+extern int pfn_valid_region(unsigned long pfn, int *last_idx);
#endif
#include <asm/memory.h>
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..9122102 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn)
return memblock_is_map_memory(pfn << PAGE_SHIFT);
}
EXPORT_SYMBOL(pfn_valid);
-#endif
+
+int pfn_valid_region(unsigned long pfn, int *last_idx)
+{
+ unsigned long start_pfn, end_pfn;
+ struct memblock_type *type = &memblock.memory;
+
+ if (*last_idx != -1) {
+ start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+ end_pfn= PFN_DOWN(type->regions[*last_idx].base +
+ type->regions[*last_idx].size);
+
+ if (pfn >= start_pfn && end_pfn < end_pfn)
+ return !memblock_is_nomap(
+ &memblock.memory.regions[*last_idx]);
+ }
+
+ *last_idx = memblock_search_pfn_regions(pfn);
+ if (*last_idx == -1)
+ return false;
+
+ return !memblock_is_nomap(&memblock.memory.regions[*last_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
#ifndef CONFIG_SPARSEMEM
static void __init arm64_memory_present(void)
--
2.7.4
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check if pfn++ is still in the same region.
Currently it only improve the performance on arm64 and will have no
impact on other arches.
Signed-off-by: Jia He <[email protected]>
---
arch/x86/include/asm/mmzone_32.h | 2 +-
include/linux/mmzone.h | 12 +++++++++---
mm/page_alloc.c | 2 +-
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 73d8dd1..329d3ba 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
return 0;
}
-#define early_pfn_valid(pfn) pfn_valid((pfn))
+#define early_pfn_valid(pfn, last_region_idx) pfn_valid((pfn))
#endif /* CONFIG_DISCONTIGMEM */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d797716..3a686af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
})
#else
#define pfn_to_nid(pfn) (0)
-#endif
+#endif /*CONFIG_NUMA*/
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+#define early_pfn_valid(pfn, last_region_idx) \
+ pfn_valid_region(pfn, last_region_idx)
+#else
+#define early_pfn_valid(pfn, last_region_idx) pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
-#define early_pfn_valid(pfn) pfn_valid(pfn)
void sparse_init(void);
#else
#define sparse_init() do {} while (0)
@@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
};
#ifndef early_pfn_valid
-#define early_pfn_valid(pfn) (1)
+#define early_pfn_valid(pfn, last_region_idx) (1)
#endif
void memory_present(int nid, unsigned long start, unsigned long end);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0bb0274..68aef71 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5484,8 +5484,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
- if (!early_pfn_valid(pfn)) {
#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
+ if (!early_pfn_valid(pfn, &idx)) {
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
--
2.7.4
Hi Jia,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on mmotm/master]
[also build test ERROR on next-20180323]
[cannot apply to v4.16-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jia-He/optimize-memblock_next_valid_pfn-and-early_pfn_valid/20180325-175026
base: git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-x013-201812 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
mm/page_alloc.c: In function 'memmap_init_zone':
>> mm/page_alloc.c:5499:4: error: continue statement not within a loop
continue;
^~~~~~~~
>> mm/page_alloc.c:5501:4: error: break statement not within loop or switch
break;
^~~~~
mm/page_alloc.c:5520:5: error: continue statement not within a loop
continue;
^~~~~~~~
mm/page_alloc.c:5462:6: warning: unused variable 'idx' [-Wunused-variable]
int idx = -1;
^~~
mm/page_alloc.c: At top level:
>> mm/page_alloc.c:5551:1: error: expected identifier or '(' before '}' token
}
^
vim +5499 mm/page_alloc.c
^1da177e4 Linus Torvalds 2005-04-16 5468
22b31eec6 Hugh Dickins 2009-01-06 5469 if (highest_memmap_pfn < end_pfn - 1)
22b31eec6 Hugh Dickins 2009-01-06 5470 highest_memmap_pfn = end_pfn - 1;
22b31eec6 Hugh Dickins 2009-01-06 5471
4b94ffdc4 Dan Williams 2016-01-15 5472 /*
4b94ffdc4 Dan Williams 2016-01-15 5473 * Honor reservation requested by the driver for this ZONE_DEVICE
4b94ffdc4 Dan Williams 2016-01-15 5474 * memory
4b94ffdc4 Dan Williams 2016-01-15 5475 */
4b94ffdc4 Dan Williams 2016-01-15 5476 if (altmap && start_pfn == altmap->base_pfn)
4b94ffdc4 Dan Williams 2016-01-15 5477 start_pfn += altmap->reserve;
4b94ffdc4 Dan Williams 2016-01-15 5478
cbe8dd4af Greg Ungerer 2006-01-12 5479 for (pfn = start_pfn; pfn < end_pfn; pfn++) {
a2f3aa025 Dave Hansen 2007-01-10 5480 /*
b72d0ffb5 Andrew Morton 2016-03-15 5481 * There can be holes in boot-time mem_map[]s handed to this
b72d0ffb5 Andrew Morton 2016-03-15 5482 * function. They do not exist on hotplugged memory.
a2f3aa025 Dave Hansen 2007-01-10 5483 */
b72d0ffb5 Andrew Morton 2016-03-15 5484 if (context != MEMMAP_EARLY)
b72d0ffb5 Andrew Morton 2016-03-15 5485 goto not_early;
b72d0ffb5 Andrew Morton 2016-03-15 5486
c0b211780 Jia He 2018-03-24 5487 #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
94200be7f Jia He 2018-03-24 5488 if (!early_pfn_valid(pfn, &idx)) {
c0b211780 Jia He 2018-03-24 5489 /*
c0b211780 Jia He 2018-03-24 5490 * Skip to the pfn preceding the next valid one (or
c0b211780 Jia He 2018-03-24 5491 * end_pfn), such that we hit a valid pfn (or end_pfn)
c0b211780 Jia He 2018-03-24 5492 * on our next iteration of the loop.
c0b211780 Jia He 2018-03-24 5493 */
5ce6c7e68 Jia He 2018-03-24 5494 pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
c0b211780 Jia He 2018-03-24 5495 #endif
d41dee369 Andy Whitcroft 2005-06-23 5496 continue;
c0b211780 Jia He 2018-03-24 5497 }
751679573 Andy Whitcroft 2006-10-21 5498 if (!early_pfn_in_nid(pfn, nid))
751679573 Andy Whitcroft 2006-10-21 @5499 continue;
b72d0ffb5 Andrew Morton 2016-03-15 5500 if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
3a80a7fa7 Mel Gorman 2015-06-30 @5501 break;
342332e6a Taku Izumi 2016-03-15 5502
342332e6a Taku Izumi 2016-03-15 5503 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
342332e6a Taku Izumi 2016-03-15 5504 /*
b72d0ffb5 Andrew Morton 2016-03-15 5505 * Check given memblock attribute by firmware which can affect
b72d0ffb5 Andrew Morton 2016-03-15 5506 * kernel memory layout. If zone==ZONE_MOVABLE but memory is
b72d0ffb5 Andrew Morton 2016-03-15 5507 * mirrored, it's an overlapped memmap init. skip it.
342332e6a Taku Izumi 2016-03-15 5508 */
342332e6a Taku Izumi 2016-03-15 5509 if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
b72d0ffb5 Andrew Morton 2016-03-15 5510 if (!r || pfn >= memblock_region_memory_end_pfn(r)) {
342332e6a Taku Izumi 2016-03-15 5511 for_each_memblock(memory, tmp)
342332e6a Taku Izumi 2016-03-15 5512 if (pfn < memblock_region_memory_end_pfn(tmp))
342332e6a Taku Izumi 2016-03-15 5513 break;
342332e6a Taku Izumi 2016-03-15 5514 r = tmp;
342332e6a Taku Izumi 2016-03-15 5515 }
342332e6a Taku Izumi 2016-03-15 5516 if (pfn >= memblock_region_memory_base_pfn(r) &&
342332e6a Taku Izumi 2016-03-15 5517 memblock_is_mirror(r)) {
342332e6a Taku Izumi 2016-03-15 5518 /* already initialized as NORMAL */
342332e6a Taku Izumi 2016-03-15 5519 pfn = memblock_region_memory_end_pfn(r);
342332e6a Taku Izumi 2016-03-15 @5520 continue;
342332e6a Taku Izumi 2016-03-15 5521 }
342332e6a Taku Izumi 2016-03-15 5522 }
342332e6a Taku Izumi 2016-03-15 5523 #endif
ac5d2539b Mel Gorman 2015-06-30 5524
b72d0ffb5 Andrew Morton 2016-03-15 5525 not_early:
d08f92e7d Pavel Tatashin 2018-03-23 5526 page = pfn_to_page(pfn);
d08f92e7d Pavel Tatashin 2018-03-23 5527 __init_single_page(page, pfn, zone, nid);
d08f92e7d Pavel Tatashin 2018-03-23 5528 if (context == MEMMAP_HOTPLUG)
d08f92e7d Pavel Tatashin 2018-03-23 5529 SetPageReserved(page);
d08f92e7d Pavel Tatashin 2018-03-23 5530
ac5d2539b Mel Gorman 2015-06-30 5531 /*
ac5d2539b Mel Gorman 2015-06-30 5532 * Mark the block movable so that blocks are reserved for
ac5d2539b Mel Gorman 2015-06-30 5533 * movable at startup. This will force kernel allocations
ac5d2539b Mel Gorman 2015-06-30 5534 * to reserve their blocks rather than leaking throughout
ac5d2539b Mel Gorman 2015-06-30 5535 * the address space during boot when many long-lived
974a786e6 Mel Gorman 2015-11-06 5536 * kernel allocations are made.
ac5d2539b Mel Gorman 2015-06-30 5537 *
ac5d2539b Mel Gorman 2015-06-30 5538 * bitmap is created for zone's valid pfn range. but memmap
ac5d2539b Mel Gorman 2015-06-30 5539 * can be created for invalid pages (for alignment)
ac5d2539b Mel Gorman 2015-06-30 5540 * check here not to call set_pageblock_migratetype() against
ac5d2539b Mel Gorman 2015-06-30 5541 * pfn out of zone.
9bb5a391f Michal Hocko 2018-01-31 5542 *
9bb5a391f Michal Hocko 2018-01-31 5543 * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
9bb5a391f Michal Hocko 2018-01-31 5544 * because this is done early in sparse_add_one_section
ac5d2539b Mel Gorman 2015-06-30 5545 */
ac5d2539b Mel Gorman 2015-06-30 5546 if (!(pfn & (pageblock_nr_pages - 1))) {
ac5d2539b Mel Gorman 2015-06-30 5547 set_pageblock_migratetype(page, MIGRATE_MOVABLE);
9b6e63cbf Michal Hocko 2017-10-03 5548 cond_resched();
^1da177e4 Linus Torvalds 2005-04-16 5549 }
^1da177e4 Linus Torvalds 2005-04-16 5550 }
ac5d2539b Mel Gorman 2015-06-30 @5551 }
^1da177e4 Linus Torvalds 2005-04-16 5552
:::::: The code at line 5499 was first introduced by commit
:::::: 7516795739bd53175629b90fab0ad488d7a6a9f7 [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc
:::::: TO: Andy Whitcroft <[email protected]>
:::::: CC: Linus Torvalds <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Jia,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on mmotm/master]
[cannot apply to v4.16-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jia-He/optimize-memblock_next_valid_pfn-and-early_pfn_valid/20180325-175026
base: git://git.cmpxchg.org/linux-mmotm.git master
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64
All warnings (new ones prefixed by >>):
arch/arm64/mm/init.c: In function 'pfn_valid_region':
>> arch/arm64/mm/init.c:304:35: warning: self-comparison always evaluates to false [-Wtautological-compare]
if (pfn >= start_pfn && end_pfn < end_pfn)
^
vim +304 arch/arm64/mm/init.c
293
294 int pfn_valid_region(unsigned long pfn, int *last_idx)
295 {
296 unsigned long start_pfn, end_pfn;
297 struct memblock_type *type = &memblock.memory;
298
299 if (*last_idx != -1) {
300 start_pfn = PFN_DOWN(type->regions[*last_idx].base);
301 end_pfn= PFN_DOWN(type->regions[*last_idx].base +
302 type->regions[*last_idx].size);
303
> 304 if (pfn >= start_pfn && end_pfn < end_pfn)
305 return !memblock_is_nomap(
306 &memblock.memory.regions[*last_idx]);
307 }
308
309 *last_idx = memblock_search_pfn_regions(pfn);
310 if (*last_idx == -1)
311 return false;
312
313 return !memblock_is_nomap(&memblock.memory.regions[*last_idx]);
314 }
315 EXPORT_SYMBOL(pfn_valid_region);
316 #endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
317
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Sat, Mar 24, 2018 at 1:24 PM, Jia He <[email protected]> wrote:
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But it causes
> possible panic bug. So Daniel Vacek reverted it later.
>
> But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
> enabled. And as verified by Eugeniu Rosca, arm can benifit from this
> commit. So remain the memblock_next_valid_pfn.
It is not dependent on CONFIG_HAVE_ARCH_PFN_VALID option but on
arm(64) implementation of pfn_valid() function, IIUC. So it should
really be moved from generic source file to arm specific location. I'd
say somewhere close to the pfn_valid() implementation. Such as to
arch/arm{,64}/mm/ init.c-ish?
--nX
> Signed-off-by: Jia He <[email protected]>
> ---
> include/linux/memblock.h | 4 ++++
> mm/memblock.c | 29 +++++++++++++++++++++++++++++
> mm/page_alloc.c | 11 ++++++++++-
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 0257aee..efbbe4b 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> +unsigned long memblock_next_valid_pfn(unsigned long pfn);
> +#endif
> +
> /**
> * for_each_free_mem_range - iterate through free memblock areas
> * @i: u64 used as loop variable
> diff --git a/mm/memblock.c b/mm/memblock.c
> index ba7c878..bea5a9c 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
> *out_nid = r->nid;
> }
>
> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
> +{
> + struct memblock_type *type = &memblock.memory;
> + unsigned int right = type->cnt;
> + unsigned int mid, left = 0;
> + phys_addr_t addr = PFN_PHYS(++pfn);
> +
> + do {
> + mid = (right + left) / 2;
> +
> + if (addr < type->regions[mid].base)
> + right = mid;
> + else if (addr >= (type->regions[mid].base +
> + type->regions[mid].size))
> + left = mid + 1;
> + else {
> + /* addr is within the region, so pfn is valid */
> + return pfn;
> + }
> + } while (left < right);
> +
> + if (right == type->cnt)
> + return -1UL;
> + else
> + return PHYS_PFN(type->regions[right].base);
> +}
> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
> +
> /**
> * memblock_set_node - set node ID on memblock regions
> * @base: base of area to set node ID for
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c19f5ac..2a967f7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> if (context != MEMMAP_EARLY)
> goto not_early;
>
> - if (!early_pfn_valid(pfn))
> + if (!early_pfn_valid(pfn)) {
> +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
> + /*
> + * Skip to the pfn preceding the next valid one (or
> + * end_pfn), such that we hit a valid pfn (or end_pfn)
> + * on our next iteration of the loop.
> + */
> + pfn = memblock_next_valid_pfn(pfn) - 1;
> +#endif
> continue;
> + }
> if (!early_pfn_in_nid(pfn, nid))
> continue;
> if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
> --
> 2.7.4
>
On 3/28/2018 12:52 AM, Daniel Vacek Wrote:
> On Sat, Mar 24, 2018 at 1:24 PM, Jia He <[email protected]> wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic bug. So Daniel Vacek reverted it later.
>>
>> But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
>> enabled. And as verified by Eugeniu Rosca, arm can benifit from this
>> commit. So remain the memblock_next_valid_pfn.
> It is not dependent on CONFIG_HAVE_ARCH_PFN_VALID option but on
> arm(64) implementation of pfn_valid() function, IIUC. So it should
> really be moved from generic source file to arm specific location. I'd
> say somewhere close to the pfn_valid() implementation. Such as to
> arch/arm{,64}/mm/ init.c-ish?
>
> --nX
Ok, thanks for your suggestions.
I will try to move theĀ related codes to arm arch directory.
Cheer,
Jia
On 3/28/2018 12:52 AM, Daniel Vacek Wrote:
> On Sat, Mar 24, 2018 at 1:24 PM, Jia He <[email protected]> wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic bug. So Daniel Vacek reverted it later.
>>
>> But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
>> enabled. And as verified by Eugeniu Rosca, arm can benifit from this
>> commit. So remain the memblock_next_valid_pfn.
> It is not dependent on CONFIG_HAVE_ARCH_PFN_VALID option but on
> arm(64) implementation of pfn_valid() function, IIUC. So it should
> really be moved from generic source file to arm specific location. I'd
> say somewhere close to the pfn_valid() implementation. Such as to
> arch/arm{,64}/mm/ init.c-ish?
>
> --nX
Hi Daniel
I didn't catch the reason why "It is not dependent on
CONFIG_HAVE_ARCH_PFN_VALID option but
on arm(64) implementation of pfn_valid() function"? Can you explain more
about it? Thanks
What's your thought if I changed the codes as followed?
in include/linux/memblock.h
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
#else
#define memblock_next_valid_pfn(pfn) (pfn + 1)
#endif
Cheers,
Jia
>
>> Signed-off-by: Jia He <[email protected]>
>> ---
>> include/linux/memblock.h | 4 ++++
>> mm/memblock.c | 29 +++++++++++++++++++++++++++++
>> mm/page_alloc.c | 11 ++++++++++-
>> 3 files changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index 0257aee..efbbe4b 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>> i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> +#endif
>> +
>> /**
>> * for_each_free_mem_range - iterate through free memblock areas
>> * @i: u64 used as loop variable
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index ba7c878..bea5a9c 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
>> *out_nid = r->nid;
>> }
>>
>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>> +{
>> + struct memblock_type *type = &memblock.memory;
>> + unsigned int right = type->cnt;
>> + unsigned int mid, left = 0;
>> + phys_addr_t addr = PFN_PHYS(++pfn);
>> +
>> + do {
>> + mid = (right + left) / 2;
>> +
>> + if (addr < type->regions[mid].base)
>> + right = mid;
>> + else if (addr >= (type->regions[mid].base +
>> + type->regions[mid].size))
>> + left = mid + 1;
>> + else {
>> + /* addr is within the region, so pfn is valid */
>> + return pfn;
>> + }
>> + } while (left < right);
>> +
>> + if (right == type->cnt)
>> + return -1UL;
>> + else
>> + return PHYS_PFN(type->regions[right].base);
>> +}
>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>> +
>> /**
>> * memblock_set_node - set node ID on memblock regions
>> * @base: base of area to set node ID for
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c19f5ac..2a967f7 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> if (context != MEMMAP_EARLY)
>> goto not_early;
>>
>> - if (!early_pfn_valid(pfn))
>> + if (!early_pfn_valid(pfn)) {
>> +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
>> + /*
>> + * Skip to the pfn preceding the next valid one (or
>> + * end_pfn), such that we hit a valid pfn (or end_pfn)
>> + * on our next iteration of the loop.
>> + */
>> + pfn = memblock_next_valid_pfn(pfn) - 1;
>> +#endif
>> continue;
>> + }
>> if (!early_pfn_in_nid(pfn, nid))
>> continue;
>> if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
>> --
>> 2.7.4
>>
--
Cheers,
Jia
On Sat, Mar 24, 2018 at 05:24:38AM -0700, Jia He wrote:
>Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>where possible") optimized the loop in memmap_init_zone(). But it causes
>possible panic bug. So Daniel Vacek reverted it later.
>
Why this has a bug? Do you have some link about it?
If the audience could know the potential risk, it would be helpful to review
the code and decide whether to take it back.
>But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
>enabled. And as verified by Eugeniu Rosca, arm can benifit from this
>commit. So remain the memblock_next_valid_pfn.
>
>Signed-off-by: Jia He <[email protected]>
>---
> include/linux/memblock.h | 4 ++++
> mm/memblock.c | 29 +++++++++++++++++++++++++++++
> mm/page_alloc.c | 11 ++++++++++-
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
>diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>index 0257aee..efbbe4b 100644
>--- a/include/linux/memblock.h
>+++ b/include/linux/memblock.h
>@@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
>+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>+unsigned long memblock_next_valid_pfn(unsigned long pfn);
>+#endif
>+
> /**
> * for_each_free_mem_range - iterate through free memblock areas
> * @i: u64 used as loop variable
>diff --git a/mm/memblock.c b/mm/memblock.c
>index ba7c878..bea5a9c 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
> *out_nid = r->nid;
> }
>
>+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>+{
>+ struct memblock_type *type = &memblock.memory;
>+ unsigned int right = type->cnt;
>+ unsigned int mid, left = 0;
>+ phys_addr_t addr = PFN_PHYS(++pfn);
>+
>+ do {
>+ mid = (right + left) / 2;
>+
>+ if (addr < type->regions[mid].base)
>+ right = mid;
>+ else if (addr >= (type->regions[mid].base +
>+ type->regions[mid].size))
>+ left = mid + 1;
>+ else {
>+ /* addr is within the region, so pfn is valid */
>+ return pfn;
>+ }
>+ } while (left < right);
>+
>+ if (right == type->cnt)
>+ return -1UL;
>+ else
>+ return PHYS_PFN(type->regions[right].base);
>+}
>+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>+
> /**
> * memblock_set_node - set node ID on memblock regions
> * @base: base of area to set node ID for
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index c19f5ac..2a967f7 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> if (context != MEMMAP_EARLY)
> goto not_early;
>
>- if (!early_pfn_valid(pfn))
>+ if (!early_pfn_valid(pfn)) {
>+#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
In commit b92df1de5d28, it use CONFIG_HAVE_MEMBLOCK_NODE_MAP.
Not get the point of your change.
>+ /*
>+ * Skip to the pfn preceding the next valid one (or
>+ * end_pfn), such that we hit a valid pfn (or end_pfn)
>+ * on our next iteration of the loop.
>+ */
>+ pfn = memblock_next_valid_pfn(pfn) - 1;
>+#endif
> continue;
>+ }
> if (!early_pfn_in_nid(pfn, nid))
> continue;
> if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))
>--
>2.7.4
--
Wei Yang
Help you, Help me
On Wed, Mar 28, 2018 at 11:26 AM, Jia He <[email protected]> wrote:
>
>
> On 3/28/2018 12:52 AM, Daniel Vacek Wrote:
>>
>> On Sat, Mar 24, 2018 at 1:24 PM, Jia He <[email protected]> wrote:
>>>
>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>> where possible") optimized the loop in memmap_init_zone(). But it causes
>>> possible panic bug. So Daniel Vacek reverted it later.
>>>
>>> But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
>>> enabled. And as verified by Eugeniu Rosca, arm can benifit from this
>>> commit. So remain the memblock_next_valid_pfn.
>>
>> It is not dependent on CONFIG_HAVE_ARCH_PFN_VALID option but on
>> arm(64) implementation of pfn_valid() function, IIUC. So it should
>> really be moved from generic source file to arm specific location. I'd
>> say somewhere close to the pfn_valid() implementation. Such as to
>> arch/arm{,64}/mm/ init.c-ish?
>>
>> --nX
>
> Hi Daniel
> I didn't catch the reason why "It is not dependent on
> CONFIG_HAVE_ARCH_PFN_VALID option but
> on arm(64) implementation of pfn_valid() function"? Can you explain more
> about it? Thanks
Arm implementation of pfn_valid() is actually based on memblock as
HAVE_MEMBLOCK is mandatory for arm so memblock is guaranteed to always
be available, IIUC. Correct me if I am wrong here.
With that said, you are fine with using memblock to skip gaps and
finding next valid frame.
Though the generic version of pfn_valid() is based on mem sections and
memblock_next_valid_pfn() does not always return the next valid one
but skips more resulting in some valid frames to be skipped (as if
they were invalid). And that's why kernel was eventually crashing on
some !arm machines.
Now, if any other architecture defines CONFIG_HAVE_ARCH_PFN_VALID and
implements it's own version of pfn_valid(), there is no guarantee that
it will be based on memblock data or somehow equivalent to the arm
implementation, right?
At this moment only arm implements CONFIG_HAVE_ARCH_PFN_VALID. Maybe
it could be generalized to something like CONFIG_MEMBLOCK_PFN_VALID
and moved to generic code. And then you can base your changes on that.
But I am not sure if that is possible.
> What's your thought if I changed the codes as followed?
> in include/linux/memblock.h
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
> extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
> #else
> #define memblock_next_valid_pfn(pfn) (pfn + 1)
> #endif
I think I'd rather do something like this:
if (!early_pfn_valid(pfn)) {
pfn = skip_to_last_invalid_pfn(pfn);
continue;
}
And than for arm define:
#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
#define skip_to_last_invalid_pfn(pfn) (memblock_next_valid_pfn(pfn,
&early_idx) - 1)
#endif
And for the generic fallback:
#ifndef skip_to_last_invalid_pfn
#define skip_to_last_invalid_pfn(pfn) (pfn)
#endif
--nX
> Cheers,
> Jia
>
>>
>>> Signed-off-by: Jia He <[email protected]>
>>> ---
>>> include/linux/memblock.h | 4 ++++
>>> mm/memblock.c | 29 +++++++++++++++++++++++++++++
>>> mm/page_alloc.c | 11 ++++++++++-
>>> 3 files changed, 43 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>>> index 0257aee..efbbe4b 100644
>>> --- a/include/linux/memblock.h
>>> +++ b/include/linux/memblock.h
>>> @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid,
>>> unsigned long *out_start_pfn,
>>> i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end,
>>> p_nid))
>>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>>
>>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>> +unsigned long memblock_next_valid_pfn(unsigned long pfn);
>>> +#endif
>>> +
>>> /**
>>> * for_each_free_mem_range - iterate through free memblock areas
>>> * @i: u64 used as loop variable
>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>> index ba7c878..bea5a9c 100644
>>> --- a/mm/memblock.c
>>> +++ b/mm/memblock.c
>>> @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int
>>> *idx, int nid,
>>> *out_nid = r->nid;
>>> }
>>>
>>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
>>> +{
>>> + struct memblock_type *type = &memblock.memory;
>>> + unsigned int right = type->cnt;
>>> + unsigned int mid, left = 0;
>>> + phys_addr_t addr = PFN_PHYS(++pfn);
>>> +
>>> + do {
>>> + mid = (right + left) / 2;
>>> +
>>> + if (addr < type->regions[mid].base)
>>> + right = mid;
>>> + else if (addr >= (type->regions[mid].base +
>>> + type->regions[mid].size))
>>> + left = mid + 1;
>>> + else {
>>> + /* addr is within the region, so pfn is valid */
>>> + return pfn;
>>> + }
>>> + } while (left < right);
>>> +
>>> + if (right == type->cnt)
>>> + return -1UL;
>>> + else
>>> + return PHYS_PFN(type->regions[right].base);
>>> +}
>>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
>>> +
>>> /**
>>> * memblock_set_node - set node ID on memblock regions
>>> * @base: base of area to set node ID for
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index c19f5ac..2a967f7 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long
>>> size, int nid, unsigned long zone,
>>> if (context != MEMMAP_EARLY)
>>> goto not_early;
>>>
>>> - if (!early_pfn_valid(pfn))
>>> + if (!early_pfn_valid(pfn)) {
>>> +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined
>>> CONFIG_HAVE_ARCH_PFN_VALID)
>>> + /*
>>> + * Skip to the pfn preceding the next valid one
>>> (or
>>> + * end_pfn), such that we hit a valid pfn (or
>>> end_pfn)
>>> + * on our next iteration of the loop.
>>> + */
>>> + pfn = memblock_next_valid_pfn(pfn) - 1;
>>> +#endif
>>> continue;
>>> + }
>>> if (!early_pfn_in_nid(pfn, nid))
>>> continue;
>>> if (!update_defer_init(pgdat, pfn, end_pfn,
>>> &nr_initialised))
>>> --
>>> 2.7.4
>>>
>
> --
> Cheers,
> Jia
>