2020-07-28 05:12:45

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 00/15] memblock: seasonal cleaning^w cleanup

From: Mike Rapoport <[email protected]>

Hi,

These patches simplify several uses of memblock iterators and hide some of
the memblock implementation details from the rest of the system.

The patches are on top of v5.8-rc7 + cherry-pick of "mm/sparse: cleanup the
code surrounding memory_present()" [1] from mmotm tree.

[1] http://lkml.kernel.org/r/[email protected]

Mike Rapoport (15):
KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
dma-contiguous: simplify cma_early_percent_memory()
arm, xtensa: simplify initialization of high memory pages
arm64: numa: simplify dummy_numa_init()
h8300, nds32, openrisc: simplify detection of memory extents
powerpc: fadamp: simplify fadump_reserve_crash_area()
riscv: drop unneeded node initialization
mircoblaze: drop unneeded NUMA and sparsemem initializations
memblock: make for_each_memblock_type() iterator private
memblock: make memblock_debug and related functionality private
memblock: reduce number of parameters in for_each_mem_range()
arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
arch, drivers: replace for_each_membock() with for_each_mem_range()
x86/numa: remove redundant iteration over memblock.reserved
memblock: remove 'type' parameter from for_each_memblock()

.clang-format | 1 +
arch/arm/kernel/setup.c | 18 +++++---
arch/arm/mm/init.c | 59 +++++-------------------
arch/arm/mm/mmu.c | 39 ++++++----------
arch/arm/mm/pmsa-v7.c | 20 ++++----
arch/arm/mm/pmsa-v8.c | 17 ++++---
arch/arm/xen/mm.c | 7 +--
arch/arm64/kernel/machine_kexec_file.c | 6 +--
arch/arm64/kernel/setup.c | 2 +-
arch/arm64/mm/init.c | 11 ++---
arch/arm64/mm/kasan_init.c | 8 ++--
arch/arm64/mm/mmu.c | 11 ++---
arch/arm64/mm/numa.c | 15 +++---
arch/c6x/kernel/setup.c | 9 ++--
arch/h8300/kernel/setup.c | 8 ++--
arch/microblaze/mm/init.c | 24 ++--------
arch/mips/cavium-octeon/dma-octeon.c | 12 ++---
arch/mips/kernel/setup.c | 31 ++++++-------
arch/mips/netlogic/xlp/setup.c | 2 +-
arch/nds32/kernel/setup.c | 8 +---
arch/openrisc/kernel/setup.c | 9 +---
arch/openrisc/mm/init.c | 8 ++--
arch/powerpc/kernel/fadump.c | 58 ++++++++---------------
arch/powerpc/kvm/book3s_hv_builtin.c | 11 +----
arch/powerpc/mm/book3s64/hash_utils.c | 16 +++----
arch/powerpc/mm/book3s64/radix_pgtable.c | 11 ++---
arch/powerpc/mm/kasan/kasan_init_32.c | 8 ++--
arch/powerpc/mm/mem.c | 33 +++++++------
arch/powerpc/mm/numa.c | 7 +--
arch/powerpc/mm/pgtable_32.c | 8 ++--
arch/riscv/mm/init.c | 33 ++++---------
arch/riscv/mm/kasan_init.c | 10 ++--
arch/s390/kernel/crash_dump.c | 8 ++--
arch/s390/kernel/setup.c | 31 ++++++++-----
arch/s390/mm/page-states.c | 6 +--
arch/s390/mm/vmem.c | 16 ++++---
arch/sh/mm/init.c | 9 ++--
arch/sparc/mm/init_64.c | 12 ++---
arch/x86/mm/numa.c | 26 ++++-------
arch/xtensa/mm/init.c | 55 ++++------------------
drivers/bus/mvebu-mbus.c | 12 ++---
drivers/s390/char/zcore.c | 9 ++--
include/linux/memblock.h | 45 +++++++++---------
kernel/dma/contiguous.c | 11 +----
mm/memblock.c | 28 +++++++----
mm/page_alloc.c | 11 ++---
mm/sparse.c | 10 ++--
47 files changed, 324 insertions(+), 485 deletions(-)

--
2.26.2


2020-07-28 05:13:06

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 02/15] dma-contiguous: simplify cma_early_percent_memory()

From: Mike Rapoport <[email protected]>

The memory size calculation in cma_early_percent_memory() traverses
memblock.memory rather than simply call memblock_phys_mem_size(). The
comment in that function suggests that at some point there should have been
call to memblock_analyze() before memblock_phys_mem_size() could be used.
As of now, there is no memblock_analyze() at all and
memblock_phys_mem_size() can be used as soon as cold-plug memory is
registerd with memblock.

Replace loop over memblock.memory with a call to memblock_phys_mem_size().

Signed-off-by: Mike Rapoport <[email protected]>
---
kernel/dma/contiguous.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 15bc5026c485..1992afd8ca7b 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -73,16 +73,7 @@ early_param("cma", early_cma);

static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
{
- struct memblock_region *reg;
- unsigned long total_pages = 0;
-
- /*
- * We cannot use memblock_phys_mem_size() here, because
- * memblock_analyze() has not been called yet.
- */
- for_each_memblock(memory, reg)
- total_pages += memblock_region_memory_end_pfn(reg) -
- memblock_region_memory_base_pfn(reg);
+ unsigned long total_pages = PHYS_PFN(memblock_phys_mem_size());

return (total_pages * CONFIG_CMA_SIZE_PERCENTAGE / 100) << PAGE_SHIFT;
}
--
2.26.2

2020-07-28 05:13:17

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 03/15] arm, xtensa: simplify initialization of high memory pages

From: Mike Rapoport <[email protected]>

The function free_highpages() in both arm and xtensa essentially open-code
for_each_free_mem_range() loop to detect high memory pages that were not
reserved and that should be initialized and passed to the buddy allocator.

Replace open-coded implementation of for_each_free_mem_range() with usage
of memblock API to simplify the code.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/arm/mm/init.c | 48 +++++++------------------------------
arch/xtensa/mm/init.c | 55 ++++++++-----------------------------------
2 files changed, 18 insertions(+), 85 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 01e18e43b174..626af348eb8f 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -352,61 +352,29 @@ static void __init free_unused_memmap(void)
#endif
}

-#ifdef CONFIG_HIGHMEM
-static inline void free_area_high(unsigned long pfn, unsigned long end)
-{
- for (; pfn < end; pfn++)
- free_highmem_page(pfn_to_page(pfn));
-}
-#endif
-
static void __init free_highpages(void)
{
#ifdef CONFIG_HIGHMEM
unsigned long max_low = max_low_pfn;
- struct memblock_region *mem, *res;
+ phys_addr_t range_start, range_end;
+ u64 i;

/* set highmem page free */
- for_each_memblock(memory, mem) {
- unsigned long start = memblock_region_memory_base_pfn(mem);
- unsigned long end = memblock_region_memory_end_pfn(mem);
+ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE,
+ &range_start, &range_end, NULL) {
+ unsigned long start = PHYS_PFN(range_start);
+ unsigned long end = PHYS_PFN(range_end);

/* Ignore complete lowmem entries */
if (end <= max_low)
continue;

- if (memblock_is_nomap(mem))
- continue;
-
/* Truncate partial highmem entries */
if (start < max_low)
start = max_low;

- /* Find and exclude any reserved regions */
- for_each_memblock(reserved, res) {
- unsigned long res_start, res_end;
-
- res_start = memblock_region_reserved_base_pfn(res);
- res_end = memblock_region_reserved_end_pfn(res);
-
- if (res_end < start)
- continue;
- if (res_start < start)
- res_start = start;
- if (res_start > end)
- res_start = end;
- if (res_end > end)
- res_end = end;
- if (res_start != start)
- free_area_high(start, res_start);
- start = res_end;
- if (start == end)
- break;
- }
-
- /* And now free anything which remains */
- if (start < end)
- free_area_high(start, end);
+ for (; start < end; start++)
+ free_highmem_page(pfn_to_page(start));
}
#endif
}
diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
index a05b306cf371..ad9d59d93f39 100644
--- a/arch/xtensa/mm/init.c
+++ b/arch/xtensa/mm/init.c
@@ -79,67 +79,32 @@ void __init zones_init(void)
free_area_init(max_zone_pfn);
}

-#ifdef CONFIG_HIGHMEM
-static void __init free_area_high(unsigned long pfn, unsigned long end)
-{
- for (; pfn < end; pfn++)
- free_highmem_page(pfn_to_page(pfn));
-}
-
static void __init free_highpages(void)
{
+#ifdef CONFIG_HIGHMEM
unsigned long max_low = max_low_pfn;
- struct memblock_region *mem, *res;
+ phys_addr_t range_start, range_end;
+ u64 i;

- reset_all_zones_managed_pages();
/* set highmem page free */
- for_each_memblock(memory, mem) {
- unsigned long start = memblock_region_memory_base_pfn(mem);
- unsigned long end = memblock_region_memory_end_pfn(mem);
+ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE,
+ &range_start, &range_end, NULL) {
+ unsigned long start = PHYS_PFN(range_start);
+ unsigned long end = PHYS_PFN(range_end);

/* Ignore complete lowmem entries */
if (end <= max_low)
continue;

- if (memblock_is_nomap(mem))
- continue;
-
/* Truncate partial highmem entries */
if (start < max_low)
start = max_low;

- /* Find and exclude any reserved regions */
- for_each_memblock(reserved, res) {
- unsigned long res_start, res_end;
-
- res_start = memblock_region_reserved_base_pfn(res);
- res_end = memblock_region_reserved_end_pfn(res);
-
- if (res_end < start)
- continue;
- if (res_start < start)
- res_start = start;
- if (res_start > end)
- res_start = end;
- if (res_end > end)
- res_end = end;
- if (res_start != start)
- free_area_high(start, res_start);
- start = res_end;
- if (start == end)
- break;
- }
-
- /* And now free anything which remains */
- if (start < end)
- free_area_high(start, end);
+ for (; start < end; start++)
+ free_highmem_page(pfn_to_page(start));
}
-}
-#else
-static void __init free_highpages(void)
-{
-}
#endif
+}

/*
* Initialize memory pages.
--
2.26.2

2020-07-28 05:13:46

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 04/15] arm64: numa: simplify dummy_numa_init()

From: Mike Rapoport <[email protected]>

dummy_numa_init() loops over memblock.memory and passes nid=0 to
numa_add_memblk() which essentially wraps memblock_set_node(). However,
memblock_set_node() can cope with entire memory span itself, so the loop
over memblock.memory regions is redundant.

Replace the loop with a single call to memblock_set_node() to the entire
memory.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/arm64/mm/numa.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index aafcee3e3f7e..0cbdbcc885fb 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -423,19 +423,16 @@ static int __init numa_init(int (*init_func)(void))
*/
static int __init dummy_numa_init(void)
{
+ phys_addr_t start = memblock_start_of_DRAM();
+ phys_addr_t end = memblock_end_of_DRAM();
int ret;
- struct memblock_region *mblk;

if (numa_off)
pr_info("NUMA disabled\n"); /* Forced off on command line. */
- pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
- memblock_start_of_DRAM(), memblock_end_of_DRAM() - 1);
-
- for_each_memblock(memory, mblk) {
- ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
- if (!ret)
- continue;
+ pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1);

+ ret = numa_add_memblk(0, start, end);
+ if (ret) {
pr_err("NUMA init failed\n");
return ret;
}
--
2.26.2

2020-07-28 05:13:55

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 05/15] h8300, nds32, openrisc: simplify detection of memory extents

From: Mike Rapoport <[email protected]>

Instead of traversing memblock.memory regions to find memory_start and
memory_end, simply query memblock_{start,end}_of_DRAM().

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/h8300/kernel/setup.c | 8 +++-----
arch/nds32/kernel/setup.c | 8 ++------
arch/openrisc/kernel/setup.c | 9 ++-------
3 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/h8300/kernel/setup.c b/arch/h8300/kernel/setup.c
index 28ac88358a89..0281f92eea3d 100644
--- a/arch/h8300/kernel/setup.c
+++ b/arch/h8300/kernel/setup.c
@@ -74,17 +74,15 @@ static void __init bootmem_init(void)
memory_end = memory_start = 0;

/* Find main memory where is the kernel */
- for_each_memblock(memory, region) {
- memory_start = region->base;
- memory_end = region->base + region->size;
- }
+ memory_start = memblock_start_of_DRAM();
+ memory_end = memblock_end_of_DRAM();

if (!memory_end)
panic("No memory!");

/* setup bootmem globals (we use no_bootmem, but mm still depends on this) */
min_low_pfn = PFN_UP(memory_start);
- max_low_pfn = PFN_DOWN(memblock_end_of_DRAM());
+ max_low_pfn = PFN_DOWN(memory_end);
max_pfn = max_low_pfn;

memblock_reserve(__pa(_stext), _end - _stext);
diff --git a/arch/nds32/kernel/setup.c b/arch/nds32/kernel/setup.c
index a066efbe53c0..c356e484dcab 100644
--- a/arch/nds32/kernel/setup.c
+++ b/arch/nds32/kernel/setup.c
@@ -249,12 +249,8 @@ static void __init setup_memory(void)
memory_end = memory_start = 0;

/* Find main memory where is the kernel */
- for_each_memblock(memory, region) {
- memory_start = region->base;
- memory_end = region->base + region->size;
- pr_info("%s: Memory: 0x%x-0x%x\n", __func__,
- memory_start, memory_end);
- }
+ memory_start = memblock_start_of_DRAM();
+ memory_end = memblock_end_of_DRAM();

if (!memory_end) {
panic("No memory!");
diff --git a/arch/openrisc/kernel/setup.c b/arch/openrisc/kernel/setup.c
index 8aa438e1f51f..c5706153d3b6 100644
--- a/arch/openrisc/kernel/setup.c
+++ b/arch/openrisc/kernel/setup.c
@@ -48,17 +48,12 @@ static void __init setup_memory(void)
unsigned long ram_start_pfn;
unsigned long ram_end_pfn;
phys_addr_t memory_start, memory_end;
- struct memblock_region *region;

memory_end = memory_start = 0;

/* Find main memory where is the kernel, we assume its the only one */
- for_each_memblock(memory, region) {
- memory_start = region->base;
- memory_end = region->base + region->size;
- printk(KERN_INFO "%s: Memory: 0x%x-0x%x\n", __func__,
- memory_start, memory_end);
- }
+ memory_start = memblock_start_of_DRAM();
+ memory_end = memblock_end_of_DRAM();

if (!memory_end) {
panic("No memory!");
--
2.26.2

2020-07-28 05:14:14

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()

From: Mike Rapoport <[email protected]>

fadump_reserve_crash_area() reserves memory from a specified base address
till the end of the RAM.

Replace iteration through the memblock.memory with a single call to
memblock_reserve() with appropriate that will take care of proper memory
reservation.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/powerpc/kernel/fadump.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 78ab9a6ee6ac..2446a61e3c25 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void)
/* Preserve everything above the base address */
static void __init fadump_reserve_crash_area(u64 base)
{
- struct memblock_region *reg;
- u64 mstart, msize;
-
- for_each_memblock(memory, reg) {
- mstart = reg->base;
- msize = reg->size;
-
- if ((mstart + msize) < base)
- continue;
-
- if (mstart < base) {
- msize -= (base - mstart);
- mstart = base;
- }
-
- pr_info("Reserving %lluMB of memory at %#016llx for preserving crash data",
- (msize >> 20), mstart);
- memblock_reserve(mstart, msize);
- }
+ memblock_reserve(base, memblock_end_of_DRAM() - base);
}

unsigned long __init arch_reserved_kernel_pages(void)
--
2.26.2

2020-07-28 05:14:20

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 07/15] riscv: drop unneeded node initialization

From: Mike Rapoport <[email protected]>

RISC-V does not (yet) support NUMA and for UMA architectures node 0 is
used implicitly during early memory initialization.

There is no need to call memblock_set_node(), remove this call and the
surrounding code.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/riscv/mm/init.c | 9 ---------
1 file changed, 9 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 79e9d55bdf1a..7440ba2cdaaa 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -191,15 +191,6 @@ void __init setup_bootmem(void)
early_init_fdt_scan_reserved_mem();
memblock_allow_resize();
memblock_dump_all();
-
- for_each_memblock(memory, reg) {
- unsigned long start_pfn = memblock_region_memory_base_pfn(reg);
- unsigned long end_pfn = memblock_region_memory_end_pfn(reg);
-
- memblock_set_node(PFN_PHYS(start_pfn),
- PFN_PHYS(end_pfn - start_pfn),
- &memblock.memory, 0);
- }
}

#ifdef CONFIG_MMU
--
2.26.2

2020-07-28 05:14:56

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 10/15] memblock: make memblock_debug and related functionality private

From: Mike Rapoport <[email protected]>

The only user of memblock_dbg() outside memblock was s390 setup code and it
is converted to use pr_debug() instead.
This allows to stop exposing memblock_debug and memblock_dbg() to the rest
of the kernel.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/s390/kernel/setup.c | 4 ++--
include/linux/memblock.h | 12 +-----------
mm/memblock.c | 13 +++++++++++--
3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 07aa15ba43b3..8b284cf6e199 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -776,8 +776,8 @@ static void __init memblock_add_mem_detect_info(void)
unsigned long start, end;
int i;

- memblock_dbg("physmem info source: %s (%hhd)\n",
- get_mem_info_source(), mem_detect.info_source);
+ pr_debug("physmem info source: %s (%hhd)\n",
+ get_mem_info_source(), mem_detect.info_source);
/* keep memblock lists close to the kernel */
memblock_set_bottom_up(true);
for_each_mem_detect_block(i, &start, &end) {
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 220b5f0dad42..e6a23b3db696 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -90,7 +90,6 @@ struct memblock {
};

extern struct memblock memblock;
-extern int memblock_debug;

#ifndef CONFIG_ARCH_KEEP_MEMBLOCK
#define __init_memblock __meminit
@@ -102,9 +101,6 @@ void memblock_discard(void);
static inline void memblock_discard(void) {}
#endif

-#define memblock_dbg(fmt, ...) \
- if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
-
phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
phys_addr_t size, phys_addr_t align);
void memblock_allow_resize(void);
@@ -456,13 +452,7 @@ bool memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
bool memblock_is_reserved(phys_addr_t addr);
bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);

-extern void __memblock_dump_all(void);
-
-static inline void memblock_dump_all(void)
-{
- if (memblock_debug)
- __memblock_dump_all();
-}
+void memblock_dump_all(void);

/**
* memblock_set_current_limit - Set the current allocation limit to allow
diff --git a/mm/memblock.c b/mm/memblock.c
index a5b9b3df81fc..824938849f6d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -134,7 +134,10 @@ struct memblock memblock __initdata_memblock = {
i < memblock_type->cnt; \
i++, rgn = &memblock_type->regions[i])

-int memblock_debug __initdata_memblock;
+#define memblock_dbg(fmt, ...) \
+ if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+static int memblock_debug __initdata_memblock;
static bool system_has_some_mirror __initdata_memblock = false;
static int memblock_can_resize __initdata_memblock;
static int memblock_memory_in_slab __initdata_memblock = 0;
@@ -1919,7 +1922,7 @@ static void __init_memblock memblock_dump(struct memblock_type *type)
}
}

-void __init_memblock __memblock_dump_all(void)
+static void __init_memblock __memblock_dump_all(void)
{
pr_info("MEMBLOCK configuration:\n");
pr_info(" memory size = %pa reserved size = %pa\n",
@@ -1933,6 +1936,12 @@ void __init_memblock __memblock_dump_all(void)
#endif
}

+void __init_memblock memblock_dump_all(void)
+{
+ if (memblock_debug)
+ __memblock_dump_all();
+}
+
void __init memblock_allow_resize(void)
{
memblock_can_resize = 1;
--
2.26.2

2020-07-28 05:15:02

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 01/15] KVM: PPC: Book3S HV: simplify kvm_cma_reserve()

From: Mike Rapoport <[email protected]>

The memory size calculation in kvm_cma_reserve() traverses memblock.memory
rather than simply call memblock_phys_mem_size(). The comment in that
function suggests that at some point there should have been call to
memblock_analyze() before memblock_phys_mem_size() could be used.
As of now, there is no memblock_analyze() at all and
memblock_phys_mem_size() can be used as soon as cold-plug memory is
registerd with memblock.

Replace loop over memblock.memory with a call to memblock_phys_mem_size().

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7cd3cf3d366b..56ab0d28de2a 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -95,22 +95,15 @@ EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
void __init kvm_cma_reserve(void)
{
unsigned long align_size;
- struct memblock_region *reg;
- phys_addr_t selected_size = 0;
+ phys_addr_t selected_size;

/*
* We need CMA reservation only when we are in HV mode
*/
if (!cpu_has_feature(CPU_FTR_HVMODE))
return;
- /*
- * We cannot use memblock_phys_mem_size() here, because
- * memblock_analyze() has not been called yet.
- */
- for_each_memblock(memory, reg)
- selected_size += memblock_region_memory_end_pfn(reg) -
- memblock_region_memory_base_pfn(reg);

+ selected_size = PHYS_PFN(memblock_phys_mem_size());
selected_size = (selected_size * kvm_cma_resv_ratio / 100) << PAGE_SHIFT;
if (selected_size) {
pr_debug("%s: reserving %ld MiB for global area\n", __func__,
--
2.26.2

2020-07-28 05:15:04

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 12/15] arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()

From: Mike Rapoport <[email protected]>

There are several occurrences of the following pattern:

for_each_memblock(memory, reg) {
start_pfn = memblock_region_memory_base_pfn(reg);
end_pfn = memblock_region_memory_end_pfn(reg);

/* do something with start_pfn and end_pfn */
}

Rather than iterate over all memblock.memory regions and each time query
for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
simpler and clearer code.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/arm/mm/init.c | 11 ++++-------
arch/arm64/mm/init.c | 11 ++++-------
arch/powerpc/kernel/fadump.c | 11 ++++++-----
arch/powerpc/mm/mem.c | 15 ++++++++-------
arch/powerpc/mm/numa.c | 7 ++-----
arch/s390/mm/page-states.c | 6 ++----
arch/sh/mm/init.c | 9 +++------
mm/memblock.c | 6 ++----
mm/sparse.c | 10 ++++------
9 files changed, 35 insertions(+), 51 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 626af348eb8f..bb56668b4f54 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -304,16 +304,14 @@ free_memmap(unsigned long start_pfn, unsigned long end_pfn)
*/
static void __init free_unused_memmap(void)
{
- unsigned long start, prev_end = 0;
- struct memblock_region *reg;
+ unsigned long start, end, prev_end = 0;
+ int i;

/*
* This relies on each bank being in address order.
* The banks are sorted previously in bootmem_init().
*/
- for_each_memblock(memory, reg) {
- start = memblock_region_memory_base_pfn(reg);
-
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start, &end, NULL) {
#ifdef CONFIG_SPARSEMEM
/*
* Take care not to free memmap entries that don't exist
@@ -341,8 +339,7 @@ static void __init free_unused_memmap(void)
* memmap entries are valid from the bank end aligned to
* MAX_ORDER_NR_PAGES.
*/
- prev_end = ALIGN(memblock_region_memory_end_pfn(reg),
- MAX_ORDER_NR_PAGES);
+ prev_end = ALIGN(end, MAX_ORDER_NR_PAGES);
}

#ifdef CONFIG_SPARSEMEM
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1e93cfc7c47a..271a8ea32482 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -473,12 +473,10 @@ static inline void free_memmap(unsigned long start_pfn, unsigned long end_pfn)
*/
static void __init free_unused_memmap(void)
{
- unsigned long start, prev_end = 0;
- struct memblock_region *reg;
-
- for_each_memblock(memory, reg) {
- start = __phys_to_pfn(reg->base);
+ unsigned long start, end, prev_end = 0;
+ int i;

+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start, &end, NULL) {
#ifdef CONFIG_SPARSEMEM
/*
* Take care not to free memmap entries that don't exist due
@@ -498,8 +496,7 @@ static void __init free_unused_memmap(void)
* memmap entries are valid from the bank end aligned to
* MAX_ORDER_NR_PAGES.
*/
- prev_end = ALIGN(__phys_to_pfn(reg->base + reg->size),
- MAX_ORDER_NR_PAGES);
+ prev_end = ALIGN(end, MAX_ORDER_NR_PAGES);
}

#ifdef CONFIG_SPARSEMEM
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 2446a61e3c25..fdbafe417139 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1216,14 +1216,15 @@ static void fadump_free_reserved_memory(unsigned long start_pfn,
*/
static void fadump_release_reserved_area(u64 start, u64 end)
{
- u64 tstart, tend, spfn, epfn;
- struct memblock_region *reg;
+ u64 tstart, tend, spfn, epfn, reg_spfn, reg_epfn, i;

spfn = PHYS_PFN(start);
epfn = PHYS_PFN(end);
- for_each_memblock(memory, reg) {
- tstart = max_t(u64, spfn, memblock_region_memory_base_pfn(reg));
- tend = min_t(u64, epfn, memblock_region_memory_end_pfn(reg));
+
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &reg_spfn, &reg_epfn, NULL) {
+ tstart = max_t(u64, spfn, reg_spfn);
+ tend = min_t(u64, epfn, reg_epfn);
+
if (tstart < tend) {
fadump_free_reserved_memory(tstart, tend);

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index c2c11eb8dcfc..38d1acd7c8ef 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -192,15 +192,16 @@ void __init initmem_init(void)
/* mark pages that don't exist as nosave */
static int __init mark_nonram_nosave(void)
{
- struct memblock_region *reg, *prev = NULL;
+ unsigned long spfn, epfn, prev = 0;
+ int i;

- for_each_memblock(memory, reg) {
- if (prev &&
- memblock_region_memory_end_pfn(prev) < memblock_region_memory_base_pfn(reg))
- register_nosave_region(memblock_region_memory_end_pfn(prev),
- memblock_region_memory_base_pfn(reg));
- prev = reg;
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &spfn, &epfn, NULL) {
+ if (prev && prev < spfn)
+ register_nosave_region(prev, spfn);
+
+ prev = epfn;
}
+
return 0;
}
#else /* CONFIG_NEED_MULTIPLE_NODES */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..53254afae725 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -800,17 +800,14 @@ static void __init setup_nonnuma(void)
unsigned long total_ram = memblock_phys_mem_size();
unsigned long start_pfn, end_pfn;
unsigned int nid = 0;
- struct memblock_region *reg;
+ int i;

printk(KERN_DEBUG "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
top_of_ram, total_ram);
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
(top_of_ram - total_ram) >> 20);

- for_each_memblock(memory, reg) {
- start_pfn = memblock_region_memory_base_pfn(reg);
- end_pfn = memblock_region_memory_end_pfn(reg);
-
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start_pfn, &end_pfn, NULL) {
fake_numa_create_new_node(end_pfn, &nid);
memblock_set_node(PFN_PHYS(start_pfn),
PFN_PHYS(end_pfn - start_pfn),
diff --git a/arch/s390/mm/page-states.c b/arch/s390/mm/page-states.c
index fc141893d028..8909f7b7b053 100644
--- a/arch/s390/mm/page-states.c
+++ b/arch/s390/mm/page-states.c
@@ -183,9 +183,9 @@ static void mark_kernel_pgd(void)

void __init cmma_init_nodat(void)
{
- struct memblock_region *reg;
struct page *page;
unsigned long start, end, ix;
+ int i;

if (cmma_flag < 2)
return;
@@ -193,9 +193,7 @@ void __init cmma_init_nodat(void)
mark_kernel_pgd();

/* Set all kernel pages not used for page tables to stable/no-dat */
- for_each_memblock(memory, reg) {
- start = memblock_region_memory_base_pfn(reg);
- end = memblock_region_memory_end_pfn(reg);
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start, &end, NULL) {
page = pfn_to_page(start);
for (ix = start; ix < end; ix++, page++) {
if (__test_and_clear_bit(PG_arch_1, &page->flags))
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 62b8f03ffc80..398ee363e3e3 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -224,15 +224,12 @@ void __init allocate_pgdat(unsigned int nid)

static void __init do_init_bootmem(void)
{
- struct memblock_region *reg;
+ unsigned long start_pfn, end_pfn;
+ int i;

/* Add active regions with valid PFNs. */
- for_each_memblock(memory, reg) {
- unsigned long start_pfn, end_pfn;
- start_pfn = memblock_region_memory_base_pfn(reg);
- end_pfn = memblock_region_memory_end_pfn(reg);
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start_pfn, &end_pfn, NULL)
__add_active_range(0, start_pfn, end_pfn);
- }

/* All of system RAM sits in node 0 for the non-NUMA case */
allocate_pgdat(0);
diff --git a/mm/memblock.c b/mm/memblock.c
index 824938849f6d..2ad5e6e47215 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1659,12 +1659,10 @@ phys_addr_t __init_memblock memblock_reserved_size(void)
phys_addr_t __init memblock_mem_size(unsigned long limit_pfn)
{
unsigned long pages = 0;
- struct memblock_region *r;
unsigned long start_pfn, end_pfn;
+ int i;

- for_each_memblock(memory, r) {
- start_pfn = memblock_region_memory_base_pfn(r);
- end_pfn = memblock_region_memory_end_pfn(r);
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start_pfn, &end_pfn, NULL) {
start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
pages += end_pfn - start_pfn;
diff --git a/mm/sparse.c b/mm/sparse.c
index b2b9a3e34696..8bdaddb40453 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -292,13 +292,11 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
*/
void __init memblocks_present(void)
{
- struct memblock_region *reg;
+ unsigned long start, end;
+ int i, nid;

- for_each_memblock(memory, reg) {
- memory_present(memblock_get_region_node(reg),
- memblock_region_memory_base_pfn(reg),
- memblock_region_memory_end_pfn(reg));
- }
+ for_each_mem_pfn_range(i, NUMA_NO_NODE, &start, &end, &nid)
+ memory_present(nid, start, end);
}

/*
--
2.26.2

2020-07-28 05:15:13

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 13/15] arch, drivers: replace for_each_membock() with for_each_mem_range()

From: Mike Rapoport <[email protected]>

There are several occurrences of the following pattern:

for_each_memblock(memory, reg) {
start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));

/* do something with start and end */
}

Using for_each_mem_range() iterator is more appropriate in such cases and
allows simpler and cleaner code.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/arm/kernel/setup.c | 18 +++++++----
arch/arm/mm/mmu.c | 39 ++++++++----------------
arch/arm/mm/pmsa-v7.c | 20 ++++++------
arch/arm/mm/pmsa-v8.c | 17 +++++------
arch/arm/xen/mm.c | 7 +++--
arch/arm64/mm/kasan_init.c | 8 ++---
arch/arm64/mm/mmu.c | 11 ++-----
arch/c6x/kernel/setup.c | 9 +++---
arch/microblaze/mm/init.c | 9 +++---
arch/mips/cavium-octeon/dma-octeon.c | 12 ++++----
arch/mips/kernel/setup.c | 31 +++++++++----------
arch/openrisc/mm/init.c | 8 +++--
arch/powerpc/kernel/fadump.c | 27 +++++++---------
arch/powerpc/mm/book3s64/hash_utils.c | 16 +++++-----
arch/powerpc/mm/book3s64/radix_pgtable.c | 11 +++----
arch/powerpc/mm/kasan/kasan_init_32.c | 8 ++---
arch/powerpc/mm/mem.c | 16 ++++++----
arch/powerpc/mm/pgtable_32.c | 8 ++---
arch/riscv/mm/init.c | 24 ++++++---------
arch/riscv/mm/kasan_init.c | 10 +++---
arch/s390/kernel/setup.c | 27 ++++++++++------
arch/s390/mm/vmem.c | 16 +++++-----
arch/sparc/mm/init_64.c | 12 +++-----
drivers/bus/mvebu-mbus.c | 12 ++++----
drivers/s390/char/zcore.c | 9 +++---
25 files changed, 187 insertions(+), 198 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index d8e18cdd96d3..3f65d0ac9f63 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -843,19 +843,25 @@ early_param("mem", early_mem);

static void __init request_standard_resources(const struct machine_desc *mdesc)
{
- struct memblock_region *region;
+ phys_addr_t start, end, res_end;
struct resource *res;
+ u64 i;

kernel_code.start = virt_to_phys(_text);
kernel_code.end = virt_to_phys(__init_begin - 1);
kernel_data.start = virt_to_phys(_sdata);
kernel_data.end = virt_to_phys(_end - 1);

- for_each_memblock(memory, region) {
- phys_addr_t start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
- phys_addr_t end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+ for_each_mem_range(i, &start, &end) {
unsigned long boot_alias_start;

+ /*
+ * In memblock, end points to the first byte after the
+ * range while in resourses, end points to the last byte in
+ * the range.
+ */
+ res_end = end - 1;
+
/*
* Some systems have a special memory alias which is only
* used for booting. We need to advertise this region to
@@ -869,7 +875,7 @@ static void __init request_standard_resources(const struct machine_desc *mdesc)
__func__, sizeof(*res));
res->name = "System RAM (boot alias)";
res->start = boot_alias_start;
- res->end = phys_to_idmap(end);
+ res->end = phys_to_idmap(res_end);
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
request_resource(&iomem_resource, res);
}
@@ -880,7 +886,7 @@ static void __init request_standard_resources(const struct machine_desc *mdesc)
sizeof(*res));
res->name = "System RAM";
res->start = start;
- res->end = end;
+ res->end = res_end;
res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;

request_resource(&iomem_resource, res);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 628028bfbb92..a149d9cb4fdb 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1155,9 +1155,8 @@ phys_addr_t arm_lowmem_limit __initdata = 0;

void __init adjust_lowmem_bounds(void)
{
- phys_addr_t memblock_limit = 0;
- u64 vmalloc_limit;
- struct memblock_region *reg;
+ phys_addr_t block_start, block_end, memblock_limit = 0;
+ u64 vmalloc_limit, i;
phys_addr_t lowmem_limit = 0;

/*
@@ -1173,26 +1172,18 @@ void __init adjust_lowmem_bounds(void)
* The first usable region must be PMD aligned. Mark its start
* as MEMBLOCK_NOMAP if it isn't
*/
- for_each_memblock(memory, reg) {
- if (!memblock_is_nomap(reg)) {
- if (!IS_ALIGNED(reg->base, PMD_SIZE)) {
- phys_addr_t len;
+ for_each_mem_range(i, &block_start, &block_end) {
+ if (!IS_ALIGNED(block_start, PMD_SIZE)) {
+ phys_addr_t len;

- len = round_up(reg->base, PMD_SIZE) - reg->base;
- memblock_mark_nomap(reg->base, len);
- }
- break;
+ len = round_up(block_start, PMD_SIZE) - block_start;
+ memblock_mark_nomap(block_start, len);
}
+ break;
}

- for_each_memblock(memory, reg) {
- phys_addr_t block_start = reg->base;
- phys_addr_t block_end = reg->base + reg->size;
-
- if (memblock_is_nomap(reg))
- continue;
-
- if (reg->base < vmalloc_limit) {
+ for_each_mem_range(i, &block_start, &block_end) {
+ if (block_start < vmalloc_limit) {
if (block_end > lowmem_limit)
/*
* Compare as u64 to ensure vmalloc_limit does
@@ -1441,19 +1432,15 @@ static void __init kmap_init(void)

static void __init map_lowmem(void)
{
- struct memblock_region *reg;
phys_addr_t kernel_x_start = round_down(__pa(KERNEL_START), SECTION_SIZE);
phys_addr_t kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE);
+ phys_addr_t start, end;
+ u64 i;

/* Map all the lowmem memory banks. */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
+ for_each_mem_range(i, &start, &end) {
struct map_desc map;

- if (memblock_is_nomap(reg))
- continue;
-
if (end > arm_lowmem_limit)
end = arm_lowmem_limit;
if (start >= end)
diff --git a/arch/arm/mm/pmsa-v7.c b/arch/arm/mm/pmsa-v7.c
index 699fa2e88725..44b7644a4237 100644
--- a/arch/arm/mm/pmsa-v7.c
+++ b/arch/arm/mm/pmsa-v7.c
@@ -231,10 +231,9 @@ static int __init allocate_region(phys_addr_t base, phys_addr_t size,
void __init pmsav7_adjust_lowmem_bounds(void)
{
phys_addr_t specified_mem_size = 0, total_mem_size = 0;
- struct memblock_region *reg;
- bool first = true;
phys_addr_t mem_start;
phys_addr_t mem_end;
+ phys_addr_t reg_start, reg_end;
unsigned int mem_max_regions;
int num, i;

@@ -262,20 +261,19 @@ void __init pmsav7_adjust_lowmem_bounds(void)
mem_max_regions -= num;
#endif

- for_each_memblock(memory, reg) {
- if (first) {
+ for_each_mem_range(i, &reg_start, &reg_end) {
+ if (i == 0) {
phys_addr_t phys_offset = PHYS_OFFSET;

/*
* Initially only use memory continuous from
* PHYS_OFFSET */
- if (reg->base != phys_offset)
+ if (reg_start != phys_offset)
panic("First memory bank must be contiguous from PHYS_OFFSET");

- mem_start = reg->base;
- mem_end = reg->base + reg->size;
- specified_mem_size = reg->size;
- first = false;
+ mem_start = reg_start;
+ mem_end = reg_end
+ specified_mem_size = mem_end - mem_start;
} else {
/*
* memblock auto merges contiguous blocks, remove
@@ -283,8 +281,8 @@ void __init pmsav7_adjust_lowmem_bounds(void)
* blocks separately while iterating)
*/
pr_notice("Ignoring RAM after %pa, memory at %pa ignored\n",
- &mem_end, &reg->base);
- memblock_remove(reg->base, 0 - reg->base);
+ &mem_end, &reg_start);
+ memblock_remove(reg_start, 0 - reg_start);
break;
}
}
diff --git a/arch/arm/mm/pmsa-v8.c b/arch/arm/mm/pmsa-v8.c
index 0d7d5fb59247..b39e74b48437 100644
--- a/arch/arm/mm/pmsa-v8.c
+++ b/arch/arm/mm/pmsa-v8.c
@@ -94,20 +94,19 @@ static __init bool is_region_fixed(int number)
void __init pmsav8_adjust_lowmem_bounds(void)
{
phys_addr_t mem_end;
- struct memblock_region *reg;
- bool first = true;
+ phys_addr_t reg_start, reg_end;
+ int i;

- for_each_memblock(memory, reg) {
- if (first) {
+ for_each_mem_range(i, &reg_start, &reg_end) {
+ if (i == 0) {
phys_addr_t phys_offset = PHYS_OFFSET;

/*
* Initially only use memory continuous from
* PHYS_OFFSET */
- if (reg->base != phys_offset)
+ if (reg_start != phys_offset)
panic("First memory bank must be contiguous from PHYS_OFFSET");
- mem_end = reg->base + reg->size;
- first = false;
+ mem_end = reg_end;
} else {
/*
* memblock auto merges contiguous blocks, remove
@@ -115,8 +114,8 @@ void __init pmsav8_adjust_lowmem_bounds(void)
* blocks separately while iterating)
*/
pr_notice("Ignoring RAM after %pa, memory at %pa ignored\n",
- &mem_end, &reg->base);
- memblock_remove(reg->base, 0 - reg->base);
+ &mem_end, &reg_start);
+ memblock_remove(reg_start, 0 - reg_start);
break;
}
}
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index d40e9e5fc52b..05f24ff41e36 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -24,11 +24,12 @@

unsigned long xen_get_swiotlb_free_pages(unsigned int order)
{
- struct memblock_region *reg;
+ phys_addr_t base;
gfp_t flags = __GFP_NOWARN|__GFP_KSWAPD_RECLAIM;
+ u64 i;

- for_each_memblock(memory, reg) {
- if (reg->base < (phys_addr_t)0xffffffff) {
+ for_each_mem_range(i, &base, NULL) {
+ if (base < (phys_addr_t)0xffffffff) {
if (IS_ENABLED(CONFIG_ZONE_DMA32))
flags |= __GFP_DMA32;
else
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 7291b26ce788..1faa086f9193 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -212,7 +212,7 @@ void __init kasan_init(void)
{
u64 kimg_shadow_start, kimg_shadow_end;
u64 mod_shadow_start, mod_shadow_end;
- struct memblock_region *reg;
+ phys_addr_t _start, _end;
int i;

kimg_shadow_start = (u64)kasan_mem_to_shadow(_text) & PAGE_MASK;
@@ -246,9 +246,9 @@ void __init kasan_init(void)
kasan_populate_early_shadow((void *)mod_shadow_end,
(void *)kimg_shadow_start);

- for_each_memblock(memory, reg) {
- void *start = (void *)__phys_to_virt(reg->base);
- void *end = (void *)__phys_to_virt(reg->base + reg->size);
+ for_each_mem_range(i, &start, &end) {
+ void *_start = (void *)__phys_to_virt(_start);
+ void *end = (void *)__phys_to_virt(_end);

if (start >= end)
break;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1df25f26571d..327264fb83fb 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -461,8 +461,9 @@ static void __init map_mem(pgd_t *pgdp)
{
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
- struct memblock_region *reg;
+ phys_addr_t start, end;
int flags = 0;
+ u64 i;

if (rodata_full || debug_pagealloc_enabled())
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
@@ -481,15 +482,9 @@ static void __init map_mem(pgd_t *pgdp)
#endif

/* map all the memory banks */
- for_each_memblock(memory, reg) {
- phys_addr_t start = reg->base;
- phys_addr_t end = start + reg->size;
-
+ for_each_mem_range(i, &start, &end) {
if (start >= end)
break;
- if (memblock_is_nomap(reg))
- continue;
-
__map_memblock(pgdp, start, end, PAGE_KERNEL, flags);
}

diff --git a/arch/c6x/kernel/setup.c b/arch/c6x/kernel/setup.c
index 8ef35131f999..9254c3b794a5 100644
--- a/arch/c6x/kernel/setup.c
+++ b/arch/c6x/kernel/setup.c
@@ -287,7 +287,8 @@ notrace void __init machine_init(unsigned long dt_ptr)

void __init setup_arch(char **cmdline_p)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
+ u64 i;

printk(KERN_INFO "Initializing kernel\n");

@@ -351,9 +352,9 @@ void __init setup_arch(char **cmdline_p)
disable_caching(ram_start, ram_end - 1);

/* Set caching of external RAM used by Linux */
- for_each_memblock(memory, reg)
- enable_caching(CACHE_REGION_START(reg->base),
- CACHE_REGION_START(reg->base + reg->size - 1));
+ for_each_mem_range(i, &start, &end)
+ enable_caching(CACHE_REGION_START(start),
+ CACHE_REGION_START(end - 1));

#ifdef CONFIG_BLK_DEV_INITRD
/*
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index 49e0c241f9b1..15403b5adfcf 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -106,13 +106,14 @@ static void __init paging_init(void)
void __init setup_memory(void)
{
#ifndef CONFIG_MMU
- struct memblock_region *reg;
u32 kernel_align_start, kernel_align_size;
+ phys_addr_t start, end;
+ u64 i;

/* Find main memory where is the kernel */
- for_each_memblock(memory, reg) {
- memory_start = (u32)reg->base;
- lowmem_size = reg->size;
+ for_each_mem_range(i, &start, &end) {
+ memory_start = start;
+ lowmem_size = end - start;
if ((memory_start <= (u32)_text) &&
((u32)_text <= (memory_start + lowmem_size - 1))) {
memory_size = lowmem_size;
diff --git a/arch/mips/cavium-octeon/dma-octeon.c b/arch/mips/cavium-octeon/dma-octeon.c
index 14ea680d180e..d938c1f7c1e1 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -190,25 +190,25 @@ char *octeon_swiotlb;

void __init plat_swiotlb_setup(void)
{
- struct memblock_region *mem;
+ phys_addr_t start, end;
phys_addr_t max_addr;
phys_addr_t addr_size;
size_t swiotlbsize;
unsigned long swiotlb_nslabs;
+ u64 i;

max_addr = 0;
addr_size = 0;

- for_each_memblock(memory, mem) {
+ for_each_mem_range(i, &start, &end) {
/* These addresses map low for PCI. */
if (mem->base > 0x410000000ull && !OCTEON_IS_OCTEON2())
continue;

- addr_size += mem->size;
-
- if (max_addr < mem->base + mem->size)
- max_addr = mem->base + mem->size;
+ addr_size += (end - start);

+ if (max_addr < end)
+ max_addr = end;
}

swiotlbsize = PAGE_SIZE;
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 7b537fa2035d..eaac1b66026d 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -300,8 +300,9 @@ static void __init bootmem_init(void)

static void __init bootmem_init(void)
{
- struct memblock_region *mem;
phys_addr_t ramstart, ramend;
+ phys_addr_t start, end;
+ u64 i;

ramstart = memblock_start_of_DRAM();
ramend = memblock_end_of_DRAM();
@@ -338,18 +339,13 @@ static void __init bootmem_init(void)

min_low_pfn = ARCH_PFN_OFFSET;
max_pfn = PFN_DOWN(ramend);
- for_each_memblock(memory, mem) {
- unsigned long start = memblock_region_memory_base_pfn(mem);
- unsigned long end = memblock_region_memory_end_pfn(mem);
-
+ for_each_mem_range(i, &start, &end) {
/*
* Skip highmem here so we get an accurate max_low_pfn if low
* memory stops short of high memory.
* If the region overlaps HIGHMEM_START, end is clipped so
* max_pfn excludes the highmem portion.
*/
- if (memblock_is_nomap(mem))
- continue;
if (start >= PFN_DOWN(HIGHMEM_START))
continue;
if (end > PFN_DOWN(HIGHMEM_START))
@@ -458,13 +454,12 @@ early_param("memmap", early_parse_memmap);
unsigned long setup_elfcorehdr, setup_elfcorehdr_size;
static int __init early_parse_elfcorehdr(char *p)
{
- struct memblock_region *mem;
+ phys_addr_t start, end;
+ u64 i;

setup_elfcorehdr = memparse(p, &p);

- for_each_memblock(memory, mem) {
- unsigned long start = mem->base;
- unsigned long end = start + mem->size;
+ for_each_mem_range(i, &start, &end) {
if (setup_elfcorehdr >= start && setup_elfcorehdr < end) {
/*
* Reserve from the elf core header to the end of
@@ -728,7 +723,8 @@ static void __init arch_mem_init(char **cmdline_p)

static void __init resource_init(void)
{
- struct memblock_region *region;
+ phys_addr_t start, end;
+ u64 i;

if (UNCAC_BASE != IO_BASE)
return;
@@ -740,9 +736,7 @@ static void __init resource_init(void)
bss_resource.start = __pa_symbol(&__bss_start);
bss_resource.end = __pa_symbol(&__bss_stop) - 1;

- for_each_memblock(memory, region) {
- phys_addr_t start = PFN_PHYS(memblock_region_memory_base_pfn(region));
- phys_addr_t end = PFN_PHYS(memblock_region_memory_end_pfn(region)) - 1;
+ for_each_mem_range(i, &start, &end) {
struct resource *res;

res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
@@ -751,7 +745,12 @@ static void __init resource_init(void)
sizeof(struct resource));

res->start = start;
- res->end = end;
+ /*
+ * In memblock, end points to the first byte after the
+ * range while in resourses, end points to the last byte in
+ * the range.
+ */
+ res->end = end - 1;
res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
res->name = "System RAM";

diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
index 3d7c79c7745d..8348feaaf46e 100644
--- a/arch/openrisc/mm/init.c
+++ b/arch/openrisc/mm/init.c
@@ -64,6 +64,7 @@ extern const char _s_kernel_ro[], _e_kernel_ro[];
*/
static void __init map_ram(void)
{
+ phys_addr_t start, end;
unsigned long v, p, e;
pgprot_t prot;
pgd_t *pge;
@@ -71,6 +72,7 @@ static void __init map_ram(void)
pud_t *pue;
pmd_t *pme;
pte_t *pte;
+ u64 i;
/* These mark extents of read-only kernel pages...
* ...from vmlinux.lds.S
*/
@@ -78,9 +80,9 @@ static void __init map_ram(void)

v = PAGE_OFFSET;

- for_each_memblock(memory, region) {
- p = (u32) region->base & PAGE_MASK;
- e = p + (u32) region->size;
+ for_each_mem_range(i, &start, &end) {
+ p = (u32) start & PAGE_MASK;
+ e = (u32) end;

v = (u32) __va(p);
pge = pgd_offset_k(v);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index fdbafe417139..435b98d069eb 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -180,13 +180,13 @@ int is_fadump_active(void)
*/
static bool is_fadump_mem_area_contiguous(u64 d_start, u64 d_end)
{
- struct memblock_region *reg;
+ phys_addr_t reg_start, reg_end;
bool ret = false;
- u64 start, end;
+ u64 i, start, end;

- for_each_memblock(memory, reg) {
- start = max_t(u64, d_start, reg->base);
- end = min_t(u64, d_end, (reg->base + reg->size));
+ for_each_mem_range(i, &reg_start, &reg_end) {
+ start = max_t(u64, d_start, reg_start);
+ end = min_t(u64, d_end, reg_end));
if (d_start < end) {
/* Memory hole from d_start to start */
if (start > d_start)
@@ -413,7 +413,7 @@ static int __init fadump_get_boot_mem_regions(void)
{
unsigned long base, size, cur_size, hole_size, last_end;
unsigned long mem_size = fw_dump.boot_memory_size;
- struct memblock_region *reg;
+ phys_addr_t reg_start, reg_end;
int ret = 1;

fw_dump.boot_mem_regs_cnt = 0;
@@ -421,9 +421,8 @@ static int __init fadump_get_boot_mem_regions(void)
last_end = 0;
hole_size = 0;
cur_size = 0;
- for_each_memblock(memory, reg) {
- base = reg->base;
- size = reg->size;
+ for_each_mem_range(i, &reg_start, &reg_end) {
+ size = reg_end - reg_start;
hole_size += (base - last_end);

if ((cur_size + size) >= mem_size) {
@@ -959,9 +958,8 @@ static int fadump_init_elfcore_header(char *bufp)
*/
static int fadump_setup_crash_memory_ranges(void)
{
- struct memblock_region *reg;
- u64 start, end;
- int i, ret;
+ u64 i, start, end;
+ int ret;

pr_debug("Setup crash memory ranges.\n");
crash_mrange_info.mem_range_cnt = 0;
@@ -979,10 +977,7 @@ static int fadump_setup_crash_memory_ranges(void)
return ret;
}

- for_each_memblock(memory, reg) {
- start = (u64)reg->base;
- end = start + (u64)reg->size;
-
+ for_each_mem_range(i, &start, end) {
/*
* skip the memory chunk that is already added
* (0 through boot_memory_top).
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 468169e33c86..9ba76b075b11 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -7,7 +7,7 @@
*
* SMP scalability work:
* Copyright (C) 2001 Anton Blanchard <[email protected]>, IBM
- *
+ *
* Module name: htab.c
*
* Description:
@@ -862,8 +862,8 @@ static void __init htab_initialize(void)
unsigned long table;
unsigned long pteg_count;
unsigned long prot;
- unsigned long base = 0, size = 0;
- struct memblock_region *reg;
+ phys_addr_t base = 0, size = 0, end;
+ u64 i;

DBG(" -> htab_initialize()\n");

@@ -879,7 +879,7 @@ static void __init htab_initialize(void)
/*
* Calculate the required size of the htab. We want the number of
* PTEGs to equal one half the number of real pages.
- */
+ */
htab_size_bytes = htab_get_table_size();
pteg_count = htab_size_bytes >> 7;

@@ -889,7 +889,7 @@ static void __init htab_initialize(void)
firmware_has_feature(FW_FEATURE_PS3_LV1)) {
/* Using a hypervisor which owns the htab */
htab_address = NULL;
- _SDR1 = 0;
+ _SDR1 = 0;
#ifdef CONFIG_FA_DUMP
/*
* If firmware assisted dump is active firmware preserves
@@ -955,9 +955,9 @@ static void __init htab_initialize(void)
#endif /* CONFIG_DEBUG_PAGEALLOC */

/* create bolted the linear mapping in the hash table */
- for_each_memblock(memory, reg) {
- base = (unsigned long)__va(reg->base);
- size = reg->size;
+ for_each_mem_range(i, &base, &end) {
+ size = end - base;
+ base = (unsigned long)__va(base);

DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
base, size, prot);
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index bb00e0cba119..65657b920847 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -318,28 +318,27 @@ static int __meminit create_physical_mapping(unsigned long start,
static void __init radix_init_pgtable(void)
{
unsigned long rts_field;
- struct memblock_region *reg;
+ phys_addr_t start, end;
+ u64 i;

/* We don't support slb for radix */
mmu_slb_size = 0;
/*
* Create the linear mapping, using standard page size for now
*/
- for_each_memblock(memory, reg) {
+ for_each_mem_range(i, &start, &end) {
/*
* The memblock allocator is up at this point, so the
* page tables will be allocated within the range. No
* need or a node (which we don't have yet).
*/

- if ((reg->base + reg->size) >= RADIX_VMALLOC_START) {
+ if (end >= RADIX_VMALLOC_START) {
pr_warn("Outside the supported range\n");
continue;
}

- WARN_ON(create_physical_mapping(reg->base,
- reg->base + reg->size,
- -1, PAGE_KERNEL));
+ WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL));
}

/* Find out how many PID bits are supported */
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 0760e1e754e4..6e73434e4e41 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -120,11 +120,11 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
static void __init kasan_mmu_init(void)
{
int ret;
- struct memblock_region *reg;
+ phys_addr_t base, end;
+ u64 i;

- for_each_memblock(memory, reg) {
- phys_addr_t base = reg->base;
- phys_addr_t top = min(base + reg->size, total_lowmem);
+ for_each_mem_range(i, &base, &end) {
+ phys_addr_t top = min(end, total_lowmem);

if (base >= top)
continue;
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 38d1acd7c8ef..0248b6d58fcd 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -593,20 +593,24 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
*/
static int __init add_system_ram_resources(void)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
+ u64 i;

- for_each_memblock(memory, reg) {
+ for_each_mem_range(i, &start, &end) {
struct resource *res;
- unsigned long base = reg->base;
- unsigned long size = reg->size;

res = kzalloc(sizeof(struct resource), GFP_KERNEL);
WARN_ON(!res);

if (res) {
res->name = "System RAM";
- res->start = base;
- res->end = base + size - 1;
+ res->start = start;
+ /*
+ * In memblock, end points to the first byte after
+ * the range while in resourses, end points to the
+ * last byte in the range.
+ */
+ res->end = end - 1;
res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
WARN_ON(request_resource(&iomem_resource, res) < 0);
}
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 6eb4eab79385..079159e97bca 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -123,11 +123,11 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)

void __init mapin_ram(void)
{
- struct memblock_region *reg;
+ phys_addr_t base, end;
+ u64 i;

- for_each_memblock(memory, reg) {
- phys_addr_t base = reg->base;
- phys_addr_t top = min(base + reg->size, total_lowmem);
+ for_each_mem_range(i, &base, &end) {
+ phys_addr_t top = min(end, total_lowmem);

if (base >= top)
continue;
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 7440ba2cdaaa..2abe1165fe56 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -145,21 +145,22 @@ static phys_addr_t dtb_early_pa __initdata;

void __init setup_bootmem(void)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
phys_addr_t mem_size = 0;
phys_addr_t total_mem = 0;
phys_addr_t mem_start, end = 0;
phys_addr_t vmlinux_end = __pa_symbol(&_end);
phys_addr_t vmlinux_start = __pa_symbol(&_start);
+ u64 i;

/* Find the memory region containing the kernel */
- for_each_memblock(memory, reg) {
- end = reg->base + reg->size;
+ for_each_mem_range(i, &start, &end) {
+ phys_addr_t size = end - start;
if (!total_mem)
- mem_start = reg->base;
- if (reg->base <= vmlinux_start && vmlinux_end <= end)
- BUG_ON(reg->size == 0);
- total_mem = total_mem + reg->size;
+ mem_start = start;
+ if (start <= vmlinux_start && vmlinux_end <= end)
+ BUG_ON(size == 0);
+ total_mem = total_mem + size;
}

/*
@@ -456,7 +457,7 @@ static void __init setup_vm_final(void)
{
uintptr_t va, map_size;
phys_addr_t pa, start, end;
- struct memblock_region *reg;
+ u64 i;

/* Set mmu_enabled flag */
mmu_enabled = true;
@@ -467,14 +468,9 @@ static void __init setup_vm_final(void)
PGDIR_SIZE, PAGE_TABLE);

/* Map all memory banks */
- for_each_memblock(memory, reg) {
- start = reg->base;
- end = start + reg->size;
-
+ for_each_mem_range(i, &start, &end) {
if (start >= end)
break;
- if (memblock_is_nomap(reg))
- continue;
if (start <= __pa(PAGE_OFFSET) &&
__pa(PAGE_OFFSET) < end)
start = __pa(PAGE_OFFSET);
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 87b4ab3d3c77..12ddd1f6bf70 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -85,16 +85,16 @@ static void __init populate(void *start, void *end)

void __init kasan_init(void)
{
- struct memblock_region *reg;
- unsigned long i;
+ phys_addr_t _start, _end;
+ u64 i;

kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
(void *)kasan_mem_to_shadow((void *)
VMALLOC_END));

- for_each_memblock(memory, reg) {
- void *start = (void *)__va(reg->base);
- void *end = (void *)__va(reg->base + reg->size);
+ for_each_mem_range(i, &_start, &_end) {
+ void *start = (void *)_start;
+ void *end = (void *)_end;

if (start >= end)
break;
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 8b284cf6e199..b6c4a0c5ff86 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -198,7 +198,7 @@ static void __init conmode_default(void)
cpcmd("QUERY TERM", query_buffer, 1024, NULL);
ptr = strstr(query_buffer, "CONMODE");
/*
- * Set the conmode to 3215 so that the device recognition
+ * Set the conmode to 3215 so that the device recognition
* will set the cu_type of the console to 3215. If the
* conmode is 3270 and we don't set it back then both
* 3215 and the 3270 driver will try to access the console
@@ -258,7 +258,7 @@ static inline void setup_zfcpdump(void) {}

/*
* Reboot, halt and power_off stubs. They just call _machine_restart,
- * _machine_halt or _machine_power_off.
+ * _machine_halt or _machine_power_off.
*/

void machine_restart(char *command)
@@ -484,8 +484,9 @@ static struct resource __initdata *standard_resources[] = {
static void __init setup_resources(void)
{
struct resource *res, *std_res, *sub_res;
- struct memblock_region *reg;
+ phys_addr_t start, end;
int j;
+ u64 i;

code_resource.start = (unsigned long) _text;
code_resource.end = (unsigned long) _etext - 1;
@@ -494,7 +495,7 @@ static void __init setup_resources(void)
bss_resource.start = (unsigned long) __bss_start;
bss_resource.end = (unsigned long) __bss_stop - 1;

- for_each_memblock(memory, reg) {
+ for_each_mem_range(i, &start, &end) {
res = memblock_alloc(sizeof(*res), 8);
if (!res)
panic("%s: Failed to allocate %zu bytes align=0x%x\n",
@@ -502,8 +503,13 @@ static void __init setup_resources(void)
res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM;

res->name = "System RAM";
- res->start = reg->base;
- res->end = reg->base + reg->size - 1;
+ res->start = start;
+ /*
+ * In memblock, end points to the first byte after the
+ * range while in resourses, end points to the last byte in
+ * the range.
+ */
+ res->end = end - 1;
request_resource(&iomem_resource, res);

for (j = 0; j < ARRAY_SIZE(standard_resources); j++) {
@@ -819,14 +825,15 @@ static void __init reserve_kernel(void)

static void __init setup_memory(void)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
+ u64 i;

/*
* Init storage key for present memory
*/
- for_each_memblock(memory, reg) {
- storage_key_init_range(reg->base, reg->base + reg->size);
- }
+ for_each_mem_range(i, &start, &end)
+ storage_key_init_range(start, end);
+
psw_set_key(PAGE_DEFAULT_KEY);

/* Only cosmetics */
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 8b6282cf7d13..30076ecc3eb7 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -399,10 +399,11 @@ int vmem_add_mapping(unsigned long start, unsigned long size)
*/
void __init vmem_map_init(void)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
+ u64 i;

- for_each_memblock(memory, reg)
- vmem_add_mem(reg->base, reg->size);
+ for_each_mem_range(i, &start, &end)
+ vmem_add_mem(start, end - start);
__set_memory((unsigned long)_stext,
(unsigned long)(_etext - _stext) >> PAGE_SHIFT,
SET_MEMORY_RO | SET_MEMORY_X);
@@ -428,16 +429,17 @@ void __init vmem_map_init(void)
*/
static int __init vmem_convert_memory_chunk(void)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
struct memory_segment *seg;
+ u64 i;

mutex_lock(&vmem_mutex);
- for_each_memblock(memory, reg) {
+ for_each_mem_range(i, &start, &end) {
seg = kzalloc(sizeof(*seg), GFP_KERNEL);
if (!seg)
panic("Out of memory...\n");
- seg->start = reg->base;
- seg->size = reg->size;
+ seg->start = start;
+ seg->size = end - start;
insert_memory_segment(seg);
}
mutex_unlock(&vmem_mutex);
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 02e6e5e0f106..de63c002638e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1192,18 +1192,14 @@ int of_node_to_nid(struct device_node *dp)

static void __init add_node_ranges(void)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
unsigned long prev_max;
+ u64 i;

memblock_resized:
prev_max = memblock.memory.max;

- for_each_memblock(memory, reg) {
- unsigned long size = reg->size;
- unsigned long start, end;
-
- start = reg->base;
- end = start + size;
+ for_each_mem_range(i, &start, &end) {
while (start < end) {
unsigned long this_end;
int nid;
@@ -1211,7 +1207,7 @@ static void __init add_node_ranges(void)
this_end = memblock_nid_range(start, end, &nid);

numadbg("Setting memblock NUMA node nid[%d] "
- "start[%lx] end[%lx]\n",
+ "start[%llx] end[%lx]\n",
nid, start, this_end);

memblock_set_node(start, this_end - start,
diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 5b2a11a88951..2519ceede64b 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -610,23 +610,23 @@ static unsigned int armada_xp_mbus_win_remap_offset(int win)
static void __init
mvebu_mbus_find_bridge_hole(uint64_t *start, uint64_t *end)
{
- struct memblock_region *r;
- uint64_t s = 0;
+ phys_addr_t reg_start, reg_end;
+ uint64_t i, s = 0;

- for_each_memblock(memory, r) {
+ for_each_mem_range(i, &reg_start, &reg_end) {
/*
* This part of the memory is above 4 GB, so we don't
* care for the MBus bridge hole.
*/
- if (r->base >= 0x100000000ULL)
+ if (reg_start >= 0x100000000ULL)
continue;

/*
* The MBus bridge hole is at the end of the RAM under
* the 4 GB limit.
*/
- if (r->base + r->size > s)
- s = r->base + r->size;
+ if (reg_end > s)
+ s = reg_end;
}

*start = s;
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 08f812475f5e..484b1ec9a1bc 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -148,18 +148,19 @@ static ssize_t zcore_memmap_read(struct file *filp, char __user *buf,

static int zcore_memmap_open(struct inode *inode, struct file *filp)
{
- struct memblock_region *reg;
+ phys_addr_t start, end;
char *buf;
int i = 0;
+ u64 r;

buf = kcalloc(memblock.memory.cnt, CHUNK_INFO_SIZE, GFP_KERNEL);
if (!buf) {
return -ENOMEM;
}
- for_each_memblock(memory, reg) {
+ for_each_mem_range(r, &start, &end) {
sprintf(buf + (i++ * CHUNK_INFO_SIZE), "%016llx %016llx ",
- (unsigned long long) reg->base,
- (unsigned long long) reg->size);
+ (unsigned long long) start,
+ (unsigned long long) (end - start));
}
filp->private_data = buf;
return nonseekable_open(inode, filp);
--
2.26.2

2020-07-28 05:16:28

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 08/15] mircoblaze: drop unneeded NUMA and sparsemem initializations

From: Mike Rapoport <[email protected]>

microblaze does not support neither NUMA not SPARSMEM, so there is no point
to call memblock_set_node() and sparse_memory_present_with_active_regions()
functions during microblaze memory initialization.

Remove these calls and the surrounding code.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/microblaze/mm/init.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index 521b59ba716c..49e0c241f9b1 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -105,9 +105,8 @@ static void __init paging_init(void)

void __init setup_memory(void)
{
- struct memblock_region *reg;
-
#ifndef CONFIG_MMU
+ struct memblock_region *reg;
u32 kernel_align_start, kernel_align_size;

/* Find main memory where is the kernel */
@@ -161,20 +160,6 @@ void __init setup_memory(void)
pr_info("%s: max_low_pfn: %#lx\n", __func__, max_low_pfn);
pr_info("%s: max_pfn: %#lx\n", __func__, max_pfn);

- /* Add active regions with valid PFNs */
- for_each_memblock(memory, reg) {
- unsigned long start_pfn, end_pfn;
-
- start_pfn = memblock_region_memory_base_pfn(reg);
- end_pfn = memblock_region_memory_end_pfn(reg);
- memblock_set_node(start_pfn << PAGE_SHIFT,
- (end_pfn - start_pfn) << PAGE_SHIFT,
- &memblock.memory, 0);
- }
-
- /* XXX need to clip this if using highmem? */
- sparse_memory_present_with_active_regions(0);
-
paging_init();
}

--
2.26.2

2020-07-28 05:16:40

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 09/15] memblock: make for_each_memblock_type() iterator private

From: Mike Rapoport <[email protected]>

for_each_memblock_type() is not used outside mm/memblock.c, move it there
from include/linux/memblock.h

Signed-off-by: Mike Rapoport <[email protected]>
---
include/linux/memblock.h | 5 -----
mm/memblock.c | 5 +++++
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 017fae833d4a..220b5f0dad42 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -532,11 +532,6 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
region++)

-#define for_each_memblock_type(i, memblock_type, rgn) \
- for (i = 0, rgn = &memblock_type->regions[0]; \
- i < memblock_type->cnt; \
- i++, rgn = &memblock_type->regions[i])
-
extern void *alloc_large_system_hash(const char *tablename,
unsigned long bucketsize,
unsigned long numentries,
diff --git a/mm/memblock.c b/mm/memblock.c
index 39aceafc57f6..a5b9b3df81fc 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -129,6 +129,11 @@ struct memblock memblock __initdata_memblock = {
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};

+#define for_each_memblock_type(i, memblock_type, rgn) \
+ for (i = 0, rgn = &memblock_type->regions[0]; \
+ i < memblock_type->cnt; \
+ i++, rgn = &memblock_type->regions[i])
+
int memblock_debug __initdata_memblock;
static bool system_has_some_mirror __initdata_memblock = false;
static int memblock_can_resize __initdata_memblock;
--
2.26.2

2020-07-28 05:16:59

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 11/15] memblock: reduce number of parameters in for_each_mem_range()

From: Mike Rapoport <[email protected]>

Currently for_each_mem_range() iterator is the most generic way to traverse
memblock regions. As such, it has 8 parameters and it is hardly convenient
to users. Most users choose to utilize one of its wrappers and the only
user that actually needs most of the parameters outside memblock is s390
crash dump implementation.

To avoid yet another naming for memblock iterators, rename the existing
for_each_mem_range() to __for_each_mem_range() and add a new
for_each_mem_range() wrapper with only index, start and end parameters.

The new wrapper nicely fits into init_unavailable_mem() and will be used in
upcoming changes to simplify memblock traversals.

Signed-off-by: Mike Rapoport <[email protected]>
---
.clang-format | 1 +
arch/arm64/kernel/machine_kexec_file.c | 6 ++----
arch/s390/kernel/crash_dump.c | 8 ++++----
include/linux/memblock.h | 18 ++++++++++++++----
mm/page_alloc.c | 3 +--
5 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/.clang-format b/.clang-format
index a0a96088c74f..52ededab25ce 100644
--- a/.clang-format
+++ b/.clang-format
@@ -205,6 +205,7 @@ ForEachMacros:
- 'for_each_memblock_type'
- 'for_each_memcg_cache_index'
- 'for_each_mem_pfn_range'
+ - '__for_each_mem_range'
- 'for_each_mem_range'
- 'for_each_mem_range_rev'
- 'for_each_migratetype_order'
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 361a1143e09e..5b0e67b93cdc 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -215,8 +215,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
phys_addr_t start, end;

nr_ranges = 1; /* for exclusion of crashkernel region */
- for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
- MEMBLOCK_NONE, &start, &end, NULL)
+ for_each_mem_range(i, &start, &end)
nr_ranges++;

cmem = kmalloc(struct_size(cmem, ranges, nr_ranges), GFP_KERNEL);
@@ -225,8 +224,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)

cmem->max_nr_ranges = nr_ranges;
cmem->nr_ranges = 0;
- for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
- MEMBLOCK_NONE, &start, &end, NULL) {
+ for_each_mem_range(i, &start, &end) {
cmem->ranges[cmem->nr_ranges].start = start;
cmem->ranges[cmem->nr_ranges].end = end - 1;
cmem->nr_ranges++;
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index f96a5857bbfd..e28085c725ff 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -549,8 +549,8 @@ static int get_mem_chunk_cnt(void)
int cnt = 0;
u64 idx;

- for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
- MEMBLOCK_NONE, NULL, NULL, NULL)
+ __for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
+ MEMBLOCK_NONE, NULL, NULL, NULL)
cnt++;
return cnt;
}
@@ -563,8 +563,8 @@ static void loads_init(Elf64_Phdr *phdr, u64 loads_offset)
phys_addr_t start, end;
u64 idx;

- for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
- MEMBLOCK_NONE, &start, &end, NULL) {
+ __for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
+ MEMBLOCK_NONE, &start, &end, NULL) {
phdr->p_filesz = end - start;
phdr->p_type = PT_LOAD;
phdr->p_offset = start;
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e6a23b3db696..d70c2835e913 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -142,7 +142,7 @@ void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
void __memblock_free_late(phys_addr_t base, phys_addr_t size);

/**
- * for_each_mem_range - iterate through memblock areas from type_a and not
+ * __for_each_mem_range - iterate through memblock areas from type_a and not
* included in type_b. Or just type_a if type_b is NULL.
* @i: u64 used as loop variable
* @type_a: ptr to memblock_type to iterate
@@ -153,7 +153,7 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size);
* @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @p_nid: ptr to int for nid of the range, can be %NULL
*/
-#define for_each_mem_range(i, type_a, type_b, nid, flags, \
+#define __for_each_mem_range(i, type_a, type_b, nid, flags, \
p_start, p_end, p_nid) \
for (i = 0, __next_mem_range(&i, nid, flags, type_a, type_b, \
p_start, p_end, p_nid); \
@@ -182,6 +182,16 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size);
__next_mem_range_rev(&i, nid, flags, type_a, type_b, \
p_start, p_end, p_nid))

+/**
+ * for_each_mem_range - iterate through memory areas.
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ */
+#define for_each_mem_range(i, p_start, p_end) \
+ __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, \
+ MEMBLOCK_NONE, p_start, p_end, NULL)
+
/**
* for_each_reserved_mem_region - iterate over all reserved memblock areas
* @i: u64 used as loop variable
@@ -287,8 +297,8 @@ int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask);
* soon as memblock is initialized.
*/
#define for_each_free_mem_range(i, nid, flags, p_start, p_end, p_nid) \
- for_each_mem_range(i, &memblock.memory, &memblock.reserved, \
- nid, flags, p_start, p_end, p_nid)
+ __for_each_mem_range(i, &memblock.memory, &memblock.reserved, \
+ nid, flags, p_start, p_end, p_nid)

/**
* for_each_free_mem_range_reverse - rev-iterate through free memblock areas
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e028b87ce294..95af111d69d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6972,8 +6972,7 @@ static void __init init_unavailable_mem(void)
* Loop through unavailable ranges not covered by memblock.memory.
*/
pgcnt = 0;
- for_each_mem_range(i, &memblock.memory, NULL,
- NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
+ for_each_mem_range(i, &start, &end) {
if (next < start)
pgcnt += init_unavailable_range(PFN_DOWN(next),
PFN_UP(start));
--
2.26.2

2020-07-28 05:17:07

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

From: Mike Rapoport <[email protected]>

numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
regions to set node ID in memblock.reserved and than traverses
memblock.reserved to update reserved_nodemask to include node IDs that were
set in the first loop.

Remove redundant traversal over memblock.reserved and update
reserved_nodemask while iterating over numa_meminfo.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/x86/mm/numa.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 8ee952038c80..4078abd33938 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -498,31 +498,25 @@ static void __init numa_clear_kernel_node_hotplug(void)
* and use those ranges to set the nid in memblock.reserved.
* This will split up the memblock regions along node
* boundaries and will set the node IDs as well.
+ *
+ * The nid will also be set in reserved_nodemask which is later
+ * used to clear MEMBLOCK_HOTPLUG flag.
+ *
+ * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
+ * numa_meminfo might not include all memblock.reserved
+ * memory ranges, because quirks such as trim_snb_memory()
+ * reserve specific pages for Sandy Bridge graphics.
+ * These ranges will remain with nid == MAX_NUMNODES. ]
*/
for (i = 0; i < numa_meminfo.nr_blks; i++) {
struct numa_memblk *mb = numa_meminfo.blk + i;
int ret;

ret = memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid);
+ node_set(mb->nid, reserved_nodemask);
WARN_ON_ONCE(ret);
}

- /*
- * Now go over all reserved memblock regions, to construct a
- * node mask of all kernel reserved memory areas.
- *
- * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
- * numa_meminfo might not include all memblock.reserved
- * memory ranges, because quirks such as trim_snb_memory()
- * reserve specific pages for Sandy Bridge graphics. ]
- */
- for_each_memblock(reserved, mb_region) {
- int nid = memblock_get_region_node(mb_region);
-
- if (nid != MAX_NUMNODES)
- node_set(nid, reserved_nodemask);
- }
-
/*
* Finally, clear the MEMBLOCK_HOTPLUG flag for all memory
* belonging to the reserved node mask.
--
2.26.2

2020-07-28 05:17:15

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH 15/15] memblock: remove 'type' parameter from for_each_memblock()

From: Mike Rapoport <[email protected]>

for_each_memblock() is used exclusively to iterate over memblock.memory in
a few places that use data from memblock_region rather than the memory
ranges.

Remove type parameter from the for_each_memblock() iterator to improve
encapsulation of memblock internals from its users.

Signed-off-by: Mike Rapoport <[email protected]>
---
arch/arm64/kernel/setup.c | 2 +-
arch/arm64/mm/numa.c | 2 +-
arch/mips/netlogic/xlp/setup.c | 2 +-
include/linux/memblock.h | 10 +++++++---
mm/memblock.c | 4 ++--
mm/page_alloc.c | 8 ++++----
6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 93b3844cf442..23da7908cbed 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -217,7 +217,7 @@ static void __init request_standard_resources(void)
if (!standard_resources)
panic("%s: Failed to allocate %zu bytes\n", __func__, res_size);

- for_each_memblock(memory, region) {
+ for_each_memblock(region) {
res = &standard_resources[i++];
if (memblock_is_nomap(region)) {
res->name = "reserved";
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 0cbdbcc885fb..08721d2c0b79 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -350,7 +350,7 @@ static int __init numa_register_nodes(void)
struct memblock_region *mblk;

/* Check that valid nid is set to memblks */
- for_each_memblock(memory, mblk) {
+ for_each_memblock(mblk) {
int mblk_nid = memblock_get_region_node(mblk);

if (mblk_nid == NUMA_NO_NODE || mblk_nid >= MAX_NUMNODES) {
diff --git a/arch/mips/netlogic/xlp/setup.c b/arch/mips/netlogic/xlp/setup.c
index 1a0fc5b62ba4..e69d9fc468cf 100644
--- a/arch/mips/netlogic/xlp/setup.c
+++ b/arch/mips/netlogic/xlp/setup.c
@@ -70,7 +70,7 @@ static void nlm_fixup_mem(void)
const int pref_backup = 512;
struct memblock_region *mem;

- for_each_memblock(memory, mem) {
+ for_each_memblock(mem) {
memblock_remove(mem->base + mem->size - pref_backup,
pref_backup);
}
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d70c2835e913..c901cb8ecf92 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -527,9 +527,13 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
return PFN_UP(reg->base + reg->size);
}

-#define for_each_memblock(memblock_type, region) \
- for (region = memblock.memblock_type.regions; \
- region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
+/**
+ * for_each_memblock - itereate over registered memory regions
+ * @region: loop variable
+ */
+#define for_each_memblock(region) \
+ for (region = memblock.memory.regions; \
+ region < (memblock.memory.regions + memblock.memory.cnt); \
region++)

extern void *alloc_large_system_hash(const char *tablename,
diff --git a/mm/memblock.c b/mm/memblock.c
index 2ad5e6e47215..550bb72cf6cb 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1694,7 +1694,7 @@ static phys_addr_t __init_memblock __find_max_addr(phys_addr_t limit)
* the memory memblock regions, if the @limit exceeds the total size
* of those regions, max_addr will keep original value PHYS_ADDR_MAX
*/
- for_each_memblock(memory, r) {
+ for_each_memblock(r) {
if (limit <= r->size) {
max_addr = r->base + limit;
break;
@@ -1864,7 +1864,7 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
phys_addr_t start, end, orig_start, orig_end;
struct memblock_region *r;

- for_each_memblock(memory, r) {
+ for_each_memblock(r) {
orig_start = r->base;
orig_end = r->base + r->size;
start = round_up(orig_start, align);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 95af111d69d3..8a19f46dc86e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5927,7 +5927,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)

if (mirrored_kernelcore && zone == ZONE_MOVABLE) {
if (!r || *pfn >= memblock_region_memory_end_pfn(r)) {
- for_each_memblock(memory, r) {
+ for_each_memblock(r) {
if (*pfn < memblock_region_memory_end_pfn(r))
break;
}
@@ -6528,7 +6528,7 @@ static unsigned long __init zone_absent_pages_in_node(int nid,
unsigned long start_pfn, end_pfn;
struct memblock_region *r;

- for_each_memblock(memory, r) {
+ for_each_memblock(r) {
start_pfn = clamp(memblock_region_memory_base_pfn(r),
zone_start_pfn, zone_end_pfn);
end_pfn = clamp(memblock_region_memory_end_pfn(r),
@@ -7122,7 +7122,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
* options.
*/
if (movable_node_is_enabled()) {
- for_each_memblock(memory, r) {
+ for_each_memblock(r) {
if (!memblock_is_hotpluggable(r))
continue;

@@ -7143,7 +7143,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
if (mirrored_kernelcore) {
bool mem_below_4gb_not_mirrored = false;

- for_each_memblock(memory, r) {
+ for_each_memblock(r) {
if (memblock_is_mirror(r))
continue;

--
2.26.2

2020-07-28 06:41:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/15] dma-contiguous: simplify cma_early_percent_memory()

On Tue, Jul 28, 2020 at 08:11:40AM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> The memory size calculation in cma_early_percent_memory() traverses
> memblock.memory rather than simply call memblock_phys_mem_size(). The
> comment in that function suggests that at some point there should have been
> call to memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
>
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
>
> Signed-off-by: Mike Rapoport <[email protected]>

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2020-07-28 08:10:49

by Max Filippov

[permalink] [raw]
Subject: Re: [PATCH 03/15] arm, xtensa: simplify initialization of high memory pages

On Mon, Jul 27, 2020 at 10:12 PM Mike Rapoport <[email protected]> wrote:
>
> From: Mike Rapoport <[email protected]>
>
> The function free_highpages() in both arm and xtensa essentially open-code
> for_each_free_mem_range() loop to detect high memory pages that were not
> reserved and that should be initialized and passed to the buddy allocator.
>
> Replace open-coded implementation of for_each_free_mem_range() with usage
> of memblock API to simplify the code.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/arm/mm/init.c | 48 +++++++------------------------------
> arch/xtensa/mm/init.c | 55 ++++++++-----------------------------------
> 2 files changed, 18 insertions(+), 85 deletions(-)

For the xtensa part:
Reviewed-by: Max Filippov <[email protected]>
Tested-by: Max Filippov <[email protected]>

--
Thanks.
-- Max

2020-07-28 10:46:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved


* Mike Rapoport <[email protected]> wrote:

> From: Mike Rapoport <[email protected]>
>
> numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> regions to set node ID in memblock.reserved and than traverses
> memblock.reserved to update reserved_nodemask to include node IDs that were
> set in the first loop.
>
> Remove redundant traversal over memblock.reserved and update
> reserved_nodemask while iterating over numa_meminfo.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/x86/mm/numa.c | 26 ++++++++++----------------
> 1 file changed, 10 insertions(+), 16 deletions(-)

I suspect you'd like to carry this in the -mm tree?

Acked-by: Ingo Molnar <[email protected]>

Thanks,

Ingo

2020-07-28 10:57:11

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

On Tue, Jul 28, 2020 at 12:44:40PM +0200, Ingo Molnar wrote:
>
> * Mike Rapoport <[email protected]> wrote:
>
> > From: Mike Rapoport <[email protected]>
> >
> > numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> > regions to set node ID in memblock.reserved and than traverses
> > memblock.reserved to update reserved_nodemask to include node IDs that were
> > set in the first loop.
> >
> > Remove redundant traversal over memblock.reserved and update
> > reserved_nodemask while iterating over numa_meminfo.
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > arch/x86/mm/numa.c | 26 ++++++++++----------------
> > 1 file changed, 10 insertions(+), 16 deletions(-)
>
> I suspect you'd like to carry this in the -mm tree?

Yes.

> Acked-by: Ingo Molnar <[email protected]>

Thanks!

> Thanks,
>
> Ingo

--
Sincerely yours,
Mike.

2020-07-28 11:04:32

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> regions to set node ID in memblock.reserved and than traverses
> memblock.reserved to update reserved_nodemask to include node IDs that were
> set in the first loop.
>
> Remove redundant traversal over memblock.reserved and update
> reserved_nodemask while iterating over numa_meminfo.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/x86/mm/numa.c | 26 ++++++++++----------------
> 1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 8ee952038c80..4078abd33938 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -498,31 +498,25 @@ static void __init numa_clear_kernel_node_hotplug(void)
> * and use those ranges to set the nid in memblock.reserved.
> * This will split up the memblock regions along node
> * boundaries and will set the node IDs as well.
> + *
> + * The nid will also be set in reserved_nodemask which is later
> + * used to clear MEMBLOCK_HOTPLUG flag.
> + *
> + * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> + * numa_meminfo might not include all memblock.reserved
> + * memory ranges, because quirks such as trim_snb_memory()
> + * reserve specific pages for Sandy Bridge graphics.
> + * These ranges will remain with nid == MAX_NUMNODES. ]
> */
> for (i = 0; i < numa_meminfo.nr_blks; i++) {
> struct numa_memblk *mb = numa_meminfo.blk + i;
> int ret;
>
> ret = memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid);
> + node_set(mb->nid, reserved_nodemask);

Really? This will set all node id into reserved_nodemask. But in the
current code, it's setting nid into memblock reserved region which
interleaves with numa_memoinfo, then get those nid and set it in
reserved_nodemask. This is so different, with my understanding. Please
correct me if I am wrong.

Thanks
Baoquan

> WARN_ON_ONCE(ret);
> }
>
> - /*
> - * Now go over all reserved memblock regions, to construct a
> - * node mask of all kernel reserved memory areas.
> - *
> - * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> - * numa_meminfo might not include all memblock.reserved
> - * memory ranges, because quirks such as trim_snb_memory()
> - * reserve specific pages for Sandy Bridge graphics. ]
> - */
> - for_each_memblock(reserved, mb_region) {
> - int nid = memblock_get_region_node(mb_region);
> -
> - if (nid != MAX_NUMNODES)
> - node_set(nid, reserved_nodemask);
> - }
> -
> /*
> * Finally, clear the MEMBLOCK_HOTPLUG flag for all memory
> * belonging to the reserved node mask.
> --
> 2.26.2
>
>

2020-07-28 11:32:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved


* Mike Rapoport <[email protected]> wrote:

> On Tue, Jul 28, 2020 at 12:44:40PM +0200, Ingo Molnar wrote:
> >
> > * Mike Rapoport <[email protected]> wrote:
> >
> > > From: Mike Rapoport <[email protected]>
> > >
> > > numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> > > regions to set node ID in memblock.reserved and than traverses
> > > memblock.reserved to update reserved_nodemask to include node IDs that were
> > > set in the first loop.
> > >
> > > Remove redundant traversal over memblock.reserved and update
> > > reserved_nodemask while iterating over numa_meminfo.
> > >
> > > Signed-off-by: Mike Rapoport <[email protected]>
> > > ---
> > > arch/x86/mm/numa.c | 26 ++++++++++----------------
> > > 1 file changed, 10 insertions(+), 16 deletions(-)
> >
> > I suspect you'd like to carry this in the -mm tree?
>
> Yes.
>
> > Acked-by: Ingo Molnar <[email protected]>
>
> Thanks!

Assuming it is correct and works. :-)

Thanks,

Ingo

2020-07-28 14:16:40

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

On Tue, Jul 28, 2020 at 07:02:54PM +0800, Baoquan He wrote:
> On 07/28/20 at 08:11am, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> > regions to set node ID in memblock.reserved and than traverses
> > memblock.reserved to update reserved_nodemask to include node IDs that were
> > set in the first loop.
> >
> > Remove redundant traversal over memblock.reserved and update
> > reserved_nodemask while iterating over numa_meminfo.
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > arch/x86/mm/numa.c | 26 ++++++++++----------------
> > 1 file changed, 10 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > index 8ee952038c80..4078abd33938 100644
> > --- a/arch/x86/mm/numa.c
> > +++ b/arch/x86/mm/numa.c
> > @@ -498,31 +498,25 @@ static void __init numa_clear_kernel_node_hotplug(void)
> > * and use those ranges to set the nid in memblock.reserved.
> > * This will split up the memblock regions along node
> > * boundaries and will set the node IDs as well.
> > + *
> > + * The nid will also be set in reserved_nodemask which is later
> > + * used to clear MEMBLOCK_HOTPLUG flag.
> > + *
> > + * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> > + * numa_meminfo might not include all memblock.reserved
> > + * memory ranges, because quirks such as trim_snb_memory()
> > + * reserve specific pages for Sandy Bridge graphics.
> > + * These ranges will remain with nid == MAX_NUMNODES. ]
> > */
> > for (i = 0; i < numa_meminfo.nr_blks; i++) {
> > struct numa_memblk *mb = numa_meminfo.blk + i;
> > int ret;
> >
> > ret = memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid);
> > + node_set(mb->nid, reserved_nodemask);
>
> Really? This will set all node id into reserved_nodemask. But in the
> current code, it's setting nid into memblock reserved region which
> interleaves with numa_memoinfo, then get those nid and set it in
> reserved_nodemask. This is so different, with my understanding. Please
> correct me if I am wrong.

You are right, I've missed the intersections of numa_meminfo with
memblock.reserved.

x86 interaction with membock is so, hmm, interesting...

> Thanks
> Baoquan
>
> > WARN_ON_ONCE(ret);
> > }
> >
> > - /*
> > - * Now go over all reserved memblock regions, to construct a
> > - * node mask of all kernel reserved memory areas.
> > - *
> > - * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> > - * numa_meminfo might not include all memblock.reserved
> > - * memory ranges, because quirks such as trim_snb_memory()
> > - * reserve specific pages for Sandy Bridge graphics. ]
> > - */
> > - for_each_memblock(reserved, mb_region) {
> > - int nid = memblock_get_region_node(mb_region);
> > -
> > - if (nid != MAX_NUMNODES)
> > - node_set(nid, reserved_nodemask);
> > - }
> > -
> > /*
> > * Finally, clear the MEMBLOCK_HOTPLUG flag for all memory
> > * belonging to the reserved node mask.
> > --
> > 2.26.2
> >
> >
>

--
Sincerely yours,
Mike.

2020-07-28 14:25:24

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

On 07/28/20 at 05:15pm, Mike Rapoport wrote:
> On Tue, Jul 28, 2020 at 07:02:54PM +0800, Baoquan He wrote:
> > On 07/28/20 at 08:11am, Mike Rapoport wrote:
> > > From: Mike Rapoport <[email protected]>
> > >
> > > numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> > > regions to set node ID in memblock.reserved and than traverses
> > > memblock.reserved to update reserved_nodemask to include node IDs that were
> > > set in the first loop.
> > >
> > > Remove redundant traversal over memblock.reserved and update
> > > reserved_nodemask while iterating over numa_meminfo.
> > >
> > > Signed-off-by: Mike Rapoport <[email protected]>
> > > ---
> > > arch/x86/mm/numa.c | 26 ++++++++++----------------
> > > 1 file changed, 10 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > > index 8ee952038c80..4078abd33938 100644
> > > --- a/arch/x86/mm/numa.c
> > > +++ b/arch/x86/mm/numa.c
> > > @@ -498,31 +498,25 @@ static void __init numa_clear_kernel_node_hotplug(void)
> > > * and use those ranges to set the nid in memblock.reserved.
> > > * This will split up the memblock regions along node
> > > * boundaries and will set the node IDs as well.
> > > + *
> > > + * The nid will also be set in reserved_nodemask which is later
> > > + * used to clear MEMBLOCK_HOTPLUG flag.
> > > + *
> > > + * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> > > + * numa_meminfo might not include all memblock.reserved
> > > + * memory ranges, because quirks such as trim_snb_memory()
> > > + * reserve specific pages for Sandy Bridge graphics.
> > > + * These ranges will remain with nid == MAX_NUMNODES. ]
> > > */
> > > for (i = 0; i < numa_meminfo.nr_blks; i++) {
> > > struct numa_memblk *mb = numa_meminfo.blk + i;
> > > int ret;
> > >
> > > ret = memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid);
> > > + node_set(mb->nid, reserved_nodemask);
> >
> > Really? This will set all node id into reserved_nodemask. But in the
> > current code, it's setting nid into memblock reserved region which
> > interleaves with numa_memoinfo, then get those nid and set it in
> > reserved_nodemask. This is so different, with my understanding. Please
> > correct me if I am wrong.
>
> You are right, I've missed the intersections of numa_meminfo with
> memblock.reserved.
>
> x86 interaction with membock is so, hmm, interesting...

Yeah, numa_clear_kernel_node_hotplug() intends to find out any memory node
which has reserved memory, then make it as unmovable. Setting all node
id into reserved_nodemask will break the use case of hot removing hotpluggable
boot memory after system bootup.

2020-07-29 08:34:28

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 04/15] arm64: numa: simplify dummy_numa_init()

On Tue, 28 Jul 2020 08:11:42 +0300
Mike Rapoport <[email protected]> wrote:

> From: Mike Rapoport <[email protected]>
>
> dummy_numa_init() loops over memblock.memory and passes nid=0 to
> numa_add_memblk() which essentially wraps memblock_set_node(). However,
> memblock_set_node() can cope with entire memory span itself, so the loop
> over memblock.memory regions is redundant.
>
> Replace the loop with a single call to memblock_set_node() to the entire
> memory.

Hi Mike,

I had a similar patch I was going to post shortly so can add a bit more
on the advantages of this one.

Beyond cleaning up, it also fixes an issue with a buggy ACPI firmware in which the SRAT
table covers some but not all of the memory in the EFI memory map. Stealing bits
from the draft cover letter I had for that...

> This issue can be easily triggered by having an SRAT table which fails
> to cover all elements of the EFI memory map.
>
> This firmware error is detected and a warning printed. e.g.
> "NUMA: Warning: invalid memblk node 64 [mem 0x240000000-0x27fffffff]"
> At that point we fall back to dummy_numa_init().
>
> However, the failed ACPI init has left us with our memblocks all broken
> up as we split them when trying to assign them to NUMA nodes.
>
> We then iterate over the memblocks and add them to node 0.
>
> for_each_memblock(memory, mblk) {
> ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
> if (!ret)
> continue;
> pr_err("NUMA init failed\n");
> return ret;
> }
>
> numa_add_memblk() calls memblock_set_node() which merges regions that
> were previously split up during the earlier attempt to add them to different
> nodes during parsing of SRAT.
>
> This means elements are moved in the memblock array and we can end up
> in a different memblock after the call to numa_add_memblk().
> Result is:
>
> Unable to handle kernel paging request at virtual address 0000000000003a40
> Mem abort info:
> ESR = 0x96000004
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000004
> CM = 0, WnR = 0
> [0000000000003a40] user address but active_mm is swapper
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
>
> ...
>
> Call trace:
> sparse_init_nid+0x5c/0x2b0
> sparse_init+0x138/0x170
> bootmem_init+0x80/0xe0
> setup_arch+0x2a0/0x5fc
> start_kernel+0x8c/0x648
>
> As an illustrative example:
> EFI table has one block of memory.
> memblks[0] = [0...0x2f] so we start with a single memblock.
>
> SRAT has
> [0x00...0x0f] in node 0
> [0x10...0x1f] in node 1
> but no entry covering
> [0x20...0x2f].
>
> Whilst parsing SRAT the single memblock is broken into 3.
> memblks[0] = [0x00...0x0f] in node 0
> memblks[1] = [0x10...0x1f] in node 1
> memblks[2] = [0x20...0x2f] in node MAX_NUM_NODES (invalid value)
>
> A sanity check parse then detects the invalid section and acpi_numa_init
> fails. We then fall back to the dummy path.
>
> That iterates over the memblocks. We'll use i an index in the array of memblocks
>
> i = 0;
> memblks[0] = [0x00...0x0f] set to node0.
> merge doesn't do anything because the neighbouring memblock is still in node1.
>
> i = 1
> memblks[1] = [0x10...0x1f] set to node 0.
> merge combines memblock 0 and 1 to give a new set of memblocks.
>
> memblks[0] = [0x00..0x1f] in node 0
> memblks[1] = [0x20..0x2f] in node MAX_NUM_NODES.
>
> i = 2 off the end of the now reduced array of memblocks, so exit the loop.
> (if we restart the loop here everything will be fine).
>
> Later sparse_init_nid tries to use the node of the second memblock to index
> somethings and boom.


>
> Signed-off-by: Mike Rapoport <[email protected]>

Acked-by: Jonathan Cameron <[email protected]>

> ---
> arch/arm64/mm/numa.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index aafcee3e3f7e..0cbdbcc885fb 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -423,19 +423,16 @@ static int __init numa_init(int (*init_func)(void))
> */
> static int __init dummy_numa_init(void)
> {
> + phys_addr_t start = memblock_start_of_DRAM();
> + phys_addr_t end = memblock_end_of_DRAM();
> int ret;
> - struct memblock_region *mblk;
>
> if (numa_off)
> pr_info("NUMA disabled\n"); /* Forced off on command line. */
> - pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> - memblock_start_of_DRAM(), memblock_end_of_DRAM() - 1);
> -
> - for_each_memblock(memory, mblk) {
> - ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
> - if (!ret)
> - continue;
> + pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1);
>
> + ret = numa_add_memblk(0, start, end);
> + if (ret) {
> pr_err("NUMA init failed\n");
> return ret;
> }


2020-07-29 11:43:00

by Stafford Horne

[permalink] [raw]
Subject: Re: [PATCH 05/15] h8300, nds32, openrisc: simplify detection of memory extents

On Tue, Jul 28, 2020 at 08:11:43AM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> Instead of traversing memblock.memory regions to find memory_start and
> memory_end, simply query memblock_{start,end}_of_DRAM().
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/h8300/kernel/setup.c | 8 +++-----
> arch/nds32/kernel/setup.c | 8 ++------
> arch/openrisc/kernel/setup.c | 9 ++-------
> 3 files changed, 7 insertions(+), 18 deletions(-)

Hi Mike,

For the openrisc part:

Acked-by: Stafford Horne <[email protected]>

> --- a/arch/openrisc/kernel/setup.c
> +++ b/arch/openrisc/kernel/setup.c
> @@ -48,17 +48,12 @@ static void __init setup_memory(void)
> unsigned long ram_start_pfn;
> unsigned long ram_end_pfn;
> phys_addr_t memory_start, memory_end;
> - struct memblock_region *region;
>
> memory_end = memory_start = 0;
>
> /* Find main memory where is the kernel, we assume its the only one */
> - for_each_memblock(memory, region) {
> - memory_start = region->base;
> - memory_end = region->base + region->size;
> - printk(KERN_INFO "%s: Memory: 0x%x-0x%x\n", __func__,
> - memory_start, memory_end);
> - }
> + memory_start = memblock_start_of_DRAM();
> + memory_end = memblock_end_of_DRAM();
>
> if (!memory_end) {
> panic("No memory!");
> --
> 2.26.2
>

2020-07-30 01:53:13

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 09/15] memblock: make for_each_memblock_type() iterator private

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> for_each_memblock_type() is not used outside mm/memblock.c, move it there
> from include/linux/memblock.h
>
> ---
> include/linux/memblock.h | 5 -----
> mm/memblock.c | 5 +++++
> 2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 017fae833d4a..220b5f0dad42 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -532,11 +532,6 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
> region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
> region++)
>
> -#define for_each_memblock_type(i, memblock_type, rgn) \
> - for (i = 0, rgn = &memblock_type->regions[0]; \
> - i < memblock_type->cnt; \
> - i++, rgn = &memblock_type->regions[i])
> -
> extern void *alloc_large_system_hash(const char *tablename,
> unsigned long bucketsize,
> unsigned long numentries,
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 39aceafc57f6..a5b9b3df81fc 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -129,6 +129,11 @@ struct memblock memblock __initdata_memblock = {
> .current_limit = MEMBLOCK_ALLOC_ANYWHERE,
> };
>
> +#define for_each_memblock_type(i, memblock_type, rgn) \
> + for (i = 0, rgn = &memblock_type->regions[0]; \
> + i < memblock_type->cnt; \
> + i++, rgn = &memblock_type->regions[i])
> +

Reviewed-by: Baoquan He <[email protected]>

2020-07-30 01:55:11

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 10/15] memblock: make memblock_debug and related functionality private

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> The only user of memblock_dbg() outside memblock was s390 setup code and it
> is converted to use pr_debug() instead.
> This allows to stop exposing memblock_debug and memblock_dbg() to the rest
> of the kernel.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/s390/kernel/setup.c | 4 ++--
> include/linux/memblock.h | 12 +-----------
> mm/memblock.c | 13 +++++++++++--
> 3 files changed, 14 insertions(+), 15 deletions(-)

Nice clean up.

Reviewed-by: Baoquan He <[email protected]>

>
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index 07aa15ba43b3..8b284cf6e199 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -776,8 +776,8 @@ static void __init memblock_add_mem_detect_info(void)
> unsigned long start, end;
> int i;
>
> - memblock_dbg("physmem info source: %s (%hhd)\n",
> - get_mem_info_source(), mem_detect.info_source);
> + pr_debug("physmem info source: %s (%hhd)\n",
> + get_mem_info_source(), mem_detect.info_source);
> /* keep memblock lists close to the kernel */
> memblock_set_bottom_up(true);
> for_each_mem_detect_block(i, &start, &end) {
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 220b5f0dad42..e6a23b3db696 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -90,7 +90,6 @@ struct memblock {
> };
>
> extern struct memblock memblock;
> -extern int memblock_debug;
>
> #ifndef CONFIG_ARCH_KEEP_MEMBLOCK
> #define __init_memblock __meminit
> @@ -102,9 +101,6 @@ void memblock_discard(void);
> static inline void memblock_discard(void) {}
> #endif
>
> -#define memblock_dbg(fmt, ...) \
> - if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> -
> phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
> phys_addr_t size, phys_addr_t align);
> void memblock_allow_resize(void);
> @@ -456,13 +452,7 @@ bool memblock_is_region_memory(phys_addr_t base, phys_addr_t size);
> bool memblock_is_reserved(phys_addr_t addr);
> bool memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);
>
> -extern void __memblock_dump_all(void);
> -
> -static inline void memblock_dump_all(void)
> -{
> - if (memblock_debug)
> - __memblock_dump_all();
> -}
> +void memblock_dump_all(void);
>
> /**
> * memblock_set_current_limit - Set the current allocation limit to allow
> diff --git a/mm/memblock.c b/mm/memblock.c
> index a5b9b3df81fc..824938849f6d 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -134,7 +134,10 @@ struct memblock memblock __initdata_memblock = {
> i < memblock_type->cnt; \
> i++, rgn = &memblock_type->regions[i])
>
> -int memblock_debug __initdata_memblock;
> +#define memblock_dbg(fmt, ...) \
> + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> +
> +static int memblock_debug __initdata_memblock;
> static bool system_has_some_mirror __initdata_memblock = false;
> static int memblock_can_resize __initdata_memblock;
> static int memblock_memory_in_slab __initdata_memblock = 0;
> @@ -1919,7 +1922,7 @@ static void __init_memblock memblock_dump(struct memblock_type *type)
> }
> }
>
> -void __init_memblock __memblock_dump_all(void)
> +static void __init_memblock __memblock_dump_all(void)
> {
> pr_info("MEMBLOCK configuration:\n");
> pr_info(" memory size = %pa reserved size = %pa\n",
> @@ -1933,6 +1936,12 @@ void __init_memblock __memblock_dump_all(void)
> #endif
> }
>
> +void __init_memblock memblock_dump_all(void)
> +{
> + if (memblock_debug)
> + __memblock_dump_all();
> +}
> +
> void __init memblock_allow_resize(void)
> {
> memblock_can_resize = 1;
> --
> 2.26.2
>
>

2020-07-30 02:24:59

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 11/15] memblock: reduce number of parameters in for_each_mem_range()

On 07/28/20 at 08:11am, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> Currently for_each_mem_range() iterator is the most generic way to traverse
> memblock regions. As such, it has 8 parameters and it is hardly convenient
> to users. Most users choose to utilize one of its wrappers and the only
> user that actually needs most of the parameters outside memblock is s390
> crash dump implementation.
>
> To avoid yet another naming for memblock iterators, rename the existing
> for_each_mem_range() to __for_each_mem_range() and add a new
> for_each_mem_range() wrapper with only index, start and end parameters.
>
> The new wrapper nicely fits into init_unavailable_mem() and will be used in
> upcoming changes to simplify memblock traversals.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> .clang-format | 1 +
> arch/arm64/kernel/machine_kexec_file.c | 6 ++----
> arch/s390/kernel/crash_dump.c | 8 ++++----
> include/linux/memblock.h | 18 ++++++++++++++----
> mm/page_alloc.c | 3 +--
> 5 files changed, 22 insertions(+), 14 deletions(-)

Reviewed-by: Baoquan He <[email protected]>

>
> diff --git a/.clang-format b/.clang-format
> index a0a96088c74f..52ededab25ce 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -205,6 +205,7 @@ ForEachMacros:
> - 'for_each_memblock_type'
> - 'for_each_memcg_cache_index'
> - 'for_each_mem_pfn_range'
> + - '__for_each_mem_range'
> - 'for_each_mem_range'
> - 'for_each_mem_range_rev'
> - 'for_each_migratetype_order'
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 361a1143e09e..5b0e67b93cdc 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -215,8 +215,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
> phys_addr_t start, end;
>
> nr_ranges = 1; /* for exclusion of crashkernel region */
> - for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> - MEMBLOCK_NONE, &start, &end, NULL)
> + for_each_mem_range(i, &start, &end)
> nr_ranges++;
>
> cmem = kmalloc(struct_size(cmem, ranges, nr_ranges), GFP_KERNEL);
> @@ -225,8 +224,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>
> cmem->max_nr_ranges = nr_ranges;
> cmem->nr_ranges = 0;
> - for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> - MEMBLOCK_NONE, &start, &end, NULL) {
> + for_each_mem_range(i, &start, &end) {
> cmem->ranges[cmem->nr_ranges].start = start;
> cmem->ranges[cmem->nr_ranges].end = end - 1;
> cmem->nr_ranges++;
> diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
> index f96a5857bbfd..e28085c725ff 100644
> --- a/arch/s390/kernel/crash_dump.c
> +++ b/arch/s390/kernel/crash_dump.c
> @@ -549,8 +549,8 @@ static int get_mem_chunk_cnt(void)
> int cnt = 0;
> u64 idx;
>
> - for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> - MEMBLOCK_NONE, NULL, NULL, NULL)
> + __for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> + MEMBLOCK_NONE, NULL, NULL, NULL)
> cnt++;
> return cnt;
> }
> @@ -563,8 +563,8 @@ static void loads_init(Elf64_Phdr *phdr, u64 loads_offset)
> phys_addr_t start, end;
> u64 idx;
>
> - for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> - MEMBLOCK_NONE, &start, &end, NULL) {
> + __for_each_mem_range(idx, &memblock.physmem, &oldmem_type, NUMA_NO_NODE,
> + MEMBLOCK_NONE, &start, &end, NULL) {
> phdr->p_filesz = end - start;
> phdr->p_type = PT_LOAD;
> phdr->p_offset = start;
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index e6a23b3db696..d70c2835e913 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -142,7 +142,7 @@ void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
> void __memblock_free_late(phys_addr_t base, phys_addr_t size);
>
> /**
> - * for_each_mem_range - iterate through memblock areas from type_a and not
> + * __for_each_mem_range - iterate through memblock areas from type_a and not
> * included in type_b. Or just type_a if type_b is NULL.
> * @i: u64 used as loop variable
> * @type_a: ptr to memblock_type to iterate
> @@ -153,7 +153,7 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size);
> * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> * @p_nid: ptr to int for nid of the range, can be %NULL
> */
> -#define for_each_mem_range(i, type_a, type_b, nid, flags, \
> +#define __for_each_mem_range(i, type_a, type_b, nid, flags, \
> p_start, p_end, p_nid) \
> for (i = 0, __next_mem_range(&i, nid, flags, type_a, type_b, \
> p_start, p_end, p_nid); \
> @@ -182,6 +182,16 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size);
> __next_mem_range_rev(&i, nid, flags, type_a, type_b, \
> p_start, p_end, p_nid))
>
> +/**
> + * for_each_mem_range - iterate through memory areas.
> + * @i: u64 used as loop variable
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + */
> +#define for_each_mem_range(i, p_start, p_end) \
> + __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, \
> + MEMBLOCK_NONE, p_start, p_end, NULL)
> +
> /**
> * for_each_reserved_mem_region - iterate over all reserved memblock areas
> * @i: u64 used as loop variable
> @@ -287,8 +297,8 @@ int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask);
> * soon as memblock is initialized.
> */
> #define for_each_free_mem_range(i, nid, flags, p_start, p_end, p_nid) \
> - for_each_mem_range(i, &memblock.memory, &memblock.reserved, \
> - nid, flags, p_start, p_end, p_nid)
> + __for_each_mem_range(i, &memblock.memory, &memblock.reserved, \
> + nid, flags, p_start, p_end, p_nid)
>
> /**
> * for_each_free_mem_range_reverse - rev-iterate through free memblock areas
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e028b87ce294..95af111d69d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6972,8 +6972,7 @@ static void __init init_unavailable_mem(void)
> * Loop through unavailable ranges not covered by memblock.memory.
> */
> pgcnt = 0;
> - for_each_mem_range(i, &memblock.memory, NULL,
> - NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
> + for_each_mem_range(i, &start, &end) {
> if (next < start)
> pgcnt += init_unavailable_range(PFN_DOWN(next),
> PFN_UP(start));
> --
> 2.26.2
>
>

2020-07-30 12:04:33

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 04/15] arm64: numa: simplify dummy_numa_init()

On Tue, Jul 28, 2020 at 08:11:42AM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> dummy_numa_init() loops over memblock.memory and passes nid=0 to
> numa_add_memblk() which essentially wraps memblock_set_node(). However,
> memblock_set_node() can cope with entire memory span itself, so the loop
> over memblock.memory regions is redundant.
>
> Replace the loop with a single call to memblock_set_node() to the entire
> memory.
>
> Signed-off-by: Mike Rapoport <[email protected]>

Acked-by: Catalin Marinas <[email protected]>

2020-07-30 12:18:00

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()

Mike Rapoport <[email protected]> writes:
> From: Mike Rapoport <[email protected]>
>
> fadump_reserve_crash_area() reserves memory from a specified base address
> till the end of the RAM.
>
> Replace iteration through the memblock.memory with a single call to
> memblock_reserve() with appropriate that will take care of proper memory
^
parameters?
> reservation.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/powerpc/kernel/fadump.c | 20 +-------------------
> 1 file changed, 1 insertion(+), 19 deletions(-)

I think this looks OK to me, but I don't have a setup to test it easily.
I've added Hari to Cc who might be able to.

But I'll give you an ack in the hope that it works :)

Acked-by: Michael Ellerman <[email protected]>


> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 78ab9a6ee6ac..2446a61e3c25 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void)
> /* Preserve everything above the base address */
> static void __init fadump_reserve_crash_area(u64 base)
> {
> - struct memblock_region *reg;
> - u64 mstart, msize;
> -
> - for_each_memblock(memory, reg) {
> - mstart = reg->base;
> - msize = reg->size;
> -
> - if ((mstart + msize) < base)
> - continue;
> -
> - if (mstart < base) {
> - msize -= (base - mstart);
> - mstart = base;
> - }
> -
> - pr_info("Reserving %lluMB of memory at %#016llx for preserving crash data",
> - (msize >> 20), mstart);
> - memblock_reserve(mstart, msize);
> - }
> + memblock_reserve(base, memblock_end_of_DRAM() - base);
> }
>
> unsigned long __init arch_reserved_kernel_pages(void)
> --
> 2.26.2

2020-08-01 10:20:17

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()

On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote:
> Mike Rapoport <[email protected]> writes:
> > From: Mike Rapoport <[email protected]>
> >
> > fadump_reserve_crash_area() reserves memory from a specified base address
> > till the end of the RAM.
> >
> > Replace iteration through the memblock.memory with a single call to
> > memblock_reserve() with appropriate that will take care of proper memory
> ^
> parameters?
> > reservation.
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> > arch/powerpc/kernel/fadump.c | 20 +-------------------
> > 1 file changed, 1 insertion(+), 19 deletions(-)
>
> I think this looks OK to me, but I don't have a setup to test it easily.
> I've added Hari to Cc who might be able to.
>
> But I'll give you an ack in the hope that it works :)

Actually, I did some digging in the git log and the traversal was added
there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude
memory holes while reserving memory in second kernel")
Presuming this is still reqruired I'm going to drop this patch and will
simply replace for_each_memblock() with for_each_mem_range() in v2.

> Acked-by: Michael Ellerman <[email protected]>
>
>
> > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > index 78ab9a6ee6ac..2446a61e3c25 100644
> > --- a/arch/powerpc/kernel/fadump.c
> > +++ b/arch/powerpc/kernel/fadump.c
> > @@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void)
> > /* Preserve everything above the base address */
> > static void __init fadump_reserve_crash_area(u64 base)
> > {
> > - struct memblock_region *reg;
> > - u64 mstart, msize;
> > -
> > - for_each_memblock(memory, reg) {
> > - mstart = reg->base;
> > - msize = reg->size;
> > -
> > - if ((mstart + msize) < base)
> > - continue;
> > -
> > - if (mstart < base) {
> > - msize -= (base - mstart);
> > - mstart = base;
> > - }
> > -
> > - pr_info("Reserving %lluMB of memory at %#016llx for preserving crash data",
> > - (msize >> 20), mstart);
> > - memblock_reserve(mstart, msize);
> > - }
> > + memblock_reserve(base, memblock_end_of_DRAM() - base);
> > }
> >
> > unsigned long __init arch_reserved_kernel_pages(void)
> > --
> > 2.26.2

--
Sincerely yours,
Mike.

2020-08-01 10:55:51

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()



On 01/08/20 3:48 pm, Mike Rapoport wrote:
> On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote:
>> Mike Rapoport <[email protected]> writes:
>>> From: Mike Rapoport <[email protected]>
>>>
>>> fadump_reserve_crash_area() reserves memory from a specified base address
>>> till the end of the RAM.
>>>
>>> Replace iteration through the memblock.memory with a single call to
>>> memblock_reserve() with appropriate that will take care of proper memory
>> ^
>> parameters?
>>> reservation.
>>>
>>> Signed-off-by: Mike Rapoport <[email protected]>
>>> ---
>>> arch/powerpc/kernel/fadump.c | 20 +-------------------
>>> 1 file changed, 1 insertion(+), 19 deletions(-)
>>
>> I think this looks OK to me, but I don't have a setup to test it easily.
>> I've added Hari to Cc who might be able to.
>>
>> But I'll give you an ack in the hope that it works :)
>
> Actually, I did some digging in the git log and the traversal was added
> there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude
> memory holes while reserving memory in second kernel")

I was about to comment on the same :)
memblock_reserve() was being used until we ran into the issue talked
about in the above commit...

> Presuming this is still reqruired I'm going to drop this patch and will

Yeah, it is still required..

> simply replace for_each_memblock() with for_each_mem_range() in v2.

Sounds right.

>
>> Acked-by: Michael Ellerman <[email protected]>
>>
>>
>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>> index 78ab9a6ee6ac..2446a61e3c25 100644
>>> --- a/arch/powerpc/kernel/fadump.c
>>> +++ b/arch/powerpc/kernel/fadump.c
>>> @@ -1658,25 +1658,7 @@ int __init fadump_reserve_mem(void)
>>> /* Preserve everything above the base address */
>>> static void __init fadump_reserve_crash_area(u64 base)
>>> {
>>> - struct memblock_region *reg;
>>> - u64 mstart, msize;
>>> -
>>> - for_each_memblock(memory, reg) {
>>> - mstart = reg->base;
>>> - msize = reg->size;
>>> -
>>> - if ((mstart + msize) < base)
>>> - continue;
>>> -
>>> - if (mstart < base) {
>>> - msize -= (base - mstart);
>>> - mstart = base;
>>> - }
>>> -
>>> - pr_info("Reserving %lluMB of memory at %#016llx for preserving crash data",
>>> - (msize >> 20), mstart);
>>> - memblock_reserve(mstart, msize);
>>> - }
>>> + memblock_reserve(base, memblock_end_of_DRAM() - base);
>>> }
>>>
>>> unsigned long __init arch_reserved_kernel_pages(void)
>>> --
>>> 2.26.2
>

Thanks
Hari

2020-08-02 13:18:11

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area()

Mike Rapoport <[email protected]> writes:
> On Thu, Jul 30, 2020 at 10:15:13PM +1000, Michael Ellerman wrote:
>> Mike Rapoport <[email protected]> writes:
>> > From: Mike Rapoport <[email protected]>
>> >
>> > fadump_reserve_crash_area() reserves memory from a specified base address
>> > till the end of the RAM.
>> >
>> > Replace iteration through the memblock.memory with a single call to
>> > memblock_reserve() with appropriate that will take care of proper memory
>> ^
>> parameters?
>> > reservation.
>> >
>> > Signed-off-by: Mike Rapoport <[email protected]>
>> > ---
>> > arch/powerpc/kernel/fadump.c | 20 +-------------------
>> > 1 file changed, 1 insertion(+), 19 deletions(-)
>>
>> I think this looks OK to me, but I don't have a setup to test it easily.
>> I've added Hari to Cc who might be able to.
>>
>> But I'll give you an ack in the hope that it works :)
>
> Actually, I did some digging in the git log and the traversal was added
> there on purpose by the commit b71a693d3db3 ("powerpc/fadump: exclude
> memory holes while reserving memory in second kernel")
> Presuming this is still reqruired I'm going to drop this patch and will
> simply replace for_each_memblock() with for_each_mem_range() in v2.

Thanks.

cheers