Even though it's common to see the architecture code using both
bootmem and memblock early memory allocators, it's not good for
multiple reasons. First of all, it's redundant to have two
early memory allocator while one would be more than enough from
functionality and stability points of view. Secondly, some new
features introduced in the kernel utilize the methods of the most
modern allocator ignoring the older one. It means the architecture
code must keep the both subsystems up synchronized with information
about memory regions and reservations, which leads to the code
complexity increase, that obviously increases bugs probability.
Finally it's better to keep all the architectures code unified for
better readability and code simplification. All these reasons lead
to one conclusion - arch code should use just one memory allocator,
which is supposed to be memblock as the most modern and already
utilized by the most of the kernel platforms. This patchset is
mostly about it.
One more reason why the MIPS arch code should finally move to
memblock is a BUG somewhere in the initialization process, when
CMA is activated:
[ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
[ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
[ 0.263172] flags: 0x40000000()
[ 0.266723] page dumped because: nonzero mapcount
[ 0.272049] Modules linked in:
[ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
[ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 0.323148] Call Trace:
[ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
[ 0.330859] [<80404814>] dump_stack+0xd4/0x110
[ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
[ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
[ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
[ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
[ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
[ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
[ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
[ 0.376828] [<808faca4>] kernel_init+0x14/0x104
[ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
The bugus pfn seems to be the one allocated for bootmem allocator
pages and hasn't been freed before letting the CMA working with its
areas. Anyway the bug is solved by this patchset.
Another reason why this patchset is useful is that it fixes the fdt
reserved-memory nodes functionality for MIPS. Really it's bug to have
the fdt reserved nodes scanning before the memblock is
fully initialized (calling early_init_fdt_scan_reserved_mem before
bootmem_init is called). Additionally no-map flag of the
reserved-memory node hasn't been taking into account. This patchset
fixes all of these.
As you probably remember I already did another attempt to merge a
similar functionality into the kernel. This time the patchset got
to be less complex (14 patches vs 21 last time) and fixes the
platform code like SGI IP27 and Loongson3, which due to being
NUMA introduce its own memory initialization process. Although
I have much doubt in SGI IP27 code operability in the first place,
since it got prom_meminit() method of early memory initialization,
which hasn't been called at any other place in the kernel. It must
have been left there unrenamed after arch/mips/mips-boards/generic
code had been discarded.
Here are the list of folks, who agreed to perform some tests of
the patchset:
Alexander Sverdlin <[email protected]> - Octeon2
Matt Redfearn <[email protected]> - Loongson3, etc
Joshua Kinard <[email protected]> - IP27
Marcin Nowakowski <[email protected]>
Thanks to you all in regards and for everybody, who will be involved
in reviewing and testing.
The patchset is applied on top of kernel 4.15-rc8 and can be found
submitted at my repo:
https://github.com/fancer/Linux-kernel-MIPS-memblock-project
Signed-off-by: Serge Semin <[email protected]>
Serge Semin (14):
MIPS: memblock: Add RESERVED_NOMAP memory flag
MIPS: memblock: Surely map BSS kernel memory section
MIPS: memblock: Reserve initrd memory in memblock
MIPS: memblock: Discard bootmem initialization
MIPS: memblock: Add reserved memory regions to memblock
MIPS: memblock: Reserve kdump/crash regions in memblock
MIPS: memblock: Mark present sparsemem sections
MIPS: memblock: Simplify DMA contiguous reservation
MIPS: memblock: Allow memblock regions resize
MIPS: memblock: Perform early low memory test
MIPS: memblock: Print out kernel virtual mem layout
MIPS: memblock: Discard bootmem from Loongson3 code
MIPS: memblock: Discard bootmem from SGI IP27 code
MIPS: memblock: Deactivate bootmem allocator
arch/mips/Kconfig | 2 +-
arch/mips/include/asm/bootinfo.h | 1 +
arch/mips/kernel/prom.c | 8 +-
arch/mips/kernel/setup.c | 218 +++++++++------------
arch/mips/loongson64/loongson-3/numa.c | 16 +-
arch/mips/mm/init.c | 47 +++++
arch/mips/sgi-ip27/ip27-memory.c | 9 +-
7 files changed, 153 insertions(+), 148 deletions(-)
--
2.12.0
The current MIPS code makes sure the kernel code/data/init
sections are in the maps, but BSS should also be there.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 76e9e2075..0d21c9e04 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
BOOT_MEM_INIT_RAM);
+ arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
+ BOOT_MEM_RAM);
pr_info("Determined physical RAM map:\n");
print_memory_map();
--
2.12.0
Loongson64/3 runs its own code to initialize memory allocator in
case of NUMA configuration is selected. So in order to move to the
pure memblock utilization we discard the bootmem allocator usage
and insert the memblock reservation method for kernel/addrspace_offset
memory regions.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/loongson64/loongson-3/numa.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/mips/loongson64/loongson-3/numa.c b/arch/mips/loongson64/loongson-3/numa.c
index 282c5a8c2..902843516 100644
--- a/arch/mips/loongson64/loongson-3/numa.c
+++ b/arch/mips/loongson64/loongson-3/numa.c
@@ -180,7 +180,6 @@ static void __init szmem(unsigned int node)
static void __init node_mem_init(unsigned int node)
{
- unsigned long bootmap_size;
unsigned long node_addrspace_offset;
unsigned long start_pfn, end_pfn, freepfn;
@@ -197,26 +196,21 @@ static void __init node_mem_init(unsigned int node)
__node_data[node] = prealloc__node_data + node;
- NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
- bootmap_size = init_bootmem_node(NODE_DATA(node), freepfn,
- start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
if (node == 0) /* used by finalize_initrd() */
max_low_pfn = end_pfn;
- /* This is reserved for the kernel and bdata->node_bootmem_map */
- reserve_bootmem_node(NODE_DATA(node), start_pfn << PAGE_SHIFT,
- ((freepfn - start_pfn) << PAGE_SHIFT) + bootmap_size,
- BOOTMEM_DEFAULT);
+ /* This is reserved for the kernel only */
+ if (node == 0)
+ memblock_reserve(start_pfn << PAGE_SHIFT,
+ ((freepfn - start_pfn) << PAGE_SHIFT));
if (node == 0 && node_end_pfn(0) >= (0xffffffff >> PAGE_SHIFT)) {
/* Reserve 0xfe000000~0xffffffff for RS780E integrated GPU */
- reserve_bootmem_node(NODE_DATA(node),
- (node_addrspace_offset | 0xfe000000),
- 32 << 20, BOOTMEM_DEFAULT);
+ memblock_reserve(node_addrspace_offset | 0xfe000000, 32 << 20);
}
sparse_memory_present_with_active_regions(node);
--
2.12.0
Memblock allocator can be successfully used from now for early
memory management
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 725b5ece7..a6c4fb6b6 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -4,7 +4,6 @@ config MIPS
default y
select ARCH_BINFMT_ELF_STATE
select ARCH_CLOCKSOURCE_DATA
- select ARCH_DISCARD_MEMBLOCK
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_MIGHT_HAVE_PC_PARPORT
@@ -57,6 +57,7 @@ config MIPS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KPROBES
select HAVE_KRETPROBES
+ select NO_BOOTMEM
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MOD_ARCH_SPECIFIC
--
2.12.0
SGI IP27 got its own code to set the early memory allocator up since it's
NUMA-based system. So in order to be compatible with NO_BOOTMEM config
we need to discard the bootmem allocator initialization and insert the
memblock reservation method. Although in my opinion the code isn't
working anyway since I couldn't find a place where prom_meminit() called
and kernel memory isn't reserved. It must have been untested since the
time the arch/mips/mips-boards/generic code was in the kernel.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/sgi-ip27/ip27-memory.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
index 8d0eb2643..d25758e25 100644
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -389,7 +389,6 @@ static void __init node_mem_init(cnodeid_t node)
{
unsigned long slot_firstpfn = slot_getbasepfn(node, 0);
unsigned long slot_freepfn = node_getfirstfree(node);
- unsigned long bootmap_size;
unsigned long start_pfn, end_pfn;
get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
@@ -400,7 +399,6 @@ static void __init node_mem_init(cnodeid_t node)
__node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
memset(__node_data[node], 0, PAGE_SIZE);
- NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
@@ -409,12 +407,9 @@ static void __init node_mem_init(cnodeid_t node)
slot_freepfn += PFN_UP(sizeof(struct pglist_data) +
sizeof(struct hub_data));
- bootmap_size = init_bootmem_node(NODE_DATA(node), slot_freepfn,
- start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
- reserve_bootmem_node(NODE_DATA(node), slot_firstpfn << PAGE_SHIFT,
- ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT) + bootmap_size,
- BOOTMEM_DEFAULT);
+ memblock_reserve(slot_firstpfn << PAGE_SHIFT,
+ ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT));
sparse_memory_present_with_active_regions(node);
}
--
2.12.0
CMA reserves it areas in the memblock allocator. Since we aren't
using bootmem anymore, the reservations copying should be discarded.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 6df1eaf38..e0ca0d2bc 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -869,10 +869,6 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
- /* Tell bootmem about cma reserved memblock section */
- for_each_memblock(reserved, reg)
- if (reg->size != 0)
- reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
}
static void __init resource_init(void)
--
2.12.0
When all the main reservations are done the memblock regions
can be dynamically resized. Additionally it would be useful to have
memblock regions dumped on debug at this point.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index e0ca0d2bc..82c6b77f6 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -869,6 +869,10 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
+
+ memblock_allow_resize();
+
+ memblock_dump_all();
}
static void __init resource_init(void)
--
2.12.0
Low memory can be tested at this point, since all the
reservations have just been finished without much of
additional allocations.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 82c6b77f6..b65047d85 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -873,6 +873,8 @@ static void __init arch_mem_init(char **cmdline_p)
memblock_allow_resize();
memblock_dump_all();
+
+ early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
}
static void __init resource_init(void)
--
2.12.0
It is useful to have the kernel virtual memory layout printed
at boot time so to have the full information about the booted
kernel. In some cases it might be unsafe to have virtual
addresses freely visible in logs, so the %pK format is used if
one want to hide them.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/mm/init.c | 47 +++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 15040266b..d3e6bb531 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -32,6 +32,7 @@
#include <linux/kcore.h>
#include <linux/export.h>
#include <linux/initrd.h>
+#include <linux/sizes.h>
#include <asm/asm-offsets.h>
#include <asm/bootinfo.h>
@@ -60,6 +61,51 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
EXPORT_SYMBOL(zero_page_mask);
/*
+ * Print out the kernel virtual memory layout
+ */
+#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
+#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
+static void __init __maybe_unused mem_print_kmap_info(void)
+{
+ pr_notice("Kernel virtual memory layout:\n"
+ " lowmem : 0x%pK - 0x%pK (%4ld MB)\n"
+ " .text : 0x%pK - 0x%pK (%4td kB)\n"
+ " .data : 0x%pK - 0x%pK (%4td kB)\n"
+ " .init : 0x%pK - 0x%pK (%4td kB)\n"
+ " .bss : 0x%pK - 0x%pK (%4td kB)\n"
+ " vmalloc : 0x%pK - 0x%pK (%4ld MB)\n"
+#ifdef CONFIG_HIGHMEM
+ " pkmap : 0x%pK - 0x%pK (%4ld MB)\n"
+#endif
+ " fixmap : 0x%pK - 0x%pK (%4ld kB)\n",
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(__bss_start, __bss_stop),
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
+#endif
+ MLK(FIXADDR_START, FIXADDR_TOP));
+
+ /* Check some fundamental inconsistencies. May add something else? */
+#ifdef CONFIG_HIGHMEM
+ BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
+ BUG_ON(VMALLOC_END < (unsigned long)high_memory);
+ BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
+ BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
+ (unsigned long)high_memory);
+#endif
+ BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
+ BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
+}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
+/*
* Not static inline because used by IP27 special magic initialization code
*/
void setup_zero_pages(void)
@@ -468,6 +514,7 @@ void __init mem_init(void)
free_all_bootmem();
setup_zero_pages(); /* Setup zeroed pages. */
mem_init_free_highmem();
+ mem_print_kmap_info();
mem_init_print_info(NULL);
#ifdef CONFIG_64BIT
--
2.12.0
If sparsemem is activated all sections with present pages must
be accordingly marked after memblock is fully initialized.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b121fa702..6df1eaf38 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -778,7 +778,7 @@ static void __init request_crashkernel(struct resource *res)
static void __init arch_mem_init(char **cmdline_p)
{
- struct memblock_region *reg;
+ struct memblock_region *reg __maybe_unused;
extern void plat_mem_setup(void);
/* call board setup routine */
@@ -860,6 +860,11 @@ static void __init arch_mem_init(char **cmdline_p)
crashk_res.end - crashk_res.start + 1);
#endif
device_tree_init();
+#ifdef CONFIG_SPARSEMEM
+ for_each_memblock(memory, reg)
+ memory_present(0, memblock_region_memory_base_pfn(reg),
+ memblock_region_memory_end_pfn(reg));
+#endif /* CONFIG_SPARSEMEM */
sparse_init();
plat_swiotlb_setup();
--
2.12.0
Even if nomap flag is specified the reserved memory declared in dts
isn't really discarded from the buddy allocator in the current code.
We'll fix it by adding the no-map MIPS memory flag. Additionally
lets add the RESERVED_NOMAP memory regions handling to the methods,
which aren't going to be changed in the further patches.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/include/asm/bootinfo.h | 1 +
arch/mips/kernel/prom.c | 8 ++++++--
arch/mips/kernel/setup.c | 8 ++++++++
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index b603804ca..f7be3148a 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -90,6 +90,7 @@ extern unsigned long mips_machtype;
#define BOOT_MEM_ROM_DATA 2
#define BOOT_MEM_RESERVED 3
#define BOOT_MEM_INIT_RAM 4
+#define BOOT_MEM_RESERVED_NOMAP 5
/*
* A memory map that's built upon what was determined
diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 0dbcd152a..b123eb827 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -41,7 +41,7 @@ char *mips_get_machine_name(void)
#ifdef CONFIG_USE_OF
void __init early_init_dt_add_memory_arch(u64 base, u64 size)
{
- return add_memory_region(base, size, BOOT_MEM_RAM);
+ add_memory_region(base, size, BOOT_MEM_RAM);
}
void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
@@ -52,7 +52,11 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
{
- add_memory_region(base, size, BOOT_MEM_RESERVED);
+ if (!nomap)
+ add_memory_region(base, size, BOOT_MEM_RESERVED);
+ else
+ add_memory_region(base, size, BOOT_MEM_RESERVED_NOMAP);
+
return 0;
}
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 4020d8f98..76e9e2075 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -172,6 +172,7 @@ bool __init memory_region_available(phys_addr_t start, phys_addr_t size)
in_ram = true;
break;
case BOOT_MEM_RESERVED:
+ case BOOT_MEM_RESERVED_NOMAP:
if ((start >= start_ && start < end_) ||
(start < start_ && start + size >= start_))
free = false;
@@ -207,6 +208,9 @@ static void __init print_memory_map(void)
case BOOT_MEM_RESERVED:
printk(KERN_CONT "(reserved)\n");
break;
+ case BOOT_MEM_RESERVED_NOMAP:
+ printk(KERN_CONT "(reserved nomap)\n");
+ break;
default:
printk(KERN_CONT "type %lu\n", boot_mem_map.map[i].type);
break;
@@ -955,9 +969,13 @@ static void __init resource_init(void)
res->name = "System RAM";
res->flags |= IORESOURCE_SYSRAM;
break;
+ case BOOT_MEM_RESERVED_NOMAP:
+ res->name = "reserved nomap";
+ break;
case BOOT_MEM_RESERVED:
default:
res->name = "reserved";
+ break;
}
request_resource(&iomem_resource, res);
--
2.12.0
The memory reservation has to be performed for all the crucial
objects like kernel itself, it data and fdt blob. FDT reserved-memory
nodes should also be scanned to declare or discard reserved memory
regions, but it has to be done after the memblock is fully initialized
with low/high RAM (see the function description/code).
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 96 +++++++++++++++++-------------
1 file changed, 54 insertions(+), 42 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0ffbc3bb5..9e14d9833 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -362,6 +362,10 @@ static unsigned long __init init_initrd(void)
static void __init bootmem_init(void)
{
init_initrd();
+}
+
+static void __init reservation_init(void)
+{
finalize_initrd();
}
@@ -478,54 +482,58 @@ static void __init bootmem_init(void)
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+}
+
+static void __init reservation_init(void)
+{
+ phys_addr_t size;
+ int i;
/*
- * Register fully available low RAM pages with the bootmem allocator.
+ * Reserve memory occupied by the kernel and it data
*/
- for (i = 0; i < boot_mem_map.nr_map; i++) {
- unsigned long start, end, size;
+ size = __pa_symbol(&_end) - __pa_symbol(&_text);
+ memblock_reserve(__pa_symbol(&_text), size);
- start = PFN_UP(boot_mem_map.map[i].addr);
- end = PFN_DOWN(boot_mem_map.map[i].addr
- + boot_mem_map.map[i].size);
+ /*
+ * Handle FDT and it reserved-memory nodes now
+ */
+ early_init_fdt_reserve_self();
+ early_init_fdt_scan_reserved_mem();
- /*
- * Reserve usable memory.
- */
- switch (boot_mem_map.map[i].type) {
- case BOOT_MEM_RAM:
- break;
- case BOOT_MEM_INIT_RAM:
- memory_present(0, start, end);
- continue;
- default:
- /* Not usable memory */
- if (start > min_low_pfn && end < max_low_pfn)
- reserve_bootmem(boot_mem_map.map[i].addr,
- boot_mem_map.map[i].size,
- BOOTMEM_DEFAULT);
- continue;
- }
+ /*
+ * Reserve requested memory ranges with the memblock allocator.
+ */
+ for (i = 0; i < boot_mem_map.nr_map; i++) {
+ phys_addr_t start, end;
- /*
- * We are rounding up the start address of usable memory
- * and at the end of the usable range downwards.
- */
- if (start >= max_low_pfn)
+ if (boot_mem_map.map[i].type == BOOT_MEM_RAM)
continue;
- if (end > max_low_pfn)
- end = max_low_pfn;
+
+ start = boot_mem_map.map[i].addr;
+ end = boot_mem_map.map[i].addr + boot_mem_map.map[i].size;
+ size = boot_mem_map.map[i].size;
/*
- * ... finally, is the area going away?
+ * Make sure the region isn't already reserved
*/
- if (end <= start)
+ if (memblock_is_region_reserved(start, size)) {
+ pr_warn("Reserved region %08zx @ %pa already in-use\n",
+ (size_t)size, &start);
continue;
- size = end - start;
+ }
- /* Register lowmem ranges */
- free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
- memory_present(0, start, end);
+ switch (boot_mem_map.map[i].type) {
+ case BOOT_MEM_ROM_DATA:
+ case BOOT_MEM_RESERVED:
+ case BOOT_MEM_INIT_RAM:
+ memblock_reserve(start, size);
+ break;
+ case BOOT_MEM_RESERVED_NOMAP:
+ default:
+ memblock_remove(start, size);
+ break;
+ }
}
#ifdef CONFIG_RELOCATABLE
@@ -555,6 +563,12 @@ static void __init bootmem_init(void)
* Reserve initrd memory if needed.
*/
finalize_initrd();
+
+ /*
+ * Reserve for hibernation
+ */
+ size = __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin);
+ memblock_reserve(__pa_symbol(&__nosave_begin), size);
}
#endif /* CONFIG_SGI_IP27 */
@@ -569,6 +583,7 @@ static void __init bootmem_init(void)
* kernel but generic memory management system is still entirely uninitialized.
*
* o bootmem_init()
+ * o reservation_init()
* o sparse_init()
* o paging_init()
* o dma_contiguous_reserve()
@@ -826,10 +841,10 @@ static void __init arch_mem_init(char **cmdline_p)
print_memory_map();
}
- early_init_fdt_reserve_self();
- early_init_fdt_scan_reserved_mem();
-
bootmem_init();
+
+ reservation_init();
+
#ifdef CONFIG_PROC_VMCORE
if (setup_elfcorehdr && setup_elfcorehdr_size) {
printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
@@ -855,9 +870,6 @@ static void __init arch_mem_init(char **cmdline_p)
for_each_memblock(reserved, reg)
if (reg->size != 0)
reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
-
- reserve_bootmem_region(__pa_symbol(&__nosave_begin),
- __pa_symbol(&__nosave_end)); /* Reserve for hibernation */
}
static void __init resource_init(void)
--
2.12.0
Kdump/crashkernel memory regions should be reserved in the
memblock allocator so they wouldn't be occupied by any further
allocations.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 9e14d9833..b121fa702 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -849,17 +849,15 @@ static void __init arch_mem_init(char **cmdline_p)
if (setup_elfcorehdr && setup_elfcorehdr_size) {
printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
setup_elfcorehdr, setup_elfcorehdr_size);
- reserve_bootmem(setup_elfcorehdr, setup_elfcorehdr_size,
- BOOTMEM_DEFAULT);
+ memblock_reserve(setup_elfcorehdr, setup_elfcorehdr_size);
}
#endif
mips_parse_crashkernel();
#ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end)
- reserve_bootmem(crashk_res.start,
- crashk_res.end - crashk_res.start + 1,
- BOOTMEM_DEFAULT);
+ memblock_reserve(crashk_res.start,
+ crashk_res.end - crashk_res.start + 1);
#endif
device_tree_init();
sparse_init();
--
2.12.0
Since memblock is going to be used for the early memory allocation
lets discard the bootmem node setup and all the related free-space
search code. Low/high PFN extremums should be still calculated
since they are needed on the paging_init stage. Since the current
code is already doing memblock regions initialization the only thing
left is to set the upper allocation limit to be up to the max low
memory PFN, so the memblock API can be fully used from now.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 86 ++++--------------------------
1 file changed, 11 insertions(+), 75 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 1b8246e6c..0ffbc3bb5 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -367,29 +367,15 @@ static void __init bootmem_init(void)
#else /* !CONFIG_SGI_IP27 */
-static unsigned long __init bootmap_bytes(unsigned long pages)
-{
- unsigned long bytes = DIV_ROUND_UP(pages, 8);
-
- return ALIGN(bytes, sizeof(long));
-}
-
static void __init bootmem_init(void)
{
- unsigned long reserved_end;
- unsigned long mapstart = ~0UL;
- unsigned long bootmap_size;
- bool bootmap_valid = false;
int i;
/*
- * Sanity check any INITRD first. We don't take it into account
- * for bootmem setup initially, rely on the end-of-kernel-code
- * as our memory range starting point. Once bootmem is inited we
+ * Sanity check any INITRD first. Once memblock is inited we
* will reserve the area used for the initrd.
*/
init_initrd();
- reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
/*
* max_low_pfn is not a number of pages. The number of pages
@@ -428,16 +414,6 @@ static void __init bootmem_init(void)
max_low_pfn = end;
if (start < min_low_pfn)
min_low_pfn = start;
- if (end <= reserved_end)
- continue;
-#ifdef CONFIG_BLK_DEV_INITRD
- /* Skip zones before initrd and initrd itself */
- if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end)))
- continue;
-#endif
- if (start >= mapstart)
- continue;
- mapstart = max(reserved_end, start);
}
if (min_low_pfn >= max_low_pfn)
@@ -463,53 +439,19 @@ static void __init bootmem_init(void)
#endif
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}
-
-#ifdef CONFIG_BLK_DEV_INITRD
- /*
- * mapstart should be after initrd_end
- */
- if (initrd_end)
- mapstart = max(mapstart, (unsigned long)PFN_UP(__pa(initrd_end)));
+#ifdef CONFIG_HIGHMEM
+ pr_info("PFNs: low min %lu, low max %lu, high start %lu, high end %lu,"
+ "max %lu\n",
+ min_low_pfn, max_low_pfn, highstart_pfn, highend_pfn, max_pfn);
+#else
+ pr_info("PFNs: low min %lu, low max %lu, max %lu\n",
+ min_low_pfn, max_low_pfn, max_pfn);
#endif
/*
- * check that mapstart doesn't overlap with any of
- * memory regions that have been reserved through eg. DTB
- */
- bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn);
-
- bootmap_valid = memory_region_available(PFN_PHYS(mapstart),
- bootmap_size);
- for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) {
- unsigned long mapstart_addr;
-
- switch (boot_mem_map.map[i].type) {
- case BOOT_MEM_RESERVED:
- mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr +
- boot_mem_map.map[i].size);
- if (PHYS_PFN(mapstart_addr) < mapstart)
- break;
-
- bootmap_valid = memory_region_available(mapstart_addr,
- bootmap_size);
- if (bootmap_valid)
- mapstart = PHYS_PFN(mapstart_addr);
- break;
- default:
- break;
- }
- }
-
- if (!bootmap_valid)
- panic("No memory area to place a bootmap bitmap");
-
- /*
- * Initialize the boot-time allocator with low memory only.
+ * Initialize the boot-time allocator with low/high memory, but
+ * set the allocation limit to low memory only
*/
- if (bootmap_size != init_bootmem_node(NODE_DATA(0), mapstart,
- min_low_pfn, max_low_pfn))
- panic("Unexpected memory size required for bootmap");
-
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;
@@ -535,6 +477,7 @@ static void __init bootmem_init(void)
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
+ memblock_set_current_limit(PFN_PHYS(max_low_pfn));
/*
* Register fully available low RAM pages with the bootmem allocator.
@@ -570,8 +513,6 @@ static void __init bootmem_init(void)
*/
if (start >= max_low_pfn)
continue;
- if (start < reserved_end)
- start = reserved_end;
if (end > max_low_pfn)
end = max_low_pfn;
@@ -587,11 +528,6 @@ static void __init bootmem_init(void)
memory_present(0, start, end);
}
- /*
- * Reserve the bootmap memory.
- */
- reserve_bootmem(PFN_PHYS(mapstart), bootmap_size, BOOTMEM_DEFAULT);
-
#ifdef CONFIG_RELOCATABLE
/*
* The kernel reserves all memory below its _end symbol as bootmem,
--
2.12.0
There is no reserve_bootmem() method in the nobootmem interface,
so we need to replace it with memblock-specific one.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0d21c9e04..1b8246e6c 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -330,7 +330,7 @@ static void __init finalize_initrd(void)
maybe_bswap_initrd();
- reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT);
+ memblock_reserve(__pa(initrd_start), size);
initrd_below_start_ok = 1;
pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n",
--
2.12.0
On 01/17/2018 02:23 PM, Serge Semin wrote:
> It is useful to have the kernel virtual memory layout printed
> at boot time so to have the full information about the booted
> kernel. In some cases it might be unsafe to have virtual
> addresses freely visible in logs, so the %pK format is used if
> one want to hide them.
>
> Signed-off-by: Serge Semin <[email protected]>
I personally like having that information because that helps debug and
have a quick reference, but there appears to be a trend to remove this
in the name of security:
https://patchwork.kernel.org/patch/10124007/
maybe hide this behind a configuration option?
--
Florian
On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
> On 01/17/2018 02:23 PM, Serge Semin wrote:
> > It is useful to have the kernel virtual memory layout printed
> > at boot time so to have the full information about the booted
> > kernel. In some cases it might be unsafe to have virtual
> > addresses freely visible in logs, so the %pK format is used if
> > one want to hide them.
> >
> > Signed-off-by: Serge Semin <[email protected]>
>
> I personally like having that information because that helps debug and
> have a quick reference, but there appears to be a trend to remove this
> in the name of security:
>
> https://patchwork.kernel.org/patch/10124007/
>
> maybe hide this behind a configuration option?
Yeah, arm code was the place I picked the function up.) But in my case
I've used %pK so the pointers would disappear from logging when
kptr_restrict sysctl is 1 or 2.
I agree, that we might need to make the printouts optional. If there is
any kernel config, which for instance increases the kernel security we
could also use it or anything else to discard the printouts at compile
time.
> --
> Florian
Hi Serge,
On 18/01/18 20:18, Serge Semin wrote:
> On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
>> On 01/17/2018 02:23 PM, Serge Semin wrote:
>>> It is useful to have the kernel virtual memory layout printed
>>> at boot time so to have the full information about the booted
>>> kernel. In some cases it might be unsafe to have virtual
>>> addresses freely visible in logs, so the %pK format is used if
>>> one want to hide them.
>>>
>>> Signed-off-by: Serge Semin <[email protected]>
>>
>> I personally like having that information because that helps debug and
>> have a quick reference, but there appears to be a trend to remove this
>> in the name of security:
>>
>> https://patchwork.kernel.org/patch/10124007/
>>
>> maybe hide this behind a configuration option?
>
> Yeah, arm code was the place I picked the function up.) But in my case
> I've used %pK so the pointers would disappear from logging when
> kptr_restrict sysctl is 1 or 2.
> I agree, that we might need to make the printouts optional. If there is
> any kernel config, which for instance increases the kernel security we
> could also use it or anything else to discard the printouts at compile
> time.
Certainly, when KASLR is active it would be preferable to hide this
information, so you could use CONFIG_RELOCATABLE. The existing KASLR
stuff additionally hides this kind of information behind
CONFIG_DEBUG_KERNEL, so that only people actively debugging the kernel
see it:
http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
Thanks,
Matt
>
>> --
>> Florian
On Fri, Jan 19, 2018 at 07:59:43AM +0000, Matt Redfearn <[email protected]> wrote:
Hello Matt,
> Hi Serge,
>
>
>
> On 18/01/18 20:18, Serge Semin wrote:
> >On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
> >>On 01/17/2018 02:23 PM, Serge Semin wrote:
> >>>It is useful to have the kernel virtual memory layout printed
> >>>at boot time so to have the full information about the booted
> >>>kernel. In some cases it might be unsafe to have virtual
> >>>addresses freely visible in logs, so the %pK format is used if
> >>>one want to hide them.
> >>>
> >>>Signed-off-by: Serge Semin <[email protected]>
> >>
> >>I personally like having that information because that helps debug and
> >>have a quick reference, but there appears to be a trend to remove this
> >>in the name of security:
> >>
> >>https://patchwork.kernel.org/patch/10124007/
> >>
> >>maybe hide this behind a configuration option?
> >
> >Yeah, arm code was the place I picked the function up.) But in my case
> >I've used %pK so the pointers would disappear from logging when
> >kptr_restrict sysctl is 1 or 2.
> >I agree, that we might need to make the printouts optional. If there is
> >any kernel config, which for instance increases the kernel security we
> >could also use it or anything else to discard the printouts at compile
> >time.
>
>
> Certainly, when KASLR is active it would be preferable to hide this
> information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
> additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
> that only people actively debugging the kernel see it:
>
> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
Ok. I'll hide the printouts behind both of that config macros in the next patchset
version.
Regards,
-Sergey
>
> Thanks,
> Matt
>
> >
> >>--
> >>Florian
Hi Serge,
On 17/01/18 22:23, Serge Semin wrote:
> The current MIPS code makes sure the kernel code/data/init
> sections are in the maps, but BSS should also be there.
Quite right - it should. But this was protected against by reserving all
bootmem up to the _end symbol here:
http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
Which you remove in the next patch in this series. I'm not sure it is
worth disentangling the reserved_end stuff from the next patch to make
this into a single logical change of reserving just .bss rather than
everything below _end.
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 76e9e2075..0d21c9e04 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> BOOT_MEM_INIT_RAM);
> + arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> + PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> + BOOT_MEM_RAM);
>
> pr_info("Determined physical RAM map:\n");
> print_memory_map();
>
Hi Serge,
On 17/01/18 22:22, Serge Semin wrote:
> Even though it's common to see the architecture code using both
> bootmem and memblock early memory allocators, it's not good for
> multiple reasons. First of all, it's redundant to have two
> early memory allocator while one would be more than enough from
> functionality and stability points of view. Secondly, some new
> features introduced in the kernel utilize the methods of the most
> modern allocator ignoring the older one. It means the architecture
> code must keep the both subsystems up synchronized with information
> about memory regions and reservations, which leads to the code
> complexity increase, that obviously increases bugs probability.
> Finally it's better to keep all the architectures code unified for
> better readability and code simplification. All these reasons lead
> to one conclusion - arch code should use just one memory allocator,
> which is supposed to be memblock as the most modern and already
> utilized by the most of the kernel platforms. This patchset is
> mostly about it.
>
> One more reason why the MIPS arch code should finally move to
> memblock is a BUG somewhere in the initialization process, when
> CMA is activated:
>
> [ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
> [ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
> [ 0.263172] flags: 0x40000000()
> [ 0.266723] page dumped because: nonzero mapcount
> [ 0.272049] Modules linked in:
> [ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
> [ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
> 86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
> 80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
> 80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> ...
> [ 0.323148] Call Trace:
> [ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
> [ 0.330859] [<80404814>] dump_stack+0xd4/0x110
> [ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
> [ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
> [ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
> [ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
> [ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
> [ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
> [ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
> [ 0.376828] [<808faca4>] kernel_init+0x14/0x104
> [ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
>
> The bugus pfn seems to be the one allocated for bootmem allocator
> pages and hasn't been freed before letting the CMA working with its
> areas. Anyway the bug is solved by this patchset.
>
> Another reason why this patchset is useful is that it fixes the fdt
> reserved-memory nodes functionality for MIPS. Really it's bug to have
> the fdt reserved nodes scanning before the memblock is
> fully initialized (calling early_init_fdt_scan_reserved_mem before
> bootmem_init is called). Additionally no-map flag of the
> reserved-memory node hasn't been taking into account. This patchset
> fixes all of these.
>
> As you probably remember I already did another attempt to merge a
> similar functionality into the kernel. This time the patchset got
> to be less complex (14 patches vs 21 last time) and fixes the
> platform code like SGI IP27 and Loongson3, which due to being
> NUMA introduce its own memory initialization process. Although
> I have much doubt in SGI IP27 code operability in the first place,
> since it got prom_meminit() method of early memory initialization,
> which hasn't been called at any other place in the kernel. It must
> have been left there unrenamed after arch/mips/mips-boards/generic
> code had been discarded.
>
> Here are the list of folks, who agreed to perform some tests of
> the patchset:
> Alexander Sverdlin <[email protected]> - Octeon2
> Matt Redfearn <[email protected]> - Loongson3, etc
I have applied and tested these patches on various platforms that we
have available here, and the kernel appears to boot and get to userspace
as normal on the following platforms:
UTM8 (Cavium Octeon III)
Creator CI20
Creator CI40
Loongson3a
MIPS Boston
MIPS Malta
MIPS SEAD3
Aside from the CONFIG_RELOCATABLE stuff, this looks pretty tidy to me.
Thanks,
Matt
> Joshua Kinard <[email protected]> - IP27
> Marcin Nowakowski <[email protected]>
> Thanks to you all in regards and for everybody, who will be involved
> in reviewing and testing.
>
> The patchset is applied on top of kernel 4.15-rc8 and can be found
> submitted at my repo:
> https://github.com/fancer/Linux-kernel-MIPS-memblock-project
>
> Signed-off-by: Serge Semin <[email protected]>
>
> Serge Semin (14):
> MIPS: memblock: Add RESERVED_NOMAP memory flag
> MIPS: memblock: Surely map BSS kernel memory section
> MIPS: memblock: Reserve initrd memory in memblock
> MIPS: memblock: Discard bootmem initialization
> MIPS: memblock: Add reserved memory regions to memblock
> MIPS: memblock: Reserve kdump/crash regions in memblock
> MIPS: memblock: Mark present sparsemem sections
> MIPS: memblock: Simplify DMA contiguous reservation
> MIPS: memblock: Allow memblock regions resize
> MIPS: memblock: Perform early low memory test
> MIPS: memblock: Print out kernel virtual mem layout
> MIPS: memblock: Discard bootmem from Loongson3 code
> MIPS: memblock: Discard bootmem from SGI IP27 code
> MIPS: memblock: Deactivate bootmem allocator
>
> arch/mips/Kconfig | 2 +-
> arch/mips/include/asm/bootinfo.h | 1 +
> arch/mips/kernel/prom.c | 8 +-
> arch/mips/kernel/setup.c | 218 +++++++++------------
> arch/mips/loongson64/loongson-3/numa.c | 16 +-
> arch/mips/mm/init.c | 47 +++++
> arch/mips/sgi-ip27/ip27-memory.c | 9 +-
> 7 files changed, 153 insertions(+), 148 deletions(-)
>
On Mon, Jan 22, 2018 at 04:36:40PM +0000, Matt Redfearn <[email protected]> wrote:
Hello Matt,
> Hi Serge,
>
> On 17/01/18 22:22, Serge Semin wrote:
> >Even though it's common to see the architecture code using both
> >bootmem and memblock early memory allocators, it's not good for
> >multiple reasons. First of all, it's redundant to have two
> >early memory allocator while one would be more than enough from
> >functionality and stability points of view. Secondly, some new
> >features introduced in the kernel utilize the methods of the most
> >modern allocator ignoring the older one. It means the architecture
> >code must keep the both subsystems up synchronized with information
> >about memory regions and reservations, which leads to the code
> >complexity increase, that obviously increases bugs probability.
> >Finally it's better to keep all the architectures code unified for
> >better readability and code simplification. All these reasons lead
> >to one conclusion - arch code should use just one memory allocator,
> >which is supposed to be memblock as the most modern and already
> >utilized by the most of the kernel platforms. This patchset is
> >mostly about it.
> >
> >One more reason why the MIPS arch code should finally move to
> >memblock is a BUG somewhere in the initialization process, when
> >CMA is activated:
> >
> >[ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
> >[ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
> >[ 0.263172] flags: 0x40000000()
> >[ 0.266723] page dumped because: nonzero mapcount
> >[ 0.272049] Modules linked in:
> >[ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
> >[ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
> > 86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
> > 80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
> > 80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
> > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > ...
> >[ 0.323148] Call Trace:
> >[ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
> >[ 0.330859] [<80404814>] dump_stack+0xd4/0x110
> >[ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
> >[ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
> >[ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
> >[ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
> >[ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
> >[ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
> >[ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
> >[ 0.376828] [<808faca4>] kernel_init+0x14/0x104
> >[ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
> >
> >The bugus pfn seems to be the one allocated for bootmem allocator
> >pages and hasn't been freed before letting the CMA working with its
> >areas. Anyway the bug is solved by this patchset.
> >
> >Another reason why this patchset is useful is that it fixes the fdt
> >reserved-memory nodes functionality for MIPS. Really it's bug to have
> >the fdt reserved nodes scanning before the memblock is
> >fully initialized (calling early_init_fdt_scan_reserved_mem before
> >bootmem_init is called). Additionally no-map flag of the
> >reserved-memory node hasn't been taking into account. This patchset
> >fixes all of these.
> >
> >As you probably remember I already did another attempt to merge a
> >similar functionality into the kernel. This time the patchset got
> >to be less complex (14 patches vs 21 last time) and fixes the
> >platform code like SGI IP27 and Loongson3, which due to being
> >NUMA introduce its own memory initialization process. Although
> >I have much doubt in SGI IP27 code operability in the first place,
> >since it got prom_meminit() method of early memory initialization,
> >which hasn't been called at any other place in the kernel. It must
> >have been left there unrenamed after arch/mips/mips-boards/generic
> >code had been discarded.
> >
> >Here are the list of folks, who agreed to perform some tests of
> >the patchset:
> >Alexander Sverdlin <[email protected]> - Octeon2
> >Matt Redfearn <[email protected]> - Loongson3, etc
>
>
> I have applied and tested these patches on various platforms that we have
> available here, and the kernel appears to boot and get to userspace as
> normal on the following platforms:
>
> UTM8 (Cavium Octeon III)
> Creator CI20
> Creator CI40
> Loongson3a
> MIPS Boston
> MIPS Malta
> MIPS SEAD3
>
> Aside from the CONFIG_RELOCATABLE stuff, this looks pretty tidy to me.
Thank you very much for the tests. It is great to see interest from
the community. From my side I've tested the patchset on:
Baikal-T (MIPS Warrior p5600)
Regards,
-Sergey
>
> Thanks,
> Matt
>
>
> >Joshua Kinard <[email protected]> - IP27
> >Marcin Nowakowski <[email protected]>
> >Thanks to you all in regards and for everybody, who will be involved
> >in reviewing and testing.
> >
> >The patchset is applied on top of kernel 4.15-rc8 and can be found
> >submitted at my repo:
> >https://github.com/fancer/Linux-kernel-MIPS-memblock-project
> >
> >Signed-off-by: Serge Semin <[email protected]>
> >
> >Serge Semin (14):
> > MIPS: memblock: Add RESERVED_NOMAP memory flag
> > MIPS: memblock: Surely map BSS kernel memory section
> > MIPS: memblock: Reserve initrd memory in memblock
> > MIPS: memblock: Discard bootmem initialization
> > MIPS: memblock: Add reserved memory regions to memblock
> > MIPS: memblock: Reserve kdump/crash regions in memblock
> > MIPS: memblock: Mark present sparsemem sections
> > MIPS: memblock: Simplify DMA contiguous reservation
> > MIPS: memblock: Allow memblock regions resize
> > MIPS: memblock: Perform early low memory test
> > MIPS: memblock: Print out kernel virtual mem layout
> > MIPS: memblock: Discard bootmem from Loongson3 code
> > MIPS: memblock: Discard bootmem from SGI IP27 code
> > MIPS: memblock: Deactivate bootmem allocator
> >
> > arch/mips/Kconfig | 2 +-
> > arch/mips/include/asm/bootinfo.h | 1 +
> > arch/mips/kernel/prom.c | 8 +-
> > arch/mips/kernel/setup.c | 218 +++++++++------------
> > arch/mips/loongson64/loongson-3/numa.c | 16 +-
> > arch/mips/mm/init.c | 47 +++++
> > arch/mips/sgi-ip27/ip27-memory.c | 9 +-
> > 7 files changed, 153 insertions(+), 148 deletions(-)
> >
Hello Matt,
On Mon, Jan 22, 2018 at 04:35:26PM +0000, Matt Redfearn <[email protected]> wrote:
> Hi Serge,
>
> On 17/01/18 22:23, Serge Semin wrote:
> >The current MIPS code makes sure the kernel code/data/init
> >sections are in the maps, but BSS should also be there.
>
> Quite right - it should. But this was protected against by reserving all
> bootmem up to the _end symbol here:
> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
> Which you remove in the next patch in this series. I'm not sure it is worth
Right. Missed that part. The old code just doesn't set the kernel memory free
calling the free_bootmem() method for non-reserved parts below reserved_end.
> disentangling the reserved_end stuff from the next patch to make this into a
> single logical change of reserving just .bss rather than everything below
> _end.
Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
Add reserved memory regions to memblock". It logically belongs to that place.
Since basically by the arch_mem_addpart() calls we reserve all the kernel
memory now I'd also merged them into a single call for the range [_text, _end].
What do you think?
Regards,
-Sergey
>
> Reviewed-by: Matt Redfearn <[email protected]>
>
> Thanks,
> Matt
>
> >
> >Signed-off-by: Serge Semin <[email protected]>
> >---
> > arch/mips/kernel/setup.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> >diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >index 76e9e2075..0d21c9e04 100644
> >--- a/arch/mips/kernel/setup.c
> >+++ b/arch/mips/kernel/setup.c
> >@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> > arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> > PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> > BOOT_MEM_INIT_RAM);
> >+ arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> >+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> >+ BOOT_MEM_RAM);
> > pr_info("Determined physical RAM map:\n");
> > print_memory_map();
> >
Hi Serge,
On 22/01/18 21:47, Serge Semin wrote:
> Hello Matt,
>
> On Mon, Jan 22, 2018 at 04:35:26PM +0000, Matt Redfearn <[email protected]> wrote:
>> Hi Serge,
>>
>> On 17/01/18 22:23, Serge Semin wrote:
>>> The current MIPS code makes sure the kernel code/data/init
>>> sections are in the maps, but BSS should also be there.
>>
>> Quite right - it should. But this was protected against by reserving all
>> bootmem up to the _end symbol here:
>> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
>> Which you remove in the next patch in this series. I'm not sure it is worth
>
> Right. Missed that part. The old code just doesn't set the kernel memory free
> calling the free_bootmem() method for non-reserved parts below reserved_end.
>
>> disentangling the reserved_end stuff from the next patch to make this into a
>> single logical change of reserving just .bss rather than everything below
>> _end.
>
> Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
> Add reserved memory regions to memblock". It logically belongs to that place.
> Since basically by the arch_mem_addpart() calls we reserve all the kernel
Actually I was wrong - it's not this sequence of arch_mem_addpart's that
reserves the kernels memory. At least on DT based systems, it's pretty
likely that these regions will overlap with the system memory already added.
of_scan_flat_dt will look for the memory node and add it via
early_init_dt_add_memory_arch.
These calls to add the kernel text, init and bss detect that they
overlap with the already present system memory, so don't get added, here:
http://elixir.free-electrons.com/linux/v4.15-rc9/source/arch/mips/kernel/setup.c#L759
As such, when we print out the content of boot_mem_map, we only have a
single entry:
[ 0.000000] Determined physical RAM map:
[ 0.000000] memory: 10000000 @ 00000000 (usable)
> memory now I'd also merged them into a single call for the range [_text, _end].
> What do you think?
I think that this patch makes sense in case the .bss is for some reason
not covered by an existing entry, but I would leave it as a separate patch.
Your [PATCH 05/14] MIPS: memblock: Add reserved memory regions to
memblock is actually self-contained since it replaces reserving all
memory up to _end with the single reservation of the kernel's whole size
+ size = __pa_symbol(&_end) - __pa_symbol(&_text);
+ memblock_reserve(__pa_symbol(&_text), size);
Which I think is definitely an improvement since it is much clearer.
Thanks,
Matt
>
> Regards,
> -Sergey
>
>>
>> Reviewed-by: Matt Redfearn <[email protected]>
>>
>> Thanks,
>> Matt
>>
>>>
>>> Signed-off-by: Serge Semin <[email protected]>
>>> ---
>>> arch/mips/kernel/setup.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
>>> index 76e9e2075..0d21c9e04 100644
>>> --- a/arch/mips/kernel/setup.c
>>> +++ b/arch/mips/kernel/setup.c
>>> @@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
>>> arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
>>> PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
>>> BOOT_MEM_INIT_RAM);
>>> + arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
>>> + PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
>>> + BOOT_MEM_RAM);
>>> pr_info("Determined physical RAM map:\n");
>>> print_memory_map();
>>>
Hi Matt,
On Mon, Jan 22, 2018 at 5:36 PM, Matt Redfearn <[email protected]> wrote:
>
> Hi Serge,
>
>
> On 17/01/18 22:22, Serge Semin wrote:
>>
>> Even though it's common to see the architecture code using both
>> bootmem and memblock early memory allocators, it's not good for
>> multiple reasons. First of all, it's redundant to have two
>> early memory allocator while one would be more than enough from
>> functionality and stability points of view. Secondly, some new
>> features introduced in the kernel utilize the methods of the most
>> modern allocator ignoring the older one. It means the architecture
>> code must keep the both subsystems up synchronized with information
>> about memory regions and reservations, which leads to the code
>> complexity increase, that obviously increases bugs probability.
>> Finally it's better to keep all the architectures code unified for
>> better readability and code simplification. All these reasons lead
>> to one conclusion - arch code should use just one memory allocator,
>> which is supposed to be memblock as the most modern and already
>> utilized by the most of the kernel platforms. This patchset is
>> mostly about it.
>>
>> One more reason why the MIPS arch code should finally move to
>> memblock is a BUG somewhere in the initialization process, when
>> CMA is activated:
>>
>> [ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
>> [ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
>> [ 0.263172] flags: 0x40000000()
>> [ 0.266723] page dumped because: nonzero mapcount
>> [ 0.272049] Modules linked in:
>> [ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
>> [ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
>> 86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
>> 80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
>> 80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
>> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> ...
>> [ 0.323148] Call Trace:
>> [ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
>> [ 0.330859] [<80404814>] dump_stack+0xd4/0x110
>> [ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
>> [ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
>> [ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
>> [ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
>> [ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
>> [ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
>> [ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
>> [ 0.376828] [<808faca4>] kernel_init+0x14/0x104
>> [ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
>>
>> The bugus pfn seems to be the one allocated for bootmem allocator
>> pages and hasn't been freed before letting the CMA working with its
>> areas. Anyway the bug is solved by this patchset.
>>
>> Another reason why this patchset is useful is that it fixes the fdt
>> reserved-memory nodes functionality for MIPS. Really it's bug to have
>> the fdt reserved nodes scanning before the memblock is
>> fully initialized (calling early_init_fdt_scan_reserved_mem before
>> bootmem_init is called). Additionally no-map flag of the
>> reserved-memory node hasn't been taking into account. This patchset
>> fixes all of these.
>>
>> As you probably remember I already did another attempt to merge a
>> similar functionality into the kernel. This time the patchset got
>> to be less complex (14 patches vs 21 last time) and fixes the
>> platform code like SGI IP27 and Loongson3, which due to being
>> NUMA introduce its own memory initialization process. Although
>> I have much doubt in SGI IP27 code operability in the first place,
>> since it got prom_meminit() method of early memory initialization,
>> which hasn't been called at any other place in the kernel. It must
>> have been left there unrenamed after arch/mips/mips-boards/generic
>> code had been discarded.
>>
>> Here are the list of folks, who agreed to perform some tests of
>> the patchset:
>> Alexander Sverdlin <[email protected]> - Octeon2
>> Matt Redfearn <[email protected]> - Loongson3, etc
>
>
>
> I have applied and tested these patches on various platforms that we have available here, and the kernel appears to boot and get to userspace as normal on the following platforms:
>
> UTM8 (Cavium Octeon III)
> Creator CI20
A bit off-topic, but could you please Acked one of Marcin's patch that
I re-submitted:
https://patchwork.linux-mips.org/patch/17986/
I believe CI20 wont boot from upstream kernel, since you are testing
patch I am guessing you have a running system (unless of course you
tweaked your u-boot env vars or use another different patch).
Thanks for your help
Hi Mathieu,
On 23/01/18 11:29, Mathieu Malaterre wrote:
> Hi Matt,
>
> On Mon, Jan 22, 2018 at 5:36 PM, Matt Redfearn <[email protected]> wrote:
>>
>> Hi Serge,
>>
>>
>> On 17/01/18 22:22, Serge Semin wrote:
>>>
>>> Even though it's common to see the architecture code using both
>>> bootmem and memblock early memory allocators, it's not good for
>>> multiple reasons. First of all, it's redundant to have two
>>> early memory allocator while one would be more than enough from
>>> functionality and stability points of view. Secondly, some new
>>> features introduced in the kernel utilize the methods of the most
>>> modern allocator ignoring the older one. It means the architecture
>>> code must keep the both subsystems up synchronized with information
>>> about memory regions and reservations, which leads to the code
>>> complexity increase, that obviously increases bugs probability.
>>> Finally it's better to keep all the architectures code unified for
>>> better readability and code simplification. All these reasons lead
>>> to one conclusion - arch code should use just one memory allocator,
>>> which is supposed to be memblock as the most modern and already
>>> utilized by the most of the kernel platforms. This patchset is
>>> mostly about it.
>>>
>>> One more reason why the MIPS arch code should finally move to
>>> memblock is a BUG somewhere in the initialization process, when
>>> CMA is activated:
>>>
>>> [ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
>>> [ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
>>> [ 0.263172] flags: 0x40000000()
>>> [ 0.266723] page dumped because: nonzero mapcount
>>> [ 0.272049] Modules linked in:
>>> [ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
>>> [ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
>>> 86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
>>> 80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
>>> 80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
>>> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> ...
>>> [ 0.323148] Call Trace:
>>> [ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
>>> [ 0.330859] [<80404814>] dump_stack+0xd4/0x110
>>> [ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
>>> [ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
>>> [ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
>>> [ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
>>> [ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
>>> [ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
>>> [ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
>>> [ 0.376828] [<808faca4>] kernel_init+0x14/0x104
>>> [ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
>>>
>>> The bugus pfn seems to be the one allocated for bootmem allocator
>>> pages and hasn't been freed before letting the CMA working with its
>>> areas. Anyway the bug is solved by this patchset.
>>>
>>> Another reason why this patchset is useful is that it fixes the fdt
>>> reserved-memory nodes functionality for MIPS. Really it's bug to have
>>> the fdt reserved nodes scanning before the memblock is
>>> fully initialized (calling early_init_fdt_scan_reserved_mem before
>>> bootmem_init is called). Additionally no-map flag of the
>>> reserved-memory node hasn't been taking into account. This patchset
>>> fixes all of these.
>>>
>>> As you probably remember I already did another attempt to merge a
>>> similar functionality into the kernel. This time the patchset got
>>> to be less complex (14 patches vs 21 last time) and fixes the
>>> platform code like SGI IP27 and Loongson3, which due to being
>>> NUMA introduce its own memory initialization process. Although
>>> I have much doubt in SGI IP27 code operability in the first place,
>>> since it got prom_meminit() method of early memory initialization,
>>> which hasn't been called at any other place in the kernel. It must
>>> have been left there unrenamed after arch/mips/mips-boards/generic
>>> code had been discarded.
>>>
>>> Here are the list of folks, who agreed to perform some tests of
>>> the patchset:
>>> Alexander Sverdlin <[email protected]> - Octeon2
>>> Matt Redfearn <[email protected]> - Loongson3, etc
>>
>>
>>
>> I have applied and tested these patches on various platforms that we have available here, and the kernel appears to boot and get to userspace as normal on the following platforms:
>>
>> UTM8 (Cavium Octeon III)
>> Creator CI20
>
> A bit off-topic, but could you please Acked one of Marcin's patch that
> I re-submitted:
>
> https://patchwork.linux-mips.org/patch/17986/
>
> I believe CI20 wont boot from upstream kernel, since you are testing
> patch I am guessing you have a running system (unless of course you
> tweaked your u-boot env vars or use another different patch).
>
Indeed, with the factory default arguments the kernel does not boot
since 73fbc1eba7ff. Those arguments are redundant though since the
memory is discovered via device tree. Testing locally we do not use
those arguments. Anyway, the patch mentioned does not seem to break
anything and does fix this issue so I've replied with my Tested-by.
Thanks,
Matt
> Thanks for your help
>
在 2018-01-18四的 01:23 +0300,Serge Semin写道:
Hi Serge
> Loongson64/3 runs its own code to initialize memory allocator in
> case of NUMA configuration is selected. So in order to move to the
> pure memblock utilization we discard the bootmem allocator usage
> and insert the memblock reservation method for
> kernel/addrspace_offset
> memory regions.
Thanks for your patch. However, In my test, the system didn't boot
anymore with your patch. Since I don't have any lowlevel debug
instuments(ejtag or something). I can't provide any detail about the
problem. Just let you know that we have a problem here.
--
Jiaxun Yang
Hi Serge,
On 19/01/18 14:27, Serge Semin wrote:
> On Fri, Jan 19, 2018 at 07:59:43AM +0000, Matt Redfearn <[email protected]> wrote:
>
> Hello Matt,
>
>> Hi Serge,
>>
>>
>>
>> On 18/01/18 20:18, Serge Semin wrote:
>>> On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
>>>> On 01/17/2018 02:23 PM, Serge Semin wrote:
>>>>> It is useful to have the kernel virtual memory layout printed
>>>>> at boot time so to have the full information about the booted
>>>>> kernel. In some cases it might be unsafe to have virtual
>>>>> addresses freely visible in logs, so the %pK format is used if
>>>>> one want to hide them.
>>>>>
>>>>> Signed-off-by: Serge Semin <[email protected]>
>>>>
>>>> I personally like having that information because that helps debug and
>>>> have a quick reference, but there appears to be a trend to remove this
>>>> in the name of security:
>>>>
>>>> https://patchwork.kernel.org/patch/10124007/
>>>>
>>>> maybe hide this behind a configuration option?
>>>
>>> Yeah, arm code was the place I picked the function up.) But in my case
>>> I've used %pK so the pointers would disappear from logging when
>>> kptr_restrict sysctl is 1 or 2.
>>> I agree, that we might need to make the printouts optional. If there is
>>> any kernel config, which for instance increases the kernel security we
>>> could also use it or anything else to discard the printouts at compile
>>> time.
>>
>>
>> Certainly, when KASLR is active it would be preferable to hide this
>> information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
>> additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
>> that only people actively debugging the kernel see it:
>>
>> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
>
> Ok. I'll hide the printouts behind both of that config macros in the next patchset
> version.
Another thing to note - since ad67b74d2469d ("printk: hash addresses
printed with %p") %pK at this time in the boot process is useless since
the RNG is not sufficiently initialised and all prints end up being
"(ptrval)". Hence after v4.15-rc2 we end up with output like:
[ 0.000000] Kernel virtual memory layout:
[ 0.000000] lowmem : 0x(ptrval) - 0x(ptrval) ( 256 MB)
[ 0.000000] .text : 0x(ptrval) - 0x(ptrval) (7374 kB)
[ 0.000000] .data : 0x(ptrval) - 0x(ptrval) (1901 kB)
[ 0.000000] .init : 0x(ptrval) - 0x(ptrval) (1600 kB)
[ 0.000000] .bss : 0x(ptrval) - 0x(ptrval) ( 415 kB)
[ 0.000000] vmalloc : 0x(ptrval) - 0x(ptrval) (1023 MB)
[ 0.000000] fixmap : 0x(ptrval) - 0x(ptrval) ( 68 kB)
The %px format specifier was added for cases such as this, where we
really want to print the unmodified address. And as long as this
function is suitably guarded to only do this when KASLR is deactivated /
CONFIG_DEBUG_KERNEL is activated, etc, then we are not unwittingly
leaking information - we are deliberately making it available.
Thanks,
Matt
>
> Regards,
> -Sergey
>
>>
>> Thanks,
>> Matt
>>
>>>
>>>> --
>>>> Florian
Hello Matt,
On Tue, Jan 23, 2018 at 03:35:14PM +0000, Matt Redfearn <[email protected]> wrote:
> Hi Serge,
>
> On 19/01/18 14:27, Serge Semin wrote:
> >On Fri, Jan 19, 2018 at 07:59:43AM +0000, Matt Redfearn <[email protected]> wrote:
> >
> >Hello Matt,
> >
> >>Hi Serge,
> >>
> >>
> >>
> >>On 18/01/18 20:18, Serge Semin wrote:
> >>>On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
> >>>>On 01/17/2018 02:23 PM, Serge Semin wrote:
> >>>>>It is useful to have the kernel virtual memory layout printed
> >>>>>at boot time so to have the full information about the booted
> >>>>>kernel. In some cases it might be unsafe to have virtual
> >>>>>addresses freely visible in logs, so the %pK format is used if
> >>>>>one want to hide them.
> >>>>>
> >>>>>Signed-off-by: Serge Semin <[email protected]>
> >>>>
> >>>>I personally like having that information because that helps debug and
> >>>>have a quick reference, but there appears to be a trend to remove this
> >>>>in the name of security:
> >>>>
> >>>>https://patchwork.kernel.org/patch/10124007/
> >>>>
> >>>>maybe hide this behind a configuration option?
> >>>
> >>>Yeah, arm code was the place I picked the function up.) But in my case
> >>>I've used %pK so the pointers would disappear from logging when
> >>>kptr_restrict sysctl is 1 or 2.
> >>>I agree, that we might need to make the printouts optional. If there is
> >>>any kernel config, which for instance increases the kernel security we
> >>>could also use it or anything else to discard the printouts at compile
> >>>time.
> >>
> >>
> >>Certainly, when KASLR is active it would be preferable to hide this
> >>information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
> >>additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
> >>that only people actively debugging the kernel see it:
> >>
> >>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
> >
> >Ok. I'll hide the printouts behind both of that config macros in the next patchset
> >version.
>
>
> Another thing to note - since ad67b74d2469d ("printk: hash addresses printed
> with %p") %pK at this time in the boot process is useless since the RNG is
> not sufficiently initialised and all prints end up being "(ptrval)". Hence
> after v4.15-rc2 we end up with output like:
>
> [ 0.000000] Kernel virtual memory layout:
> [ 0.000000] lowmem : 0x(ptrval) - 0x(ptrval) ( 256 MB)
> [ 0.000000] .text : 0x(ptrval) - 0x(ptrval) (7374 kB)
> [ 0.000000] .data : 0x(ptrval) - 0x(ptrval) (1901 kB)
> [ 0.000000] .init : 0x(ptrval) - 0x(ptrval) (1600 kB)
> [ 0.000000] .bss : 0x(ptrval) - 0x(ptrval) ( 415 kB)
> [ 0.000000] vmalloc : 0x(ptrval) - 0x(ptrval) (1023 MB)
> [ 0.000000] fixmap : 0x(ptrval) - 0x(ptrval) ( 68 kB)
>
It must be some bug in the algo. What point in the %pK then? According to
the documentation the only way to see the pointers is when (kptr_restrict == 0).
But if it is we don't get into the restricted_pointer() method at all:
http://elixir.free-electrons.com/linux/v4.15-rc9/source/lib/vsprintf.c#L1934
In this case the vsprintf() executes the method ptr_to_id(), which of course
default to _not_ leak addresses, and hash it before printing.
Really %pK isn't supposed to be dependent from RNG at all since kptr_restrict
doesn't do any value randomization.
>
> The %px format specifier was added for cases such as this, where we really
> want to print the unmodified address. And as long as this function is
> suitably guarded to only do this when KASLR is deactivated /
> CONFIG_DEBUG_KERNEL is activated, etc, then we are not unwittingly leaking
> information - we are deliberately making it available.
>
If %pK would work as it's stated by the kernel documentation:
https://www.kernel.org/doc/Documentation/printk-formats.txt
then the only change I'd suggest to have here is to close the kernel memory
layout printout method by the CONFIG_DEBUG_KERNEL ifdef-macro. The kptr_restrict
should default to 1/2 if the KASLR is activated:
https://lwn.net/Articles/444556/
Regards,
-Sergey
> Thanks,
> Matt
>
> >
> >Regards,
> >-Sergey
> >
> >>
> >>Thanks,
> >>Matt
> >>
> >>>
> >>>>--
> >>>>Florian
Hello Matt,
On Tue, Jan 23, 2018 at 11:03:27AM +0000, Matt Redfearn <[email protected]> wrote:
> Hi Serge,
>
> On 22/01/18 21:47, Serge Semin wrote:
> >Hello Matt,
> >
> >On Mon, Jan 22, 2018 at 04:35:26PM +0000, Matt Redfearn <[email protected]> wrote:
> >>Hi Serge,
> >>
> >>On 17/01/18 22:23, Serge Semin wrote:
> >>>The current MIPS code makes sure the kernel code/data/init
> >>>sections are in the maps, but BSS should also be there.
> >>
> >>Quite right - it should. But this was protected against by reserving all
> >>bootmem up to the _end symbol here:
> >>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
> >>Which you remove in the next patch in this series. I'm not sure it is worth
> >
> >Right. Missed that part. The old code just doesn't set the kernel memory free
> >calling the free_bootmem() method for non-reserved parts below reserved_end.
> >
> >>disentangling the reserved_end stuff from the next patch to make this into a
> >>single logical change of reserving just .bss rather than everything below
> >>_end.
> >
> >Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
> >Add reserved memory regions to memblock". It logically belongs to that place.
> >Since basically by the arch_mem_addpart() calls we reserve all the kernel
>
>
> Actually I was wrong - it's not this sequence of arch_mem_addpart's that
> reserves the kernels memory. At least on DT based systems, it's pretty
> likely that these regions will overlap with the system memory already added.
> of_scan_flat_dt will look for the memory node and add it via
> early_init_dt_add_memory_arch.
> These calls to add the kernel text, init and bss detect that they overlap
> with the already present system memory, so don't get added, here:
> http://elixir.free-electrons.com/linux/v4.15-rc9/source/arch/mips/kernel/setup.c#L759
>
> As such, when we print out the content of boot_mem_map, we only have a
> single entry:
>
> [ 0.000000] Determined physical RAM map:
> [ 0.000000] memory: 10000000 @ 00000000 (usable)
>
>
> >memory now I'd also merged them into a single call for the range [_text, _end].
> >What do you think?
>
>
> I think that this patch makes sense in case the .bss is for some reason not
> covered by an existing entry, but I would leave it as a separate patch.
>
> Your [PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
> is actually self-contained since it replaces reserving all memory up to _end
> with the single reservation of the kernel's whole size
>
> + size = __pa_symbol(&_end) - __pa_symbol(&_text);
> + memblock_reserve(__pa_symbol(&_text), size);
>
>
> Which I think is definitely an improvement since it is much clearer.
>
Alright lets sum it up. First of all, yeah, you are right, arch_mem_addpart()
is created to make sure the kernel memory is added to the memblock/bootmem pool.
The previous arch code was leaving such the memory range non-freed since it was
higher the reserved_end, so to make sure the early memory allocations wouldn't
be made from the pages, where kernel actually resides.
In my code I still wanted to make sure the kernel memory is in the memblock pool.
But I also noticed, that .bss memory range wouldn't be added to the pool if neither
dts nor platform-specific code added any memory to the boot_mem_map pool. So I
decided to fix it. The actual kernel memory reservation is performed after all
the memory regions are declared by the code you cited. It's essential to do
the [_text, _end] memory range reservation there, otherwise memblock may
allocate from the memory range occupied by the kernel code/data.
Since you agree with leaving it in the separate patch, I'd only suggest to
call the arch_mem_addpart() method for just one range [_text, _end] instead of
doing it three times for a separate _text, _data and bss sections. What do you
think?
Regards,
-Sergey
> Thanks,
> Matt
>
> >
> >Regards,
> >-Sergey
> >
> >>
> >>Reviewed-by: Matt Redfearn <[email protected]>
> >>
> >>Thanks,
> >>Matt
> >>
> >>>
> >>>Signed-off-by: Serge Semin <[email protected]>
> >>>---
> >>> arch/mips/kernel/setup.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>>diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >>>index 76e9e2075..0d21c9e04 100644
> >>>--- a/arch/mips/kernel/setup.c
> >>>+++ b/arch/mips/kernel/setup.c
> >>>@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> >>> arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> >>> PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> >>> BOOT_MEM_INIT_RAM);
> >>>+ arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> >>>+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> >>>+ BOOT_MEM_RAM);
> >>> pr_info("Determined physical RAM map:\n");
> >>> print_memory_map();
> >>>
Hello Jiaxun,
On Tue, Jan 23, 2018 at 10:28:44PM +0000, Jiaxun Yang <[email protected]> wrote:
> 在 2018-01-18四的 01:23 +0300,Serge Semin写道:
> Hi Serge
>
> > Loongson64/3 runs its own code to initialize memory allocator in
> > case of NUMA configuration is selected. So in order to move to the
> > pure memblock utilization we discard the bootmem allocator usage
> > and insert the memblock reservation method for
> > kernel/addrspace_offset
> > memory regions.
>
> Thanks for your patch. However, In my test, the system didn't boot
> anymore with your patch. Since I don't have any lowlevel debug
> instuments(ejtag or something). I can't provide any detail about the
> problem. Just let you know that we have a problem here.
>
Thanks for performing the tests of the patchset. I really appreciate this.
Regarding the problems you got. You must be doing something wrong, since
Matt Redfearn already did the tests on Loongson3:
https://lkml.org/lkml/2018/1/22/610
and the kernel turns out to be booting without troubles.
So could you make sure, that you did everything right? Particularly, you
said the patch (singular) isn't working. But this patch functionality depends
on the whole patchset. Did you apply all the patches I sent and fully rebuild
the kernel then?
Regards,
-Sergey
>
> --
> Jiaxun Yang
On Thu, Jan 18, 2018 at 01:23:12AM +0300, Serge Semin wrote:
> Memblock allocator can be successfully used from now for early
> memory management
>
> Signed-off-by: Serge Semin <[email protected]>
Am I correct that intermediate commits in this patchset (i.e. bisection)
may not work correctly, since bootmem will have been stripped out but
NO_BOOTMEM=n and memblock may not be properly operational yet?
If so, is there a way to switch without breaking bisection that doesn't
involve squashing most of the series into a single atomic commit?
Cheers
James
> ---
> arch/mips/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 725b5ece7..a6c4fb6b6 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -4,7 +4,6 @@ config MIPS
> default y
> select ARCH_BINFMT_ELF_STATE
> select ARCH_CLOCKSOURCE_DATA
> - select ARCH_DISCARD_MEMBLOCK
> select ARCH_HAS_ELF_RANDOMIZE
> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> select ARCH_MIGHT_HAVE_PC_PARPORT
> @@ -57,6 +57,7 @@ config MIPS
> select HAVE_IRQ_TIME_ACCOUNTING
> select HAVE_KPROBES
> select HAVE_KRETPROBES
> + select NO_BOOTMEM
> select HAVE_MEMBLOCK
> select HAVE_MEMBLOCK_NODE_MAP
> select HAVE_MOD_ARCH_SPECIFIC
> --
> 2.12.0
>
Hi Serge,
On 17.01.2018 23:23, Serge Semin wrote:
> If sparsemem is activated all sections with present pages must
> be accordingly marked after memblock is fully initialized.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index b121fa702..6df1eaf38 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -778,7 +778,7 @@ static void __init request_crashkernel(struct resource *res)
>
> static void __init arch_mem_init(char **cmdline_p)
> {
> - struct memblock_region *reg;
> + struct memblock_region *reg __maybe_unused;
nit: reg is used here. It becomes __maybe_unused in patch 8.
Marcin
Hello Marcin
On Wed, Jan 24, 2018 at 07:13:03AM +0100, Marcin Nowakowski <[email protected]> wrote:
> Hi Serge,
>
> On 17.01.2018 23:23, Serge Semin wrote:
> >If sparsemem is activated all sections with present pages must
> >be accordingly marked after memblock is fully initialized.
> >
> >Signed-off-by: Serge Semin <[email protected]>
> >---
> > arch/mips/kernel/setup.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> >diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >index b121fa702..6df1eaf38 100644
> >--- a/arch/mips/kernel/setup.c
> >+++ b/arch/mips/kernel/setup.c
> >@@ -778,7 +778,7 @@ static void __init request_crashkernel(struct resource *res)
> > static void __init arch_mem_init(char **cmdline_p)
> > {
> >- struct memblock_region *reg;
> >+ struct memblock_region *reg __maybe_unused;
>
> nit: reg is used here. It becomes __maybe_unused in patch 8.
>
Right. I'll move __maybe_unused change to the patch 8.
-Sergey
>
> Marcin
On Tue, Jan 23, 2018 at 11:59:35PM +0000, James Hogan <[email protected]> wrote:
> On Thu, Jan 18, 2018 at 01:23:12AM +0300, Serge Semin wrote:
> > Memblock allocator can be successfully used from now for early
> > memory management
> >
> > Signed-off-by: Serge Semin <[email protected]>
>
> Am I correct that intermediate commits in this patchset (i.e. bisection)
> may not work correctly, since bootmem will have been stripped out but
> NO_BOOTMEM=n and memblock may not be properly operational yet?
>
Yes. You're absolutely right. The kernel will be buildable, but most
likely isn't operable until the PATCH 14 deactivates bootmem allocator.
> If so, is there a way to switch without breaking bisection that doesn't
> involve squashing most of the series into a single atomic commit?
>
I don't think so. There is no way to switch without squashing at all,
at least since the alteration involves arch and platforms code, which
all relied on the bootmem allocator. Here is the list of patches, which
need to be combined to have the bisection unbroken:
[PATCH 03/14] MIPS: memblock: Reserve initrd memory in memblock
[PATCH 04/14] MIPS: memblock: Discard bootmem initialization
[PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
[PATCH 06/14] MIPS: memblock: Reserve kdump/crash regions in memblock
[PATCH 07/14] MIPS: memblock: Mark present sparsemem sections
[PATCH 08/14] MIPS: memblock: Simplify DMA contiguous reservation
[PATCH 09/14] MIPS: memblock: Allow memblock regions resize
[PATCH 12/14] MIPS: memblock: Discard bootmem from Loongson3 code
[PATCH 13/14] MIPS: memblock: Discard bootmem from SGI IP27 code
[PATCH 14/14] MIPS: memblock: Deactivate bootmem allocator
So the patches 03-09 imply the functional alterations so the arch code
would work correctly with memblock, the patches 13-14 alter the
platforms code of the specific NUMA devices like Loongson and
SGI IP27. After it's done the bootmem can be finally deactivated.
Regards,
-Sergey
> Cheers
> James
>
> > ---
> > arch/mips/Kconfig | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> > index 725b5ece7..a6c4fb6b6 100644
> > --- a/arch/mips/Kconfig
> > +++ b/arch/mips/Kconfig
> > @@ -4,7 +4,6 @@ config MIPS
> > default y
> > select ARCH_BINFMT_ELF_STATE
> > select ARCH_CLOCKSOURCE_DATA
> > - select ARCH_DISCARD_MEMBLOCK
> > select ARCH_HAS_ELF_RANDOMIZE
> > select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> > select ARCH_MIGHT_HAVE_PC_PARPORT
> > @@ -57,6 +57,7 @@ config MIPS
> > select HAVE_IRQ_TIME_ACCOUNTING
> > select HAVE_KPROBES
> > select HAVE_KRETPROBES
> > + select NO_BOOTMEM
> > select HAVE_MEMBLOCK
> > select HAVE_MEMBLOCK_NODE_MAP
> > select HAVE_MOD_ARCH_SPECIFIC
> > --
> > 2.12.0
> >
Hi Serge,
On 23/01/18 19:10, Serge Semin wrote:
> Hello Matt,
>
> On Tue, Jan 23, 2018 at 03:35:14PM +0000, Matt Redfearn <[email protected]> wrote:
>> Hi Serge,
>>
>> On 19/01/18 14:27, Serge Semin wrote:
>>> On Fri, Jan 19, 2018 at 07:59:43AM +0000, Matt Redfearn <[email protected]> wrote:
>>>
>>> Hello Matt,
>>>
>>>> Hi Serge,
>>>>
>>>>
>>>>
>>>> On 18/01/18 20:18, Serge Semin wrote:
>>>>> On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
>>>>>> On 01/17/2018 02:23 PM, Serge Semin wrote:
>>>>>>> It is useful to have the kernel virtual memory layout printed
>>>>>>> at boot time so to have the full information about the booted
>>>>>>> kernel. In some cases it might be unsafe to have virtual
>>>>>>> addresses freely visible in logs, so the %pK format is used if
>>>>>>> one want to hide them.
>>>>>>>
>>>>>>> Signed-off-by: Serge Semin <[email protected]>
>>>>>>
>>>>>> I personally like having that information because that helps debug and
>>>>>> have a quick reference, but there appears to be a trend to remove this
>>>>>> in the name of security:
>>>>>>
>>>>>> https://patchwork.kernel.org/patch/10124007/
>>>>>>
>>>>>> maybe hide this behind a configuration option?
>>>>>
>>>>> Yeah, arm code was the place I picked the function up.) But in my case
>>>>> I've used %pK so the pointers would disappear from logging when
>>>>> kptr_restrict sysctl is 1 or 2.
>>>>> I agree, that we might need to make the printouts optional. If there is
>>>>> any kernel config, which for instance increases the kernel security we
>>>>> could also use it or anything else to discard the printouts at compile
>>>>> time.
>>>>
>>>>
>>>> Certainly, when KASLR is active it would be preferable to hide this
>>>> information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
>>>> additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
>>>> that only people actively debugging the kernel see it:
>>>>
>>>> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
>>>
>>> Ok. I'll hide the printouts behind both of that config macros in the next patchset
>>> version.
>>
>>
>> Another thing to note - since ad67b74d2469d ("printk: hash addresses printed
>> with %p") %pK at this time in the boot process is useless since the RNG is
>> not sufficiently initialised and all prints end up being "(ptrval)". Hence
>> after v4.15-rc2 we end up with output like:
>>
>> [ 0.000000] Kernel virtual memory layout:
>> [ 0.000000] lowmem : 0x(ptrval) - 0x(ptrval) ( 256 MB)
>> [ 0.000000] .text : 0x(ptrval) - 0x(ptrval) (7374 kB)
>> [ 0.000000] .data : 0x(ptrval) - 0x(ptrval) (1901 kB)
>> [ 0.000000] .init : 0x(ptrval) - 0x(ptrval) (1600 kB)
>> [ 0.000000] .bss : 0x(ptrval) - 0x(ptrval) ( 415 kB)
>> [ 0.000000] vmalloc : 0x(ptrval) - 0x(ptrval) (1023 MB)
>> [ 0.000000] fixmap : 0x(ptrval) - 0x(ptrval) ( 68 kB)
>>
>
> It must be some bug in the algo. What point in the %pK then? According to
> the documentation the only way to see the pointers is when (kptr_restrict == 0).
> But if it is we don't get into the restricted_pointer() method at all:
> http://elixir.free-electrons.com/linux/v4.15-rc9/source/lib/vsprintf.c#L1934
> In this case the vsprintf() executes the method ptr_to_id(), which of course
> default to _not_ leak addresses, and hash it before printing.
>
> Really %pK isn't supposed to be dependent from RNG at all since kptr_restrict
> doesn't do any value randomization.
That was true until v4.15-rc2. The behavior of %pK was changed without
that being reflected in the documentation. A patch
(https://patchwork.kernel.org/patch/10124413/) is in progress to update
this.
>
>>
>> The %px format specifier was added for cases such as this, where we really
>> want to print the unmodified address. And as long as this function is
>> suitably guarded to only do this when KASLR is deactivated /
>> CONFIG_DEBUG_KERNEL is activated, etc, then we are not unwittingly leaking
>> information - we are deliberately making it available.
>>
>
> If %pK would work as it's stated by the kernel documentation:
> https://www.kernel.org/doc/Documentation/printk-formats.txt
> then the only change I'd suggest to have here is to close the kernel memory
> layout printout method by the CONFIG_DEBUG_KERNEL ifdef-macro. The kptr_restrict
> should default to 1/2 if the KASLR is activated:
> https://lwn.net/Articles/444556/
Yeah, again, the documentation is no longer correct, and %pK will always
be hashed, and before the RNG is initialized it does not even hash it,
just returning "(ptrval)". So I'd recommend guarding with
CONFIG_DEBUG_KERNEL and switching the format specifier to %px.
Thanks,
Matt
>
> Regards,
> -Sergey
>
>> Thanks,
>> Matt
>>
>>>
>>> Regards,
>>> -Sergey
>>>
>>>>
>>>> Thanks,
>>>> Matt
>>>>
>>>>>
>>>>>> --
>>>>>> Florian
Hi Serge,
On 23/01/18 19:27, Serge Semin wrote:
> Hello Matt,
>
> On Tue, Jan 23, 2018 at 11:03:27AM +0000, Matt Redfearn <[email protected]> wrote:
>> Hi Serge,
>>
>> On 22/01/18 21:47, Serge Semin wrote:
>>> Hello Matt,
>>>
>>> On Mon, Jan 22, 2018 at 04:35:26PM +0000, Matt Redfearn <[email protected]> wrote:
>>>> Hi Serge,
>>>>
>>>> On 17/01/18 22:23, Serge Semin wrote:
>>>>> The current MIPS code makes sure the kernel code/data/init
>>>>> sections are in the maps, but BSS should also be there.
>>>>
>>>> Quite right - it should. But this was protected against by reserving all
>>>> bootmem up to the _end symbol here:
>>>> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
>>>> Which you remove in the next patch in this series. I'm not sure it is worth
>>>
>>> Right. Missed that part. The old code just doesn't set the kernel memory free
>>> calling the free_bootmem() method for non-reserved parts below reserved_end.
>>>
>>>> disentangling the reserved_end stuff from the next patch to make this into a
>>>> single logical change of reserving just .bss rather than everything below
>>>> _end.
>>>
>>> Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
>>> Add reserved memory regions to memblock". It logically belongs to that place.
>>> Since basically by the arch_mem_addpart() calls we reserve all the kernel
>>
>>
>> Actually I was wrong - it's not this sequence of arch_mem_addpart's that
>> reserves the kernels memory. At least on DT based systems, it's pretty
>> likely that these regions will overlap with the system memory already added.
>> of_scan_flat_dt will look for the memory node and add it via
>> early_init_dt_add_memory_arch.
>> These calls to add the kernel text, init and bss detect that they overlap
>> with the already present system memory, so don't get added, here:
>> http://elixir.free-electrons.com/linux/v4.15-rc9/source/arch/mips/kernel/setup.c#L759
>>
>> As such, when we print out the content of boot_mem_map, we only have a
>> single entry:
>>
>> [ 0.000000] Determined physical RAM map:
>> [ 0.000000] memory: 10000000 @ 00000000 (usable)
>>
>>
>>> memory now I'd also merged them into a single call for the range [_text, _end].
>>> What do you think?
>>
>>
>> I think that this patch makes sense in case the .bss is for some reason not
>> covered by an existing entry, but I would leave it as a separate patch.
>>
>> Your [PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
>> is actually self-contained since it replaces reserving all memory up to _end
>> with the single reservation of the kernel's whole size
>>
>> + size = __pa_symbol(&_end) - __pa_symbol(&_text);
>> + memblock_reserve(__pa_symbol(&_text), size);
>>
>>
>> Which I think is definitely an improvement since it is much clearer.
>>
>
> Alright lets sum it up. First of all, yeah, you are right, arch_mem_addpart()
> is created to make sure the kernel memory is added to the memblock/bootmem pool.
> The previous arch code was leaving such the memory range non-freed since it was
> higher the reserved_end, so to make sure the early memory allocations wouldn't
> be made from the pages, where kernel actually resides.
>
> In my code I still wanted to make sure the kernel memory is in the memblock pool.
> But I also noticed, that .bss memory range wouldn't be added to the pool if neither
> dts nor platform-specific code added any memory to the boot_mem_map pool. So I
> decided to fix it. The actual kernel memory reservation is performed after all
> the memory regions are declared by the code you cited. It's essential to do
> the [_text, _end] memory range reservation there, otherwise memblock may
> allocate from the memory range occupied by the kernel code/data.
>
> Since you agree with leaving it in the separate patch, I'd only suggest to
> call the arch_mem_addpart() method for just one range [_text, _end] instead of
> doing it three times for a separate _text, _data and bss sections. What do you
> think?
I think it's best left as 3 separate reservations, mainly due to the
different attribute used for the init section. So all in all, I like
this patch as it is.
Thanks,
Matt
>
> Regards,
> -Sergey
>
>> Thanks,
>> Matt
>>
>>>
>>> Regards,
>>> -Sergey
>>>
>>>>
>>>> Reviewed-by: Matt Redfearn <[email protected]>
>>>>
>>>> Thanks,
>>>> Matt
>>>>
>>>>>
>>>>> Signed-off-by: Serge Semin <[email protected]>
>>>>> ---
>>>>> arch/mips/kernel/setup.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
>>>>> index 76e9e2075..0d21c9e04 100644
>>>>> --- a/arch/mips/kernel/setup.c
>>>>> +++ b/arch/mips/kernel/setup.c
>>>>> @@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
>>>>> arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
>>>>> PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
>>>>> BOOT_MEM_INIT_RAM);
>>>>> + arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
>>>>> + PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
>>>>> + BOOT_MEM_RAM);
>>>>> pr_info("Determined physical RAM map:\n");
>>>>> print_memory_map();
>>>>>
Hello Matt,
On Wed, Jan 24, 2018 at 09:49:31AM +0000, Matt Redfearn <[email protected]> wrote:
> Hi Serge,
>
> On 23/01/18 19:27, Serge Semin wrote:
> >Hello Matt,
> >
> >On Tue, Jan 23, 2018 at 11:03:27AM +0000, Matt Redfearn <[email protected]> wrote:
> >>Hi Serge,
> >>
> >>On 22/01/18 21:47, Serge Semin wrote:
> >>>Hello Matt,
> >>>
> >>>On Mon, Jan 22, 2018 at 04:35:26PM +0000, Matt Redfearn <[email protected]> wrote:
> >>>>Hi Serge,
> >>>>
> >>>>On 17/01/18 22:23, Serge Semin wrote:
> >>>>>The current MIPS code makes sure the kernel code/data/init
> >>>>>sections are in the maps, but BSS should also be there.
> >>>>
> >>>>Quite right - it should. But this was protected against by reserving all
> >>>>bootmem up to the _end symbol here:
> >>>>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
> >>>>Which you remove in the next patch in this series. I'm not sure it is worth
> >>>
> >>>Right. Missed that part. The old code just doesn't set the kernel memory free
> >>>calling the free_bootmem() method for non-reserved parts below reserved_end.
> >>>
> >>>>disentangling the reserved_end stuff from the next patch to make this into a
> >>>>single logical change of reserving just .bss rather than everything below
> >>>>_end.
> >>>
> >>>Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
> >>>Add reserved memory regions to memblock". It logically belongs to that place.
> >>>Since basically by the arch_mem_addpart() calls we reserve all the kernel
> >>
> >>
> >>Actually I was wrong - it's not this sequence of arch_mem_addpart's that
> >>reserves the kernels memory. At least on DT based systems, it's pretty
> >>likely that these regions will overlap with the system memory already added.
> >>of_scan_flat_dt will look for the memory node and add it via
> >>early_init_dt_add_memory_arch.
> >>These calls to add the kernel text, init and bss detect that they overlap
> >>with the already present system memory, so don't get added, here:
> >>http://elixir.free-electrons.com/linux/v4.15-rc9/source/arch/mips/kernel/setup.c#L759
> >>
> >>As such, when we print out the content of boot_mem_map, we only have a
> >>single entry:
> >>
> >>[ 0.000000] Determined physical RAM map:
> >>[ 0.000000] memory: 10000000 @ 00000000 (usable)
> >>
> >>
> >>>memory now I'd also merged them into a single call for the range [_text, _end].
> >>>What do you think?
> >>
> >>
> >>I think that this patch makes sense in case the .bss is for some reason not
> >>covered by an existing entry, but I would leave it as a separate patch.
> >>
> >>Your [PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
> >>is actually self-contained since it replaces reserving all memory up to _end
> >>with the single reservation of the kernel's whole size
> >>
> >>+ size = __pa_symbol(&_end) - __pa_symbol(&_text);
> >>+ memblock_reserve(__pa_symbol(&_text), size);
> >>
> >>
> >>Which I think is definitely an improvement since it is much clearer.
> >>
> >
> >Alright lets sum it up. First of all, yeah, you are right, arch_mem_addpart()
> >is created to make sure the kernel memory is added to the memblock/bootmem pool.
> >The previous arch code was leaving such the memory range non-freed since it was
> >higher the reserved_end, so to make sure the early memory allocations wouldn't
> >be made from the pages, where kernel actually resides.
> >
> >In my code I still wanted to make sure the kernel memory is in the memblock pool.
> >But I also noticed, that .bss memory range wouldn't be added to the pool if neither
> >dts nor platform-specific code added any memory to the boot_mem_map pool. So I
> >decided to fix it. The actual kernel memory reservation is performed after all
> >the memory regions are declared by the code you cited. It's essential to do
> >the [_text, _end] memory range reservation there, otherwise memblock may
> >allocate from the memory range occupied by the kernel code/data.
> >
> >Since you agree with leaving it in the separate patch, I'd only suggest to
> >call the arch_mem_addpart() method for just one range [_text, _end] instead of
> >doing it three times for a separate _text, _data and bss sections. What do you
> >think?
>
> I think it's best left as 3 separate reservations, mainly due to the
> different attribute used for the init section. So all in all, I like this
> patch as it is.
>
Alright. I'll leave as is. Lets see what others think about it.
Regards,
-Sergey
> Thanks,
> Matt
>
> >
> >Regards,
> >-Sergey
> >
> >>Thanks,
> >>Matt
> >>
> >>>
> >>>Regards,
> >>>-Sergey
> >>>
> >>>>
> >>>>Reviewed-by: Matt Redfearn <[email protected]>
> >>>>
> >>>>Thanks,
> >>>>Matt
> >>>>
> >>>>>
> >>>>>Signed-off-by: Serge Semin <[email protected]>
> >>>>>---
> >>>>> arch/mips/kernel/setup.c | 3 +++
> >>>>> 1 file changed, 3 insertions(+)
> >>>>>
> >>>>>diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >>>>>index 76e9e2075..0d21c9e04 100644
> >>>>>--- a/arch/mips/kernel/setup.c
> >>>>>+++ b/arch/mips/kernel/setup.c
> >>>>>@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> >>>>> arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> >>>>> PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> >>>>> BOOT_MEM_INIT_RAM);
> >>>>>+ arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> >>>>>+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> >>>>>+ BOOT_MEM_RAM);
> >>>>> pr_info("Determined physical RAM map:\n");
> >>>>> print_memory_map();
> >>>>>
Hello Matt,
On Wed, Jan 24, 2018 at 09:46:07AM +0000, Matt Redfearn <[email protected]> wrote:
> Hi Serge,
>
> On 23/01/18 19:10, Serge Semin wrote:
> >Hello Matt,
> >
> >On Tue, Jan 23, 2018 at 03:35:14PM +0000, Matt Redfearn <[email protected]> wrote:
> >>Hi Serge,
> >>
> >>On 19/01/18 14:27, Serge Semin wrote:
> >>>On Fri, Jan 19, 2018 at 07:59:43AM +0000, Matt Redfearn <[email protected]> wrote:
> >>>
> >>>Hello Matt,
> >>>
> >>>>Hi Serge,
> >>>>
> >>>>
> >>>>
> >>>>On 18/01/18 20:18, Serge Semin wrote:
> >>>>>On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli <[email protected]> wrote:
> >>>>>>On 01/17/2018 02:23 PM, Serge Semin wrote:
> >>>>>>>It is useful to have the kernel virtual memory layout printed
> >>>>>>>at boot time so to have the full information about the booted
> >>>>>>>kernel. In some cases it might be unsafe to have virtual
> >>>>>>>addresses freely visible in logs, so the %pK format is used if
> >>>>>>>one want to hide them.
> >>>>>>>
> >>>>>>>Signed-off-by: Serge Semin <[email protected]>
> >>>>>>
> >>>>>>I personally like having that information because that helps debug and
> >>>>>>have a quick reference, but there appears to be a trend to remove this
> >>>>>>in the name of security:
> >>>>>>
> >>>>>>https://patchwork.kernel.org/patch/10124007/
> >>>>>>
> >>>>>>maybe hide this behind a configuration option?
> >>>>>
> >>>>>Yeah, arm code was the place I picked the function up.) But in my case
> >>>>>I've used %pK so the pointers would disappear from logging when
> >>>>>kptr_restrict sysctl is 1 or 2.
> >>>>>I agree, that we might need to make the printouts optional. If there is
> >>>>>any kernel config, which for instance increases the kernel security we
> >>>>>could also use it or anything else to discard the printouts at compile
> >>>>>time.
> >>>>
> >>>>
> >>>>Certainly, when KASLR is active it would be preferable to hide this
> >>>>information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
> >>>>additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
> >>>>that only people actively debugging the kernel see it:
> >>>>
> >>>>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
> >>>
> >>>Ok. I'll hide the printouts behind both of that config macros in the next patchset
> >>>version.
> >>
> >>
> >>Another thing to note - since ad67b74d2469d ("printk: hash addresses printed
> >>with %p") %pK at this time in the boot process is useless since the RNG is
> >>not sufficiently initialised and all prints end up being "(ptrval)". Hence
> >>after v4.15-rc2 we end up with output like:
> >>
> >>[ 0.000000] Kernel virtual memory layout:
> >>[ 0.000000] lowmem : 0x(ptrval) - 0x(ptrval) ( 256 MB)
> >>[ 0.000000] .text : 0x(ptrval) - 0x(ptrval) (7374 kB)
> >>[ 0.000000] .data : 0x(ptrval) - 0x(ptrval) (1901 kB)
> >>[ 0.000000] .init : 0x(ptrval) - 0x(ptrval) (1600 kB)
> >>[ 0.000000] .bss : 0x(ptrval) - 0x(ptrval) ( 415 kB)
> >>[ 0.000000] vmalloc : 0x(ptrval) - 0x(ptrval) (1023 MB)
> >>[ 0.000000] fixmap : 0x(ptrval) - 0x(ptrval) ( 68 kB)
> >>
> >
> >It must be some bug in the algo. What point in the %pK then? According to
> >the documentation the only way to see the pointers is when (kptr_restrict == 0).
> >But if it is we don't get into the restricted_pointer() method at all:
> >http://elixir.free-electrons.com/linux/v4.15-rc9/source/lib/vsprintf.c#L1934
> >In this case the vsprintf() executes the method ptr_to_id(), which of course
> >default to _not_ leak addresses, and hash it before printing.
> >
> >Really %pK isn't supposed to be dependent from RNG at all since kptr_restrict
> >doesn't do any value randomization.
>
>
> That was true until v4.15-rc2. The behavior of %pK was changed without that
> being reflected in the documentation. A patch
> (https://patchwork.kernel.org/patch/10124413/) is in progress to update
> this.
>
> >
> >>
> >>The %px format specifier was added for cases such as this, where we really
> >>want to print the unmodified address. And as long as this function is
> >>suitably guarded to only do this when KASLR is deactivated /
> >>CONFIG_DEBUG_KERNEL is activated, etc, then we are not unwittingly leaking
> >>information - we are deliberately making it available.
> >>
> >
> >If %pK would work as it's stated by the kernel documentation:
> >https://www.kernel.org/doc/Documentation/printk-formats.txt
> >then the only change I'd suggest to have here is to close the kernel memory
> >layout printout method by the CONFIG_DEBUG_KERNEL ifdef-macro. The kptr_restrict
> >should default to 1/2 if the KASLR is activated:
> >https://lwn.net/Articles/444556/
>
> Yeah, again, the documentation is no longer correct, and %pK will always be
> hashed, and before the RNG is initialized it does not even hash it, just
> returning "(ptrval)". So I'd recommend guarding with CONFIG_DEBUG_KERNEL
> and switching the format specifier to %px.
>
Oh, it isn't the bug then) I'll do as you suggest and replace %pK with
%px closing the code by CONFIG_DEBUG_KERNEL macro.
Regards,
-Sergey
> Thanks,
> Matt
>
> >
> >Regards,
> >-Sergey
> >
> >>Thanks,
> >>Matt
> >>
> >>>
> >>>Regards,
> >>>-Sergey
> >>>
> >>>>
> >>>>Thanks,
> >>>>Matt
> >>>>
> >>>>>
> >>>>>>--
> >>>>>>Florian
Hello Serge,
On 17/01/18 23:22, Serge Semin wrote:
> The patchset is applied on top of kernel 4.15-rc8 and can be found
> submitted at my repo:
> https://github.com/fancer/Linux-kernel-MIPS-memblock-project
I've tested the Linux from your repo on Octeon2 and it looks good to me.
I've only tested startup though. Therefore,
Tested-by: Alexander Sverdlin <[email protected]>
I've noticed one positive effect I cannot explain -- with almost the same
physical memory map I observe almost 2 megabytes more available memory
after startup:
without patches:
root@(none):~ >free
total used free shared buff/cache available
Mem: 955040 16264 839948 80068 98828 810068
Swap: 0 0 0
memory map:
memory: 0000000001090dc0 @ 0000000009000000 (usable after init)
memory: 0000000005400000 @ 0000000002b00000 (usable)
memory: 0000000000c00000 @ 0000000008200000 (usable)
memory: 0000000004800000 @ 000000000a100000 (usable)
memory: 000000001fc00000 @ 0000000020000000 (usable)
memory: 0000000010000000 @ 0000000040000000 (usable)
memory: 000000000190a9d0 @ 0000000001100000 (usable)
----------------------------------------
with patches:
root@(none):~ >free
total used free shared buff/cache available
Mem: 955028 14292 841884 80068 98852 811996
Swap: 0 0 0
memory map:
memory: 0000000001090e00 @ 0000000009000000 (usable after init)
memory: 0000000005400000 @ 0000000002b00000 (usable)
memory: 0000000000c00000 @ 0000000008200000 (usable)
memory: 0000000004800000 @ 000000000a100000 (usable)
memory: 000000001fc00000 @ 0000000020000000 (usable)
memory: 0000000010000000 @ 0000000040000000 (usable)
memory: 000000000190c9d0 @ 0000000001100000 (usable)
> Signed-off-by: Serge Semin <[email protected]>
>
> Serge Semin (14):
> MIPS: memblock: Add RESERVED_NOMAP memory flag
> MIPS: memblock: Surely map BSS kernel memory section
> MIPS: memblock: Reserve initrd memory in memblock
> MIPS: memblock: Discard bootmem initialization
> MIPS: memblock: Add reserved memory regions to memblock
> MIPS: memblock: Reserve kdump/crash regions in memblock
> MIPS: memblock: Mark present sparsemem sections
> MIPS: memblock: Simplify DMA contiguous reservation
> MIPS: memblock: Allow memblock regions resize
> MIPS: memblock: Perform early low memory test
> MIPS: memblock: Print out kernel virtual mem layout
> MIPS: memblock: Discard bootmem from Loongson3 code
> MIPS: memblock: Discard bootmem from SGI IP27 code
> MIPS: memblock: Deactivate bootmem allocator
--
Best regards,
Alexander Sverdlin.
Hello Alexander,
On Thu, Jan 25, 2018 at 06:58:07PM +0100, Alexander Sverdlin <[email protected]> wrote:
> Hello Serge,
>
> On 17/01/18 23:22, Serge Semin wrote:
> > The patchset is applied on top of kernel 4.15-rc8 and can be found
> > submitted at my repo:
> > https://github.com/fancer/Linux-kernel-MIPS-memblock-project
>
> I've tested the Linux from your repo on Octeon2 and it looks good to me.
> I've only tested startup though. Therefore,
>
> Tested-by: Alexander Sverdlin <[email protected]>
>
Great! Thank you very much for doing this. I'll include the info about all the
tested platforms to the cover letter of the next patchset.
> I've noticed one positive effect I cannot explain -- with almost the same
> physical memory map I observe almost 2 megabytes more available memory
> after startup:
>
> without patches:
>
> root@(none):~ >free
> total used free shared buff/cache available
> Mem: 955040 16264 839948 80068 98828 810068
> Swap: 0 0 0
>
> memory map:
>
> memory: 0000000001090dc0 @ 0000000009000000 (usable after init)
> memory: 0000000005400000 @ 0000000002b00000 (usable)
> memory: 0000000000c00000 @ 0000000008200000 (usable)
> memory: 0000000004800000 @ 000000000a100000 (usable)
> memory: 000000001fc00000 @ 0000000020000000 (usable)
> memory: 0000000010000000 @ 0000000040000000 (usable)
> memory: 000000000190a9d0 @ 0000000001100000 (usable)
>
> ----------------------------------------
>
> with patches:
>
> root@(none):~ >free
> total used free shared buff/cache available
> Mem: 955028 14292 841884 80068 98852 811996
> Swap: 0 0 0
>
> memory map:
>
> memory: 0000000001090e00 @ 0000000009000000 (usable after init)
> memory: 0000000005400000 @ 0000000002b00000 (usable)
> memory: 0000000000c00000 @ 0000000008200000 (usable)
> memory: 0000000004800000 @ 000000000a100000 (usable)
> memory: 000000001fc00000 @ 0000000020000000 (usable)
> memory: 0000000010000000 @ 0000000040000000 (usable)
> memory: 000000000190c9d0 @ 0000000001100000 (usable)
>
That's interesting. My suggestion is that the old code used to reserve all
the memory below kernel _end symbol. So if the kernel isn't loaded right at
the start of the lowest memory range, then there is going to be a wasted
memory between the range start and the _text kernel symbol:
[PATCH 04/14] MIPS: memblock: Discard bootmem initialization
My code reserves only the memory occupied by the kernel within [_text, _end]:
[PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
There might be some other reason of the lesser memory consumption though.
Hopefully I didn't forget to reserve some necessary memory ranges.)
Regards,
-Sergey
>
> > Signed-off-by: Serge Semin <[email protected]>
> >
> > Serge Semin (14):
> > MIPS: memblock: Add RESERVED_NOMAP memory flag
> > MIPS: memblock: Surely map BSS kernel memory section
> > MIPS: memblock: Reserve initrd memory in memblock
> > MIPS: memblock: Discard bootmem initialization
> > MIPS: memblock: Add reserved memory regions to memblock
> > MIPS: memblock: Reserve kdump/crash regions in memblock
> > MIPS: memblock: Mark present sparsemem sections
> > MIPS: memblock: Simplify DMA contiguous reservation
> > MIPS: memblock: Allow memblock regions resize
> > MIPS: memblock: Perform early low memory test
> > MIPS: memblock: Print out kernel virtual mem layout
> > MIPS: memblock: Discard bootmem from Loongson3 code
> > MIPS: memblock: Discard bootmem from SGI IP27 code
> > MIPS: memblock: Deactivate bootmem allocator
>
> --
> Best regards,
> Alexander Sverdlin.
So, since there haven't been any new comments for over a week, I'll be
collecting the patchset v2 tomorrow.
Regards,
-Sergey
On Thu, Jan 18, 2018 at 01:22:58AM +0300, Serge Semin <[email protected]> wrote:
> Even though it's common to see the architecture code using both
> bootmem and memblock early memory allocators, it's not good for
> multiple reasons. First of all, it's redundant to have two
> early memory allocator while one would be more than enough from
> functionality and stability points of view. Secondly, some new
> features introduced in the kernel utilize the methods of the most
> modern allocator ignoring the older one. It means the architecture
> code must keep the both subsystems up synchronized with information
> about memory regions and reservations, which leads to the code
> complexity increase, that obviously increases bugs probability.
> Finally it's better to keep all the architectures code unified for
> better readability and code simplification. All these reasons lead
> to one conclusion - arch code should use just one memory allocator,
> which is supposed to be memblock as the most modern and already
> utilized by the most of the kernel platforms. This patchset is
> mostly about it.
>
> One more reason why the MIPS arch code should finally move to
> memblock is a BUG somewhere in the initialization process, when
> CMA is activated:
>
> [ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
> [ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
> [ 0.263172] flags: 0x40000000()
> [ 0.266723] page dumped because: nonzero mapcount
> [ 0.272049] Modules linked in:
> [ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
> [ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
> 86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
> 80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
> 80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> ...
> [ 0.323148] Call Trace:
> [ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
> [ 0.330859] [<80404814>] dump_stack+0xd4/0x110
> [ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
> [ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
> [ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
> [ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
> [ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
> [ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
> [ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
> [ 0.376828] [<808faca4>] kernel_init+0x14/0x104
> [ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
>
> The bugus pfn seems to be the one allocated for bootmem allocator
> pages and hasn't been freed before letting the CMA working with its
> areas. Anyway the bug is solved by this patchset.
>
> Another reason why this patchset is useful is that it fixes the fdt
> reserved-memory nodes functionality for MIPS. Really it's bug to have
> the fdt reserved nodes scanning before the memblock is
> fully initialized (calling early_init_fdt_scan_reserved_mem before
> bootmem_init is called). Additionally no-map flag of the
> reserved-memory node hasn't been taking into account. This patchset
> fixes all of these.
>
> As you probably remember I already did another attempt to merge a
> similar functionality into the kernel. This time the patchset got
> to be less complex (14 patches vs 21 last time) and fixes the
> platform code like SGI IP27 and Loongson3, which due to being
> NUMA introduce its own memory initialization process. Although
> I have much doubt in SGI IP27 code operability in the first place,
> since it got prom_meminit() method of early memory initialization,
> which hasn't been called at any other place in the kernel. It must
> have been left there unrenamed after arch/mips/mips-boards/generic
> code had been discarded.
>
> Here are the list of folks, who agreed to perform some tests of
> the patchset:
> Alexander Sverdlin <[email protected]> - Octeon2
> Matt Redfearn <[email protected]> - Loongson3, etc
> Joshua Kinard <[email protected]> - IP27
> Marcin Nowakowski <[email protected]>
> Thanks to you all in regards and for everybody, who will be involved
> in reviewing and testing.
>
> The patchset is applied on top of kernel 4.15-rc8 and can be found
> submitted at my repo:
> https://github.com/fancer/Linux-kernel-MIPS-memblock-project
>
> Signed-off-by: Serge Semin <[email protected]>
>
> Serge Semin (14):
> MIPS: memblock: Add RESERVED_NOMAP memory flag
> MIPS: memblock: Surely map BSS kernel memory section
> MIPS: memblock: Reserve initrd memory in memblock
> MIPS: memblock: Discard bootmem initialization
> MIPS: memblock: Add reserved memory regions to memblock
> MIPS: memblock: Reserve kdump/crash regions in memblock
> MIPS: memblock: Mark present sparsemem sections
> MIPS: memblock: Simplify DMA contiguous reservation
> MIPS: memblock: Allow memblock regions resize
> MIPS: memblock: Perform early low memory test
> MIPS: memblock: Print out kernel virtual mem layout
> MIPS: memblock: Discard bootmem from Loongson3 code
> MIPS: memblock: Discard bootmem from SGI IP27 code
> MIPS: memblock: Deactivate bootmem allocator
>
> arch/mips/Kconfig | 2 +-
> arch/mips/include/asm/bootinfo.h | 1 +
> arch/mips/kernel/prom.c | 8 +-
> arch/mips/kernel/setup.c | 218 +++++++++------------
> arch/mips/loongson64/loongson-3/numa.c | 16 +-
> arch/mips/mm/init.c | 47 +++++
> arch/mips/sgi-ip27/ip27-memory.c | 9 +-
> 7 files changed, 153 insertions(+), 148 deletions(-)
>
> --
> 2.12.0
>
Even though it's common to see the architecture code using both
bootmem and memblock early memory allocators, it's not good for
multiple reasons. First of all, it's redundant to have two
early memory allocator while one would be more than enough from
functionality and stability points of view. Secondly, some new
features introduced in the kernel utilize the methods of the most
modern allocator ignoring the older one. It means the architecture
code must keep the both subsystems up synchronized with information
about memory regions and reservations, which leads to the code
complexity increase, that obviously increases bugs probability.
Finally it's better to keep all the architectures code unified for
better readability and code simplification. All these reasons lead
to one conclusion - arch code should use just one memory allocator,
which is supposed to be memblock as the most modern and already
utilized by the most of the kernel platforms. This patchset is
mostly about it.
One more reason why the MIPS arch code should finally move to
memblock is a BUG somewhere in the initialization process, when
CMA is activated:
[ 0.248762] BUG: Bad page state in process swapper/0 pfn:01f93
[ 0.255415] page:8205b0ac count:0 mapcount:-127 mapping: (null) index:0x1
[ 0.263172] flags: 0x40000000()
[ 0.266723] page dumped because: nonzero mapcount
[ 0.272049] Modules linked in:
[ 0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
[ 0.282900] Stack : 00000000 00000000 80b6dd6a 0000003a 00000000 00000000 80930000 8092bff4
86073a14 80ac88c7 809f21ac 00000000 00000001 80b6998c 00000400 00000000
80a00000 801822e8 80b6dd68 00000000 00000002 00000000 809f8024 86077ccc
80b80000 801e9328 809fcbc0 00000000 00000400 00010000 86077ccc 86073a14
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 0.323148] Call Trace:
[ 0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
[ 0.330859] [<80404814>] dump_stack+0xd4/0x110
[ 0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
[ 0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
[ 0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
[ 0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
[ 0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
[ 0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
[ 0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
[ 0.376828] [<808faca4>] kernel_init+0x14/0x104
[ 0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
The bugus pfn seems to be the one allocated for bootmem allocator
pages and hasn't been freed before letting the CMA working with its
areas. Anyway the bug is solved by this patchset.
Another reason why this patchset is useful is that it fixes the fdt
reserved-memory nodes functionality for MIPS. Really it's bug to have
the fdt reserved nodes scanning before the memblock is
fully initialized (calling early_init_fdt_scan_reserved_mem before
bootmem_init is called). Additionally no-map flag of the
reserved-memory node hasn't been taking into account. This patchset
fixes all of these.
As you probably remember I already did another attempt to merge a
similar functionality into the kernel. This time the patchset got
to be less complex (14 patches vs 21 last time) and fixes the
platform code like SGI IP27 and Loongson3, which due to being
NUMA introduce its own memory initialization process. Although
I have much doubt in SGI IP27 code operability in the first place,
since it got prom_meminit() method of early memory initialization,
which hasn't been called at any other place in the kernel. It must
have been left there unrenamed after arch/mips/mips-boards/generic
code had been discarded.
Here are the list of folks, who agreed to perform some tests of
the patchset:
Alexander Sverdlin <[email protected]> - Octeon2
Matt Redfearn <[email protected]> - Loongson3, etc
Joshua Kinard <[email protected]> - IP27
Marcin Nowakowski <[email protected]>
Thanks to you all and to everybody, who will be involved in reviewing
and testing.
The patchset is applied on top of kernel 4.15-rc8 and can be found
submitted at my repo:
https://github.com/fancer/Linux-kernel-MIPS-memblock-project
So far the patchset has been successfully tested on the platforms:
UTM8 (Cavium Octeon III)
Creator CI20
Creator CI40
Loongson3a
MIPS Boston
MIPS Malta
MIPS SEAD3
Octeon2
Changelog v2:
- Hide mem_print_kmap_info() behind CONFIG_DEBUG_KERNEL and replace
%pK with %px there (requested by Matt Redfearn)
- Drop relocatable fixup from reservation_init (patch from Matt Redfearn)
- Move __maybe_unused change from patch 7 to patch 8 (requested by Marcin Nowakowski)
- Add tested platforms to the cover letter
Signed-off-by: Serge Semin <[email protected]>
Tested-by: Matt Redfearn <[email protected]>
Tested-by: Alexander Sverdlin <[email protected]>
Matt Redfearn (1):
MIPS: KASLR: Drop relocatable fixup from reservation_init
Serge Semin (14):
MIPS: memblock: Add RESERVED_NOMAP memory flag
MIPS: memblock: Surely map BSS kernel memory section
MIPS: memblock: Reserve initrd memory in memblock
MIPS: memblock: Discard bootmem initialization
MIPS: memblock: Add reserved memory regions to memblock
MIPS: memblock: Reserve kdump/crash regions in memblock
MIPS: memblock: Mark present sparsemem sections
MIPS: memblock: Simplify DMA contiguous reservation
MIPS: memblock: Allow memblock regions resize
MIPS: memblock: Perform early low memory test
MIPS: memblock: Print out kernel virtual mem layout
MIPS: memblock: Discard bootmem from Loongson3 code
MIPS: memblock: Discard bootmem from SGI IP27 code
MIPS: memblock: Deactivate bootmem allocator
arch/mips/Kconfig | 2 +-
arch/mips/include/asm/bootinfo.h | 1 +
arch/mips/kernel/prom.c | 8 +-
arch/mips/kernel/setup.c | 239 +++++++++++++--------------------
arch/mips/loongson64/loongson-3/numa.c | 16 +--
arch/mips/mm/init.c | 49 +++++++
arch/mips/sgi-ip27/ip27-memory.c | 9 +-
7 files changed, 154 insertions(+), 170 deletions(-)
--
2.12.0
Even if nomap flag is specified the reserved memory declared in dts
isn't really discarded from the buddy allocator in the current code.
We'll fix it by adding the no-map MIPS memory flag. Additionally
lets add the RESERVED_NOMAP memory regions handling to the methods,
which aren't going to be changed in the further patches.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/include/asm/bootinfo.h | 1 +
arch/mips/kernel/prom.c | 8 ++++++--
arch/mips/kernel/setup.c | 8 ++++++++
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index e26a093bb17a..276618b5319d 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -90,6 +90,7 @@ extern unsigned long mips_machtype;
#define BOOT_MEM_ROM_DATA 2
#define BOOT_MEM_RESERVED 3
#define BOOT_MEM_INIT_RAM 4
+#define BOOT_MEM_RESERVED_NOMAP 5
/*
* A memory map that's built upon what was determined
diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 0dbcd152a1a9..b123eb8279cd 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -41,7 +41,7 @@ char *mips_get_machine_name(void)
#ifdef CONFIG_USE_OF
void __init early_init_dt_add_memory_arch(u64 base, u64 size)
{
- return add_memory_region(base, size, BOOT_MEM_RAM);
+ add_memory_region(base, size, BOOT_MEM_RAM);
}
void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
@@ -52,7 +52,11 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
{
- add_memory_region(base, size, BOOT_MEM_RESERVED);
+ if (!nomap)
+ add_memory_region(base, size, BOOT_MEM_RESERVED);
+ else
+ add_memory_region(base, size, BOOT_MEM_RESERVED_NOMAP);
+
return 0;
}
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 702c678de116..1a4d64410303 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -172,6 +172,7 @@ bool __init memory_region_available(phys_addr_t start, phys_addr_t size)
in_ram = true;
break;
case BOOT_MEM_RESERVED:
+ case BOOT_MEM_RESERVED_NOMAP:
if ((start >= start_ && start < end_) ||
(start < start_ && start + size >= start_))
free = false;
@@ -207,6 +208,9 @@ static void __init print_memory_map(void)
case BOOT_MEM_RESERVED:
printk(KERN_CONT "(reserved)\n");
break;
+ case BOOT_MEM_RESERVED_NOMAP:
+ printk(KERN_CONT "(reserved nomap)\n");
+ break;
default:
printk(KERN_CONT "type %lu\n", boot_mem_map.map[i].type);
break;
@@ -955,9 +959,13 @@ static void __init resource_init(void)
res->name = "System RAM";
res->flags |= IORESOURCE_SYSRAM;
break;
+ case BOOT_MEM_RESERVED_NOMAP:
+ res->name = "reserved nomap";
+ break;
case BOOT_MEM_RESERVED:
default:
res->name = "reserved";
+ break;
}
request_resource(&iomem_resource, res);
--
2.12.0
The current MIPS code makes sure the kernel code/data/init
sections are in the maps, but BSS should also be there.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 1a4d64410303..f502cd702fa7 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
BOOT_MEM_INIT_RAM);
+ arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
+ BOOT_MEM_RAM);
pr_info("Determined physical RAM map:\n");
print_memory_map();
--
2.12.0
Since memblock is going to be used for the early memory allocation
lets discard the bootmem node setup and all the related free-space
search code. Low/high PFN extremums should be still calculated
since they are needed on the paging_init stage. Since the current
code is already doing memblock regions initialization the only thing
left is to set the upper allocation limit to be up to the max low
memory PFN, so the memblock API can be fully used from now.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 86 +++++++-----------------------------------------
1 file changed, 11 insertions(+), 75 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index a015cee353be..b5fcacf71b3f 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -367,29 +367,15 @@ static void __init bootmem_init(void)
#else /* !CONFIG_SGI_IP27 */
-static unsigned long __init bootmap_bytes(unsigned long pages)
-{
- unsigned long bytes = DIV_ROUND_UP(pages, 8);
-
- return ALIGN(bytes, sizeof(long));
-}
-
static void __init bootmem_init(void)
{
- unsigned long reserved_end;
- unsigned long mapstart = ~0UL;
- unsigned long bootmap_size;
- bool bootmap_valid = false;
int i;
/*
- * Sanity check any INITRD first. We don't take it into account
- * for bootmem setup initially, rely on the end-of-kernel-code
- * as our memory range starting point. Once bootmem is inited we
+ * Sanity check any INITRD first. Once memblock is inited we
* will reserve the area used for the initrd.
*/
init_initrd();
- reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
/*
* max_low_pfn is not a number of pages. The number of pages
@@ -428,16 +414,6 @@ static void __init bootmem_init(void)
max_low_pfn = end;
if (start < min_low_pfn)
min_low_pfn = start;
- if (end <= reserved_end)
- continue;
-#ifdef CONFIG_BLK_DEV_INITRD
- /* Skip zones before initrd and initrd itself */
- if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end)))
- continue;
-#endif
- if (start >= mapstart)
- continue;
- mapstart = max(reserved_end, start);
}
if (min_low_pfn >= max_low_pfn)
@@ -463,53 +439,19 @@ static void __init bootmem_init(void)
#endif
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}
-
-#ifdef CONFIG_BLK_DEV_INITRD
- /*
- * mapstart should be after initrd_end
- */
- if (initrd_end)
- mapstart = max(mapstart, (unsigned long)PFN_UP(__pa(initrd_end)));
+#ifdef CONFIG_HIGHMEM
+ pr_info("PFNs: low min %lu, low max %lu, high start %lu, high end %lu,"
+ "max %lu\n",
+ min_low_pfn, max_low_pfn, highstart_pfn, highend_pfn, max_pfn);
+#else
+ pr_info("PFNs: low min %lu, low max %lu, max %lu\n",
+ min_low_pfn, max_low_pfn, max_pfn);
#endif
/*
- * check that mapstart doesn't overlap with any of
- * memory regions that have been reserved through eg. DTB
- */
- bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn);
-
- bootmap_valid = memory_region_available(PFN_PHYS(mapstart),
- bootmap_size);
- for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) {
- unsigned long mapstart_addr;
-
- switch (boot_mem_map.map[i].type) {
- case BOOT_MEM_RESERVED:
- mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr +
- boot_mem_map.map[i].size);
- if (PHYS_PFN(mapstart_addr) < mapstart)
- break;
-
- bootmap_valid = memory_region_available(mapstart_addr,
- bootmap_size);
- if (bootmap_valid)
- mapstart = PHYS_PFN(mapstart_addr);
- break;
- default:
- break;
- }
- }
-
- if (!bootmap_valid)
- panic("No memory area to place a bootmap bitmap");
-
- /*
- * Initialize the boot-time allocator with low memory only.
+ * Initialize the boot-time allocator with low/high memory, but
+ * set the allocation limit to low memory only
*/
- if (bootmap_size != init_bootmem_node(NODE_DATA(0), mapstart,
- min_low_pfn, max_low_pfn))
- panic("Unexpected memory size required for bootmap");
-
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;
@@ -535,6 +477,7 @@ static void __init bootmem_init(void)
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
+ memblock_set_current_limit(PFN_PHYS(max_low_pfn));
/*
* Register fully available low RAM pages with the bootmem allocator.
@@ -570,8 +513,6 @@ static void __init bootmem_init(void)
*/
if (start >= max_low_pfn)
continue;
- if (start < reserved_end)
- start = reserved_end;
if (end > max_low_pfn)
end = max_low_pfn;
@@ -587,11 +528,6 @@ static void __init bootmem_init(void)
memory_present(0, start, end);
}
- /*
- * Reserve the bootmap memory.
- */
- reserve_bootmem(PFN_PHYS(mapstart), bootmap_size, BOOTMEM_DEFAULT);
-
#ifdef CONFIG_RELOCATABLE
/*
* The kernel reserves all memory below its _end symbol as bootmem,
--
2.12.0
Memblock allocator can be successfully used from now for early
memory management
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 350a990fc719..434f756e03e9 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -4,7 +4,6 @@ config MIPS
default y
select ARCH_BINFMT_ELF_STATE
select ARCH_CLOCKSOURCE_DATA
- select ARCH_DISCARD_MEMBLOCK
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_MIGHT_HAVE_PC_PARPORT
@@ -57,6 +56,7 @@ config MIPS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KPROBES
select HAVE_KRETPROBES
+ select NO_BOOTMEM
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MOD_ARCH_SPECIFIC
--
2.12.0
SGI IP27 got its own code to set the early memory allocator up since it's
NUMA-based system. So in order to be compatible with NO_BOOTMEM config
we need to discard the bootmem allocator initialization and insert the
memblock reservation method. Although in my opinion the code isn't
working anyway since I couldn't find a place where prom_meminit() called
and kernel memory isn't reserved. It must have been untested since the
time the arch/mips/mips-boards/generic code was in the kernel.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/sgi-ip27/ip27-memory.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
index 59133d0abc83..c480ee3eca96 100644
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -389,7 +389,6 @@ static void __init node_mem_init(cnodeid_t node)
{
unsigned long slot_firstpfn = slot_getbasepfn(node, 0);
unsigned long slot_freepfn = node_getfirstfree(node);
- unsigned long bootmap_size;
unsigned long start_pfn, end_pfn;
get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
@@ -400,7 +399,6 @@ static void __init node_mem_init(cnodeid_t node)
__node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
memset(__node_data[node], 0, PAGE_SIZE);
- NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
@@ -409,12 +407,9 @@ static void __init node_mem_init(cnodeid_t node)
slot_freepfn += PFN_UP(sizeof(struct pglist_data) +
sizeof(struct hub_data));
- bootmap_size = init_bootmem_node(NODE_DATA(node), slot_freepfn,
- start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
- reserve_bootmem_node(NODE_DATA(node), slot_firstpfn << PAGE_SHIFT,
- ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT) + bootmap_size,
- BOOTMEM_DEFAULT);
+ memblock_reserve(slot_firstpfn << PAGE_SHIFT,
+ ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT));
sparse_memory_present_with_active_regions(node);
}
--
2.12.0
There is no reserve_bootmem() method in the nobootmem interface,
so we need to replace it with memblock-specific one.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index f502cd702fa7..a015cee353be 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -330,7 +330,7 @@ static void __init finalize_initrd(void)
maybe_bswap_initrd();
- reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT);
+ memblock_reserve(__pa(initrd_start), size);
initrd_below_start_ok = 1;
pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n",
--
2.12.0
It is useful to have the kernel virtual memory layout printed
at boot time so to have the full information about the booted
kernel. In some cases it might be unsafe to have virtual
addresses freely visible in logs, so the %pK format is used if
one want to hide them.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/mm/init.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 84b7b592b834..eec92194d4dc 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -32,6 +32,7 @@
#include <linux/kcore.h>
#include <linux/export.h>
#include <linux/initrd.h>
+#include <linux/sizes.h>
#include <asm/asm-offsets.h>
#include <asm/bootinfo.h>
@@ -60,6 +61,53 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
EXPORT_SYMBOL(zero_page_mask);
/*
+ * Print out the kernel virtual memory layout
+ */
+#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
+#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
+static void __init mem_print_kmap_info(void)
+{
+#ifdef CONFIG_DEBUG_KERNEL
+ pr_notice("Kernel virtual memory layout:\n"
+ " lowmem : 0x%px - 0x%px (%4ld MB)\n"
+ " .text : 0x%px - 0x%px (%4td kB)\n"
+ " .data : 0x%px - 0x%px (%4td kB)\n"
+ " .init : 0x%px - 0x%px (%4td kB)\n"
+ " .bss : 0x%px - 0x%px (%4td kB)\n"
+ " vmalloc : 0x%px - 0x%px (%4ld MB)\n"
+#ifdef CONFIG_HIGHMEM
+ " pkmap : 0x%px - 0x%px (%4ld MB)\n"
+#endif
+ " fixmap : 0x%px - 0x%px (%4ld kB)\n",
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(__bss_start, __bss_stop),
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
+#endif
+ MLK(FIXADDR_START, FIXADDR_TOP));
+
+ /* Check some fundamental inconsistencies. May add something else? */
+#ifdef CONFIG_HIGHMEM
+ BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
+ BUG_ON(VMALLOC_END < (unsigned long)high_memory);
+ BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
+ BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
+ (unsigned long)high_memory);
+#endif
+ BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
+ BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
+#endif /* CONFIG_DEBUG_KERNEL */
+}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
+/*
* Not static inline because used by IP27 special magic initialization code
*/
void setup_zero_pages(void)
@@ -468,6 +516,7 @@ void __init mem_init(void)
free_all_bootmem();
setup_zero_pages(); /* Setup zeroed pages. */
mem_init_free_highmem();
+ mem_print_kmap_info();
mem_init_print_info(NULL);
#ifdef CONFIG_64BIT
--
2.12.0
From: Matt Redfearn <[email protected]>
A recent change ("MIPS: memblock: Discard bootmem initialization")
removed the reservation of all memory below the kernel's _end symbol in
bootmem. This makes the call to free_bootmem unnecessary, since the
memory region is no longer marked reserved.
Additionally, ("MIPS: memblock: Print out kernel virtual mem
layout") added a display of the kernel's virtual memory layout, so
printing the relocation information at this point is redundant.
Remove this section of code.
Signed-off-by: Matt Redfearn <[email protected]>
---
arch/mips/kernel/setup.c | 23 -----------------------
1 file changed, 23 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b5fcacf71b3f..cf3674977170 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -528,29 +528,6 @@ static void __init bootmem_init(void)
memory_present(0, start, end);
}
-#ifdef CONFIG_RELOCATABLE
- /*
- * The kernel reserves all memory below its _end symbol as bootmem,
- * but the kernel may now be at a much higher address. The memory
- * between the original and new locations may be returned to the system.
- */
- if (__pa_symbol(_text) > __pa_symbol(VMLINUX_LOAD_ADDRESS)) {
- unsigned long offset;
- extern void show_kernel_relocation(const char *level);
-
- offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS);
- free_bootmem(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
-
-#if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO)
- /*
- * This information is necessary when debugging the kernel
- * But is a security vulnerability otherwise!
- */
- show_kernel_relocation(KERN_INFO);
-#endif
- }
-#endif
-
/*
* Reserve initrd memory if needed.
*/
--
2.12.0
Loongson64/3 runs its own code to initialize memory allocator in
case of NUMA configuration is selected. So in order to move to the
pure memblock utilization we discard the bootmem allocator usage
and insert the memblock reservation method for kernel/addrspace_offset
memory regions.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/loongson64/loongson-3/numa.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/mips/loongson64/loongson-3/numa.c b/arch/mips/loongson64/loongson-3/numa.c
index f17ef520799a..2f1ebf496c17 100644
--- a/arch/mips/loongson64/loongson-3/numa.c
+++ b/arch/mips/loongson64/loongson-3/numa.c
@@ -180,7 +180,6 @@ static void __init szmem(unsigned int node)
static void __init node_mem_init(unsigned int node)
{
- unsigned long bootmap_size;
unsigned long node_addrspace_offset;
unsigned long start_pfn, end_pfn, freepfn;
@@ -197,26 +196,21 @@ static void __init node_mem_init(unsigned int node)
__node_data[node] = prealloc__node_data + node;
- NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
- bootmap_size = init_bootmem_node(NODE_DATA(node), freepfn,
- start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
if (node == 0) /* used by finalize_initrd() */
max_low_pfn = end_pfn;
- /* This is reserved for the kernel and bdata->node_bootmem_map */
- reserve_bootmem_node(NODE_DATA(node), start_pfn << PAGE_SHIFT,
- ((freepfn - start_pfn) << PAGE_SHIFT) + bootmap_size,
- BOOTMEM_DEFAULT);
+ /* This is reserved for the kernel only */
+ if (node == 0)
+ memblock_reserve(start_pfn << PAGE_SHIFT,
+ ((freepfn - start_pfn) << PAGE_SHIFT));
if (node == 0 && node_end_pfn(0) >= (0xffffffff >> PAGE_SHIFT)) {
/* Reserve 0xfe000000~0xffffffff for RS780E integrated GPU */
- reserve_bootmem_node(NODE_DATA(node),
- (node_addrspace_offset | 0xfe000000),
- 32 << 20, BOOTMEM_DEFAULT);
+ memblock_reserve(node_addrspace_offset | 0xfe000000, 32 << 20);
}
sparse_memory_present_with_active_regions(node);
--
2.12.0
If sparsemem is activated all sections with present pages must
be accordingly marked after memblock is fully initialized.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b2a5b89ae6b2..54302319ce1c 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -837,6 +837,11 @@ static void __init arch_mem_init(char **cmdline_p)
crashk_res.end - crashk_res.start + 1);
#endif
device_tree_init();
+#ifdef CONFIG_SPARSEMEM
+ for_each_memblock(memory, reg)
+ memory_present(0, memblock_region_memory_base_pfn(reg),
+ memblock_region_memory_end_pfn(reg));
+#endif /* CONFIG_SPARSEMEM */
sparse_init();
plat_swiotlb_setup();
--
2.12.0
The memory reservation has to be performed for all the crucial
objects like kernel itself, it data and fdt blob. FDT reserved-memory
nodes should also be scanned to declare or discard reserved memory
regions, but it has to be done after the memblock is fully initialized
with low/high RAM (see the function description/code).
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 96 +++++++++++++++++++++++++++---------------------
1 file changed, 54 insertions(+), 42 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index cf3674977170..72853e94c2c7 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -362,6 +362,10 @@ static unsigned long __init init_initrd(void)
static void __init bootmem_init(void)
{
init_initrd();
+}
+
+static void __init reservation_init(void)
+{
finalize_initrd();
}
@@ -478,60 +482,70 @@ static void __init bootmem_init(void)
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+}
+
+static void __init reservation_init(void)
+{
+ phys_addr_t size;
+ int i;
/*
- * Register fully available low RAM pages with the bootmem allocator.
+ * Reserve memory occupied by the kernel and it data
*/
- for (i = 0; i < boot_mem_map.nr_map; i++) {
- unsigned long start, end, size;
+ size = __pa_symbol(&_end) - __pa_symbol(&_text);
+ memblock_reserve(__pa_symbol(&_text), size);
- start = PFN_UP(boot_mem_map.map[i].addr);
- end = PFN_DOWN(boot_mem_map.map[i].addr
- + boot_mem_map.map[i].size);
+ /*
+ * Handle FDT and it reserved-memory nodes now
+ */
+ early_init_fdt_reserve_self();
+ early_init_fdt_scan_reserved_mem();
- /*
- * Reserve usable memory.
- */
- switch (boot_mem_map.map[i].type) {
- case BOOT_MEM_RAM:
- break;
- case BOOT_MEM_INIT_RAM:
- memory_present(0, start, end);
- continue;
- default:
- /* Not usable memory */
- if (start > min_low_pfn && end < max_low_pfn)
- reserve_bootmem(boot_mem_map.map[i].addr,
- boot_mem_map.map[i].size,
- BOOTMEM_DEFAULT);
- continue;
- }
+ /*
+ * Reserve requested memory ranges with the memblock allocator.
+ */
+ for (i = 0; i < boot_mem_map.nr_map; i++) {
+ phys_addr_t start, end;
- /*
- * We are rounding up the start address of usable memory
- * and at the end of the usable range downwards.
- */
- if (start >= max_low_pfn)
+ if (boot_mem_map.map[i].type == BOOT_MEM_RAM)
continue;
- if (end > max_low_pfn)
- end = max_low_pfn;
+
+ start = boot_mem_map.map[i].addr;
+ end = boot_mem_map.map[i].addr + boot_mem_map.map[i].size;
+ size = boot_mem_map.map[i].size;
/*
- * ... finally, is the area going away?
+ * Make sure the region isn't already reserved
*/
- if (end <= start)
+ if (memblock_is_region_reserved(start, size)) {
+ pr_warn("Reserved region %08zx @ %pa already in-use\n",
+ (size_t)size, &start);
continue;
- size = end - start;
+ }
- /* Register lowmem ranges */
- free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
- memory_present(0, start, end);
+ switch (boot_mem_map.map[i].type) {
+ case BOOT_MEM_ROM_DATA:
+ case BOOT_MEM_RESERVED:
+ case BOOT_MEM_INIT_RAM:
+ memblock_reserve(start, size);
+ break;
+ case BOOT_MEM_RESERVED_NOMAP:
+ default:
+ memblock_remove(start, size);
+ break;
+ }
}
/*
* Reserve initrd memory if needed.
*/
finalize_initrd();
+
+ /*
+ * Reserve for hibernation
+ */
+ size = __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin);
+ memblock_reserve(__pa_symbol(&__nosave_begin), size);
}
#endif /* CONFIG_SGI_IP27 */
@@ -546,6 +560,7 @@ static void __init bootmem_init(void)
* kernel but generic memory management system is still entirely uninitialized.
*
* o bootmem_init()
+ * o reservation_init()
* o sparse_init()
* o paging_init()
* o dma_contiguous_reserve()
@@ -803,10 +818,10 @@ static void __init arch_mem_init(char **cmdline_p)
print_memory_map();
}
- early_init_fdt_reserve_self();
- early_init_fdt_scan_reserved_mem();
-
bootmem_init();
+
+ reservation_init();
+
#ifdef CONFIG_PROC_VMCORE
if (setup_elfcorehdr && setup_elfcorehdr_size) {
printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
@@ -832,9 +847,6 @@ static void __init arch_mem_init(char **cmdline_p)
for_each_memblock(reserved, reg)
if (reg->size != 0)
reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
-
- reserve_bootmem_region(__pa_symbol(&__nosave_begin),
- __pa_symbol(&__nosave_end)); /* Reserve for hibernation */
}
static void __init resource_init(void)
--
2.12.0
Low memory can be tested at this point, since all the
reservations have just been finished without much of
additional allocations.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 531a1471a2a4..a0eac8160750 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -850,6 +850,8 @@ static void __init arch_mem_init(char **cmdline_p)
memblock_allow_resize();
memblock_dump_all();
+
+ early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
}
static void __init resource_init(void)
--
2.12.0
When all the main reservations are done the memblock regions
can be dynamically resized. Additionally it would be useful to have
memblock regions dumped on debug at this point.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 158a52c17e29..531a1471a2a4 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -846,6 +846,10 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
+
+ memblock_allow_resize();
+
+ memblock_dump_all();
}
static void __init resource_init(void)
--
2.12.0
Kdump/crashkernel memory regions should be reserved in the
memblock allocator so they wouldn't be occupied by any further
allocations.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 72853e94c2c7..b2a5b89ae6b2 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -826,17 +826,15 @@ static void __init arch_mem_init(char **cmdline_p)
if (setup_elfcorehdr && setup_elfcorehdr_size) {
printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
setup_elfcorehdr, setup_elfcorehdr_size);
- reserve_bootmem(setup_elfcorehdr, setup_elfcorehdr_size,
- BOOTMEM_DEFAULT);
+ memblock_reserve(setup_elfcorehdr, setup_elfcorehdr_size);
}
#endif
mips_parse_crashkernel();
#ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end)
- reserve_bootmem(crashk_res.start,
- crashk_res.end - crashk_res.start + 1,
- BOOTMEM_DEFAULT);
+ memblock_reserve(crashk_res.start,
+ crashk_res.end - crashk_res.start + 1);
#endif
device_tree_init();
sparse_init();
--
2.12.0
CMA reserves it areas in the memblock allocator. Since we aren't
using bootmem anymore, the reservations copying should be discarded.
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 54302319ce1c..158a52c17e29 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -755,7 +755,7 @@ static void __init request_crashkernel(struct resource *res)
static void __init arch_mem_init(char **cmdline_p)
{
- struct memblock_region *reg;
+ struct memblock_region *reg __maybe_unused;
extern void plat_mem_setup(void);
/* call board setup routine */
@@ -846,10 +846,6 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
- /* Tell bootmem about cma reserved memblock section */
- for_each_memblock(reserved, reg)
- if (reg->size != 0)
- reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
}
static void __init resource_init(void)
--
2.12.0
On 02/02/18 03:54, Serge Semin wrote:
> The current MIPS code makes sure the kernel code/data/init
> sections are in the maps, but BSS should also be there.
>
> Signed-off-by: Serge Semin <[email protected]>
Looks good to me :-)
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> ---
> arch/mips/kernel/setup.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 1a4d64410303..f502cd702fa7 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> BOOT_MEM_INIT_RAM);
> + arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> + PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> + BOOT_MEM_RAM);
>
> pr_info("Determined physical RAM map:\n");
> print_memory_map();
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> Even if nomap flag is specified the reserved memory declared in dts
> isn't really discarded from the buddy allocator in the current code.
> We'll fix it by adding the no-map MIPS memory flag. Additionally
> lets add the RESERVED_NOMAP memory regions handling to the methods,
> which aren't going to be changed in the further patches.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/include/asm/bootinfo.h | 1 +
> arch/mips/kernel/prom.c | 8 ++++++--
> arch/mips/kernel/setup.c | 8 ++++++++
> 3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
> index e26a093bb17a..276618b5319d 100644
> --- a/arch/mips/include/asm/bootinfo.h
> +++ b/arch/mips/include/asm/bootinfo.h
> @@ -90,6 +90,7 @@ extern unsigned long mips_machtype;
> #define BOOT_MEM_ROM_DATA 2
> #define BOOT_MEM_RESERVED 3
> #define BOOT_MEM_INIT_RAM 4
> +#define BOOT_MEM_RESERVED_NOMAP 5
>
> /*
> * A memory map that's built upon what was determined
> diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
> index 0dbcd152a1a9..b123eb8279cd 100644
> --- a/arch/mips/kernel/prom.c
> +++ b/arch/mips/kernel/prom.c
> @@ -41,7 +41,7 @@ char *mips_get_machine_name(void)
> #ifdef CONFIG_USE_OF
> void __init early_init_dt_add_memory_arch(u64 base, u64 size)
> {
> - return add_memory_region(base, size, BOOT_MEM_RAM);
> + add_memory_region(base, size, BOOT_MEM_RAM);
This is an unrelated change.
> }
>
> void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
> @@ -52,7 +52,11 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
> int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
> phys_addr_t size, bool nomap)
> {
> - add_memory_region(base, size, BOOT_MEM_RESERVED);
> + if (!nomap)
> + add_memory_region(base, size, BOOT_MEM_RESERVED);
> + else
> + add_memory_region(base, size, BOOT_MEM_RESERVED_NOMAP);
> +
> return 0;
> }
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 702c678de116..1a4d64410303 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -172,6 +172,7 @@ bool __init memory_region_available(phys_addr_t start, phys_addr_t size)
> in_ram = true;
> break;
> case BOOT_MEM_RESERVED:
> + case BOOT_MEM_RESERVED_NOMAP:
> if ((start >= start_ && start < end_) ||
> (start < start_ && start + size >= start_))
> free = false;
> @@ -207,6 +208,9 @@ static void __init print_memory_map(void)
> case BOOT_MEM_RESERVED:
> printk(KERN_CONT "(reserved)\n");
> break;
> + case BOOT_MEM_RESERVED_NOMAP:
> + printk(KERN_CONT "(reserved nomap)\n");
> + break;
> default:
> printk(KERN_CONT "type %lu\n", boot_mem_map.map[i].type);
> break;
> @@ -955,9 +959,13 @@ static void __init resource_init(void)
> res->name = "System RAM";
> res->flags |= IORESOURCE_SYSRAM;
> break;
> + case BOOT_MEM_RESERVED_NOMAP:
> + res->name = "reserved nomap";
> + break;
> case BOOT_MEM_RESERVED:
> default:
> res->name = "reserved";
> + break;
This is an unrelated change.
With those 2 removed, I think this looks fine.
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> }
>
> request_resource(&iomem_resource, res);
>
On 02/02/18 03:54, Serge Semin wrote:
> There is no reserve_bootmem() method in the nobootmem interface,
> so we need to replace it with memblock-specific one.
>
> Signed-off-by: Serge Semin <[email protected]>
Looks good to me :-)
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> ---
> arch/mips/kernel/setup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index f502cd702fa7..a015cee353be 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -330,7 +330,7 @@ static void __init finalize_initrd(void)
>
> maybe_bswap_initrd();
>
> - reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT);
> + memblock_reserve(__pa(initrd_start), size);
> initrd_below_start_ok = 1;
>
> pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n",
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> Since memblock is going to be used for the early memory allocation
> lets discard the bootmem node setup and all the related free-space
> search code. Low/high PFN extremums should be still calculated
> since they are needed on the paging_init stage. Since the current
> code is already doing memblock regions initialization the only thing
> left is to set the upper allocation limit to be up to the max low
> memory PFN, so the memblock API can be fully used from now.
Please could we move this patch to later in the series, perhaps before
the patches to remove bootmem initialisation from Loongson3 / IP27? This
would vastly improve the bisectability of the series.
Either that, or less ideally, perhaps merge this patch with "MIPS:
memblock: Add reserved memory regions to memblock" since those 2
combined should be a more atomic change so more bisectable, if slightly
harder to review.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 86 +++++++-----------------------------------------
> 1 file changed, 11 insertions(+), 75 deletions(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index a015cee353be..b5fcacf71b3f 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -367,29 +367,15 @@ static void __init bootmem_init(void)
>
> #else /* !CONFIG_SGI_IP27 */
>
> -static unsigned long __init bootmap_bytes(unsigned long pages)
> -{
> - unsigned long bytes = DIV_ROUND_UP(pages, 8);
> -
> - return ALIGN(bytes, sizeof(long));
> -}
> -
> static void __init bootmem_init(void)
> {
> - unsigned long reserved_end;
> - unsigned long mapstart = ~0UL;
> - unsigned long bootmap_size;
> - bool bootmap_valid = false;
> int i;
>
> /*
> - * Sanity check any INITRD first. We don't take it into account
> - * for bootmem setup initially, rely on the end-of-kernel-code
> - * as our memory range starting point. Once bootmem is inited we
> + * Sanity check any INITRD first. Once memblock is inited we
> * will reserve the area used for the initrd.
> */
> init_initrd();
> - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
>
> /*
> * max_low_pfn is not a number of pages. The number of pages
> @@ -428,16 +414,6 @@ static void __init bootmem_init(void)
> max_low_pfn = end;
> if (start < min_low_pfn)
> min_low_pfn = start;
> - if (end <= reserved_end)
> - continue;
> -#ifdef CONFIG_BLK_DEV_INITRD
> - /* Skip zones before initrd and initrd itself */
> - if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end)))
> - continue;
> -#endif
> - if (start >= mapstart)
> - continue;
> - mapstart = max(reserved_end, start);
> }
>
> if (min_low_pfn >= max_low_pfn)
> @@ -463,53 +439,19 @@ static void __init bootmem_init(void)
> #endif
> max_low_pfn = PFN_DOWN(HIGHMEM_START);
> }
> -
> -#ifdef CONFIG_BLK_DEV_INITRD
> - /*
> - * mapstart should be after initrd_end
> - */
> - if (initrd_end)
> - mapstart = max(mapstart, (unsigned long)PFN_UP(__pa(initrd_end)));
> +#ifdef CONFIG_HIGHMEM
> + pr_info("PFNs: low min %lu, low max %lu, high start %lu, high end %lu,"
> + "max %lu\n",
> + min_low_pfn, max_low_pfn, highstart_pfn, highend_pfn, max_pfn);
> +#else
> + pr_info("PFNs: low min %lu, low max %lu, max %lu\n",
> + min_low_pfn, max_low_pfn, max_pfn);
I think this debug info should be pr_debug (or removed from the final
version).
> #endif
>
> /*
> - * check that mapstart doesn't overlap with any of
> - * memory regions that have been reserved through eg. DTB
> - */
> - bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn);
> -
> - bootmap_valid = memory_region_available(PFN_PHYS(mapstart),
> - bootmap_size);
> - for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) {
> - unsigned long mapstart_addr;
> -
> - switch (boot_mem_map.map[i].type) {
> - case BOOT_MEM_RESERVED:
> - mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr +
> - boot_mem_map.map[i].size);
> - if (PHYS_PFN(mapstart_addr) < mapstart)
> - break;
> -
> - bootmap_valid = memory_region_available(mapstart_addr,
> - bootmap_size);
> - if (bootmap_valid)
> - mapstart = PHYS_PFN(mapstart_addr);
> - break;
> - default:
> - break;
> - }
> - }
> -
> - if (!bootmap_valid)
> - panic("No memory area to place a bootmap bitmap");
> -
> - /*
> - * Initialize the boot-time allocator with low memory only.
> + * Initialize the boot-time allocator with low/high memory, but
> + * set the allocation limit to low memory only
> */
> - if (bootmap_size != init_bootmem_node(NODE_DATA(0), mapstart,
> - min_low_pfn, max_low_pfn))
> - panic("Unexpected memory size required for bootmap");
> -
> for (i = 0; i < boot_mem_map.nr_map; i++) {
> unsigned long start, end;
>
> @@ -535,6 +477,7 @@ static void __init bootmem_init(void)
>
> memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
> }
> + memblock_set_current_limit(PFN_PHYS(max_low_pfn));
>
> /*
> * Register fully available low RAM pages with the bootmem allocator.
> @@ -570,8 +513,6 @@ static void __init bootmem_init(void)
> */
> if (start >= max_low_pfn)
> continue;
> - if (start < reserved_end)
> - start = reserved_end;
> if (end > max_low_pfn)
> end = max_low_pfn;
>
> @@ -587,11 +528,6 @@ static void __init bootmem_init(void)
> memory_present(0, start, end);
> }
>
> - /*
> - * Reserve the bootmap memory.
> - */
> - reserve_bootmem(PFN_PHYS(mapstart), bootmap_size, BOOTMEM_DEFAULT);
> -
> #ifdef CONFIG_RELOCATABLE
> /*
> * The kernel reserves all memory below its _end symbol as bootmem,
>
Thanks,
Matt
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> From: Matt Redfearn <[email protected]>
>
> A recent change ("MIPS: memblock: Discard bootmem initialization")
> removed the reservation of all memory below the kernel's _end symbol in
> bootmem. This makes the call to free_bootmem unnecessary, since the
> memory region is no longer marked reserved.
>
> Additionally, ("MIPS: memblock: Print out kernel virtual mem
> layout") added a display of the kernel's virtual memory layout, so
> printing the relocation information at this point is redundant.
>
> Remove this section of code.
>
> Signed-off-by: Matt Redfearn <[email protected]>
Missing your SoB.
I think this change should go after you introduce the new mechanism,
i.e. after "MIPS: memblock: Print out kernel virtual mem layout", which
should probably go nearer the start of the series.
Thanks,
Matt
> ---
> arch/mips/kernel/setup.c | 23 -----------------------
> 1 file changed, 23 deletions(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index b5fcacf71b3f..cf3674977170 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -528,29 +528,6 @@ static void __init bootmem_init(void)
> memory_present(0, start, end);
> }
>
> -#ifdef CONFIG_RELOCATABLE
> - /*
> - * The kernel reserves all memory below its _end symbol as bootmem,
> - * but the kernel may now be at a much higher address. The memory
> - * between the original and new locations may be returned to the system.
> - */
> - if (__pa_symbol(_text) > __pa_symbol(VMLINUX_LOAD_ADDRESS)) {
> - unsigned long offset;
> - extern void show_kernel_relocation(const char *level);
> -
> - offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS);
> - free_bootmem(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
> -
> -#if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO)
> - /*
> - * This information is necessary when debugging the kernel
> - * But is a security vulnerability otherwise!
> - */
> - show_kernel_relocation(KERN_INFO);
> -#endif
> - }
> -#endif
> -
> /*
> * Reserve initrd memory if needed.
> */
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> The memory reservation has to be performed for all the crucial
> objects like kernel itself, it data and fdt blob. FDT reserved-memory
> nodes should also be scanned to declare or discard reserved memory
> regions, but it has to be done after the memblock is fully initialized
> with low/high RAM (see the function description/code).
Again, if possible, introduce this change before discarding bootmem
initialisation, so that the series can be bisected.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 96 +++++++++++++++++++++++++++---------------------
> 1 file changed, 54 insertions(+), 42 deletions(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index cf3674977170..72853e94c2c7 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -362,6 +362,10 @@ static unsigned long __init init_initrd(void)
> static void __init bootmem_init(void)
> {
> init_initrd();
> +}
> +
> +static void __init reservation_init(void)
> +{
> finalize_initrd();
> }
>
> @@ -478,60 +482,70 @@ static void __init bootmem_init(void)
> memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
> }
> memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> +}
> +
> +static void __init reservation_init(void)
> +{
> + phys_addr_t size;
> + int i;
>
> /*
> - * Register fully available low RAM pages with the bootmem allocator.
> + * Reserve memory occupied by the kernel and it data
> */
> - for (i = 0; i < boot_mem_map.nr_map; i++) {
> - unsigned long start, end, size;
> + size = __pa_symbol(&_end) - __pa_symbol(&_text);
> + memblock_reserve(__pa_symbol(&_text), size);
>
> - start = PFN_UP(boot_mem_map.map[i].addr);
> - end = PFN_DOWN(boot_mem_map.map[i].addr
> - + boot_mem_map.map[i].size);
> + /*
> + * Handle FDT and it reserved-memory nodes now
> + */
> + early_init_fdt_reserve_self();
> + early_init_fdt_scan_reserved_mem();
>
> - /*
> - * Reserve usable memory.
> - */
> - switch (boot_mem_map.map[i].type) {
> - case BOOT_MEM_RAM:
> - break;
> - case BOOT_MEM_INIT_RAM:
> - memory_present(0, start, end);
> - continue;
> - default:
> - /* Not usable memory */
> - if (start > min_low_pfn && end < max_low_pfn)
> - reserve_bootmem(boot_mem_map.map[i].addr,
> - boot_mem_map.map[i].size,
> - BOOTMEM_DEFAULT);
> - continue;
> - }
I think this change maybe belongs in "MIPS: memblock: Discard bootmem
initialization"?
Thanks,
Matt
> + /*
> + * Reserve requested memory ranges with the memblock allocator.
> + */
> + for (i = 0; i < boot_mem_map.nr_map; i++) {
> + phys_addr_t start, end;
>
> - /*
> - * We are rounding up the start address of usable memory
> - * and at the end of the usable range downwards.
> - */
> - if (start >= max_low_pfn)
> + if (boot_mem_map.map[i].type == BOOT_MEM_RAM)
> continue;
> - if (end > max_low_pfn)
> - end = max_low_pfn;
> +
> + start = boot_mem_map.map[i].addr;
> + end = boot_mem_map.map[i].addr + boot_mem_map.map[i].size;
> + size = boot_mem_map.map[i].size;
>
> /*
> - * ... finally, is the area going away?
> + * Make sure the region isn't already reserved
> */
> - if (end <= start)
> + if (memblock_is_region_reserved(start, size)) {
> + pr_warn("Reserved region %08zx @ %pa already in-use\n",
> + (size_t)size, &start); > continue;
> - size = end - start;
> + }
>
> - /* Register lowmem ranges */
> - free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
> - memory_present(0, start, end);
> + switch (boot_mem_map.map[i].type) {
> + case BOOT_MEM_ROM_DATA:
> + case BOOT_MEM_RESERVED:
> + case BOOT_MEM_INIT_RAM:
> + memblock_reserve(start, size);
> + break;
> + case BOOT_MEM_RESERVED_NOMAP:
> + default:
> + memblock_remove(start, size);
> + break;
> + }
> }
>
> /*
> * Reserve initrd memory if needed.
> */
> finalize_initrd();
> +
> + /*
> + * Reserve for hibernation
> + */
> + size = __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin);
> + memblock_reserve(__pa_symbol(&__nosave_begin), size);
> }
>
> #endif /* CONFIG_SGI_IP27 */
> @@ -546,6 +560,7 @@ static void __init bootmem_init(void)
> * kernel but generic memory management system is still entirely uninitialized.
> *
> * o bootmem_init()
> + * o reservation_init()
> * o sparse_init()
> * o paging_init()
> * o dma_contiguous_reserve()
> @@ -803,10 +818,10 @@ static void __init arch_mem_init(char **cmdline_p)
> print_memory_map();
> }
>
> - early_init_fdt_reserve_self();
> - early_init_fdt_scan_reserved_mem();
> -
> bootmem_init();
> +
> + reservation_init();
> +
> #ifdef CONFIG_PROC_VMCORE
> if (setup_elfcorehdr && setup_elfcorehdr_size) {
> printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
> @@ -832,9 +847,6 @@ static void __init arch_mem_init(char **cmdline_p)
> for_each_memblock(reserved, reg)
> if (reg->size != 0)
> reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
> -
> - reserve_bootmem_region(__pa_symbol(&__nosave_begin),
> - __pa_symbol(&__nosave_end)); /* Reserve for hibernation */
> }
>
> static void __init resource_init(void)
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> Kdump/crashkernel memory regions should be reserved in the
> memblock allocator so they wouldn't be occupied by any further
> allocations.
>
> Signed-off-by: Serge Semin <[email protected]>
This looks good to me
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> ---
> arch/mips/kernel/setup.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 72853e94c2c7..b2a5b89ae6b2 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -826,17 +826,15 @@ static void __init arch_mem_init(char **cmdline_p)
> if (setup_elfcorehdr && setup_elfcorehdr_size) {
> printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
> setup_elfcorehdr, setup_elfcorehdr_size);
> - reserve_bootmem(setup_elfcorehdr, setup_elfcorehdr_size,
> - BOOTMEM_DEFAULT);
> + memblock_reserve(setup_elfcorehdr, setup_elfcorehdr_size);
> }
> #endif
>
> mips_parse_crashkernel();
> #ifdef CONFIG_KEXEC
> if (crashk_res.start != crashk_res.end)
> - reserve_bootmem(crashk_res.start,
> - crashk_res.end - crashk_res.start + 1,
> - BOOTMEM_DEFAULT);
> + memblock_reserve(crashk_res.start,
> + crashk_res.end - crashk_res.start + 1);
> #endif
> device_tree_init();
> sparse_init();
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> If sparsemem is activated all sections with present pages must
> be accordingly marked after memblock is fully initialized.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index b2a5b89ae6b2..54302319ce1c 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -837,6 +837,11 @@ static void __init arch_mem_init(char **cmdline_p)
> crashk_res.end - crashk_res.start + 1);
> #endif
> device_tree_init();
> +#ifdef CONFIG_SPARSEMEM
> + for_each_memblock(memory, reg)
> + memory_present(0, memblock_region_memory_base_pfn(reg),
> + memblock_region_memory_end_pfn(reg));
> +#endif /* CONFIG_SPARSEMEM */
Existing code calls memory_present without CONFIG_SPARSEMEM, is it
really conditional on SPARSEMEM?
Thanks,
Matt
> sparse_init();
> plat_swiotlb_setup();
>
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> When all the main reservations are done the memblock regions
> can be dynamically resized. Additionally it would be useful to have
> memblock regions dumped on debug at this point.
>
> Signed-off-by: Serge Semin <[email protected]>
Looks good to me.
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> ---
> arch/mips/kernel/setup.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 158a52c17e29..531a1471a2a4 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -846,6 +846,10 @@ static void __init arch_mem_init(char **cmdline_p)
> plat_swiotlb_setup();
>
> dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
> +
> + memblock_allow_resize();
> +
> + memblock_dump_all();
> }
>
> static void __init resource_init(void)
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> Low memory can be tested at this point, since all the
> reservations have just been finished without much of
> additional allocations.
Perhaps something along the lines of:
"Allow an early memtest to be performed by calling early_memtest.
Testing at this point in the boot sequence should be safe since all
critical areas are now reserved and a minimum of allocations have been
done."
Otherwise, looks good to me.
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 531a1471a2a4..a0eac8160750 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -850,6 +850,8 @@ static void __init arch_mem_init(char **cmdline_p)
> memblock_allow_resize();
>
> memblock_dump_all();
> +
> + early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
> }
>
> static void __init resource_init(void)
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> CMA reserves it areas in the memblock allocator. Since we aren't
> using bootmem anymore, the reservations copying should be discarded.
>
> Signed-off-by: Serge Semin <[email protected]>
Looks good to me
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> ---
> arch/mips/kernel/setup.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 54302319ce1c..158a52c17e29 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -755,7 +755,7 @@ static void __init request_crashkernel(struct resource *res)
>
> static void __init arch_mem_init(char **cmdline_p)
> {
> - struct memblock_region *reg;
> + struct memblock_region *reg __maybe_unused;
> extern void plat_mem_setup(void);
>
> /* call board setup routine */
> @@ -846,10 +846,6 @@ static void __init arch_mem_init(char **cmdline_p)
> plat_swiotlb_setup();
>
> dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
> - /* Tell bootmem about cma reserved memblock section */
> - for_each_memblock(reserved, reg)
> - if (reg->size != 0)
> - reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
> }
>
> static void __init resource_init(void)
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> It is useful to have the kernel virtual memory layout printed
> at boot time so to have the full information about the booted
> kernel. In some cases it might be unsafe to have virtual
> addresses freely visible in logs, so the %pK format is used if
> one want to hide them.
Please could you update the commit message to reflect the guard on
CONFIG_DEBUG_KERNEL, and the %pK format is no longer used.
It would be better to have this patch either at the start of the series,
or the end, since it is not strictly involved in switching between
bootmem and memblock. It would be better to have a short sequence of
patches to make that transition, with patches to lead up to that or
clean up afterwards at the beginning and end of the series. It will just
make any future bisection easier.
Thanks,
Matt
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/mm/init.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
> index 84b7b592b834..eec92194d4dc 100644
> --- a/arch/mips/mm/init.c
> +++ b/arch/mips/mm/init.c
> @@ -32,6 +32,7 @@
> #include <linux/kcore.h>
> #include <linux/export.h>
> #include <linux/initrd.h>
> +#include <linux/sizes.h>
>
> #include <asm/asm-offsets.h>
> #include <asm/bootinfo.h>
> @@ -60,6 +61,53 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
> EXPORT_SYMBOL(zero_page_mask);
>
> /*
> + * Print out the kernel virtual memory layout
> + */
> +#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
> +#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
> +#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
> +static void __init mem_print_kmap_info(void)
> +{
> +#ifdef CONFIG_DEBUG_KERNEL
> + pr_notice("Kernel virtual memory layout:\n"
> + " lowmem : 0x%px - 0x%px (%4ld MB)\n"
> + " .text : 0x%px - 0x%px (%4td kB)\n"
> + " .data : 0x%px - 0x%px (%4td kB)\n"
> + " .init : 0x%px - 0x%px (%4td kB)\n"
> + " .bss : 0x%px - 0x%px (%4td kB)\n"
> + " vmalloc : 0x%px - 0x%px (%4ld MB)\n"
> +#ifdef CONFIG_HIGHMEM
> + " pkmap : 0x%px - 0x%px (%4ld MB)\n"
> +#endif
> + " fixmap : 0x%px - 0x%px (%4ld kB)\n",
> + MLM(PAGE_OFFSET, (unsigned long)high_memory),
> + MLK_ROUNDUP(_text, _etext),
> + MLK_ROUNDUP(_sdata, _edata),
> + MLK_ROUNDUP(__init_begin, __init_end),
> + MLK_ROUNDUP(__bss_start, __bss_stop),
> + MLM(VMALLOC_START, VMALLOC_END),
> +#ifdef CONFIG_HIGHMEM
> + MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
> +#endif
> + MLK(FIXADDR_START, FIXADDR_TOP));
> +
> + /* Check some fundamental inconsistencies. May add something else? */
> +#ifdef CONFIG_HIGHMEM
> + BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
> + BUG_ON(VMALLOC_END < (unsigned long)high_memory);
> + BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
> + BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
> + (unsigned long)high_memory);
> +#endif
> + BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
> + BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
> +#endif /* CONFIG_DEBUG_KERNEL */
> +}
> +#undef MLK
> +#undef MLM
> +#undef MLK_ROUNDUP
> +
> +/*
> * Not static inline because used by IP27 special magic initialization code
> */
> void setup_zero_pages(void)
> @@ -468,6 +516,7 @@ void __init mem_init(void)
> free_all_bootmem();
> setup_zero_pages(); /* Setup zeroed pages. */
> mem_init_free_highmem();
> + mem_print_kmap_info();
> mem_init_print_info(NULL);
>
> #ifdef CONFIG_64BIT
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> Loongson64/3 runs its own code to initialize memory allocator in
> case of NUMA configuration is selected. So in order to move to the
> pure memblock utilization we discard the bootmem allocator usage
> and insert the memblock reservation method for kernel/addrspace_offset
> memory regions.
>
I don't have a NUMA Loongson to test with, but on a non-NUMA Loongson3
machine, tested as a part of the whole series, this works and looks good
to me.
Reviewed-by: Matt Redfearn <[email protected]>
Thanks,
Matt
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/loongson64/loongson-3/numa.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/arch/mips/loongson64/loongson-3/numa.c b/arch/mips/loongson64/loongson-3/numa.c
> index f17ef520799a..2f1ebf496c17 100644
> --- a/arch/mips/loongson64/loongson-3/numa.c
> +++ b/arch/mips/loongson64/loongson-3/numa.c
> @@ -180,7 +180,6 @@ static void __init szmem(unsigned int node)
>
> static void __init node_mem_init(unsigned int node)
> {
> - unsigned long bootmap_size;
> unsigned long node_addrspace_offset;
> unsigned long start_pfn, end_pfn, freepfn;
>
> @@ -197,26 +196,21 @@ static void __init node_mem_init(unsigned int node)
>
> __node_data[node] = prealloc__node_data + node;
>
> - NODE_DATA(node)->bdata = &bootmem_node_data[node];
> NODE_DATA(node)->node_start_pfn = start_pfn;
> NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
>
> - bootmap_size = init_bootmem_node(NODE_DATA(node), freepfn,
> - start_pfn, end_pfn);
> free_bootmem_with_active_regions(node, end_pfn);
> if (node == 0) /* used by finalize_initrd() */
> max_low_pfn = end_pfn;
>
> - /* This is reserved for the kernel and bdata->node_bootmem_map */
> - reserve_bootmem_node(NODE_DATA(node), start_pfn << PAGE_SHIFT,
> - ((freepfn - start_pfn) << PAGE_SHIFT) + bootmap_size,
> - BOOTMEM_DEFAULT);
> + /* This is reserved for the kernel only */
> + if (node == 0)
> + memblock_reserve(start_pfn << PAGE_SHIFT,
> + ((freepfn - start_pfn) << PAGE_SHIFT));
>
> if (node == 0 && node_end_pfn(0) >= (0xffffffff >> PAGE_SHIFT)) {
> /* Reserve 0xfe000000~0xffffffff for RS780E integrated GPU */
> - reserve_bootmem_node(NODE_DATA(node),
> - (node_addrspace_offset | 0xfe000000),
> - 32 << 20, BOOTMEM_DEFAULT);
> + memblock_reserve(node_addrspace_offset | 0xfe000000, 32 << 20);
> }
>
> sparse_memory_present_with_active_regions(node);
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> SGI IP27 got its own code to set the early memory allocator up since it's
> NUMA-based system. So in order to be compatible with NO_BOOTMEM config
> we need to discard the bootmem allocator initialization and insert the
> memblock reservation method. Although in my opinion the code isn't
> working anyway since I couldn't find a place where prom_meminit() called
> and kernel memory isn't reserved. It must have been untested since the
> time the arch/mips/mips-boards/generic code was in the kernel.
I don't have access to an IP27, but the change looks sensible to me.
Reviewed-by: Matt Redfearn <[email protected]>
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/sgi-ip27/ip27-memory.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
> index 59133d0abc83..c480ee3eca96 100644
> --- a/arch/mips/sgi-ip27/ip27-memory.c
> +++ b/arch/mips/sgi-ip27/ip27-memory.c
> @@ -389,7 +389,6 @@ static void __init node_mem_init(cnodeid_t node)
> {
> unsigned long slot_firstpfn = slot_getbasepfn(node, 0);
> unsigned long slot_freepfn = node_getfirstfree(node);
> - unsigned long bootmap_size;
> unsigned long start_pfn, end_pfn;
>
> get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
> @@ -400,7 +399,6 @@ static void __init node_mem_init(cnodeid_t node)
> __node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
> memset(__node_data[node], 0, PAGE_SIZE);
>
> - NODE_DATA(node)->bdata = &bootmem_node_data[node];
> NODE_DATA(node)->node_start_pfn = start_pfn;
> NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
>
> @@ -409,12 +407,9 @@ static void __init node_mem_init(cnodeid_t node)
> slot_freepfn += PFN_UP(sizeof(struct pglist_data) +
> sizeof(struct hub_data));
>
> - bootmap_size = init_bootmem_node(NODE_DATA(node), slot_freepfn,
> - start_pfn, end_pfn);
> free_bootmem_with_active_regions(node, end_pfn);
> - reserve_bootmem_node(NODE_DATA(node), slot_firstpfn << PAGE_SHIFT,
> - ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT) + bootmap_size,
> - BOOTMEM_DEFAULT);
> + memblock_reserve(slot_firstpfn << PAGE_SHIFT,
> + ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT));
How about PFN_PHYS()? In fact, that could be used throughout the series
to tidy up some of the shifting by PAGE_SIZE.
Thanks,
Matt
> sparse_memory_present_with_active_regions(node);
> }
>
>
Hi Serge,
On 02/02/18 03:54, Serge Semin wrote:
> Memblock allocator can be successfully used from now for early
> memory management
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 350a990fc719..434f756e03e9 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -4,7 +4,6 @@ config MIPS
> default y
> select ARCH_BINFMT_ELF_STATE
> select ARCH_CLOCKSOURCE_DATA
> - select ARCH_DISCARD_MEMBLOCK
> select ARCH_HAS_ELF_RANDOMIZE
> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> select ARCH_MIGHT_HAVE_PC_PARPORT
> @@ -57,6 +56,7 @@ config MIPS
> select HAVE_IRQ_TIME_ACCOUNTING
> select HAVE_KPROBES
> select HAVE_KRETPROBES
> + select NO_BOOTMEM
Please maintain the alphabetical order in config MIPS. It makes
conflicts easier to manage.
Thanks,
Matt
> select HAVE_MEMBLOCK
> select HAVE_MEMBLOCK_NODE_MAP
> select HAVE_MOD_ARCH_SPECIFIC
>