2019-04-23 22:51:22

by Serge Semin

[permalink] [raw]
Subject: [PATCH 00/12] mips: Post-bootmem-memblock transition fixes

First attempt of making the MIPS subsystem utilizing the memblock early memory
allocator was done by me a few years ago. I created a patchset with
21 patches [1]. It turned out to be too complicated and I decided to resend a
reworked patchset with smaller number of changes [2]. I did this and after a
small review process a v2 patchset was also posted. Then my spare
time was over and I couldn't proceed with the patchset support and
resubmission.

In a year Mike Rapoport took charge in this task and posted a small
patch which essentially did the bootmem allocator removal from MIPS
subsystem [3]. A single small patch did in general the whole thing my huge
patchsetes were intended for in the first place (though it lacked a few fixes).
Mike even went further and completely removed the bootmem allocator from
kernel code, so all the subsystems would need to use the only one early
memory allocator. This significantly simplified the platforms code as well
as removed a deprecated subsystem with duplicated functionality. Million
credits to Mike for this.

Getting back to the MIPS subsystem and it memblock allocator usage. Even
though the patch suggested by Mike [3] fixed most of the problems caused
by enabling the memblock allocator usage, some of them have been left
uncovered by it. First of all the PFNs calculation algorithm hasn't been
fully refactored. A reserved memory declaration loop has been left
untouched though it was clearly over-complicated for the new initialization
procedure. Secondly the MIPS platform code reserved the whole space below
kernel start address, which doesn't seem right since kernel can be
located much higher than memory space really starts. Thirdly CMA if it
is enabled reserves memory regions by means of memblock in the first place.
So the bootmem-init code doesn't need to do it again. Fifthly at early
platform initialization stage non of bootmem-left methods can be called
since there is no memory pages mapping at that moment, so __nosave* region
must be reserved by means of memblock allocator. Finally aside from memblock
allocator introduction my early patchsets included a series of useful
alterations like "nomap" property implementation for "reserved-memory"
dts-nodes, memblock_dump_all() method call after early memory allocator
initialization for debugging, low-memory test procedure, kernel memory
mapping printout at boot-time, and so on. So all of these fixes and
alterations are introduced in this new patchset. Please review. Hope
this time I'll be more responsive and finish this series up until it
is merged.

[1] https://lkml.org/lkml/2016/12/18/195
[2] https://lkml.org/lkml/2018/1/17/1201
[3] https://lkml.org/lkml/2018/9/10/302

NOTE I added a few "Reviewed-by: Matt Redfearn <[email protected]>"
since some patches of this series have been picked up from my earlier
patchsets, which Matt's already reviewed. I didn't add the tag for patches,
which were either new or partially ported.

Serge Semin (12):
mips: Make sure kernel .bss exists in boot mem pool
mips: Discard rudiments from bootmem_init
mips: Combine memblock init and memory reservation loops
mips: Reserve memory for the kernel image resources
mips: Discard post-CMA-init foreach loop
mips: Use memblock to reserve the __nosave memory range
mips: Add reserve-nomap memory type support
mips: Dump memblock regions for debugging
mips: Perform early low memory test
mips: Print the kernel virtual mem layout on debugging
mips: Make sure dt memory regions are valid
mips: Enable OF_RESERVED_MEM config

arch/mips/Kconfig | 1 +
arch/mips/include/asm/bootinfo.h | 1 +
arch/mips/kernel/prom.c | 18 ++++-
arch/mips/kernel/setup.c | 129 +++++++++----------------------
arch/mips/mm/init.c | 49 ++++++++++++
5 files changed, 102 insertions(+), 96 deletions(-)

--
2.21.0


2019-04-23 22:50:42

by Serge Semin

[permalink] [raw]
Subject: [PATCH 01/12] mips: Make sure kernel .bss exists in boot mem pool

Current MIPS platform code makes sure the kernel text, data and init
sections are added to the boot memory map pool right after the
arch-specific memory setup method has been executed. But for some reason
the MIPS platform code skipped the kernel .bss section, which definitely
should be in the boot mem pool as well in any case. Lets fix this just be
adding the space between __bss_start and __bss_stop.

Reviewed-by: Matt Redfearn <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 8d1dc6c71173..0ee033c44116 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -809,6 +809,9 @@ static void __init arch_mem_init(char **cmdline_p)
arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
BOOT_MEM_INIT_RAM);
+ arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
+ BOOT_MEM_RAM);

pr_info("Determined physical RAM map:\n");
print_memory_map();
--
2.21.0

2019-04-23 22:50:51

by Serge Semin

[permalink] [raw]
Subject: [PATCH 06/12] mips: Use memblock to reserve the __nosave memory range

Originally before legacy bootmem was removed, the memory for the range was
correctly reserved by reserve_bootmem_region(). But since memblock has been
selected for early memory allocation the function can be utilized only
after paging is fully initialized (as it is done by memblock_free_all()
function). So calling it from arch_mem_init() method is prone to errors,
and at this stage we need to reserve the memory in the memblock allocator.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2ae6b02b948f..3a5140943f54 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -814,8 +814,9 @@ static void __init arch_mem_init(char **cmdline_p)

dma_contiguous_reserve(PFN_PHYS(max_low_pfn));

- reserve_bootmem_region(__pa_symbol(&__nosave_begin),
- __pa_symbol(&__nosave_end)); /* Reserve for hibernation */
+ /* Reserve for hibernation. */
+ memblock_reserve(__pa_symbol(&__nosave_begin),
+ __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
}

static void __init resource_init(void)
--
2.21.0

2019-04-23 22:50:51

by Serge Semin

[permalink] [raw]
Subject: [PATCH 07/12] mips: Add reserve-nomap memory type support

It might be necessary to prevent the virtual mapping creation for a
requested memory region. For instance there is a "no-map" property
indicating exactly this feature. In this case we need to not only
reserve the specified region by pretending it doesn't exist in the
memory space, but completely remove the range from system just by
removing it from memblock. The same way it's done in default
early_init_dt_reserve_memory_arch() method.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/include/asm/bootinfo.h | 1 +
arch/mips/kernel/prom.c | 4 +++-
arch/mips/kernel/setup.c | 8 ++++++++
3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index a301a8f4bc66..235bc2f52113 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -92,6 +92,7 @@ extern unsigned long mips_machtype;
#define BOOT_MEM_ROM_DATA 2
#define BOOT_MEM_RESERVED 3
#define BOOT_MEM_INIT_RAM 4
+#define BOOT_MEM_NOMAP 5

/*
* A memory map that's built upon what was determined
diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 93b8e0b4332f..437a174e3ef9 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -47,7 +47,9 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
{
- add_memory_region(base, size, BOOT_MEM_RESERVED);
+ add_memory_region(base, size,
+ nomap ? BOOT_MEM_NOMAP : BOOT_MEM_RESERVED);
+
return 0;
}

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 3a5140943f54..2a1b2e7a1bc9 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -178,6 +178,7 @@ static bool __init __maybe_unused memory_region_available(phys_addr_t start,
in_ram = true;
break;
case BOOT_MEM_RESERVED:
+ case BOOT_MEM_NOMAP:
if ((start >= start_ && start < end_) ||
(start < start_ && start + size >= start_))
free = false;
@@ -213,6 +214,9 @@ static void __init print_memory_map(void)
case BOOT_MEM_RESERVED:
printk(KERN_CONT "(reserved)\n");
break;
+ case BOOT_MEM_NOMAP:
+ printk(KERN_CONT "(nomap)\n");
+ break;
default:
printk(KERN_CONT "type %lu\n", boot_mem_map.map[i].type);
break;
@@ -487,6 +491,9 @@ static void __init bootmem_init(void)
switch (boot_mem_map.map[i].type) {
case BOOT_MEM_RAM:
break;
+ case BOOT_MEM_NOMAP: /* Discard the range from the system. */
+ memblock_remove(PFN_PHYS(start), PFN_PHYS(end - start));
+ continue;
default: /* Reserve the rest of the memory types at boot time */
memblock_reserve(PFN_PHYS(start), PFN_PHYS(end - start));
break;
@@ -861,6 +868,7 @@ static void __init resource_init(void)
res->flags |= IORESOURCE_SYSRAM;
break;
case BOOT_MEM_RESERVED:
+ case BOOT_MEM_NOMAP:
default:
res->name = "reserved";
}
--
2.21.0

2019-04-23 22:50:58

by Serge Semin

[permalink] [raw]
Subject: [PATCH 10/12] mips: Print the kernel virtual mem layout on debugging

It is useful at least for debugging to have the kernel virtual
memory layout printed at boot time so to have the full information
about the booted kernel. Make the printing optional and available
only when DEBUG_KERNEL config is enabled so not to leak a sensitive
kernel information.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/mm/init.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index bbb196ad5f26..c338bbd03b2a 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -31,6 +31,7 @@
#include <linux/gfp.h>
#include <linux/kcore.h>
#include <linux/initrd.h>
+#include <linux/sizes.h>

#include <asm/bootinfo.h>
#include <asm/cachectl.h>
@@ -56,6 +57,53 @@ unsigned long empty_zero_page, zero_page_mask;
EXPORT_SYMBOL_GPL(empty_zero_page);
EXPORT_SYMBOL(zero_page_mask);

+/*
+ * Print out the kernel virtual memory layout
+ */
+#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
+#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
+static void __init mem_print_kmap_info(void)
+{
+#ifdef CONFIG_DEBUG_KERNEL
+ pr_notice("Kernel virtual memory layout:\n"
+ " lowmem : 0x%px - 0x%px (%4ld MB)\n"
+ " .text : 0x%px - 0x%px (%4td kB)\n"
+ " .data : 0x%px - 0x%px (%4td kB)\n"
+ " .init : 0x%px - 0x%px (%4td kB)\n"
+ " .bss : 0x%px - 0x%px (%4td kB)\n"
+ " vmalloc : 0x%px - 0x%px (%4ld MB)\n"
+#ifdef CONFIG_HIGHMEM
+ " pkmap : 0x%px - 0x%px (%4ld MB)\n"
+#endif
+ " fixmap : 0x%px - 0x%px (%4ld kB)\n",
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(__bss_start, __bss_stop),
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
+#endif
+ MLK(FIXADDR_START, FIXADDR_TOP));
+
+ /* Check some fundamental inconsistencies. May add something else? */
+#ifdef CONFIG_HIGHMEM
+ BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
+ BUG_ON(VMALLOC_END < (unsigned long)high_memory);
+ BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
+ BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
+ (unsigned long)high_memory);
+#endif
+ BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
+ BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
+#endif /* CONFIG_DEBUG_KERNEL */
+}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
/*
* Not static inline because used by IP27 special magic initialization code
*/
@@ -479,6 +527,7 @@ void __init mem_init(void)
setup_zero_pages(); /* Setup zeroed pages. */
mem_init_free_highmem();
mem_init_print_info(NULL);
+ mem_print_kmap_info();

#ifdef CONFIG_64BIT
if ((unsigned long) &_text > (unsigned long) CKSEG0)
--
2.21.0

2019-04-23 22:51:10

by Serge Semin

[permalink] [raw]
Subject: [PATCH 09/12] mips: Perform early low memory test

memblock subsystem provides a method to optionally test the passed
memory region in case if it was requested via special kernel boot
argument. Lets add the function at the bottom of the arch_mem_init()
method. Testing at this point in the boot sequence should be safe since all
critical areas are now reserved and a minimum of allocations have been
done.

Reviewed-by: Matt Redfearn <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index ca493fdf69b0..fbd216b4e929 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -826,6 +826,8 @@ static void __init arch_mem_init(char **cmdline_p)
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));

memblock_dump_all();
+
+ early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
}

static void __init resource_init(void)
--
2.21.0

2019-04-23 22:51:12

by Serge Semin

[permalink] [raw]
Subject: [PATCH 12/12] mips: Enable OF_RESERVED_MEM config

Since memblock-patchset was introduced the reserved-memory nodes are
supported being declared in dt-files. So these nodes are actually parsed
during the arch setup procedure when the early_init_fdt_scan_reserved_mem()
method is called. But some of the features like private reserved memory
pools aren't available at the moment, since OF_RESERVED_MEM isn't enabled
for the MIPS platform. Lets fix it by enabling the config.

But due to the arch-specific boot mem_map container utilization we need
to manually call the fdt_init_reserved_mem() method after all the available
and reserved memory has been moved to memblock. The function call performed
before bootmem_init() fails due to the lack of any memblock memory regions
to allocate from at that stage.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/Kconfig | 1 +
arch/mips/kernel/setup.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 4a5f5b0ee9a9..0bf9e89e4023 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2988,6 +2988,7 @@ config USE_OF
bool
select OF
select OF_EARLY_FLATTREE
+ select OF_RESERVED_MEM
select IRQ_DOMAIN

config UHI_BOOT
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index fbd216b4e929..ab349d2381c3 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -27,6 +27,7 @@
#include <linux/dma-contiguous.h>
#include <linux/decompress/generic.h>
#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>

#include <asm/addrspace.h>
#include <asm/bootinfo.h>
@@ -825,6 +826,8 @@ static void __init arch_mem_init(char **cmdline_p)
memblock_reserve(__pa_symbol(&__nosave_begin),
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));

+ fdt_init_reserved_mem();
+
memblock_dump_all();

early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
--
2.21.0

2019-04-23 22:51:21

by Serge Semin

[permalink] [raw]
Subject: [PATCH 11/12] mips: Make sure dt memory regions are valid

There are situations when memory regions coming from dts may be
too big for the platform physical address space. It especially
concerns XPA-capable systems. Bootleader may determine more than 4GB
memory available and pass it to the kernel over dts memory node, while
kernel is built without XPA support. In this case the region
may either simply be truncated by add_memory_region() method
or by u64->phys_addr_t type casting. But in worst case the method
can even drop the memory region if it exceedes PHYS_ADDR_MAX size.
So lets make sure the retrieved from dts memory regions are valid,
and if some of them isn't just manually truncate it with a warning
printed out.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/prom.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 437a174e3ef9..28bf01961bb2 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -41,7 +41,19 @@ char *mips_get_machine_name(void)
#ifdef CONFIG_USE_OF
void __init early_init_dt_add_memory_arch(u64 base, u64 size)
{
- return add_memory_region(base, size, BOOT_MEM_RAM);
+ if (base >= PHYS_ADDR_MAX) {
+ pr_warn("Trying to add an invalid memory region, skipped\n");
+ return;
+ }
+
+ /* Truncate the passed memory region instead of type casting */
+ if (base + size - 1 >= PHYS_ADDR_MAX || base + size < base) {
+ pr_warn("Truncate memory region %llx @ %llx to size %llx\n",
+ size, base, PHYS_ADDR_MAX - base);
+ size = PHYS_ADDR_MAX - base;
+ }
+
+ add_memory_region(base, size, BOOT_MEM_RAM);
}

int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
--
2.21.0

2019-04-23 22:51:59

by Serge Semin

[permalink] [raw]
Subject: [PATCH 04/12] mips: Reserve memory for the kernel image resources

The reserved_end variable had been used by the bootmem_init() code
to find a lowest limit of memory available for memmap blob. The original
code just tried to find a free memory space higher than kernel was placed.
This limitation seems justified for the memmap ragion search process, but
I can't see any obvious reason to reserve the unused space below kernel
seeing some platforms place it much higher than standard 1MB. Moreover
the RELOCATION config enables it to be loaded at any memory address.
So lets reserve the memory occupied by the kernel only, leaving the region
below being free for allocations. After doing this we can now discard the
code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
since it's going to be free anyway (unless marked as reserved by
platforms).

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 30 +++---------------------------
1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 185e0e42e009..f71a7d32a687 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -371,7 +371,6 @@ static void __init bootmem_init(void)

static void __init bootmem_init(void)
{
- unsigned long reserved_end;
phys_addr_t ramstart = PHYS_ADDR_MAX;
int i;

@@ -382,10 +381,10 @@ static void __init bootmem_init(void)
* will reserve the area used for the initrd.
*/
init_initrd();
- reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));

- memblock_reserve(PHYS_OFFSET,
- (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
+ /* Reserve memory occupied by kernel. */
+ memblock_reserve(__pa_symbol(&_text),
+ __pa_symbol(&_end) - __pa_symbol(&_text));

/*
* max_low_pfn is not a number of pages. The number of pages
@@ -501,29 +500,6 @@ static void __init bootmem_init(void)
memory_present(0, start, end);
}

-#ifdef CONFIG_RELOCATABLE
- /*
- * The kernel reserves all memory below its _end symbol as bootmem,
- * but the kernel may now be at a much higher address. The memory
- * between the original and new locations may be returned to the system.
- */
- if (__pa_symbol(_text) > __pa_symbol(VMLINUX_LOAD_ADDRESS)) {
- unsigned long offset;
- extern void show_kernel_relocation(const char *level);
-
- offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS);
- memblock_free(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
-
-#if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO)
- /*
- * This information is necessary when debugging the kernel
- * But is a security vulnerability otherwise!
- */
- show_kernel_relocation(KERN_INFO);
-#endif
- }
-#endif
-
/*
* Reserve initrd memory if needed.
*/
--
2.21.0

2019-04-23 22:52:00

by Serge Semin

[permalink] [raw]
Subject: [PATCH 02/12] mips: Discard rudiments from bootmem_init

There is a pointless code left in the bootmem_init() method since
the bootmem allocator removal. First part resides the PFN ranges
calculation loop. The conditional expressions and continue operator
are useless there, since nothing is done after them. Second part is
in RAM ranges installation loop. We can simplify the conditions cascade
a bit without much of the logic redefinition, so to reduce the code
length. In particular the end boundary value can be verified after
the possible reduction to be below max_low_pfn.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 25 +++++--------------------
1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0ee033c44116..53d93a727d1a 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -394,10 +394,7 @@ static void __init bootmem_init(void)
min_low_pfn = ~0UL;
max_low_pfn = 0;

- /*
- * Find the highest page frame number we have available
- * and the lowest used RAM address
- */
+ /* Find the highest and lowest page frame numbers we have available. */
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;

@@ -427,13 +424,6 @@ static void __init bootmem_init(void)
max_low_pfn = end;
if (start < min_low_pfn)
min_low_pfn = start;
- if (end <= reserved_end)
- continue;
-#ifdef CONFIG_BLK_DEV_INITRD
- /* Skip zones before initrd and initrd itself */
- if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end)))
- continue;
-#endif
}

if (min_low_pfn >= max_low_pfn)
@@ -474,6 +464,7 @@ static void __init bootmem_init(void)
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}

+ /* Install all valid RAM ranges to the memblock memory region */
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;

@@ -481,21 +472,15 @@ static void __init bootmem_init(void)
end = PFN_DOWN(boot_mem_map.map[i].addr
+ boot_mem_map.map[i].size);

- if (start <= min_low_pfn)
+ if (start < min_low_pfn)
start = min_low_pfn;
- if (start >= end)
- continue;
-
#ifndef CONFIG_HIGHMEM
+ /* Ignore highmem regions if highmem is unsupported */
if (end > max_low_pfn)
end = max_low_pfn;
-
- /*
- * ... finally, is the area going away?
- */
+#endif
if (end <= start)
continue;
-#endif

memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
--
2.21.0

2019-04-23 22:52:16

by Serge Semin

[permalink] [raw]
Subject: [PATCH 08/12] mips: Dump memblock regions for debugging

It is useful to have the whole memblock memory space printed to console
when basic memlock initializations are done. It can be performed by
ready-to-use method memblock_dump_all(), which prints the available
and reserved memory spaces if MEMBLOCK_DEBUG config is enabled.
Lets call it at the very end of arch_mem_init() function, when
all memblock memory and reserved regions are defined, but before
any serious allocation is performed.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2a1b2e7a1bc9..ca493fdf69b0 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -824,6 +824,8 @@ static void __init arch_mem_init(char **cmdline_p)
/* Reserve for hibernation. */
memblock_reserve(__pa_symbol(&__nosave_begin),
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
+
+ memblock_dump_all();
}

static void __init resource_init(void)
--
2.21.0

2019-04-23 22:52:33

by Serge Semin

[permalink] [raw]
Subject: [PATCH 05/12] mips: Discard post-CMA-init foreach loop

Really the loop is pointless, since it walks over memblock-reserved
memory regions and mark them as reserved in memblock. Before
bootmem was removed from the kernel, this loop had been
used to map the memory reserved by CMA into the legacy bootmem
allocator. But now the early memory allocator is memblock,
which is used by CMA for reservation, so we don't need any mapping
anymore.

Reviewed-by: Matt Redfearn <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index f71a7d32a687..2ae6b02b948f 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -708,7 +708,6 @@ static void __init request_crashkernel(struct resource *res)
*/
static void __init arch_mem_init(char **cmdline_p)
{
- struct memblock_region *reg;
extern void plat_mem_setup(void);

/*
@@ -814,10 +813,6 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();

dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
- /* Tell bootmem about cma reserved memblock section */
- for_each_memblock(reserved, reg)
- if (reg->size != 0)
- memblock_reserve(reg->base, reg->size);

reserve_bootmem_region(__pa_symbol(&__nosave_begin),
__pa_symbol(&__nosave_end)); /* Reserve for hibernation */
--
2.21.0

2019-04-23 22:53:24

by Serge Semin

[permalink] [raw]
Subject: [PATCH 03/12] mips: Combine memblock init and memory reservation loops

Before bootmem was completely removed from the kernel, the last loop
in the bootmem_init() had been used to reserve the correspondingly
marked regions, initialize sparsemem sections and to free the low memory
pages, which then would be used for early memory allocations. After the
bootmem removing patchset had been merged the loop was left to do the first
two things only. But it didn't do them quite well.

First of all it leaves the BOOT_MEM_INIT_RAM memory types unreserved,
which is definitely bug (although it isn't noticeable due to being used
by the kernel region only, which is fully marked as reserved). Secondly
the reservation is supposed to be done for any memory including the
high one. (I couldn't figure out why the highmem was ignored in the first
place, since platforms and dts' may declare any memory region for
reservation) Thirdly the reserved_end variable had been used here to not
accidentally free memory occupied by kernel. Since we already reserved the
corresponding region higher in this method there is no need in using the
variable here anymore. Fourthly the sparsemem should be aware of all the
memory types in the system including the ROM_DATA even if it is going to
be reserved for the whole system uptime. Finally after all these notes are
fixed the loop of memory reservation can be freely merged into the memory
installation loop as it's done in this patch.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 48 ++++++----------------------------------
1 file changed, 7 insertions(+), 41 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 53d93a727d1a..185e0e42e009 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -483,55 +483,21 @@ static void __init bootmem_init(void)
continue;

memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
- }
-
- /*
- * Register fully available low RAM pages with the bootmem allocator.
- */
- for (i = 0; i < boot_mem_map.nr_map; i++) {
- unsigned long start, end, size;
-
- start = PFN_UP(boot_mem_map.map[i].addr);
- end = PFN_DOWN(boot_mem_map.map[i].addr
- + boot_mem_map.map[i].size);

- /*
- * Reserve usable memory.
- */
+ /* Reserve any memory except the ordinary RAM ranges. */
switch (boot_mem_map.map[i].type) {
case BOOT_MEM_RAM:
break;
- case BOOT_MEM_INIT_RAM:
- memory_present(0, start, end);
- continue;
- default:
- /* Not usable memory */
- if (start > min_low_pfn && end < max_low_pfn)
- memblock_reserve(boot_mem_map.map[i].addr,
- boot_mem_map.map[i].size);
-
- continue;
+ default: /* Reserve the rest of the memory types at boot time */
+ memblock_reserve(PFN_PHYS(start), PFN_PHYS(end - start));
+ break;
}

/*
- * We are rounding up the start address of usable memory
- * and at the end of the usable range downwards.
- */
- if (start >= max_low_pfn)
- continue;
- if (start < reserved_end)
- start = reserved_end;
- if (end > max_low_pfn)
- end = max_low_pfn;
-
- /*
- * ... finally, is the area going away?
+ * In any case the added to the memblock memory regions
+ * (highmem/lowmem, available/reserved, etc) are considered
+ * as present, so inform sparsemem about them.
*/
- if (end <= start)
- continue;
- size = end - start;
-
- /* Register lowmem ranges */
memory_present(0, start, end);
}

--
2.21.0

2019-04-24 06:21:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 12/12] mips: Enable OF_RESERVED_MEM config

On Wed, Apr 24, 2019 at 01:47:48AM +0300, Serge Semin wrote:
> Since memblock-patchset was introduced the reserved-memory nodes are
> supported being declared in dt-files. So these nodes are actually parsed
> during the arch setup procedure when the early_init_fdt_scan_reserved_mem()
> method is called. But some of the features like private reserved memory
> pools aren't available at the moment, since OF_RESERVED_MEM isn't enabled
> for the MIPS platform. Lets fix it by enabling the config.
>
> But due to the arch-specific boot mem_map container utilization we need
> to manually call the fdt_init_reserved_mem() method after all the available
> and reserved memory has been moved to memblock. The function call performed
> before bootmem_init() fails due to the lack of any memblock memory regions
> to allocate from at that stage.

Architectures should not select this symbol directly, it will be
automatically enabled if either DMA_DECLARE_COHERENT or DMA_CMA
are enabled, which are required for the actual underlying memory
allocators.

2019-04-24 08:35:57

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 12/12] mips: Enable OF_RESERVED_MEM config

On Tue, Apr 23, 2019 at 11:17:50PM -0700, Christoph Hellwig wrote:

Hello Christoph

> On Wed, Apr 24, 2019 at 01:47:48AM +0300, Serge Semin wrote:
> > Since memblock-patchset was introduced the reserved-memory nodes are
> > supported being declared in dt-files. So these nodes are actually parsed
> > during the arch setup procedure when the early_init_fdt_scan_reserved_mem()
> > method is called. But some of the features like private reserved memory
> > pools aren't available at the moment, since OF_RESERVED_MEM isn't enabled
> > for the MIPS platform. Lets fix it by enabling the config.
> >
> > But due to the arch-specific boot mem_map container utilization we need
> > to manually call the fdt_init_reserved_mem() method after all the available
> > and reserved memory has been moved to memblock. The function call performed
> > before bootmem_init() fails due to the lack of any memblock memory regions
> > to allocate from at that stage.
>
> Architectures should not select this symbol directly, it will be
> automatically enabled if either DMA_DECLARE_COHERENT or DMA_CMA
> are enabled, which are required for the actual underlying memory
> allocators.

Thanks for the comment. I should have checked this before porting the patch from
kernel 4.9. This symbol has been selected there by the platforms. I'll remove the
forcible selection of the config in the next patchset revision.

-Sergey

2019-04-24 14:03:03

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 08/12] mips: Dump memblock regions for debugging

On Wed, Apr 24, 2019 at 01:47:44AM +0300, Serge Semin wrote:
> It is useful to have the whole memblock memory space printed to console
> when basic memlock initializations are done. It can be performed by
> ready-to-use method memblock_dump_all(), which prints the available
> and reserved memory spaces if MEMBLOCK_DEBUG config is enabled.

Nit: there's no MEMBLOCK_DEBUG config option but rather memblock=debug
command line parameter ;-)

> Lets call it at the very end of arch_mem_init() function, when
> all memblock memory and reserved regions are defined, but before
> any serious allocation is performed.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/kernel/setup.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 2a1b2e7a1bc9..ca493fdf69b0 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -824,6 +824,8 @@ static void __init arch_mem_init(char **cmdline_p)
> /* Reserve for hibernation. */
> memblock_reserve(__pa_symbol(&__nosave_begin),
> __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
> +
> + memblock_dump_all();
> }
>
> static void __init resource_init(void)
> --
> 2.21.0
>

--
Sincerely yours,
Mike.

2019-04-24 14:04:14

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 10/12] mips: Print the kernel virtual mem layout on debugging

On Wed, Apr 24, 2019 at 01:47:46AM +0300, Serge Semin wrote:
> It is useful at least for debugging to have the kernel virtual
> memory layout printed at boot time so to have the full information
> about the booted kernel. Make the printing optional and available
> only when DEBUG_KERNEL config is enabled so not to leak a sensitive
> kernel information.
>
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/mm/init.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
> index bbb196ad5f26..c338bbd03b2a 100644
> --- a/arch/mips/mm/init.c
> +++ b/arch/mips/mm/init.c
> @@ -31,6 +31,7 @@
> #include <linux/gfp.h>
> #include <linux/kcore.h>
> #include <linux/initrd.h>
> +#include <linux/sizes.h>
>
> #include <asm/bootinfo.h>
> #include <asm/cachectl.h>
> @@ -56,6 +57,53 @@ unsigned long empty_zero_page, zero_page_mask;
> EXPORT_SYMBOL_GPL(empty_zero_page);
> EXPORT_SYMBOL(zero_page_mask);
>
> +/*
> + * Print out the kernel virtual memory layout
> + */
> +#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
> +#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
> +#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
> +static void __init mem_print_kmap_info(void)
> +{
> +#ifdef CONFIG_DEBUG_KERNEL

Maybe CONFIG_DEBUG_VM?

> + pr_notice("Kernel virtual memory layout:\n"
> + " lowmem : 0x%px - 0x%px (%4ld MB)\n"
> + " .text : 0x%px - 0x%px (%4td kB)\n"
> + " .data : 0x%px - 0x%px (%4td kB)\n"
> + " .init : 0x%px - 0x%px (%4td kB)\n"
> + " .bss : 0x%px - 0x%px (%4td kB)\n"
> + " vmalloc : 0x%px - 0x%px (%4ld MB)\n"
> +#ifdef CONFIG_HIGHMEM
> + " pkmap : 0x%px - 0x%px (%4ld MB)\n"
> +#endif
> + " fixmap : 0x%px - 0x%px (%4ld kB)\n",
> + MLM(PAGE_OFFSET, (unsigned long)high_memory),
> + MLK_ROUNDUP(_text, _etext),
> + MLK_ROUNDUP(_sdata, _edata),
> + MLK_ROUNDUP(__init_begin, __init_end),
> + MLK_ROUNDUP(__bss_start, __bss_stop),
> + MLM(VMALLOC_START, VMALLOC_END),
> +#ifdef CONFIG_HIGHMEM
> + MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
> +#endif
> + MLK(FIXADDR_START, FIXADDR_TOP));
> +
> + /* Check some fundamental inconsistencies. May add something else? */
> +#ifdef CONFIG_HIGHMEM
> + BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
> + BUG_ON(VMALLOC_END < (unsigned long)high_memory);
> + BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
> + BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
> + (unsigned long)high_memory);
> +#endif
> + BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
> + BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
> +#endif /* CONFIG_DEBUG_KERNEL */
> +}
> +#undef MLK
> +#undef MLM
> +#undef MLK_ROUNDUP
> +
> /*
> * Not static inline because used by IP27 special magic initialization code
> */
> @@ -479,6 +527,7 @@ void __init mem_init(void)
> setup_zero_pages(); /* Setup zeroed pages. */
> mem_init_free_highmem();
> mem_init_print_info(NULL);
> + mem_print_kmap_info();
>
> #ifdef CONFIG_64BIT
> if ((unsigned long) &_text > (unsigned long) CKSEG0)
> --
> 2.21.0
>

--
Sincerely yours,
Mike.

2019-04-24 14:37:12

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 10/12] mips: Print the kernel virtual mem layout on debugging

On Wed, Apr 24, 2019 at 04:47:11PM +0300, Mike Rapoport wrote:
> On Wed, Apr 24, 2019 at 01:47:46AM +0300, Serge Semin wrote:
> > It is useful at least for debugging to have the kernel virtual
> > memory layout printed at boot time so to have the full information
> > about the booted kernel. Make the printing optional and available
> > only when DEBUG_KERNEL config is enabled so not to leak a sensitive
> > kernel information.
> >
> > Signed-off-by: Serge Semin <[email protected]>
> > ---
> > arch/mips/mm/init.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 49 insertions(+)
> >
> > diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
> > index bbb196ad5f26..c338bbd03b2a 100644
> > --- a/arch/mips/mm/init.c
> > +++ b/arch/mips/mm/init.c
> > @@ -31,6 +31,7 @@
> > #include <linux/gfp.h>
> > #include <linux/kcore.h>
> > #include <linux/initrd.h>
> > +#include <linux/sizes.h>
> >
> > #include <asm/bootinfo.h>
> > #include <asm/cachectl.h>
> > @@ -56,6 +57,53 @@ unsigned long empty_zero_page, zero_page_mask;
> > EXPORT_SYMBOL_GPL(empty_zero_page);
> > EXPORT_SYMBOL(zero_page_mask);
> >
> > +/*
> > + * Print out the kernel virtual memory layout
> > + */
> > +#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
> > +#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
> > +#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
> > +static void __init mem_print_kmap_info(void)
> > +{
> > +#ifdef CONFIG_DEBUG_KERNEL
>
> Maybe CONFIG_DEBUG_VM?
>

Last time I posted this patch Matt suggested to use CONFIG_DEBUG_KERNEL [1].
On the other hand arm platform prints this table unconditionally, but uses %lx
format for low-memory ranges and %p for kernel segments. I even more inclined
in the arm solution. But if selecting between DEBUG_KERNEL and DEBUG_VM I'd
stick with DEBUG_KERNEL, since VM-debug config help text states it is
intended for special performance checks: "Enable this to turn on extended
checks in the virtual-memory system that may impact performance," and
not for memory layout.

-Sergey

[1] https://lkml.org/lkml/2018/2/13/494

> > + pr_notice("Kernel virtual memory layout:\n"
> > + " lowmem : 0x%px - 0x%px (%4ld MB)\n"
> > + " .text : 0x%px - 0x%px (%4td kB)\n"
> > + " .data : 0x%px - 0x%px (%4td kB)\n"
> > + " .init : 0x%px - 0x%px (%4td kB)\n"
> > + " .bss : 0x%px - 0x%px (%4td kB)\n"
> > + " vmalloc : 0x%px - 0x%px (%4ld MB)\n"
> > +#ifdef CONFIG_HIGHMEM
> > + " pkmap : 0x%px - 0x%px (%4ld MB)\n"
> > +#endif
> > + " fixmap : 0x%px - 0x%px (%4ld kB)\n",
> > + MLM(PAGE_OFFSET, (unsigned long)high_memory),
> > + MLK_ROUNDUP(_text, _etext),
> > + MLK_ROUNDUP(_sdata, _edata),
> > + MLK_ROUNDUP(__init_begin, __init_end),
> > + MLK_ROUNDUP(__bss_start, __bss_stop),
> > + MLM(VMALLOC_START, VMALLOC_END),
> > +#ifdef CONFIG_HIGHMEM
> > + MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
> > +#endif
> > + MLK(FIXADDR_START, FIXADDR_TOP));
> > +
> > + /* Check some fundamental inconsistencies. May add something else? */
> > +#ifdef CONFIG_HIGHMEM
> > + BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
> > + BUG_ON(VMALLOC_END < (unsigned long)high_memory);
> > + BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
> > + BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
> > + (unsigned long)high_memory);
> > +#endif
> > + BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
> > + BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
> > +#endif /* CONFIG_DEBUG_KERNEL */
> > +}
> > +#undef MLK
> > +#undef MLM
> > +#undef MLK_ROUNDUP
> > +
> > /*
> > * Not static inline because used by IP27 special magic initialization code
> > */
> > @@ -479,6 +527,7 @@ void __init mem_init(void)
> > setup_zero_pages(); /* Setup zeroed pages. */
> > mem_init_free_highmem();
> > mem_init_print_info(NULL);
> > + mem_print_kmap_info();
> >
> > #ifdef CONFIG_64BIT
> > if ((unsigned long) &_text > (unsigned long) CKSEG0)
> > --
> > 2.21.0
> >
>
> --
> Sincerely yours,
> Mike.
>

2019-04-24 19:44:50

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 08/12] mips: Dump memblock regions for debugging

On Wed, Apr 24, 2019 at 04:45:47PM +0300, Mike Rapoport wrote:
> On Wed, Apr 24, 2019 at 01:47:44AM +0300, Serge Semin wrote:
> > It is useful to have the whole memblock memory space printed to console
> > when basic memlock initializations are done. It can be performed by
> > ready-to-use method memblock_dump_all(), which prints the available
> > and reserved memory spaces if MEMBLOCK_DEBUG config is enabled.
>
> Nit: there's no MEMBLOCK_DEBUG config option but rather memblock=debug
> command line parameter ;-)
>

Right. Thanks. I'll reword the message in the next patchset revision.

-Sergey

> > Lets call it at the very end of arch_mem_init() function, when
> > all memblock memory and reserved regions are defined, but before
> > any serious allocation is performed.
> >
> > Signed-off-by: Serge Semin <[email protected]>
> > ---
> > arch/mips/kernel/setup.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> > index 2a1b2e7a1bc9..ca493fdf69b0 100644
> > --- a/arch/mips/kernel/setup.c
> > +++ b/arch/mips/kernel/setup.c
> > @@ -824,6 +824,8 @@ static void __init arch_mem_init(char **cmdline_p)
> > /* Reserve for hibernation. */
> > memblock_reserve(__pa_symbol(&__nosave_begin),
> > __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
> > +
> > + memblock_dump_all();
> > }
> >
> > static void __init resource_init(void)
> > --
> > 2.21.0
> >
>
> --
> Sincerely yours,
> Mike.
>

2019-04-25 07:27:16

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 02/12] mips: Discard rudiments from bootmem_init

Hello,

Serge Semin wrote:
> There is a pointless code left in the bootmem_init() method since
> the bootmem allocator removal. First part resides the PFN ranges
> calculation loop. The conditional expressions and continue operator
> are useless there, since nothing is done after them. Second part is
> in RAM ranges installation loop. We can simplify the conditions cascade
> a bit without much of the logic redefinition, so to reduce the code
> length. In particular the end boundary value can be verified after
> the possible reduction to be below max_low_pfn.
>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-04-25 07:31:39

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Serge,

On Wed, Apr 24, 2019 at 01:47:40AM +0300, Serge Semin wrote:
> The reserved_end variable had been used by the bootmem_init() code
> to find a lowest limit of memory available for memmap blob. The original
> code just tried to find a free memory space higher than kernel was placed.
> This limitation seems justified for the memmap ragion search process, but
> I can't see any obvious reason to reserve the unused space below kernel
> seeing some platforms place it much higher than standard 1MB.

There are 2 reasons I'm aware of:

1) Older systems generally had something like an ISA bus which used
addresses below the kernel, and bootloaders like YAMON left behind
functions that could be called right at the start of RAM. This sort
of thing should be accounted for by /memreserve/ in DT or similar
platform-specific reservations though rather than generically, and
at least Malta & SEAD-3 DTs already have /memreserve/ entries for
it. So this part I think is OK. Some other older platforms might
need updating, but that's fine.

2) trap_init() only allocates memory for the exception vector if using
a vectored interrupt mode. In other cases it just uses CAC_BASE
which currently gets reserved as part of this region between
PHYS_OFFSET & _text.

I think this behavior is bogus, and we should instead:

- Allocate the exception vector memory using memblock_alloc() for
CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable
EBase register). If we're not using vectored interrupts then
allocating one page will do, and we already have the size
calculation for if we are.

- Otherwise use CAC_BASE but call memblock_reserve() on the first
page.

I think we should make that change before this one goes in. I can
try to get to it tomorrow, but feel free to beat me to it.

Thanks,
Paul

2019-04-25 10:07:33

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 01/12] mips: Make sure kernel .bss exists in boot mem pool

Hello,

Serge Semin wrote:
> Current MIPS platform code makes sure the kernel text, data and init
> sections are added to the boot memory map pool right after the
> arch-specific memory setup method has been executed. But for some reason
> the MIPS platform code skipped the kernel .bss section, which definitely
> should be in the boot mem pool as well in any case. Lets fix this just be
> adding the space between __bss_start and __bss_stop.
>
> Reviewed-by: Matt Redfearn <[email protected]>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-04-25 10:07:43

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 03/12] mips: Combine memblock init and memory reservation loops

Hello,

Serge Semin wrote:
> Before bootmem was completely removed from the kernel, the last loop
> in the bootmem_init() had been used to reserve the correspondingly
> marked regions, initialize sparsemem sections and to free the low memory
> pages, which then would be used for early memory allocations. After the
> bootmem removing patchset had been merged the loop was left to do the first
> two things only. But it didn't do them quite well.
>
> First of all it leaves the BOOT_MEM_INIT_RAM memory types unreserved,
> which is definitely bug (although it isn't noticeable due to being used
> by the kernel region only, which is fully marked as reserved). Secondly
> the reservation is supposed to be done for any memory including the
> high one. (I couldn't figure out why the highmem was ignored in the first
> place, since platforms and dts' may declare any memory region for
> reservation) Thirdly the reserved_end variable had been used here to not
> accidentally free memory occupied by kernel. Since we already reserved the
> corresponding region higher in this method there is no need in using the
> variable here anymore. Fourthly the sparsemem should be aware of all the
> memory types in the system including the ROM_DATA even if it is going to
> be reserved for the whole system uptime. Finally after all these notes are
> fixed the loop of memory reservation can be freely merged into the memory
> installation loop as it's done in this patch.
>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-04-26 01:37:50

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

On Wed, Apr 24, 2019 at 10:43:48PM +0000, Paul Burton wrote:
> Hi Serge,
>
> On Wed, Apr 24, 2019 at 01:47:40AM +0300, Serge Semin wrote:
> > The reserved_end variable had been used by the bootmem_init() code
> > to find a lowest limit of memory available for memmap blob. The original
> > code just tried to find a free memory space higher than kernel was placed.
> > This limitation seems justified for the memmap ragion search process, but
> > I can't see any obvious reason to reserve the unused space below kernel
> > seeing some platforms place it much higher than standard 1MB.
>
> There are 2 reasons I'm aware of:
>
> 1) Older systems generally had something like an ISA bus which used
> addresses below the kernel, and bootloaders like YAMON left behind
> functions that could be called right at the start of RAM. This sort
> of thing should be accounted for by /memreserve/ in DT or similar
> platform-specific reservations though rather than generically, and
> at least Malta & SEAD-3 DTs already have /memreserve/ entries for
> it. So this part I think is OK. Some other older platforms might
> need updating, but that's fine.
>

Regarding ISA. As far as I remember devices on that bus can DMA only to the
lowest 16MB. So in case if kernel is too big or placed pretty much high,
they may be left even without reachable memory at all in current
implementation.

> 2) trap_init() only allocates memory for the exception vector if using
> a vectored interrupt mode. In other cases it just uses CAC_BASE
> which currently gets reserved as part of this region between
> PHYS_OFFSET & _text.
>
> I think this behavior is bogus, and we should instead:
>
> - Allocate the exception vector memory using memblock_alloc() for
> CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable
> EBase register). If we're not using vectored interrupts then
> allocating one page will do, and we already have the size
> calculation for if we are.
>
> - Otherwise use CAC_BASE but call memblock_reserve() on the first
> page.
>
> I think we should make that change before this one goes in. I can
> try to get to it tomorrow, but feel free to beat me to it.
>

As far as I understood you and the code this should be enough to fix
the problem:
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 98ca55d62201..f680253e2617 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2326,6 +2326,8 @@ void __init trap_init(void)
ebase += (read_c0_ebase() & 0x3ffff000);
}
}
+
+ memblock_reserve(ebase, PAGE_SIZE);
}

if (cpu_has_mmips) {
---

Allocation has already been implemented in the if-branch under the
(cpu_has_veic || cpu_has_vint) condition. So we don't need to change
there anything.
In case if vectored interrupts aren't supported the else-clause is
taken and we need to reserve whatever is set in the exception base
address variable.

I'll add this patch between 3d and 4th ones if you are ok with it.

-Sergey

> Thanks,
> Paul

2019-04-30 22:59:47

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Serge,

On Fri, Apr 26, 2019 at 03:00:36AM +0300, Serge Semin wrote:
> > 1) Older systems generally had something like an ISA bus which used
> > addresses below the kernel, and bootloaders like YAMON left behind
> > functions that could be called right at the start of RAM. This sort
> > of thing should be accounted for by /memreserve/ in DT or similar
> > platform-specific reservations though rather than generically, and
> > at least Malta & SEAD-3 DTs already have /memreserve/ entries for
> > it. So this part I think is OK. Some other older platforms might
> > need updating, but that's fine.
> >
>
> Regarding ISA. As far as I remember devices on that bus can DMA only to the
> lowest 16MB. So in case if kernel is too big or placed pretty much high,
> they may be left even without reachable memory at all in current
> implementation.

Sure - I'm not too worried about these old buses, platforms can continue
to reserve the memory through DT or otherwise if they need to.

> > 2) trap_init() only allocates memory for the exception vector if using
> > a vectored interrupt mode. In other cases it just uses CAC_BASE
> > which currently gets reserved as part of this region between
> > PHYS_OFFSET & _text.
> >
> > I think this behavior is bogus, and we should instead:
> >
> > - Allocate the exception vector memory using memblock_alloc() for
> > CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable
> > EBase register). If we're not using vectored interrupts then
> > allocating one page will do, and we already have the size
> > calculation for if we are.
> >
> > - Otherwise use CAC_BASE but call memblock_reserve() on the first
> > page.
> >
> > I think we should make that change before this one goes in. I can
> > try to get to it tomorrow, but feel free to beat me to it.
> >
>
> As far as I understood you and the code this should be enough to fix
> the problem:
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index 98ca55d62201..f680253e2617 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2326,6 +2326,8 @@ void __init trap_init(void)
> ebase += (read_c0_ebase() & 0x3ffff000);
> }
> }
> +
> + memblock_reserve(ebase, PAGE_SIZE);
> }
>
> if (cpu_has_mmips) {
> ---
>
> Allocation has already been implemented in the if-branch under the
> (cpu_has_veic || cpu_has_vint) condition. So we don't need to change
> there anything.
> In case if vectored interrupts aren't supported the else-clause is
> taken and we need to reserve whatever is set in the exception base
> address variable.
>
> I'll add this patch between 3d and 4th ones if you are ok with it.

I think that would work, but I have other motivations to allocate the
memory in non-vectored cases anyway. I just sent a series that does that
& cleans up a little [1]. If you could take a look that would be great.
With that change made I think this patch will be good to apply.

Thanks,
Paul

[1] https://lore.kernel.org/linux-mips/[email protected]/T/#t

2019-05-02 14:27:37

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

On Tue, Apr 30, 2019 at 10:58:33PM +0000, Paul Burton wrote:

Hello Paul

> Hi Serge,
>
> On Fri, Apr 26, 2019 at 03:00:36AM +0300, Serge Semin wrote:
> > > 1) Older systems generally had something like an ISA bus which used
> > > addresses below the kernel, and bootloaders like YAMON left behind
> > > functions that could be called right at the start of RAM. This sort
> > > of thing should be accounted for by /memreserve/ in DT or similar
> > > platform-specific reservations though rather than generically, and
> > > at least Malta & SEAD-3 DTs already have /memreserve/ entries for
> > > it. So this part I think is OK. Some other older platforms might
> > > need updating, but that's fine.
> > >
> >
> > Regarding ISA. As far as I remember devices on that bus can DMA only to the
> > lowest 16MB. So in case if kernel is too big or placed pretty much high,
> > they may be left even without reachable memory at all in current
> > implementation.
>
> Sure - I'm not too worried about these old buses, platforms can continue
> to reserve the memory through DT or otherwise if they need to.
>
> > > 2) trap_init() only allocates memory for the exception vector if using
> > > a vectored interrupt mode. In other cases it just uses CAC_BASE
> > > which currently gets reserved as part of this region between
> > > PHYS_OFFSET & _text.
> > >
> > > I think this behavior is bogus, and we should instead:
> > >
> > > - Allocate the exception vector memory using memblock_alloc() for
> > > CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable
> > > EBase register). If we're not using vectored interrupts then
> > > allocating one page will do, and we already have the size
> > > calculation for if we are.
> > >
> > > - Otherwise use CAC_BASE but call memblock_reserve() on the first
> > > page.
> > >
> > > I think we should make that change before this one goes in. I can
> > > try to get to it tomorrow, but feel free to beat me to it.
> > >
> >
> > As far as I understood you and the code this should be enough to fix
> > the problem:
> > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > index 98ca55d62201..f680253e2617 100644
> > --- a/arch/mips/kernel/traps.c
> > +++ b/arch/mips/kernel/traps.c
> > @@ -2326,6 +2326,8 @@ void __init trap_init(void)
> > ebase += (read_c0_ebase() & 0x3ffff000);
> > }
> > }
> > +
> > + memblock_reserve(ebase, PAGE_SIZE);
> > }
> >
> > if (cpu_has_mmips) {
> > ---
> >
> > Allocation has already been implemented in the if-branch under the
> > (cpu_has_veic || cpu_has_vint) condition. So we don't need to change
> > there anything.
> > In case if vectored interrupts aren't supported the else-clause is
> > taken and we need to reserve whatever is set in the exception base
> > address variable.
> >
> > I'll add this patch between 3d and 4th ones if you are ok with it.
>
> I think that would work, but I have other motivations to allocate the
> memory in non-vectored cases anyway. I just sent a series that does that
> & cleans up a little [1]. If you could take a look that would be great.
> With that change made I think this patch will be good to apply.
>

Just reviewed and tested your series on my machine. I tagged the whole series
in a response to the cover-letter of [1].

Could you please proceed with this patchset review procedure? There are
also eight more patches left without your tag or comment. This patch
is also left with no explicit tag.

BTW I see you already applied patches 1-3 to the mips-next, so what shall I
do when sending a v2 patchset with fixes asked to be provided for patch 12
and possibly for others in future? Shall I just resend the series without that
applied patches or send them over with your acked-by tagges?

-Sergey

> Thanks,
> Paul
>
> [1] https://lore.kernel.org/linux-mips/[email protected]/T/#t

2019-05-02 18:37:00

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hello,

Serge Semin wrote:
> The reserved_end variable had been used by the bootmem_init() code
> to find a lowest limit of memory available for memmap blob. The original
> code just tried to find a free memory space higher than kernel was placed.
> This limitation seems justified for the memmap ragion search process, but
> I can't see any obvious reason to reserve the unused space below kernel
> seeing some platforms place it much higher than standard 1MB. Moreover
> the RELOCATION config enables it to be loaded at any memory address.
> So lets reserve the memory occupied by the kernel only, leaving the region
> below being free for allocations. After doing this we can now discard the
> code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> since it's going to be free anyway (unless marked as reserved by
> platforms).
>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-05-02 18:37:03

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 05/12] mips: Discard post-CMA-init foreach loop

Hello,

Serge Semin wrote:
> Really the loop is pointless, since it walks over memblock-reserved
> memory regions and mark them as reserved in memblock. Before
> bootmem was removed from the kernel, this loop had been
> used to map the memory reserved by CMA into the legacy bootmem
> allocator. But now the early memory allocator is memblock,
> which is used by CMA for reservation, so we don't need any mapping
> anymore.
>
> Reviewed-by: Matt Redfearn <[email protected]>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-05-02 18:37:13

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 06/12] mips: Use memblock to reserve the __nosave memory range

Hello,

Serge Semin wrote:
> Originally before legacy bootmem was removed, the memory for the range was
> correctly reserved by reserve_bootmem_region(). But since memblock has been
> selected for early memory allocation the function can be utilized only
> after paging is fully initialized (as it is done by memblock_free_all()
> function). So calling it from arch_mem_init() method is prone to errors,
> and at this stage we need to reserve the memory in the memblock allocator.
>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-05-02 18:37:46

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 07/12] mips: Add reserve-nomap memory type support

Hello,

Serge Semin wrote:
> It might be necessary to prevent the virtual mapping creation for a
> requested memory region. For instance there is a "no-map" property
> indicating exactly this feature. In this case we need to not only
> reserve the specified region by pretending it doesn't exist in the
> memory space, but completely remove the range from system just by
> removing it from memblock. The same way it's done in default
> early_init_dt_reserve_memory_arch() method.
>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-05-02 18:48:14

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Serge,

On Thu, May 02, 2019 at 05:24:37PM +0300, Serge Semin wrote:
> Just reviewed and tested your series on my machine. I tagged the whole series
> in a response to the cover-letter of [1].

Thanks for the review & testing; that series is now in mips-next.

> Could you please proceed with this patchset review procedure? There are
> also eight more patches left without your tag or comment. This patch
> is also left with no explicit tag.
>
> BTW I see you already applied patches 1-3 to the mips-next, so what shall I
> do when sending a v2 patchset with fixes asked to be provided for patch 12
> and possibly for others in future? Shall I just resend the series without that
> applied patches or send them over with your acked-by tagges?

I've so far applied patches 1-7 of your series to mips-next, and stopped
at patch 8 which has a comment to address.

My preference would be if you could send a v2 which just contains the
remaining patches (ie. patches 8-12 become patches 1-5), ideally atop
the mips-next branch.

The series looks good to me once the review comments are addressed, but
no need to add an Acked-by - it'll be implicit when I apply them to
mips-next.

Thanks,
Paul

2019-05-03 17:55:26

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hello Paul

On Thu, May 02, 2019 at 06:45:39PM +0000, Paul Burton wrote:
> Hi Serge,
>
> On Thu, May 02, 2019 at 05:24:37PM +0300, Serge Semin wrote:
> > Just reviewed and tested your series on my machine. I tagged the whole series
> > in a response to the cover-letter of [1].
>
> Thanks for the review & testing; that series is now in mips-next.
>
> > Could you please proceed with this patchset review procedure? There are
> > also eight more patches left without your tag or comment. This patch
> > is also left with no explicit tag.
> >
> > BTW I see you already applied patches 1-3 to the mips-next, so what shall I
> > do when sending a v2 patchset with fixes asked to be provided for patch 12
> > and possibly for others in future? Shall I just resend the series without that
> > applied patches or send them over with your acked-by tagges?
>
> I've so far applied patches 1-7 of your series to mips-next, and stopped
> at patch 8 which has a comment to address.
>
> My preference would be if you could send a v2 which just contains the
> remaining patches (ie. patches 8-12 become patches 1-5), ideally atop
> the mips-next branch.
>
> The series looks good to me once the review comments are addressed, but
> no need to add an Acked-by - it'll be implicit when I apply them to
> mips-next.
>

Agreed. I'll do it shortly.

-Sergey

> Thanks,
> Paul

2019-05-03 18:15:55

by Serge Semin

[permalink] [raw]
Subject: [PATCH v2 0/5] mips: Post-bootmem-memblock transition fixes

First attempt of making the MIPS subsystem utilizing the memblock early memory
allocator was done by me a few years ago. I created a patchset with
21 patches [1]. It turned out to be too complicated and I decided to resend a
reworked patchset with smaller number of changes [2]. I did this and after a
small review process a v2 patchset was also posted. Then my spare
time was over and I couldn't proceed with the patchset support and
resubmission.

In a year Mike Rapoport took charge in this task and posted a small
patch which essentially did the bootmem allocator removal from MIPS
subsystem [3]. A single small patch did in general the whole thing my huge
patchsetes were intended for in the first place (though it lacked a few fixes).
Mike even went further and completely removed the bootmem allocator from
kernel code, so all the subsystems would need to use the only one early
memory allocator. This significantly simplified the platforms code as well
as removed a deprecated subsystem with duplicated functionality. Million
credits to Mike for this.

Getting back to the MIPS subsystem and it memblock allocator usage. Even
though the patch suggested by Mike [3] fixed most of the problems caused
by enabling the memblock allocator usage, some of them have been left
uncovered by it. First of all the PFNs calculation algorithm hasn't been
fully refactored. A reserved memory declaration loop has been left
untouched though it was clearly over-complicated for the new initialization
procedure. Secondly the MIPS platform code reserved the whole space below
kernel start address, which doesn't seem right since kernel can be
located much higher than memory space really starts. Thirdly CMA if it
is enabled reserves memory regions by means of memblock in the first place.
So the bootmem-init code doesn't need to do it again. Fifthly at early
platform initialization stage non of bootmem-left methods can be called
since there is no memory pages mapping at that moment, so __nosave* region
must be reserved by means of memblock allocator. Finally aside from memblock
allocator introduction my early patchsets included a series of useful
alterations like "nomap" property implementation for "reserved-memory"
dts-nodes, memblock_dump_all() method call after early memory allocator
initialization for debugging, low-memory test procedure, kernel memory
mapping printout at boot-time, and so on. So all of these fixes and
alterations are introduced in this new patchset. Please review. Hope
this time I'll be more responsive and finish this series up until it
is merged.

[1] https://lkml.org/lkml/2016/12/18/195
[2] https://lkml.org/lkml/2018/1/17/1201
[3] https://lkml.org/lkml/2018/9/10/302

NOTE I added a few "Reviewed-by: Matt Redfearn <[email protected]>"
since some patches of this series have been picked up from my earlier
patchsets, which Matt's already reviewed. I didn't add the tag for patches,
which were either new or partially ported.

Changelog v2
- Discard forcible selection of OF_RESERVED_MEM config
- Reword 'mips: Dump memblock regions for debugging' commit message since
memblock debug printout is activated by memblock=debug kernel parameter
- Rebase onto mips-next
- Keep patches from 8 to 12 as the rest of them have already been merged


Serge Semin (5):
mips: Dump memblock regions for debugging
mips: Perform early low memory test
mips: Print the kernel virtual mem layout on debugging
mips: Make sure dt memory regions are valid
mips: Manually call fdt_init_reserved_mem() method

arch/mips/kernel/prom.c | 14 +++++++++++-
arch/mips/kernel/setup.c | 7 ++++++
arch/mips/mm/init.c | 49 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 1 deletion(-)

--
2.21.0

2019-05-03 20:21:40

by Serge Semin

[permalink] [raw]
Subject: [PATCH v2 2/5] mips: Perform early low memory test

memblock subsystem provides a method to optionally test the passed
memory region in case if it was requested via special kernel boot
argument. Lets add the function at the bottom of the arch_mem_init()
method. Testing at this point in the boot sequence should be safe since all
critical areas are now reserved and a minimum of allocations have been
done.

Reviewed-by: Matt Redfearn <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index ca493fdf69b0..fbd216b4e929 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -826,6 +826,8 @@ static void __init arch_mem_init(char **cmdline_p)
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));

memblock_dump_all();
+
+ early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
}

static void __init resource_init(void)
--
2.21.0

2019-05-06 19:13:12

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] mips: Perform early low memory test

Hello,

Serge Semin wrote:
> memblock subsystem provides a method to optionally test the passed
> memory region in case if it was requested via special kernel boot
> argument. Lets add the function at the bottom of the arch_mem_init()
> method. Testing at this point in the boot sequence should be safe since all
> critical areas are now reserved and a minimum of allocations have been
> done.
>
> Reviewed-by: Matt Redfearn <[email protected]>
> Signed-off-by: Serge Semin <[email protected]>

Applied to mips-next.

Thanks,
Paul

[ This message was auto-generated; if you believe anything is incorrect
then please email [email protected] to report it. ]

2019-05-21 14:59:33

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Serge,

On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> The reserved_end variable had been used by the bootmem_init() code
> to find a lowest limit of memory available for memmap blob. The original
> code just tried to find a free memory space higher than kernel was placed.
> This limitation seems justified for the memmap ragion search process, but
> I can't see any obvious reason to reserve the unused space below kernel
> seeing some platforms place it much higher than standard 1MB. Moreover
> the RELOCATION config enables it to be loaded at any memory address.
> So lets reserve the memory occupied by the kernel only, leaving the region
> below being free for allocations. After doing this we can now discard the
> code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> since it's going to be free anyway (unless marked as reserved by
> platforms).
>
> Signed-off-by: Serge Semin <[email protected]>

This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:

VFS: Mounted root (nfs filesystem) on device 0:13.
devtmpfs: mounted
BUG: Bad page state in process swapper pfn:00001
page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
flags: 0x0()
raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted
5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
00000001 804a3490
00000001 81000000 0030f231 80148558 00000003 10008400
87c1dd80 7599ee13
00000000 00000000 804b0000 00000000 00000007 00000000
00000085 00000000
62722d31 00000084 804b0000 39347874 00000000 804b7820
8040cef8 81000010
00000001 00000007 00000001 81000000 00000008 8021de24
00000000 804a0000
...
Call Trace:
[<8010adec>] show_stack+0x74/0x104
[<801a5e44>] bad_page+0x130/0x138
[<801a654c>] free_pcppages_bulk+0x17c/0x3b0
[<801a789c>] free_unref_page+0x40/0x68
[<801120f4>] free_init_pages+0xec/0x104
[<803bdde8>] free_initmem+0x10/0x58
[<803bdb8c>] kernel_init+0x20/0x100
[<801057c8>] ret_from_kernel_thread+0x14/0x1c
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper pfn:00002
[...]

CONFIG_RELOCATABLE is not set, so the only relevant part is the
change quoted below.

> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
>
> static void __init bootmem_init(void)
> {
> - unsigned long reserved_end;
> phys_addr_t ramstart = PHYS_ADDR_MAX;
> int i;
>
> @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> * will reserve the area used for the initrd.
> */
> init_initrd();
> - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
>
> - memblock_reserve(PHYS_OFFSET,
> - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> + /* Reserve memory occupied by kernel. */
> + memblock_reserve(__pa_symbol(&_text),
> + __pa_symbol(&_end) - __pa_symbol(&_text));
>
> /*
> * max_low_pfn is not a number of pages. The number of pages

With some debug code added:

Determined physical RAM map:
memory: 08000000 @ 00000000 (usable)
bootmem_init:390: PHYS_OFFSET = 0x0
bootmem_init:391: __pa_symbol(&_text) = 0x100000
bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8
bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8

Hence the old code reserved 1 MiB extra at the beginning.

Note that the new code also dropped the rounding up of the memory block
size to a multiple of PAGE_SIZE. I'm not sure the latter actually
matters or not.

Do you have a clue? Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-05-21 15:54:56

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Geert,

On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> Hi Serge,
>
> On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > The reserved_end variable had been used by the bootmem_init() code
> > to find a lowest limit of memory available for memmap blob. The original
> > code just tried to find a free memory space higher than kernel was placed.
> > This limitation seems justified for the memmap ragion search process, but
> > I can't see any obvious reason to reserve the unused space below kernel
> > seeing some platforms place it much higher than standard 1MB. Moreover
> > the RELOCATION config enables it to be loaded at any memory address.
> > So lets reserve the memory occupied by the kernel only, leaving the region
> > below being free for allocations. After doing this we can now discard the
> > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > since it's going to be free anyway (unless marked as reserved by
> > platforms).
> >
> > Signed-off-by: Serge Semin <[email protected]>
>
> This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
>
> VFS: Mounted root (nfs filesystem) on device 0:13.
> devtmpfs: mounted
> BUG: Bad page state in process swapper pfn:00001
> page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> flags: 0x0()
> raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> page dumped because: nonzero mapcount
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Not tainted
> 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> 00000001 804a3490
> 00000001 81000000 0030f231 80148558 00000003 10008400
> 87c1dd80 7599ee13
> 00000000 00000000 804b0000 00000000 00000007 00000000
> 00000085 00000000
> 62722d31 00000084 804b0000 39347874 00000000 804b7820
> 8040cef8 81000010
> 00000001 00000007 00000001 81000000 00000008 8021de24
> 00000000 804a0000
> ...
> Call Trace:
> [<8010adec>] show_stack+0x74/0x104
> [<801a5e44>] bad_page+0x130/0x138
> [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> [<801a789c>] free_unref_page+0x40/0x68
> [<801120f4>] free_init_pages+0xec/0x104
> [<803bdde8>] free_initmem+0x10/0x58
> [<803bdb8c>] kernel_init+0x20/0x100
> [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> Disabling lock debugging due to kernel taint
> BUG: Bad page state in process swapper pfn:00002
> [...]
>
> CONFIG_RELOCATABLE is not set, so the only relevant part is the
> change quoted below.
>
> > --- a/arch/mips/kernel/setup.c
> > +++ b/arch/mips/kernel/setup.c
> > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> >
> > static void __init bootmem_init(void)
> > {
> > - unsigned long reserved_end;
> > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > int i;
> >
> > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > * will reserve the area used for the initrd.
> > */
> > init_initrd();
> > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> >
> > - memblock_reserve(PHYS_OFFSET,
> > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > + /* Reserve memory occupied by kernel. */
> > + memblock_reserve(__pa_symbol(&_text),
> > + __pa_symbol(&_end) - __pa_symbol(&_text));
> >
> > /*
> > * max_low_pfn is not a number of pages. The number of pages
>
> With some debug code added:
>
> Determined physical RAM map:
> memory: 08000000 @ 00000000 (usable)
> bootmem_init:390: PHYS_OFFSET = 0x0
> bootmem_init:391: __pa_symbol(&_text) = 0x100000
> bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8
> bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8

Have you tried adding memblock=debug to the command line?
Not sure it'll help, but still :)

> Hence the old code reserved 1 MiB extra at the beginning.
>
> Note that the new code also dropped the rounding up of the memory block
> size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> matters or not.

I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.

> Do you have a clue? Thanks!
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>

--
Sincerely yours,
Mike.


2019-05-21 16:41:26

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hello Geert, Mike

On Tue, May 21, 2019 at 06:53:10PM +0300, Mike Rapoport wrote:
> Hi Geert,
>
> On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > Hi Serge,
> >
> > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > The reserved_end variable had been used by the bootmem_init() code
> > > to find a lowest limit of memory available for memmap blob. The original
> > > code just tried to find a free memory space higher than kernel was placed.
> > > This limitation seems justified for the memmap ragion search process, but
> > > I can't see any obvious reason to reserve the unused space below kernel
> > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > the RELOCATION config enables it to be loaded at any memory address.
> > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > below being free for allocations. After doing this we can now discard the
> > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > since it's going to be free anyway (unless marked as reserved by
> > > platforms).
> > >
> > > Signed-off-by: Serge Semin <[email protected]>
> >
> > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> >
> > VFS: Mounted root (nfs filesystem) on device 0:13.
> > devtmpfs: mounted
> > BUG: Bad page state in process swapper pfn:00001
> > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > flags: 0x0()
> > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > page dumped because: nonzero mapcount
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper Not tainted
> > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > 00000001 804a3490
> > 00000001 81000000 0030f231 80148558 00000003 10008400
> > 87c1dd80 7599ee13
> > 00000000 00000000 804b0000 00000000 00000007 00000000
> > 00000085 00000000
> > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > 8040cef8 81000010
> > 00000001 00000007 00000001 81000000 00000008 8021de24
> > 00000000 804a0000
> > ...
> > Call Trace:
> > [<8010adec>] show_stack+0x74/0x104
> > [<801a5e44>] bad_page+0x130/0x138
> > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > [<801a789c>] free_unref_page+0x40/0x68
> > [<801120f4>] free_init_pages+0xec/0x104
> > [<803bdde8>] free_initmem+0x10/0x58
> > [<803bdb8c>] kernel_init+0x20/0x100
> > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > Disabling lock debugging due to kernel taint
> > BUG: Bad page state in process swapper pfn:00002
> > [...]
> >

The root cause of the problem most likely is in prom_free_prom_memory() method of
arch/mips/txx9/generic/setup.c:
void __init prom_free_prom_memory(void)
{
unsigned long saddr = PAGE_SIZE;
unsigned long eaddr = __pa_symbol(&_text);

if (saddr < eaddr)
free_init_pages("prom memory", saddr, eaddr);
}

As you can see the txx9 platform tries to free a memory which isn't reserved
and set free from the very beginning due to the patch above. So as soon as you
remove the free_init_pages("prom memory", ...) the problem shall be fixed.
Could you try it and send a result to us whether it helped?

Regards,
-Sergey

> > CONFIG_RELOCATABLE is not set, so the only relevant part is the
> > change quoted below.
> >
> > > --- a/arch/mips/kernel/setup.c
> > > +++ b/arch/mips/kernel/setup.c
> > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> > >
> > > static void __init bootmem_init(void)
> > > {
> > > - unsigned long reserved_end;
> > > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > > int i;
> > >
> > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > > * will reserve the area used for the initrd.
> > > */
> > > init_initrd();
> > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> > >
> > > - memblock_reserve(PHYS_OFFSET,
> > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > > + /* Reserve memory occupied by kernel. */
> > > + memblock_reserve(__pa_symbol(&_text),
> > > + __pa_symbol(&_end) - __pa_symbol(&_text));
> > >
> > > /*
> > > * max_low_pfn is not a number of pages. The number of pages
> >
> > With some debug code added:
> >
> > Determined physical RAM map:
> > memory: 08000000 @ 00000000 (usable)
> > bootmem_init:390: PHYS_OFFSET = 0x0
> > bootmem_init:391: __pa_symbol(&_text) = 0x100000
> > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8
> > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8
>
> Have you tried adding memblock=debug to the command line?
> Not sure it'll help, but still :)
>
> > Hence the old code reserved 1 MiB extra at the beginning.
> >
> > Note that the new code also dropped the rounding up of the memory block
> > size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> > matters or not.
>
> I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.
>
> > Do you have a clue? Thanks!
> >
> > Gr{oetje,eeting}s,
> >
> > Geert
> >
> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
> >
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> > -- Linus Torvalds
> >
>
> --
> Sincerely yours,
> Mike.
>

2019-05-22 07:48:42

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Mike,

On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <[email protected]> wrote:
> On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > The reserved_end variable had been used by the bootmem_init() code
> > > to find a lowest limit of memory available for memmap blob. The original
> > > code just tried to find a free memory space higher than kernel was placed.
> > > This limitation seems justified for the memmap ragion search process, but
> > > I can't see any obvious reason to reserve the unused space below kernel
> > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > the RELOCATION config enables it to be loaded at any memory address.
> > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > below being free for allocations. After doing this we can now discard the
> > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > since it's going to be free anyway (unless marked as reserved by
> > > platforms).
> > >
> > > Signed-off-by: Serge Semin <[email protected]>
> >
> > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> >
> > VFS: Mounted root (nfs filesystem) on device 0:13.
> > devtmpfs: mounted
> > BUG: Bad page state in process swapper pfn:00001
> > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > flags: 0x0()
> > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > page dumped because: nonzero mapcount
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper Not tainted
> > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > 00000001 804a3490
> > 00000001 81000000 0030f231 80148558 00000003 10008400
> > 87c1dd80 7599ee13
> > 00000000 00000000 804b0000 00000000 00000007 00000000
> > 00000085 00000000
> > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > 8040cef8 81000010
> > 00000001 00000007 00000001 81000000 00000008 8021de24
> > 00000000 804a0000
> > ...
> > Call Trace:
> > [<8010adec>] show_stack+0x74/0x104
> > [<801a5e44>] bad_page+0x130/0x138
> > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > [<801a789c>] free_unref_page+0x40/0x68
> > [<801120f4>] free_init_pages+0xec/0x104
> > [<803bdde8>] free_initmem+0x10/0x58
> > [<803bdb8c>] kernel_init+0x20/0x100
> > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > Disabling lock debugging due to kernel taint
> > BUG: Bad page state in process swapper pfn:00002
> > [...]
> >
> > CONFIG_RELOCATABLE is not set, so the only relevant part is the
> > change quoted below.
> >
> > > --- a/arch/mips/kernel/setup.c
> > > +++ b/arch/mips/kernel/setup.c
> > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> > >
> > > static void __init bootmem_init(void)
> > > {
> > > - unsigned long reserved_end;
> > > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > > int i;
> > >
> > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > > * will reserve the area used for the initrd.
> > > */
> > > init_initrd();
> > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> > >
> > > - memblock_reserve(PHYS_OFFSET,
> > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > > + /* Reserve memory occupied by kernel. */
> > > + memblock_reserve(__pa_symbol(&_text),
> > > + __pa_symbol(&_end) - __pa_symbol(&_text));
> > >
> > > /*
> > > * max_low_pfn is not a number of pages. The number of pages
> >
> > With some debug code added:
> >
> > Determined physical RAM map:
> > memory: 08000000 @ 00000000 (usable)
> > bootmem_init:390: PHYS_OFFSET = 0x0
> > bootmem_init:391: __pa_symbol(&_text) = 0x100000
> > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8
> > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8
>
> Have you tried adding memblock=debug to the command line?
> Not sure it'll help, but still :)

Thanks! Output below...

Determined physical RAM map:
memory: 08000000 @ 00000000 (usable)
+memblock_reserve: [0x00100000-0x004b77c7] setup_arch+0x258/0x8e4
Initrd not found or empty - disabling initrd
+memblock_reserve: [0x00448000-0x00447fff] setup_arch+0x5ac/0x8e4
+MEMBLOCK configuration:
+ memory size = 0x08000000 reserved size = 0x003b77c8
+ memory.cnt = 0x1
+ memory[0x0] [0x00000000-0x07ffffff], 0x08000000 bytes on node 0 flags: 0x0
+ reserved.cnt = 0x1
+ reserved[0x0] [0x00100000-0x004b77c7], 0x003b77c8 bytes flags: 0x0
+memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 setup_arch+0x7ec/0x8e4
+memblock_reserve: [0x004b77e0-0x004b77ff] memblock_alloc_range_nid+0x130/0x178
Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes
Zone ranges:
@@ -16,10 +26,48 @@ Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000000000-0x0000000007ffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff]
+memblock_alloc_try_nid: 1048576 bytes align=0x20 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.31+0x6c/0xc8
+memblock_reserve: [0x004b7800-0x005b77ff] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 16 bytes align=0x20 nid=0 from=0x00000000
max_addr=0x00000000 setup_usemap.isra.13+0x68/0xa0
+memblock_reserve: [0x005b7800-0x005b780f] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 start_kernel+0xb0/0x508
+memblock_reserve: [0x005b7820-0x005b7893] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 start_kernel+0xf0/0x508
+memblock_reserve: [0x005b78a0-0x005b7913] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 start_kernel+0x114/0x508
+memblock_reserve: [0x005b7920-0x005b7993] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_alloc_info+0x60/0xb8
+memblock_reserve: [0x005b8000-0x005b8fff] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 32768 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 setup_per_cpu_areas+0x38/0xa8
+memblock_reserve: [0x01000000-0x01007fff] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_setup_first_chunk+0x200/0x588
+memblock_reserve: [0x005b79a0-0x005b79a3] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_setup_first_chunk+0x22c/0x588
+memblock_reserve: [0x005b79c0-0x005b79c3] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_setup_first_chunk+0x250/0x588
+memblock_reserve: [0x005b79e0-0x005b79e3] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_setup_first_chunk+0x288/0x588
+memblock_reserve: [0x005b7a00-0x005b7a03] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 120 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_setup_first_chunk+0x47c/0x588
+memblock_reserve: [0x005b7a20-0x005b7a97] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 89 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_alloc_first_chunk+0x88/0x2e0
+memblock_reserve: [0x005b7aa0-0x005b7af8] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 1024 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_alloc_first_chunk+0xd8/0x2e0
+memblock_reserve: [0x005b7b00-0x005b7eff] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 1028 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_alloc_first_chunk+0x118/0x2e0
+memblock_reserve: [0x005b9000-0x005b9403] memblock_alloc_range_nid+0x130/0x178
+memblock_alloc_try_nid: 256 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 pcpu_alloc_first_chunk+0x154/0x2e0
+memblock_reserve: [0x005b7f00-0x005b7fff] memblock_alloc_range_nid+0x130/0x178
+ memblock_free: [0x005b8000-0x005b8fff] start_kernel+0x164/0x508
Built 1 zonelists, mobility grouping on. Total pages: 32512
-Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw
nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3
+Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw
nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3
memblock=debug
+memblock_alloc_try_nid: 65536 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 alloc_large_system_hash+0x270/0x478
+memblock_reserve: [0x005b9420-0x005c941f] memblock_alloc_range_nid+0x130/0x178
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
+memblock_alloc_try_nid: 32768 bytes align=0x20 nid=-1 from=0x00000000
max_addr=0x00000000 alloc_large_system_hash+0x270/0x478
+memblock_reserve: [0x005c9420-0x005d141f] memblock_alloc_range_nid+0x130/0x178
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
+memblock_reserve: [0x00000000-0x000003ff] trap_init+0x58/0x474
Memory: 126100K/131072K available (2830K kernel code, 147K rwdata,
508K rodata, 220K init, 93K bss, 4972K reserved, 0K cma-reserved)

> > Hence the old code reserved 1 MiB extra at the beginning.
> >
> > Note that the new code also dropped the rounding up of the memory block
> > size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> > matters or not.
>
> I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.

Yes, by prom_free_prom_memory(), as pointed out by Serge.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-05-22 07:52:00

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Serge,

On Tue, May 21, 2019 at 6:39 PM Serge Semin <[email protected]> wrote:
> On Tue, May 21, 2019 at 06:53:10PM +0300, Mike Rapoport wrote:
> > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > > The reserved_end variable had been used by the bootmem_init() code
> > > > to find a lowest limit of memory available for memmap blob. The original
> > > > code just tried to find a free memory space higher than kernel was placed.
> > > > This limitation seems justified for the memmap ragion search process, but
> > > > I can't see any obvious reason to reserve the unused space below kernel
> > > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > > the RELOCATION config enables it to be loaded at any memory address.
> > > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > > below being free for allocations. After doing this we can now discard the
> > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > > since it's going to be free anyway (unless marked as reserved by
> > > > platforms).
> > > >
> > > > Signed-off-by: Serge Semin <[email protected]>
> > >
> > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> > >
> > > VFS: Mounted root (nfs filesystem) on device 0:13.
> > > devtmpfs: mounted
> > > BUG: Bad page state in process swapper pfn:00001
> > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > > flags: 0x0()
> > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > > page dumped because: nonzero mapcount
> > > Modules linked in:
> > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > > 00000001 804a3490
> > > 00000001 81000000 0030f231 80148558 00000003 10008400
> > > 87c1dd80 7599ee13
> > > 00000000 00000000 804b0000 00000000 00000007 00000000
> > > 00000085 00000000
> > > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > > 8040cef8 81000010
> > > 00000001 00000007 00000001 81000000 00000008 8021de24
> > > 00000000 804a0000
> > > ...
> > > Call Trace:
> > > [<8010adec>] show_stack+0x74/0x104
> > > [<801a5e44>] bad_page+0x130/0x138
> > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > > [<801a789c>] free_unref_page+0x40/0x68
> > > [<801120f4>] free_init_pages+0xec/0x104
> > > [<803bdde8>] free_initmem+0x10/0x58
> > > [<803bdb8c>] kernel_init+0x20/0x100
> > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > > Disabling lock debugging due to kernel taint
> > > BUG: Bad page state in process swapper pfn:00002
> > > [...]
> > >
>
> The root cause of the problem most likely is in prom_free_prom_memory() method of
> arch/mips/txx9/generic/setup.c:
> void __init prom_free_prom_memory(void)
> {
> unsigned long saddr = PAGE_SIZE;
> unsigned long eaddr = __pa_symbol(&_text);
>
> if (saddr < eaddr)
> free_init_pages("prom memory", saddr, eaddr);
> }
>
> As you can see the txx9 platform tries to free a memory which isn't reserved
> and set free from the very beginning due to the patch above. So as soon as you
> remove the free_init_pages("prom memory", ...) the problem shall be fixed.
> Could you try it and send a result to us whether it helped?

Thanks, that does the trick!
Will send a patch shortly...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-05-22 08:10:17

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote:
> Hi Mike,
>
> On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <[email protected]> wrote:
> > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > > The reserved_end variable had been used by the bootmem_init() code
> > > > to find a lowest limit of memory available for memmap blob. The original
> > > > code just tried to find a free memory space higher than kernel was placed.
> > > > This limitation seems justified for the memmap ragion search process, but
> > > > I can't see any obvious reason to reserve the unused space below kernel
> > > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > > the RELOCATION config enables it to be loaded at any memory address.
> > > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > > below being free for allocations. After doing this we can now discard the
> > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > > since it's going to be free anyway (unless marked as reserved by
> > > > platforms).
> > > >
> > > > Signed-off-by: Serge Semin <[email protected]>
> > >
> > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> > >
> > > VFS: Mounted root (nfs filesystem) on device 0:13.
> > > devtmpfs: mounted
> > > BUG: Bad page state in process swapper pfn:00001
> > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > > flags: 0x0()
> > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > > page dumped because: nonzero mapcount
> > > Modules linked in:
> > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > > 00000001 804a3490
> > > 00000001 81000000 0030f231 80148558 00000003 10008400
> > > 87c1dd80 7599ee13
> > > 00000000 00000000 804b0000 00000000 00000007 00000000
> > > 00000085 00000000
> > > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > > 8040cef8 81000010
> > > 00000001 00000007 00000001 81000000 00000008 8021de24
> > > 00000000 804a0000
> > > ...
> > > Call Trace:
> > > [<8010adec>] show_stack+0x74/0x104
> > > [<801a5e44>] bad_page+0x130/0x138
> > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > > [<801a789c>] free_unref_page+0x40/0x68
> > > [<801120f4>] free_init_pages+0xec/0x104
> > > [<803bdde8>] free_initmem+0x10/0x58
> > > [<803bdb8c>] kernel_init+0x20/0x100
> > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > > Disabling lock debugging due to kernel taint
> > > BUG: Bad page state in process swapper pfn:00002
> > > [...]
> > >
> > > CONFIG_RELOCATABLE is not set, so the only relevant part is the
> > > change quoted below.
> > >
> > > > --- a/arch/mips/kernel/setup.c
> > > > +++ b/arch/mips/kernel/setup.c
> > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> > > >
> > > > static void __init bootmem_init(void)
> > > > {
> > > > - unsigned long reserved_end;
> > > > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > > > int i;
> > > >
> > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > > > * will reserve the area used for the initrd.
> > > > */
> > > > init_initrd();
> > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> > > >
> > > > - memblock_reserve(PHYS_OFFSET,
> > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > > > + /* Reserve memory occupied by kernel. */
> > > > + memblock_reserve(__pa_symbol(&_text),
> > > > + __pa_symbol(&_end) - __pa_symbol(&_text));
> > > >
> > > > /*
> > > > * max_low_pfn is not a number of pages. The number of pages
> > >
> > > With some debug code added:
> > >
> > > Determined physical RAM map:
> > > memory: 08000000 @ 00000000 (usable)
> > > bootmem_init:390: PHYS_OFFSET = 0x0
> > > bootmem_init:391: __pa_symbol(&_text) = 0x100000
> > > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8
> > > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8
> >
> > Have you tried adding memblock=debug to the command line?
> > Not sure it'll help, but still :)
>
> Thanks! Output below...
>
> Determined physical RAM map:
> memory: 08000000 @ 00000000 (usable)
> +memblock_reserve: [0x00100000-0x004b77c7] setup_arch+0x258/0x8e4
> Initrd not found or empty - disabling initrd
> +memblock_reserve: [0x00448000-0x00447fff] setup_arch+0x5ac/0x8e4
> +MEMBLOCK configuration:
> + memory size = 0x08000000 reserved size = 0x003b77c8
> + memory.cnt = 0x1
> + memory[0x0] [0x00000000-0x07ffffff], 0x08000000 bytes on node 0 flags: 0x0
> + reserved.cnt = 0x1
> + reserved[0x0] [0x00100000-0x004b77c7], 0x003b77c8 bytes flags: 0x0
> +memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 setup_arch+0x7ec/0x8e4
> +memblock_reserve: [0x004b77e0-0x004b77ff] memblock_alloc_range_nid+0x130/0x178
> Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
> Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes
> Zone ranges:
> @@ -16,10 +26,48 @@ Movable zone start for each node
> Early memory node ranges
> node 0: [mem 0x0000000000000000-0x0000000007ffffff]
> Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff]
> +memblock_alloc_try_nid: 1048576 bytes align=0x20 nid=0
> from=0x00000000 max_addr=0x00000000
> alloc_node_mem_map.constprop.31+0x6c/0xc8
> +memblock_reserve: [0x004b7800-0x005b77ff] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 16 bytes align=0x20 nid=0 from=0x00000000
> max_addr=0x00000000 setup_usemap.isra.13+0x68/0xa0
> +memblock_reserve: [0x005b7800-0x005b780f] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 start_kernel+0xb0/0x508
> +memblock_reserve: [0x005b7820-0x005b7893] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 start_kernel+0xf0/0x508
> +memblock_reserve: [0x005b78a0-0x005b7913] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 start_kernel+0x114/0x508
> +memblock_reserve: [0x005b7920-0x005b7993] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_alloc_info+0x60/0xb8
> +memblock_reserve: [0x005b8000-0x005b8fff] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 32768 bytes align=0x1000 nid=-1
> from=0x01000000 max_addr=0x00000000 setup_per_cpu_areas+0x38/0xa8
> +memblock_reserve: [0x01000000-0x01007fff] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_setup_first_chunk+0x200/0x588
> +memblock_reserve: [0x005b79a0-0x005b79a3] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_setup_first_chunk+0x22c/0x588
> +memblock_reserve: [0x005b79c0-0x005b79c3] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_setup_first_chunk+0x250/0x588
> +memblock_reserve: [0x005b79e0-0x005b79e3] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_setup_first_chunk+0x288/0x588
> +memblock_reserve: [0x005b7a00-0x005b7a03] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 120 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_setup_first_chunk+0x47c/0x588
> +memblock_reserve: [0x005b7a20-0x005b7a97] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 89 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_alloc_first_chunk+0x88/0x2e0
> +memblock_reserve: [0x005b7aa0-0x005b7af8] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 1024 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_alloc_first_chunk+0xd8/0x2e0
> +memblock_reserve: [0x005b7b00-0x005b7eff] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 1028 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_alloc_first_chunk+0x118/0x2e0
> +memblock_reserve: [0x005b9000-0x005b9403] memblock_alloc_range_nid+0x130/0x178
> +memblock_alloc_try_nid: 256 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 pcpu_alloc_first_chunk+0x154/0x2e0
> +memblock_reserve: [0x005b7f00-0x005b7fff] memblock_alloc_range_nid+0x130/0x178
> + memblock_free: [0x005b8000-0x005b8fff] start_kernel+0x164/0x508
> Built 1 zonelists, mobility grouping on. Total pages: 32512
> -Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw
> nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3
> +Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw
> nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3
> memblock=debug
> +memblock_alloc_try_nid: 65536 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 alloc_large_system_hash+0x270/0x478
> +memblock_reserve: [0x005b9420-0x005c941f] memblock_alloc_range_nid+0x130/0x178
> Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> +memblock_alloc_try_nid: 32768 bytes align=0x20 nid=-1 from=0x00000000
> max_addr=0x00000000 alloc_large_system_hash+0x270/0x478
> +memblock_reserve: [0x005c9420-0x005d141f] memblock_alloc_range_nid+0x130/0x178
> Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> +memblock_reserve: [0x00000000-0x000003ff] trap_init+0x58/0x474
> Memory: 126100K/131072K available (2830K kernel code, 147K rwdata,
> 508K rodata, 220K init, 93K bss, 4972K reserved, 0K cma-reserved)

Presuming your system is !cpu_has_mips_r2_r6 and CAC_BASE is 0 the log
looks completely sane

> > > Hence the old code reserved 1 MiB extra at the beginning.
> > >
> > > Note that the new code also dropped the rounding up of the memory block
> > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> > > matters or not.
> >
> > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.
>
> Yes, by prom_free_prom_memory(), as pointed out by Serge.

I wonder how other MIPS variants would react to the fact that the memory
below the kernel is not reserved ;-)

> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>

--
Sincerely yours,
Mike.

2019-05-22 08:16:23

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Mike,

On Wed, May 22, 2019 at 10:08 AM Mike Rapoport <[email protected]> wrote:
> On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote:
> > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <[email protected]> wrote:
> > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > > > The reserved_end variable had been used by the bootmem_init() code
> > > > > to find a lowest limit of memory available for memmap blob. The original
> > > > > code just tried to find a free memory space higher than kernel was placed.
> > > > > This limitation seems justified for the memmap ragion search process, but
> > > > > I can't see any obvious reason to reserve the unused space below kernel
> > > > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > > > the RELOCATION config enables it to be loaded at any memory address.
> > > > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > > > below being free for allocations. After doing this we can now discard the
> > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > > > since it's going to be free anyway (unless marked as reserved by
> > > > > platforms).
> > > > >
> > > > > Signed-off-by: Serge Semin <[email protected]>
> > > >
> > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> > > >
> > > > VFS: Mounted root (nfs filesystem) on device 0:13.
> > > > devtmpfs: mounted
> > > > BUG: Bad page state in process swapper pfn:00001
> > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > > > flags: 0x0()
> > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > > > page dumped because: nonzero mapcount
> > > > Modules linked in:
> > > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > > > 00000001 804a3490
> > > > 00000001 81000000 0030f231 80148558 00000003 10008400
> > > > 87c1dd80 7599ee13
> > > > 00000000 00000000 804b0000 00000000 00000007 00000000
> > > > 00000085 00000000
> > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > > > 8040cef8 81000010
> > > > 00000001 00000007 00000001 81000000 00000008 8021de24
> > > > 00000000 804a0000
> > > > ...
> > > > Call Trace:
> > > > [<8010adec>] show_stack+0x74/0x104
> > > > [<801a5e44>] bad_page+0x130/0x138
> > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > > > [<801a789c>] free_unref_page+0x40/0x68
> > > > [<801120f4>] free_init_pages+0xec/0x104
> > > > [<803bdde8>] free_initmem+0x10/0x58
> > > > [<803bdb8c>] kernel_init+0x20/0x100
> > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > > > Disabling lock debugging due to kernel taint
> > > > BUG: Bad page state in process swapper pfn:00002
> > > > [...]
> > > >
> > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the
> > > > change quoted below.
> > > >
> > > > > --- a/arch/mips/kernel/setup.c
> > > > > +++ b/arch/mips/kernel/setup.c
> > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> > > > >
> > > > > static void __init bootmem_init(void)
> > > > > {
> > > > > - unsigned long reserved_end;
> > > > > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > > > > int i;
> > > > >
> > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > > > > * will reserve the area used for the initrd.
> > > > > */
> > > > > init_initrd();
> > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> > > > >
> > > > > - memblock_reserve(PHYS_OFFSET,
> > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > > > > + /* Reserve memory occupied by kernel. */
> > > > > + memblock_reserve(__pa_symbol(&_text),
> > > > > + __pa_symbol(&_end) - __pa_symbol(&_text));
> > > > >
> > > > > /*
> > > > > * max_low_pfn is not a number of pages. The number of pages

> > > > Hence the old code reserved 1 MiB extra at the beginning.
> > > >
> > > > Note that the new code also dropped the rounding up of the memory block
> > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> > > > matters or not.
> > >
> > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.
> >
> > Yes, by prom_free_prom_memory(), as pointed out by Serge.
>
> I wonder how other MIPS variants would react to the fact that the memory
> below the kernel is not reserved ;-)

Exactly...

Looks like at least arch/mips/dec/prom/memory.c needs a similar but\
more complicated fix, due to declance handling...


Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-05-22 13:35:44

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hello folks,

On Wed, May 22, 2019 at 10:14:47AM +0200, Geert Uytterhoeven wrote:
> Hi Mike,
>
> On Wed, May 22, 2019 at 10:08 AM Mike Rapoport <[email protected]> wrote:
> > On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote:
> > > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <[email protected]> wrote:
> > > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > > > > The reserved_end variable had been used by the bootmem_init() code
> > > > > > to find a lowest limit of memory available for memmap blob. The original
> > > > > > code just tried to find a free memory space higher than kernel was placed.
> > > > > > This limitation seems justified for the memmap ragion search process, but
> > > > > > I can't see any obvious reason to reserve the unused space below kernel
> > > > > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > > > > the RELOCATION config enables it to be loaded at any memory address.
> > > > > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > > > > below being free for allocations. After doing this we can now discard the
> > > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > > > > since it's going to be free anyway (unless marked as reserved by
> > > > > > platforms).
> > > > > >
> > > > > > Signed-off-by: Serge Semin <[email protected]>
> > > > >
> > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> > > > >
> > > > > VFS: Mounted root (nfs filesystem) on device 0:13.
> > > > > devtmpfs: mounted
> > > > > BUG: Bad page state in process swapper pfn:00001
> > > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > > > > flags: 0x0()
> > > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > > > > page dumped because: nonzero mapcount
> > > > > Modules linked in:
> > > > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > > > > 00000001 804a3490
> > > > > 00000001 81000000 0030f231 80148558 00000003 10008400
> > > > > 87c1dd80 7599ee13
> > > > > 00000000 00000000 804b0000 00000000 00000007 00000000
> > > > > 00000085 00000000
> > > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > > > > 8040cef8 81000010
> > > > > 00000001 00000007 00000001 81000000 00000008 8021de24
> > > > > 00000000 804a0000
> > > > > ...
> > > > > Call Trace:
> > > > > [<8010adec>] show_stack+0x74/0x104
> > > > > [<801a5e44>] bad_page+0x130/0x138
> > > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > > > > [<801a789c>] free_unref_page+0x40/0x68
> > > > > [<801120f4>] free_init_pages+0xec/0x104
> > > > > [<803bdde8>] free_initmem+0x10/0x58
> > > > > [<803bdb8c>] kernel_init+0x20/0x100
> > > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > > > > Disabling lock debugging due to kernel taint
> > > > > BUG: Bad page state in process swapper pfn:00002
> > > > > [...]
> > > > >
> > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the
> > > > > change quoted below.
> > > > >
> > > > > > --- a/arch/mips/kernel/setup.c
> > > > > > +++ b/arch/mips/kernel/setup.c
> > > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> > > > > >
> > > > > > static void __init bootmem_init(void)
> > > > > > {
> > > > > > - unsigned long reserved_end;
> > > > > > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > > > > > int i;
> > > > > >
> > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > > > > > * will reserve the area used for the initrd.
> > > > > > */
> > > > > > init_initrd();
> > > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> > > > > >
> > > > > > - memblock_reserve(PHYS_OFFSET,
> > > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > > > > > + /* Reserve memory occupied by kernel. */
> > > > > > + memblock_reserve(__pa_symbol(&_text),
> > > > > > + __pa_symbol(&_end) - __pa_symbol(&_text));
> > > > > >
> > > > > > /*
> > > > > > * max_low_pfn is not a number of pages. The number of pages
>
> > > > > Hence the old code reserved 1 MiB extra at the beginning.
> > > > >
> > > > > Note that the new code also dropped the rounding up of the memory block
> > > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> > > > > matters or not.
> > > >
> > > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.
> > >
> > > Yes, by prom_free_prom_memory(), as pointed out by Serge.
> >
> > I wonder how other MIPS variants would react to the fact that the memory
> > below the kernel is not reserved ;-)
>
> Exactly...
>
> Looks like at least arch/mips/dec/prom/memory.c needs a similar but\
> more complicated fix, due to declance handling...
>

The problem might be fixed there by the next patch:
---
diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c
index 5073d2ed78bb..5a0c734b5d04 100644
--- a/arch/mips/dec/prom/memory.c
+++ b/arch/mips/dec/prom/memory.c
@@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic)
pmax_setup_memory_region();
else
rex_setup_memory_region();
-}
-
-void __init prom_free_prom_memory(void)
-{
- unsigned long end;
-
- /*
- * Free everything below the kernel itself but leave
- * the first page reserved for the exception handlers.
- */

#if IS_ENABLED(CONFIG_DECLANCE)
/*
- * Leave 128 KB reserved for Lance memory for
- * IOASIC DECstations.
+ * Reserve 128 KB for Lance memory for IOASIC DECstations.
*
* XXX: save this address for use in dec_lance.c?
*/
if (IOASIC)
- end = __pa(&_text) - 0x00020000;
- else
+ memblock_reserve(__pa_symbol(&_text), 0x00020000);
#endif
- end = __pa(&_text);
-
- free_init_pages("unused PROM memory", PAGE_SIZE, end);
}
---

Didn't wanna use prom_FREE_prom_memory to actually reserve a memory
chunk, so I moved the reservation into the prom_meminit() method.

Regarding the first page for the exception handlers. We don't need
to reserve it here, since it is already done in arch/mips/kernel/traps.c .

Regards,
-Sergey


>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

2019-05-22 13:46:22

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

Hi Serge,

On Wed, May 22, 2019 at 3:34 PM Serge Semin <[email protected]> wrote:
> On Wed, May 22, 2019 at 10:14:47AM +0200, Geert Uytterhoeven wrote:
> > On Wed, May 22, 2019 at 10:08 AM Mike Rapoport <[email protected]> wrote:
> > > On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote:
> > > > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <[email protected]> wrote:
> > > > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote:
> > > > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <[email protected]> wrote:
> > > > > > > The reserved_end variable had been used by the bootmem_init() code
> > > > > > > to find a lowest limit of memory available for memmap blob. The original
> > > > > > > code just tried to find a free memory space higher than kernel was placed.
> > > > > > > This limitation seems justified for the memmap ragion search process, but
> > > > > > > I can't see any obvious reason to reserve the unused space below kernel
> > > > > > > seeing some platforms place it much higher than standard 1MB. Moreover
> > > > > > > the RELOCATION config enables it to be loaded at any memory address.
> > > > > > > So lets reserve the memory occupied by the kernel only, leaving the region
> > > > > > > below being free for allocations. After doing this we can now discard the
> > > > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
> > > > > > > since it's going to be free anyway (unless marked as reserved by
> > > > > > > platforms).
> > > > > > >
> > > > > > > Signed-off-by: Serge Semin <[email protected]>
> > > > > >
> > > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel
> > > > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot:
> > > > > >
> > > > > > VFS: Mounted root (nfs filesystem) on device 0:13.
> > > > > > devtmpfs: mounted
> > > > > > BUG: Bad page state in process swapper pfn:00001
> > > > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1
> > > > > > flags: 0x0()
> > > > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000
> > > > > > page dumped because: nonzero mapcount
> > > > > > Modules linked in:
> > > > > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137
> > > > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c
> > > > > > 00000001 804a3490
> > > > > > 00000001 81000000 0030f231 80148558 00000003 10008400
> > > > > > 87c1dd80 7599ee13
> > > > > > 00000000 00000000 804b0000 00000000 00000007 00000000
> > > > > > 00000085 00000000
> > > > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820
> > > > > > 8040cef8 81000010
> > > > > > 00000001 00000007 00000001 81000000 00000008 8021de24
> > > > > > 00000000 804a0000
> > > > > > ...
> > > > > > Call Trace:
> > > > > > [<8010adec>] show_stack+0x74/0x104
> > > > > > [<801a5e44>] bad_page+0x130/0x138
> > > > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0
> > > > > > [<801a789c>] free_unref_page+0x40/0x68
> > > > > > [<801120f4>] free_init_pages+0xec/0x104
> > > > > > [<803bdde8>] free_initmem+0x10/0x58
> > > > > > [<803bdb8c>] kernel_init+0x20/0x100
> > > > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c
> > > > > > Disabling lock debugging due to kernel taint
> > > > > > BUG: Bad page state in process swapper pfn:00002
> > > > > > [...]
> > > > > >
> > > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the
> > > > > > change quoted below.
> > > > > >
> > > > > > > --- a/arch/mips/kernel/setup.c
> > > > > > > +++ b/arch/mips/kernel/setup.c
> > > > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void)
> > > > > > >
> > > > > > > static void __init bootmem_init(void)
> > > > > > > {
> > > > > > > - unsigned long reserved_end;
> > > > > > > phys_addr_t ramstart = PHYS_ADDR_MAX;
> > > > > > > int i;
> > > > > > >
> > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void)
> > > > > > > * will reserve the area used for the initrd.
> > > > > > > */
> > > > > > > init_initrd();
> > > > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
> > > > > > >
> > > > > > > - memblock_reserve(PHYS_OFFSET,
> > > > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
> > > > > > > + /* Reserve memory occupied by kernel. */
> > > > > > > + memblock_reserve(__pa_symbol(&_text),
> > > > > > > + __pa_symbol(&_end) - __pa_symbol(&_text));
> > > > > > >
> > > > > > > /*
> > > > > > > * max_low_pfn is not a number of pages. The number of pages
> >
> > > > > > Hence the old code reserved 1 MiB extra at the beginning.
> > > > > >
> > > > > > Note that the new code also dropped the rounding up of the memory block
> > > > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually
> > > > > > matters or not.
> > > > >
> > > > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M.
> > > >
> > > > Yes, by prom_free_prom_memory(), as pointed out by Serge.
> > >
> > > I wonder how other MIPS variants would react to the fact that the memory
> > > below the kernel is not reserved ;-)
> >
> > Exactly...
> >
> > Looks like at least arch/mips/dec/prom/memory.c needs a similar but\
> > more complicated fix, due to declance handling...
> >
>
> The problem might be fixed there by the next patch:
> ---
> diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c
> index 5073d2ed78bb..5a0c734b5d04 100644
> --- a/arch/mips/dec/prom/memory.c
> +++ b/arch/mips/dec/prom/memory.c
> @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic)
> pmax_setup_memory_region();
> else
> rex_setup_memory_region();
> -}
> -
> -void __init prom_free_prom_memory(void)
> -{
> - unsigned long end;
> -
> - /*
> - * Free everything below the kernel itself but leave
> - * the first page reserved for the exception handlers.
> - */
>
> #if IS_ENABLED(CONFIG_DECLANCE)
> /*
> - * Leave 128 KB reserved for Lance memory for
> - * IOASIC DECstations.
> + * Reserve 128 KB for Lance memory for IOASIC DECstations.
> *
> * XXX: save this address for use in dec_lance.c?
> */
> if (IOASIC)
> - end = __pa(&_text) - 0x00020000;
> - else
> + memblock_reserve(__pa_symbol(&_text), 0x00020000);

Shouldn't that be

memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000);

?

> #endif
> - end = __pa(&_text);
> -
> - free_init_pages("unused PROM memory", PAGE_SIZE, end);
> }
> ---
>
> Didn't wanna use prom_FREE_prom_memory to actually reserve a memory
> chunk, so I moved the reservation into the prom_meminit() method.

I guess Maciej will test this on real hardware, eventually...

> Regarding the first page for the exception handlers. We don't need
> to reserve it here, since it is already done in arch/mips/kernel/traps.c .

Thanks for checking! That was actually something I was still wondering
about...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-05-22 13:56:58

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

On Wed, May 22, 2019 at 03:44:47PM +0200, Geert Uytterhoeven wrote:
> Hi Serge,
>
> ...
>
> >
> > The problem might be fixed there by the next patch:
> > ---
> > diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c
> > index 5073d2ed78bb..5a0c734b5d04 100644
> > --- a/arch/mips/dec/prom/memory.c
> > +++ b/arch/mips/dec/prom/memory.c
> > @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic)
> > pmax_setup_memory_region();
> > else
> > rex_setup_memory_region();
> > -}
> > -
> > -void __init prom_free_prom_memory(void)
> > -{
> > - unsigned long end;
> > -
> > - /*
> > - * Free everything below the kernel itself but leave
> > - * the first page reserved for the exception handlers.
> > - */
> >
> > #if IS_ENABLED(CONFIG_DECLANCE)
> > /*
> > - * Leave 128 KB reserved for Lance memory for
> > - * IOASIC DECstations.
> > + * Reserve 128 KB for Lance memory for IOASIC DECstations.
> > *
> > * XXX: save this address for use in dec_lance.c?
> > */
> > if (IOASIC)
> > - end = __pa(&_text) - 0x00020000;
> > - else
> > + memblock_reserve(__pa_symbol(&_text), 0x00020000);
>
> Shouldn't that be
>
> memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000);
>
> ?
>

Right. Thanks. The updated version of the patch is attached to this email.

-Sergey

> > #endif
> > - end = __pa(&_text);
> > -
> > - free_init_pages("unused PROM memory", PAGE_SIZE, end);
> > }
> > ---
> >
> > Didn't wanna use prom_FREE_prom_memory to actually reserve a memory
> > chunk, so I moved the reservation into the prom_meminit() method.
>
> I guess Maciej will test this on real hardware, eventually...
>
> > Regarding the first page for the exception handlers. We don't need
> > to reserve it here, since it is already done in arch/mips/kernel/traps.c .
>
> Thanks for checking! That was actually something I was still wondering
> about...
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds


Attachments:
(No filename) (2.44 kB)
fix_dec.patch (988.00 B)
Download all attachments

2020-10-14 09:51:40

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

On Wed, 22 May 2019, Serge Semin wrote:

> > > The problem might be fixed there by the next patch:
> > > ---
> > > diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c
> > > index 5073d2ed78bb..5a0c734b5d04 100644
> > > --- a/arch/mips/dec/prom/memory.c
> > > +++ b/arch/mips/dec/prom/memory.c
> > > @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic)
> > > pmax_setup_memory_region();
> > > else
> > > rex_setup_memory_region();
> > > -}
> > > -
> > > -void __init prom_free_prom_memory(void)
> > > -{
> > > - unsigned long end;
> > > -
> > > - /*
> > > - * Free everything below the kernel itself but leave
> > > - * the first page reserved for the exception handlers.
> > > - */
> > >
> > > #if IS_ENABLED(CONFIG_DECLANCE)
> > > /*
> > > - * Leave 128 KB reserved for Lance memory for
> > > - * IOASIC DECstations.
> > > + * Reserve 128 KB for Lance memory for IOASIC DECstations.
> > > *
> > > * XXX: save this address for use in dec_lance.c?
> > > */
> > > if (IOASIC)
> > > - end = __pa(&_text) - 0x00020000;
> > > - else
> > > + memblock_reserve(__pa_symbol(&_text), 0x00020000);
> >
> > Shouldn't that be
> >
> > memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000);
> >
> > ?
> >
>
> Right. Thanks. The updated version of the patch is attached to this email.
>
> -Sergey
>
> > > #endif
> > > - end = __pa(&_text);
> > > -
> > > - free_init_pages("unused PROM memory", PAGE_SIZE, end);
> > > }
> > > ---
> > >
> > > Didn't wanna use prom_FREE_prom_memory to actually reserve a memory
> > > chunk, so I moved the reservation into the prom_meminit() method.
> >
> > I guess Maciej will test this on real hardware, eventually...

I finally got to it as I was hit by it the hard way (and I do have kept
the thread in my inbox!), however this is the wrong fix.

With DEC hardware the whole 128KiB region is reserved as firmware working
memory area and we call into the firmware throughout bootstrap on several
occasions. Therefore we have to stay away from it until we know we won't
need any firmware services any longer, which is at this point. So this
piece has to stay, and the removed reservation has to be reinstated in
platform code. I'll be posting a fix separately.

NB I suspect CFE platforms may need a similar fix, but I don't have
access to my SWARM now, so I'll verify it on another occasion.

Other platforms may need it too; at least up to a point an assumption was
the kernel load address is just at the end of any firmware working area
typically allocated right beyond the exception vector area, hence the
reservation. I realise the assumption may have changed at one point and
the oldtimers who have known it may have been away or not paying enough
attention while the newcomers did not realise that.

Maciej