2021-12-10 06:56:37

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following 10 patches:

0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
0005 makes functions reserve_crashkernel[_low]() generic.
0006-0007 reimplements arm64 crashkernel=X.
0008-0009 adds memory for devices by DT property linux,usable-memory-range.
0010 updates the doc.

Changes since [v16]
- Because no functional changes in this version, so add
"Tested-by: Dave Kleikamp <[email protected]>" for patch 1-9
- Add "Reviewed-by: Rob Herring <[email protected]>" for patch 8
- Update patch 9 based on the review comments of Rob Herring
- As Catalin Marinas's suggestion, merge the implementation of
ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
contents of X86 and ARM64 do not overlap, and reduce unnecessary
temporary differences.

Changes since [v15]
- Aggregate the processing of "linux,usable-memory-range" into one function.
Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
has removed the architecture-specific code, extend the property "linux,usable-memory-range"
in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037
[v13]: https://lkml.org/lkml/2020/10/31/34
[v14]: https://lkml.org/lkml/2021/1/30/53
[v15]: https://lkml.org/lkml/2021/10/19/1405
[v16]: https://lkml.org/lkml/2021/11/23/435


Chen Zhou (9):
x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
x86: kdump: make the lower bound of crash kernel reservation
consistent
x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
reserve_crashkernel()
x86: kdump: move xen_pv_domain() check and insert_resource() to
setup_arch()
x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
arm64: kdump: introduce some macros for crash kernel reservation
arm64: kdump: reimplement crashkernel=X
of: fdt: Add memory for devices by DT property
"linux,usable-memory-range"
kdump: update Documentation about crashkernel

Zhen Lei (1):
of: fdt: Aggregate the processing of "linux,usable-memory-range"

Documentation/admin-guide/kdump/kdump.rst | 11 +-
.../admin-guide/kernel-parameters.txt | 11 +-
arch/Kconfig | 3 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kexec.h | 10 ++
arch/arm64/kernel/machine_kexec_file.c | 12 +-
arch/arm64/kernel/setup.c | 13 +-
arch/arm64/mm/init.c | 59 ++-----
arch/x86/Kconfig | 2 +
arch/x86/include/asm/elf.h | 3 +
arch/x86/include/asm/kexec.h | 31 +++-
arch/x86/kernel/setup.c | 163 ++----------------
drivers/of/fdt.c | 42 +++--
include/linux/crash_core.h | 3 +
include/linux/kexec.h | 2 -
kernel/crash_core.c | 156 +++++++++++++++++
kernel/kexec_core.c | 17 --
17 files changed, 301 insertions(+), 238 deletions(-)

--
2.25.1



2021-12-10 06:56:47

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

From: Chen Zhou <[email protected]>

Move CRASH_ALIGN to header asm/kexec.h for later use.

Suggested-by: Dave Young <[email protected]>
Suggested-by: Baoquan He <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/x86/include/asm/kexec.h | 3 +++
arch/x86/kernel/setup.c | 3 ---
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 11b7c06e2828c30..3a22e65262aa70b 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -18,6 +18,9 @@

# define KEXEC_CONTROL_CODE_MAX_SIZE 2048

+/* 16M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_16M
+
#ifndef __ASSEMBLY__

#include <linux/string.h>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6a190c7f4d71b05..5cc60996eac56d6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)

#ifdef CONFIG_KEXEC_CORE

-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN SZ_16M
-
/*
* Keep the crash kernel below this limit.
*
--
2.25.1


2021-12-10 06:56:50

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

From: Chen Zhou <[email protected]>

The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.

Suggested-by: Dave Young <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/x86/kernel/setup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5cc60996eac56d6..6424ee4f23da2cf 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
return 0;
}

- low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
+ low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_LOW_MAX);
if (!low_base) {
pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
(unsigned long)(low_size >> 20));
--
2.25.1


2021-12-10 06:56:53

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 07/10] arm64: kdump: reimplement crashkernel=X

From: Chen Zhou <[email protected]>

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.

Another minor change, there may be two regions reserved for crash
dump kernel, in order to distinct from the high region and make no
effect to the use of existing kexec-tools, rename the low region as
"Crash kernel (low)".

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kexec.h | 4 ++
arch/arm64/kernel/machine_kexec_file.c | 12 +++++-
arch/arm64/kernel/setup.c | 13 +++++-
arch/arm64/mm/init.c | 59 +++++---------------------
5 files changed, 38 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17ffb..4b99efa36da3793 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -95,6 +95,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+ select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select ARCH_WANTS_NO_INSTR
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARM_AMBA
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 1b9edc69f0244ca..3bde0079925d771 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
static inline void crash_post_resume(void) {}
#endif

+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
#if defined(CONFIG_KEXEC_CORE)
void cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
unsigned long arg0, unsigned long arg1,
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 63634b4d72c158f..6f3fa059ca4e816 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)

/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+ if (ret)
+ goto out;
+
+ if (crashk_low_res.end) {
+ ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+ if (ret)
+ goto out;
+ }

- if (!ret)
- ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
+ ret = crash_prepare_elf64_headers(cmem, true, addr, sz);

+out:
kfree(cmem);
return ret;
}
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index be5f85b0a24de69..4bb2e55366be64d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -248,7 +248,18 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
#ifdef CONFIG_KEXEC_CORE
- /* Userspace will find "Crash kernel" region in /proc/iomem. */
+ /*
+ * Userspace will find "Crash kernel" or "Crash kernel (low)"
+ * region in /proc/iomem.
+ * In order to distinct from the high region and make no effect
+ * to the use of existing kexec-tools, rename the low region as
+ * "Crash kernel (low)".
+ */
+ if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+ crashk_low_res.end <= res->end) {
+ crashk_low_res.name = "Crash kernel (low)";
+ request_resource(res, &crashk_low_res);
+ }
if (crashk_res.end && crashk_res.start >= res->start &&
crashk_res.end <= res->end)
request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index be4595dc7459115..85c83e4eff2b6c4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
#include <asm/fixmap.h>
#include <asm/kasan.h>
#include <asm/kernel-pgtable.h>
+#include <asm/kexec.h>
#include <asm/kvm_host.h>
#include <asm/memory.h>
#include <asm/numa.h>
@@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr);
*/
phys_addr_t arm64_dma_phys_limit __ro_after_init;

-#ifdef CONFIG_KEXEC_CORE
-/*
- * reserve_crashkernel() - reserves memory for crash kernel
- *
- * This function reserves memory area given in "crashkernel=" kernel command
- * line parameter. The memory reserved is used by dump capture kernel when
- * primary kernel is crashing.
- */
+#ifndef CONFIG_KEXEC_CORE
static void __init reserve_crashkernel(void)
{
- unsigned long long crash_base, crash_size;
- unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
- int ret;
-
- ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
- &crash_size, &crash_base);
- /* no crashkernel= or invalid value specified */
- if (ret || !crash_size)
- return;
-
- crash_size = PAGE_ALIGN(crash_size);
-
- /* User specifies base address explicitly. */
- if (crash_base)
- crash_max = crash_base + crash_size;
-
- /* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
- crash_base, crash_max);
- if (!crash_base) {
- pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
- crash_size);
- return;
- }
-
- pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
- crash_base, crash_base + crash_size, crash_size >> 20);
-
- /*
- * The crashkernel memory will be removed from the kernel linear
- * map. Inform kmemleak so that it won't try to access it.
- */
- kmemleak_ignore_phys(crash_base);
- crashk_res.start = crash_base;
- crashk_res.end = crash_base + crash_size - 1;
}
-#else
-static void __init reserve_crashkernel(void)
-{
-}
-#endif /* CONFIG_KEXEC_CORE */
+#endif

/*
* Return the maximum physical address for a zone accessible by the given bits
@@ -362,6 +317,14 @@ void __init bootmem_init(void)
* reserved, so do it here.
*/
reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+ /*
+ * The low region is intended to be used for crash dump kernel devices,
+ * just mark the low region as "nomap" simply.
+ */
+ if (crashk_low_res.end)
+ memblock_mark_nomap(crashk_low_res.start, resource_size(&crashk_low_res));
+#endif

memblock_dump_all();
}
--
2.25.1


2021-12-10 06:56:56

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

From: Chen Zhou <[email protected]>

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
drivers/of/fdt.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 37b477a51175359..f7b72fa773250ad 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)

static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;

+/*
+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES 2
+
/**
* early_init_dt_check_for_usable_mem_range - Decode usable memory range
* location from flat tree
@@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
*/
static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
{
- const __be32 *prop;
- int len;
- phys_addr_t cap_mem_addr;
- phys_addr_t cap_mem_size;
+ struct memblock_region rgn[MAX_USABLE_RANGES] = {0};
+ const __be32 *prop, *endp;
+ int len, i;

if ((long)node < 0)
return;
@@ -985,16 +993,21 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
pr_debug("Looking for usable-memory-range property... ");

prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
- if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells)))
return;

- cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);
- cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
+ endp = prop + (len / sizeof(__be32));
+ for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) {
+ rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop);
+ rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop);

- pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
- &cap_mem_size);
+ pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
+ i, &rgn[i].base, &rgn[i].size);
+ }

- memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+ memblock_cap_memory_range(rgn[0].base, rgn[0].size);
+ for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++)
+ memblock_add(rgn[i].base, rgn[i].size);
}

#ifdef CONFIG_SERIAL_EARLYCON
--
2.25.1


2021-12-10 06:57:01

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 10/10] kdump: update Documentation about crashkernel

From: Chen Zhou <[email protected]>

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone and fall back to high allocation
if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
Documentation/admin-guide/kdump/kdump.rst | 11 +++++++++--
Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++--
2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3df27c9b2..d4c287044be0c70 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -361,8 +361,15 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.

- On arm64, use "crashkernel=Y[@X]". Note that the start address of
- the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+ On arm64, use "crashkernel=X" to try low allocation in DMA zone and
+ fall back to high allocation if it fails.
+ We can also use "crashkernel=X,high" to select a high region above
+ DMA zone, which also tries to allocate at least 256M low memory in
+ DMA zone automatically.
+ "crashkernel=Y,low" can be used to allocate specified size low memory.
+ Use "crashkernel=Y@X" if you really have to reserve memory from
+ specified start address X. Note that the start address of the kernel,
+ X if explicitly specified, must be aligned to 2MiB (0x200000).

Load the Dump-capture Kernel
============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d46db..91f3a8dc537d404 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -783,6 +783,9 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+ [KNL, ARM64] Try low allocation in DMA zone and fall back
+ to high allocation if it fails when '@offset' hasn't been
+ specified.
See Documentation/admin-guide/kdump/kdump.rst for further details.

crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -799,6 +802,8 @@
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
+ [KNL, ARM64] range in high memory.
+ Allow kernel to allocate physical memory region from top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -807,13 +812,15 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
- at least 256M below 4G automatically.
+ least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+ [KNL, ARM64] range in low memory.
+ This one let user to specify a low range in DMA zone for
+ crash dump kernel.
cryptomgr.notests
[KNL] Disable crypto self-tests

--
2.25.1


2021-12-10 06:57:02

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 06/10] arm64: kdump: introduce some macros for crash kernel reservation

From: Chen Zhou <[email protected]>

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macros instead.

Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/arm64/include/asm/kexec.h | 6 ++++++
arch/arm64/mm/init.c | 4 ++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 9839bfc163d7147..1b9edc69f0244ca 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,12 @@

#define KEXEC_ARCH KEXEC_ARCH_AARCH64

+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
+
#ifndef __ASSEMBLY__

/**
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a8834434af99ae0..be4595dc7459115 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
static void __init reserve_crashkernel(void)
{
unsigned long long crash_base, crash_size;
- unsigned long long crash_max = arm64_dma_phys_limit;
+ unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;

ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
@@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void)
crash_max = crash_base + crash_size;

/* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+ crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
crash_base, crash_max);
if (!crash_base) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
--
2.25.1


2021-12-10 06:57:04

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 08/10] of: fdt: Aggregate the processing of "linux,usable-memory-range"

Currently, we parse the "linux,usable-memory-range" property in
early_init_dt_scan_chosen(), to obtain the specified memory range of the
crash kernel. We then reserve the required memory after
early_init_dt_scan_memory() has identified all available physical memory.
Because the two pieces of code are separated far, the readability and
maintainability are reduced. So bring them together.

Suggested-by: Rob Herring <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
drivers/of/fdt.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index bdca35284cebd56..37b477a51175359 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
elfcorehdr_addr, elfcorehdr_size);
}

-static phys_addr_t cap_mem_addr;
-static phys_addr_t cap_mem_size;
+static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;

/**
* early_init_dt_check_for_usable_mem_range - Decode usable memory range
@@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
{
const __be32 *prop;
int len;
+ phys_addr_t cap_mem_addr;
+ phys_addr_t cap_mem_size;
+
+ if ((long)node < 0)
+ return;

pr_debug("Looking for usable-memory-range property... ");

@@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)

pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
&cap_mem_size);
+
+ memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
}

#ifdef CONFIG_SERIAL_EARLYCON
@@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
(strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
return 0;

+ chosen_node_offset = node;
+
early_init_dt_check_for_initrd(node);
early_init_dt_check_for_elfcorehdr(node);
- early_init_dt_check_for_usable_mem_range(node);

/* Retrieve command line */
p = of_get_flat_dt_prop(node, "bootargs", &l);
@@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);

/* Handle linux,usable-memory-range property */
- memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+ early_init_dt_check_for_usable_mem_range(chosen_node_offset);
}

bool __init early_init_dt_scan(void *params)
--
2.25.1


2021-12-10 06:57:09

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

From: Chen Zhou <[email protected]>

To make the functions reserve_crashkernel() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/kernel/setup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6424ee4f23da2cf..bb2a0973b98059e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
if (!crash_base) {
/*
* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
- * crashkernel=x,high reserves memory over 4G, also allocates
- * 256M extra low memory for DMA buffers and swiotlb.
+ * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+ * also allocates 256M extra low memory for DMA buffers
+ * and swiotlb.
* But the extra memory is not required for all machines.
* So try low memory first and fall back to high memory
* unless "crashkernel=size[KMG],high" is specified.
@@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
}
}

- if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
+ if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
memblock_phys_free(crash_base, crash_size);
return;
}
--
2.25.1


2021-12-10 06:57:14

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

From: Chen Zhou <[email protected]>

Make the functions reserve_crashkernel[_low]() as generic. Since
reserve_crashkernel[_low]() implementations are quite similar on other
architectures as well, we can have more users of this later.

So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
select this by X86.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/Kconfig | 3 +
arch/x86/Kconfig | 2 +
arch/x86/include/asm/elf.h | 3 +
arch/x86/include/asm/kexec.h | 28 ++++++-
arch/x86/kernel/setup.c | 143 +-------------------------------
include/linux/crash_core.h | 3 +
include/linux/kexec.h | 2 -
kernel/crash_core.c | 156 +++++++++++++++++++++++++++++++++++
kernel/kexec_core.c | 17 ----
9 files changed, 194 insertions(+), 163 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d3c4ab249e9c275..7bdb32c41985dc5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,9 @@ config KEXEC_ELF
config HAVE_IMA_KEXEC
bool

+config ARCH_WANT_RESERVE_CRASH_KERNEL
+ bool
+
config SET_FS
bool

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2efb86..bd78ed8193079b9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -12,6 +12,7 @@ config X86_32
depends on !64BIT
# Options that are inherently 32-bit kernel only:
select ARCH_WANT_IPC_PARSE_VERSION
+ select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select CLKSRC_I8253
select CLONE_BACKWARDS
select GENERIC_VDSO_32
@@ -28,6 +29,7 @@ config X86_64
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_USE_CMPXCHG_LOCKREF
+ select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 29fea180a6658e8..7a6c36cff8331f5 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;

#define elf_check_arch(x) elf_check_arch_ia32(x)

+/* We can also handle crash dumps from 64 bit kernel. */
+# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
+
/* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
contains a pointer to a function which might be registered using `atexit'.
This provides a mean for the dynamic linker to call DT_FINI functions for
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 3a22e65262aa70b..3ff38a1353a2b86 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -21,6 +21,27 @@
/* 16M alignment for crash kernel regions */
#define CRASH_ALIGN SZ_16M

+/*
+ * Keep the crash kernel below this limit.
+ *
+ * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
+ * due to mapping restrictions.
+ *
+ * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
+ * the upper limit of system RAM in 4-level paging mode. Since the kdump
+ * jump could be from 5-level paging to 4-level paging, the jump will fail if
+ * the kernel is put above 64 TB, and during the 1st kernel bootup there's
+ * no good way to detect the paging mode of the target kernel which will be
+ * loaded for dumping.
+ */
+#ifdef CONFIG_X86_32
+# define CRASH_ADDR_LOW_MAX SZ_512M
+# define CRASH_ADDR_HIGH_MAX SZ_512M
+#else
+# define CRASH_ADDR_LOW_MAX SZ_4G
+# define CRASH_ADDR_HIGH_MAX SZ_64T
+#endif
+
#ifndef __ASSEMBLY__

#include <linux/string.h>
@@ -51,9 +72,6 @@ struct kimage;

/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_386
-
-/* We can also handle crash dumps from 64 bit kernel. */
-# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
#else
/* Maximum physical address we can use pages from */
# define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1)
@@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);

+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 7ae00716a208f82..5519baa7f4b964e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -39,6 +39,7 @@
#include <asm/io_apic.h>
#include <asm/kasan.h>
#include <asm/kaslr.h>
+#include <asm/kexec.h>
#include <asm/mce.h>
#include <asm/mtrr.h>
#include <asm/realmode.h>
@@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
}
}

-/*
- * --------- Crashkernel reservation ------------------------------
- */
-
-#ifdef CONFIG_KEXEC_CORE
-
-/*
- * Keep the crash kernel below this limit.
- *
- * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
- * due to mapping restrictions.
- *
- * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
- * the upper limit of system RAM in 4-level paging mode. Since the kdump
- * jump could be from 5-level paging to 4-level paging, the jump will fail if
- * the kernel is put above 64 TB, and during the 1st kernel bootup there's
- * no good way to detect the paging mode of the target kernel which will be
- * loaded for dumping.
- */
-#ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX SZ_512M
-# define CRASH_ADDR_HIGH_MAX SZ_512M
-#else
-# define CRASH_ADDR_LOW_MAX SZ_4G
-# define CRASH_ADDR_HIGH_MAX SZ_64T
-#endif
-
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
- unsigned long long base, low_base = 0, low_size = 0;
- unsigned long low_mem_limit;
- int ret;
-
- low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
-
- /* crashkernel=Y,low */
- ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
- if (ret) {
- /*
- * two parts from kernel/dma/swiotlb.c:
- * -swiotlb size: user-specified with swiotlb= or default.
- *
- * -swiotlb overflow buffer: now hardcoded to 32k. We round it
- * to 8M for other buffers that may need to stay low too. Also
- * make sure we allocate enough extra low memory so that we
- * don't run out of DMA buffers for 32-bit devices.
- */
- low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
- } else {
- /* passed with crashkernel=0,low ? */
- if (!low_size)
- return 0;
- }
-
- low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_LOW_MAX);
- if (!low_base) {
- pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
- (unsigned long)(low_size >> 20));
- return -ENOMEM;
- }
-
- pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
- (unsigned long)(low_size >> 20),
- (unsigned long)(low_base >> 20),
- (unsigned long)(low_mem_limit >> 20));
-
- crashk_low_res.start = low_base;
- crashk_low_res.end = low_base + low_size - 1;
-#endif
- return 0;
-}
-
-static void __init reserve_crashkernel(void)
-{
- unsigned long long crash_size, crash_base, total_mem;
- bool high = false;
- int ret;
-
- total_mem = memblock_phys_mem_size();
-
- /* crashkernel=XM */
- ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0) {
- /* crashkernel=X,high */
- ret = parse_crashkernel_high(boot_command_line, total_mem,
- &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
- return;
- high = true;
- }
-
- /* 0 means: find the address automatically */
- if (!crash_base) {
- /*
- * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
- * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
- * also allocates 256M extra low memory for DMA buffers
- * and swiotlb.
- * But the extra memory is not required for all machines.
- * So try low memory first and fall back to high memory
- * unless "crashkernel=size[KMG],high" is specified.
- */
- if (!high)
- crash_base = memblock_phys_alloc_range(crash_size,
- CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_LOW_MAX);
- if (!crash_base)
- crash_base = memblock_phys_alloc_range(crash_size,
- CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_HIGH_MAX);
- if (!crash_base) {
- pr_info("crashkernel reservation failed - No suitable area found.\n");
- return;
- }
- } else {
- unsigned long long start;
-
- start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
- crash_base + crash_size);
- if (start != crash_base) {
- pr_info("crashkernel reservation failed - memory is in use.\n");
- return;
- }
- }
-
- if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
- memblock_phys_free(crash_base, crash_size);
- return;
- }
-
- pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
- (unsigned long)(crash_size >> 20),
- (unsigned long)(crash_base >> 20),
- (unsigned long)(total_mem >> 20));
-
- crashk_res.start = crash_base;
- crashk_res.end = crash_base + crash_size - 1;
-}
-#else
+#ifndef CONFIG_KEXEC_CORE
static void __init reserve_crashkernel(void)
{
}
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e7db..f6b99da4ed08ecf 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -73,6 +73,9 @@ extern unsigned char *vmcoreinfo_data;
extern size_t vmcoreinfo_size;
extern u32 *vmcoreinfo_note;

+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
+
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
void *data, size_t data_len);
void final_note(Elf_Word *buf);
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0c994ae37729e1e..cd744d962f6f417 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -352,8 +352,6 @@ extern int kexec_load_disabled;

/* Location of a reserved region to hold the crash kernel.
*/
-extern struct resource crashk_res;
-extern struct resource crashk_low_res;
extern note_buf_t __percpu *crash_notes;

/* flag to track if kexec reboot is in progress */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index eb53f5ec62c900f..b23cfc0ca8905fd 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -8,6 +8,12 @@
#include <linux/crash_core.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/memblock.h>
+#include <linux/swiotlb.h>
+
+#ifdef CONFIG_KEXEC_CORE
+#include <asm/kexec.h>
+#endif

#include <asm/page.h>
#include <asm/sections.h>
@@ -22,6 +28,22 @@ u32 *vmcoreinfo_note;
/* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
static unsigned char *vmcoreinfo_data_safecopy;

+/* Location of the reserved area for the crash kernel */
+struct resource crashk_res = {
+ .name = "Crash kernel",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+ .desc = IORES_DESC_CRASH_KERNEL
+};
+struct resource crashk_low_res = {
+ .name = "Crash kernel",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+ .desc = IORES_DESC_CRASH_KERNEL
+};
+
/*
* parsing the "crashkernel" commandline
*
@@ -295,6 +317,140 @@ int __init parse_crashkernel_low(char *cmdline,
"crashkernel=", suffix_tbl[SUFFIX_LOW]);
}

+/*
+ * --------- Crashkernel reservation ------------------------------
+ */
+
+#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
+static int __init reserve_crashkernel_low(void)
+{
+#ifdef CONFIG_64BIT
+ unsigned long long base, low_base = 0, low_size = 0;
+ unsigned long low_mem_limit;
+ int ret;
+
+ low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
+
+ /* crashkernel=Y,low */
+ ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
+ if (ret) {
+ /*
+ * two parts from kernel/dma/swiotlb.c:
+ * -swiotlb size: user-specified with swiotlb= or default.
+ *
+ * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+ * to 8M for other buffers that may need to stay low too. Also
+ * make sure we allocate enough extra low memory so that we
+ * don't run out of DMA buffers for 32-bit devices.
+ */
+ low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
+ } else {
+ /* passed with crashkernel=0,low ? */
+ if (!low_size)
+ return 0;
+ }
+
+ low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_LOW_MAX);
+ if (!low_base) {
+ pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
+ (unsigned long)(low_size >> 20));
+ return -ENOMEM;
+ }
+
+ pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
+ (unsigned long)(low_size >> 20),
+ (unsigned long)(low_base >> 20),
+ (unsigned long)(low_mem_limit >> 20));
+
+ crashk_low_res.start = low_base;
+ crashk_low_res.end = low_base + low_size - 1;
+#endif
+ return 0;
+}
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_size, crash_base, total_mem;
+ bool high = false;
+ int ret;
+
+ total_mem = memblock_phys_mem_size();
+
+ /* crashkernel=XM */
+ ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
+ if (ret != 0 || crash_size <= 0) {
+ /* crashkernel=X,high */
+ ret = parse_crashkernel_high(boot_command_line, total_mem,
+ &crash_size, &crash_base);
+ if (ret != 0 || crash_size <= 0)
+ return;
+ high = true;
+ }
+
+ /* 0 means: find the address automatically */
+ if (!crash_base) {
+ /*
+ * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
+ * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+ * also allocates 256M extra low memory for DMA buffers
+ * and swiotlb.
+ * But the extra memory is not required for all machines.
+ * So try low memory first and fall back to high memory
+ * unless "crashkernel=size[KMG],high" is specified.
+ */
+ if (!high)
+ crash_base = memblock_phys_alloc_range(crash_size,
+ CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_LOW_MAX);
+ if (!crash_base)
+ crash_base = memblock_phys_alloc_range(crash_size,
+ CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX);
+ if (!crash_base) {
+ pr_info("crashkernel reservation failed - No suitable area found.\n");
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ unsigned long long start;
+
+ if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
+ pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
+ (unsigned long)CRASH_ALIGN >> 20);
+ return;
+ }
+
+ start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
+ crash_base + crash_size);
+ if (start != crash_base) {
+ pr_info("crashkernel reservation failed - memory is in use.\n");
+ return;
+ }
+ }
+
+ if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
+ memblock_phys_free(crash_base, crash_size);
+ return;
+ }
+
+ pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
+ (unsigned long)(crash_size >> 20),
+ (unsigned long)(crash_base >> 20),
+ (unsigned long)(total_mem >> 20));
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
+
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
void *data, size_t data_len)
{
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 5a5d192a89ac307..1e0d4909bbb6b77 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
/* Flag to indicate we are going to kexec a new kernel */
bool kexec_in_progress = false;

-
-/* Location of the reserved area for the crash kernel */
-struct resource crashk_res = {
- .name = "Crash kernel",
- .start = 0,
- .end = 0,
- .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
- .desc = IORES_DESC_CRASH_KERNEL
-};
-struct resource crashk_low_res = {
- .name = "Crash kernel",
- .start = 0,
- .end = 0,
- .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
- .desc = IORES_DESC_CRASH_KERNEL
-};
-
int kexec_should_crash(struct task_struct *p)
{
/*
--
2.25.1


2021-12-10 06:57:19

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()

From: Chen Zhou <[email protected]>

We will make the functions reserve_crashkernel() as generic, the
xen_pv_domain() check in reserve_crashkernel() is relevant only to
x86, the same as insert_resource() in reserve_crashkernel[_low]().
So move xen_pv_domain() check and insert_resource() to setup_arch()
to keep them in x86.

Suggested-by: Mike Rapoport <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/kernel/setup.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bb2a0973b98059e..7ae00716a208f82 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)

crashk_low_res.start = low_base;
crashk_low_res.end = low_base + low_size - 1;
- insert_resource(&iomem_resource, &crashk_low_res);
#endif
return 0;
}
@@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
high = true;
}

- if (xen_pv_domain()) {
- pr_info("Ignoring crashkernel for a Xen PV domain\n");
- return;
- }
-
/* 0 means: find the address automatically */
if (!crash_base) {
/*
@@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)

crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
- insert_resource(&iomem_resource, &crashk_res);
}
#else
static void __init reserve_crashkernel(void)
@@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
* Reserve memory for crash kernel after SRAT is parsed so that it
* won't consume hotpluggable memory.
*/
- reserve_crashkernel();
+ if (xen_pv_domain())
+ pr_info("Ignoring crashkernel for a Xen PV domain\n");
+ else {
+ reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.end > crashk_res.start)
+ insert_resource(&iomem_resource, &crashk_res);
+ if (crashk_low_res.end > crashk_low_res.start)
+ insert_resource(&iomem_resource, &crashk_low_res);
+#endif
+ }

memblock_find_dma_reserve();

--
2.25.1


2021-12-10 07:15:06

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump


On 2021/12/10 14:55, Zhen Lei wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following 10 patches:
>
> 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> 0005 makes functions reserve_crashkernel[_low]() generic.
> 0006-0007 reimplements arm64 crashkernel=X.
> 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> 0010 updates the doc.
>
> Changes since [v16]
> - Because no functional changes in this version, so add
> "Tested-by: Dave Kleikamp <[email protected]>" for patch 1-9
> - Add "Reviewed-by: Rob Herring <[email protected]>" for patch 8
> - Update patch 9 based on the review comments of Rob Herring
> - As Catalin Marinas's suggestion, merge the implementation of
> ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> contents of X86 and ARM64 do not overlap, and reduce unnecessary
> temporary differences.

An Internal review has been done, so for this series,

Reviewed-by: Kefeng Wang <[email protected]>


2021-12-10 16:40:01

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v17 08/10] of: fdt: Aggregate the processing of "linux,usable-memory-range"

On Fri, 10 Dec 2021 14:55:31 +0800, Zhen Lei wrote:
> Currently, we parse the "linux,usable-memory-range" property in
> early_init_dt_scan_chosen(), to obtain the specified memory range of the
> crash kernel. We then reserve the required memory after
> early_init_dt_scan_memory() has identified all available physical memory.
> Because the two pieces of code are separated far, the readability and
> maintainability are reduced. So bring them together.
>
> Suggested-by: Rob Herring <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> ---
> drivers/of/fdt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>

Reviewed-by: Rob Herring <[email protected]>

2021-12-10 16:43:27

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v17 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

On Fri, 10 Dec 2021 14:55:32 +0800, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices and never mapped by the first kernel.
> This memory range is advertised to crash dump kernel via DT property
> under /chosen,
> linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>
>
> We reused the DT property linux,usable-memory-range and made the low
> memory region as the second range "BASE2 SIZE2", which keeps compatibility
> with existing user-space and older kdump kernels.
>
> Crash dump kernel reads this property at boot time and call memblock_add()
> to add the low memory region after memblock_cap_memory_range() has been
> called.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> ---
> drivers/of/fdt.c | 33 +++++++++++++++++++++++----------
> 1 file changed, 23 insertions(+), 10 deletions(-)
>

Reviewed-by: Rob Herring <[email protected]>

2021-12-13 13:17:43

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> Move CRASH_ALIGN to header asm/kexec.h for later use.
>
> Suggested-by: Dave Young <[email protected]>
> Suggested-by: Baoquan He <[email protected]>

I remember Dave and I discussed and suggested this when reviewing.
You can remove my Suggested-by.

For this one, I would like to add ack:

Acked-by: Baoquan He <[email protected]>

> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> ---
> arch/x86/include/asm/kexec.h | 3 +++
> arch/x86/kernel/setup.c | 3 ---
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 11b7c06e2828c30..3a22e65262aa70b 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -18,6 +18,9 @@
>
> # define KEXEC_CONTROL_CODE_MAX_SIZE 2048
>
> +/* 16M alignment for crash kernel regions */
> +#define CRASH_ALIGN SZ_16M
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/string.h>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6a190c7f4d71b05..5cc60996eac56d6 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>
> #ifdef CONFIG_KEXEC_CORE
>
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN SZ_16M
> -
> /*
> * Keep the crash kernel below this limit.
> *
> --
> 2.25.1
>


2021-12-13 13:37:51

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> The lower bounds of crash kernel reservation and crash kernel low
> reservation are different, use the consistent value CRASH_ALIGN.
>
> Suggested-by: Dave Young <[email protected]>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>

You may need add Co-developed-by to clarify who is author, and who is
co-author. Please check section "When to use Acked-by:, Cc:, and Co-developed-by:"
of Documentation/process/submitting-patches.rst. Otherwise,

Acked-by: Baoquan He <[email protected]>

> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> ---
> arch/x86/kernel/setup.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 5cc60996eac56d6..6424ee4f23da2cf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
> return 0;
> }
>
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> if (!low_base) {
> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> (unsigned long)(low_size >> 20));
> --
> 2.25.1
>


2021-12-13 14:27:35

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> Move CRASH_ALIGN to header asm/kexec.h for later use.
>
> Suggested-by: Dave Young <[email protected]>
> Suggested-by: Baoquan He <[email protected]>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
>
Acked-by: John Donnelly <[email protected]>


> ---
> arch/x86/include/asm/kexec.h | 3 +++
> arch/x86/kernel/setup.c | 3 ---
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 11b7c06e2828c30..3a22e65262aa70b 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -18,6 +18,9 @@
>
> # define KEXEC_CONTROL_CODE_MAX_SIZE 2048
>
> +/* 16M alignment for crash kernel regions */
> +#define CRASH_ALIGN SZ_16M
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/string.h>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6a190c7f4d71b05..5cc60996eac56d6 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>
> #ifdef CONFIG_KEXEC_CORE
>
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN SZ_16M
> -
> /*
> * Keep the crash kernel below this limit.
> *


2021-12-13 14:28:10

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> The lower bounds of crash kernel reservation and crash kernel low
> reservation are different, use the consistent value CRASH_ALIGN.
>
> Suggested-by: Dave Young <[email protected]>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> arch/x86/kernel/setup.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 5cc60996eac56d6..6424ee4f23da2cf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
> return 0;
> }
>
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> if (!low_base) {
> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> (unsigned long)(low_size >> 20));


2021-12-13 14:29:30

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> To make the functions reserve_crashkernel() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> Acked-by: Baoquan He <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> arch/x86/kernel/setup.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6424ee4f23da2cf..bb2a0973b98059e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
> if (!crash_base) {
> /*
> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over 4G, also allocates
> - * 256M extra low memory for DMA buffers and swiotlb.
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> * But the extra memory is not required for all machines.
> * So try low memory first and fall back to high memory
> * unless "crashkernel=size[KMG],high" is specified.
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
> }
> }
>
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> memblock_phys_free(crash_base, crash_size);
> return;
> }


2021-12-13 14:29:50

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> We will make the functions reserve_crashkernel() as generic, the
> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> So move xen_pv_domain() check and insert_resource() to setup_arch()
> to keep them in x86.
>
> Suggested-by: Mike Rapoport <[email protected]>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> Acked-by: Baoquan He <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> arch/x86/kernel/setup.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bb2a0973b98059e..7ae00716a208f82 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>
> crashk_low_res.start = low_base;
> crashk_low_res.end = low_base + low_size - 1;
> - insert_resource(&iomem_resource, &crashk_low_res);
> #endif
> return 0;
> }
> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
> high = true;
> }
>
> - if (xen_pv_domain()) {
> - pr_info("Ignoring crashkernel for a Xen PV domain\n");
> - return;
> - }
> -
> /* 0 means: find the address automatically */
> if (!crash_base) {
> /*
> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>
> crashk_res.start = crash_base;
> crashk_res.end = crash_base + crash_size - 1;
> - insert_resource(&iomem_resource, &crashk_res);
> }
> #else
> static void __init reserve_crashkernel(void)
> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
> * Reserve memory for crash kernel after SRAT is parsed so that it
> * won't consume hotpluggable memory.
> */
> - reserve_crashkernel();
> + if (xen_pv_domain())
> + pr_info("Ignoring crashkernel for a Xen PV domain\n");
> + else {
> + reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> + if (crashk_res.end > crashk_res.start)
> + insert_resource(&iomem_resource, &crashk_res);
> + if (crashk_low_res.end > crashk_low_res.start)
> + insert_resource(&iomem_resource, &crashk_low_res);
> +#endif
> + }
>
> memblock_find_dma_reserve();
>


2021-12-13 14:30:51

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> Make the functions reserve_crashkernel[_low]() as generic. Since
> reserve_crashkernel[_low]() implementations are quite similar on other
> architectures as well, we can have more users of this later.
>
> So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
> select this by X86.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>

Acked-by: John Donnelly <[email protected]>
> ---
> arch/Kconfig | 3 +
> arch/x86/Kconfig | 2 +
> arch/x86/include/asm/elf.h | 3 +
> arch/x86/include/asm/kexec.h | 28 ++++++-
> arch/x86/kernel/setup.c | 143 +-------------------------------
> include/linux/crash_core.h | 3 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 156 +++++++++++++++++++++++++++++++++++
> kernel/kexec_core.c | 17 ----
> 9 files changed, 194 insertions(+), 163 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c275..7bdb32c41985dc5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -24,6 +24,9 @@ config KEXEC_ELF
> config HAVE_IMA_KEXEC
> bool
>
> +config ARCH_WANT_RESERVE_CRASH_KERNEL
> + bool
> +
> config SET_FS
> bool
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5c2ccb85f2efb86..bd78ed8193079b9 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -12,6 +12,7 @@ config X86_32
> depends on !64BIT
> # Options that are inherently 32-bit kernel only:
> select ARCH_WANT_IPC_PARSE_VERSION
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
> select CLKSRC_I8253
> select CLONE_BACKWARDS
> select GENERIC_VDSO_32
> @@ -28,6 +29,7 @@ config X86_64
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
> select ARCH_USE_CMPXCHG_LOCKREF
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
> select HAVE_ARCH_SOFT_DIRTY
> select MODULES_USE_ELF_RELA
> select NEED_DMA_MAP_STATE
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 29fea180a6658e8..7a6c36cff8331f5 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
>
> #define elf_check_arch(x) elf_check_arch_ia32(x)
>
> +/* We can also handle crash dumps from 64 bit kernel. */
> +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
> +
> /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
> contains a pointer to a function which might be registered using `atexit'.
> This provides a mean for the dynamic linker to call DT_FINI functions for
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 3a22e65262aa70b..3ff38a1353a2b86 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -21,6 +21,27 @@
> /* 16M alignment for crash kernel regions */
> #define CRASH_ALIGN SZ_16M
>
> +/*
> + * Keep the crash kernel below this limit.
> + *
> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> + * due to mapping restrictions.
> + *
> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
> + * jump could be from 5-level paging to 4-level paging, the jump will fail if
> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> + * no good way to detect the paging mode of the target kernel which will be
> + * loaded for dumping.
> + */
> +#ifdef CONFIG_X86_32
> +# define CRASH_ADDR_LOW_MAX SZ_512M
> +# define CRASH_ADDR_HIGH_MAX SZ_512M
> +#else
> +# define CRASH_ADDR_LOW_MAX SZ_4G
> +# define CRASH_ADDR_HIGH_MAX SZ_64T
> +#endif
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/string.h>
> @@ -51,9 +72,6 @@ struct kimage;
>
> /* The native architecture */
> # define KEXEC_ARCH KEXEC_ARCH_386
> -
> -/* We can also handle crash dumps from 64 bit kernel. */
> -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
> #else
> /* Maximum physical address we can use pages from */
> # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1)
> @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
> extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> extern void kdump_nmi_shootdown_cpus(void);
>
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 7ae00716a208f82..5519baa7f4b964e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -39,6 +39,7 @@
> #include <asm/io_apic.h>
> #include <asm/kasan.h>
> #include <asm/kaslr.h>
> +#include <asm/kexec.h>
> #include <asm/mce.h>
> #include <asm/mtrr.h>
> #include <asm/realmode.h>
> @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
> }
> }
>
> -/*
> - * --------- Crashkernel reservation ------------------------------
> - */
> -
> -#ifdef CONFIG_KEXEC_CORE
> -
> -/*
> - * Keep the crash kernel below this limit.
> - *
> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> - * due to mapping restrictions.
> - *
> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
> - * jump could be from 5-level paging to 4-level paging, the jump will fail if
> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> - * no good way to detect the paging mode of the target kernel which will be
> - * loaded for dumping.
> - */
> -#ifdef CONFIG_X86_32
> -# define CRASH_ADDR_LOW_MAX SZ_512M
> -# define CRASH_ADDR_HIGH_MAX SZ_512M
> -#else
> -# define CRASH_ADDR_LOW_MAX SZ_4G
> -# define CRASH_ADDR_HIGH_MAX SZ_64T
> -#endif
> -
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> - unsigned long long base, low_base = 0, low_size = 0;
> - unsigned long low_mem_limit;
> - int ret;
> -
> - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> -
> - /* crashkernel=Y,low */
> - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> - if (ret) {
> - /*
> - * two parts from kernel/dma/swiotlb.c:
> - * -swiotlb size: user-specified with swiotlb= or default.
> - *
> - * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> - * to 8M for other buffers that may need to stay low too. Also
> - * make sure we allocate enough extra low memory so that we
> - * don't run out of DMA buffers for 32-bit devices.
> - */
> - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> - } else {
> - /* passed with crashkernel=0,low ? */
> - if (!low_size)
> - return 0;
> - }
> -
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_LOW_MAX);
> - if (!low_base) {
> - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> - (unsigned long)(low_size >> 20));
> - return -ENOMEM;
> - }
> -
> - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> - (unsigned long)(low_size >> 20),
> - (unsigned long)(low_base >> 20),
> - (unsigned long)(low_mem_limit >> 20));
> -
> - crashk_low_res.start = low_base;
> - crashk_low_res.end = low_base + low_size - 1;
> -#endif
> - return 0;
> -}
> -
> -static void __init reserve_crashkernel(void)
> -{
> - unsigned long long crash_size, crash_base, total_mem;
> - bool high = false;
> - int ret;
> -
> - total_mem = memblock_phys_mem_size();
> -
> - /* crashkernel=XM */
> - ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> - if (ret != 0 || crash_size <= 0) {
> - /* crashkernel=X,high */
> - ret = parse_crashkernel_high(boot_command_line, total_mem,
> - &crash_size, &crash_base);
> - if (ret != 0 || crash_size <= 0)
> - return;
> - high = true;
> - }
> -
> - /* 0 means: find the address automatically */
> - if (!crash_base) {
> - /*
> - * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> - * also allocates 256M extra low memory for DMA buffers
> - * and swiotlb.
> - * But the extra memory is not required for all machines.
> - * So try low memory first and fall back to high memory
> - * unless "crashkernel=size[KMG],high" is specified.
> - */
> - if (!high)
> - crash_base = memblock_phys_alloc_range(crash_size,
> - CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_LOW_MAX);
> - if (!crash_base)
> - crash_base = memblock_phys_alloc_range(crash_size,
> - CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_HIGH_MAX);
> - if (!crash_base) {
> - pr_info("crashkernel reservation failed - No suitable area found.\n");
> - return;
> - }
> - } else {
> - unsigned long long start;
> -
> - start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> - crash_base + crash_size);
> - if (start != crash_base) {
> - pr_info("crashkernel reservation failed - memory is in use.\n");
> - return;
> - }
> - }
> -
> - if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> - memblock_phys_free(crash_base, crash_size);
> - return;
> - }
> -
> - pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> - (unsigned long)(crash_size >> 20),
> - (unsigned long)(crash_base >> 20),
> - (unsigned long)(total_mem >> 20));
> -
> - crashk_res.start = crash_base;
> - crashk_res.end = crash_base + crash_size - 1;
> -}
> -#else
> +#ifndef CONFIG_KEXEC_CORE
> static void __init reserve_crashkernel(void)
> {
> }
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e7db..f6b99da4ed08ecf 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -73,6 +73,9 @@ extern unsigned char *vmcoreinfo_data;
> extern size_t vmcoreinfo_size;
> extern u32 *vmcoreinfo_note;
>
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
> +
> Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len);
> void final_note(Elf_Word *buf);
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729e1e..cd744d962f6f417 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -352,8 +352,6 @@ extern int kexec_load_disabled;
>
> /* Location of a reserved region to hold the crash kernel.
> */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
> extern note_buf_t __percpu *crash_notes;
>
> /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index eb53f5ec62c900f..b23cfc0ca8905fd 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -8,6 +8,12 @@
> #include <linux/crash_core.h>
> #include <linux/utsname.h>
> #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
> +
> +#ifdef CONFIG_KEXEC_CORE
> +#include <asm/kexec.h>
> +#endif
>
> #include <asm/page.h>
> #include <asm/sections.h>
> @@ -22,6 +28,22 @@ u32 *vmcoreinfo_note;
> /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
> static unsigned char *vmcoreinfo_data_safecopy;
>
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> + .name = "Crash kernel",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> + .name = "Crash kernel",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_CRASH_KERNEL
> +};
> +
> /*
> * parsing the "crashkernel" commandline
> *
> @@ -295,6 +317,140 @@ int __init parse_crashkernel_low(char *cmdline,
> "crashkernel=", suffix_tbl[SUFFIX_LOW]);
> }
>
> +/*
> + * --------- Crashkernel reservation ------------------------------
> + */
> +
> +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
> +static int __init reserve_crashkernel_low(void)
> +{
> +#ifdef CONFIG_64BIT
> + unsigned long long base, low_base = 0, low_size = 0;
> + unsigned long low_mem_limit;
> + int ret;
> +
> + low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> +
> + /* crashkernel=Y,low */
> + ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> + if (ret) {
> + /*
> + * two parts from kernel/dma/swiotlb.c:
> + * -swiotlb size: user-specified with swiotlb= or default.
> + *
> + * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> + * to 8M for other buffers that may need to stay low too. Also
> + * make sure we allocate enough extra low memory so that we
> + * don't run out of DMA buffers for 32-bit devices.
> + */
> + low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> + } else {
> + /* passed with crashkernel=0,low ? */
> + if (!low_size)
> + return 0;
> + }
> +
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> + if (!low_base) {
> + pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> + (unsigned long)(low_size >> 20));
> + return -ENOMEM;
> + }
> +
> + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> + (unsigned long)(low_size >> 20),
> + (unsigned long)(low_base >> 20),
> + (unsigned long)(low_mem_limit >> 20));
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end = low_base + low_size - 1;
> +#endif
> + return 0;
> +}
> +
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)
> +{
> + unsigned long long crash_size, crash_base, total_mem;
> + bool high = false;
> + int ret;
> +
> + total_mem = memblock_phys_mem_size();
> +
> + /* crashkernel=XM */
> + ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> + if (ret != 0 || crash_size <= 0) {
> + /* crashkernel=X,high */
> + ret = parse_crashkernel_high(boot_command_line, total_mem,
> + &crash_size, &crash_base);
> + if (ret != 0 || crash_size <= 0)
> + return;
> + high = true;
> + }
> +
> + /* 0 means: find the address automatically */
> + if (!crash_base) {
> + /*
> + * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> + * But the extra memory is not required for all machines.
> + * So try low memory first and fall back to high memory
> + * unless "crashkernel=size[KMG],high" is specified.
> + */
> + if (!high)
> + crash_base = memblock_phys_alloc_range(crash_size,
> + CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> + if (!crash_base)
> + crash_base = memblock_phys_alloc_range(crash_size,
> + CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX);
> + if (!crash_base) {
> + pr_info("crashkernel reservation failed - No suitable area found.\n");
> + return;
> + }
> + } else {
> + /* User specifies base address explicitly. */
> + unsigned long long start;
> +
> + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
> + pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
> + (unsigned long)CRASH_ALIGN >> 20);
> + return;
> + }
> +
> + start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> + crash_base + crash_size);
> + if (start != crash_base) {
> + pr_info("crashkernel reservation failed - memory is in use.\n");
> + return;
> + }
> + }
> +
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> + memblock_phys_free(crash_base, crash_size);
> + return;
> + }
> +
> + pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> + (unsigned long)(crash_size >> 20),
> + (unsigned long)(crash_base >> 20),
> + (unsigned long)(total_mem >> 20));
> +
> + crashk_res.start = crash_base;
> + crashk_res.end = crash_base + crash_size - 1;
> +}
> +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
> +
> Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len)
> {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5a5d192a89ac307..1e0d4909bbb6b77 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
> /* Flag to indicate we are going to kexec a new kernel */
> bool kexec_in_progress = false;
>
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> - .name = "Crash kernel",
> - .start = 0,
> - .end = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> - .name = "Crash kernel",
> - .start = 0,
> - .end = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc = IORES_DESC_CRASH_KERNEL
> -};
> -
> int kexec_should_crash(struct task_struct *p)
> {
> /*


2021-12-13 14:31:45

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 06/10] arm64: kdump: introduce some macros for crash kernel reservation

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
> for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
> upper bound of high crash memory, use macros instead.
>
> Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
> of crash kernel reservation.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> arch/arm64/include/asm/kexec.h | 6 ++++++
> arch/arm64/mm/init.c | 4 ++--
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 9839bfc163d7147..1b9edc69f0244ca 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -25,6 +25,12 @@
>
> #define KEXEC_ARCH KEXEC_ARCH_AARCH64
>
> +/* 2M alignment for crash kernel regions */
> +#define CRASH_ALIGN SZ_2M
> +
> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
> +#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
> +
> #ifndef __ASSEMBLY__
>
> /**
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a8834434af99ae0..be4595dc7459115 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
> static void __init reserve_crashkernel(void)
> {
> unsigned long long crash_base, crash_size;
> - unsigned long long crash_max = arm64_dma_phys_limit;
> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> int ret;
>
> ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void)
> crash_max = crash_base + crash_size;
>
> /* Current arm64 boot protocol requires 2MB alignment */
> - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> crash_base, crash_max);
> if (!crash_base) {
> pr_warn("cannot allocate crashkernel (size:0x%llx)\n",


2021-12-13 14:32:51

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 07/10] arm64: kdump: reimplement crashkernel=X

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
>
> To solve these issues, change the behavior of crashkernel=X and
> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
> in DMA zone, and fall back to high allocation if it fails.
> We can also use "crashkernel=X,high" to select a region above DMA zone,
> which also tries to allocate at least 256M in DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
>
> Another minor change, there may be two regions reserved for crash
> dump kernel, in order to distinct from the high region and make no
> effect to the use of existing kexec-tools, rename the low region as
> "Crash kernel (low)".
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/kexec.h | 4 ++
> arch/arm64/kernel/machine_kexec_file.c | 12 +++++-
> arch/arm64/kernel/setup.c | 13 +++++-
> arch/arm64/mm/init.c | 59 +++++---------------------
> 5 files changed, 38 insertions(+), 51 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c4207cf9bb17ffb..4b99efa36da3793 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -95,6 +95,7 @@ config ARM64
> select ARCH_WANT_FRAME_POINTERS
> select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> select ARCH_WANT_LD_ORPHAN_WARN
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
> select ARCH_WANTS_NO_INSTR
> select ARCH_HAS_UBSAN_SANITIZE_ALL
> select ARM_AMBA
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 1b9edc69f0244ca..3bde0079925d771 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
> static inline void crash_post_resume(void) {}
> #endif
>
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
> #if defined(CONFIG_KEXEC_CORE)
> void cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
> unsigned long arg0, unsigned long arg1,
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 63634b4d72c158f..6f3fa059ca4e816 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>
> /* Exclude crashkernel region */
> ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
> + if (ret)
> + goto out;
> +
> + if (crashk_low_res.end) {
> + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
> + if (ret)
> + goto out;
> + }
>
> - if (!ret)
> - ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
> + ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
>
> +out:
> kfree(cmem);
> return ret;
> }
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index be5f85b0a24de69..4bb2e55366be64d 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -248,7 +248,18 @@ static void __init request_standard_resources(void)
> kernel_data.end <= res->end)
> request_resource(res, &kernel_data);
> #ifdef CONFIG_KEXEC_CORE
> - /* Userspace will find "Crash kernel" region in /proc/iomem. */
> + /*
> + * Userspace will find "Crash kernel" or "Crash kernel (low)"
> + * region in /proc/iomem.
> + * In order to distinct from the high region and make no effect
> + * to the use of existing kexec-tools, rename the low region as
> + * "Crash kernel (low)".
> + */
> + if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> + crashk_low_res.end <= res->end) {
> + crashk_low_res.name = "Crash kernel (low)";
> + request_resource(res, &crashk_low_res);
> + }
> if (crashk_res.end && crashk_res.start >= res->start &&
> crashk_res.end <= res->end)
> request_resource(res, &crashk_res);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index be4595dc7459115..85c83e4eff2b6c4 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -36,6 +36,7 @@
> #include <asm/fixmap.h>
> #include <asm/kasan.h>
> #include <asm/kernel-pgtable.h>
> +#include <asm/kexec.h>
> #include <asm/kvm_host.h>
> #include <asm/memory.h>
> #include <asm/numa.h>
> @@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr);
> */
> phys_addr_t arm64_dma_phys_limit __ro_after_init;
>
> -#ifdef CONFIG_KEXEC_CORE
> -/*
> - * reserve_crashkernel() - reserves memory for crash kernel
> - *
> - * This function reserves memory area given in "crashkernel=" kernel command
> - * line parameter. The memory reserved is used by dump capture kernel when
> - * primary kernel is crashing.
> - */
> +#ifndef CONFIG_KEXEC_CORE
> static void __init reserve_crashkernel(void)
> {
> - unsigned long long crash_base, crash_size;
> - unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> - int ret;
> -
> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> - &crash_size, &crash_base);
> - /* no crashkernel= or invalid value specified */
> - if (ret || !crash_size)
> - return;
> -
> - crash_size = PAGE_ALIGN(crash_size);
> -
> - /* User specifies base address explicitly. */
> - if (crash_base)
> - crash_max = crash_base + crash_size;
> -
> - /* Current arm64 boot protocol requires 2MB alignment */
> - crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> - crash_base, crash_max);
> - if (!crash_base) {
> - pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> - crash_size);
> - return;
> - }
> -
> - pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
> - crash_base, crash_base + crash_size, crash_size >> 20);
> -
> - /*
> - * The crashkernel memory will be removed from the kernel linear
> - * map. Inform kmemleak so that it won't try to access it.
> - */
> - kmemleak_ignore_phys(crash_base);
> - crashk_res.start = crash_base;
> - crashk_res.end = crash_base + crash_size - 1;
> }
> -#else
> -static void __init reserve_crashkernel(void)
> -{
> -}
> -#endif /* CONFIG_KEXEC_CORE */
> +#endif
>
> /*
> * Return the maximum physical address for a zone accessible by the given bits
> @@ -362,6 +317,14 @@ void __init bootmem_init(void)
> * reserved, so do it here.
> */
> reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> + /*
> + * The low region is intended to be used for crash dump kernel devices,
> + * just mark the low region as "nomap" simply.
> + */
> + if (crashk_low_res.end)
> + memblock_mark_nomap(crashk_low_res.start, resource_size(&crashk_low_res));
> +#endif
>
> memblock_dump_all();
> }


2021-12-13 14:33:49

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices and never mapped by the first kernel.
> This memory range is advertised to crash dump kernel via DT property
> under /chosen,
> linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>
>
> We reused the DT property linux,usable-memory-range and made the low
> memory region as the second range "BASE2 SIZE2", which keeps compatibility
> with existing user-space and older kdump kernels.
>
> Crash dump kernel reads this property at boot time and call memblock_add()
> to add the low memory region after memblock_cap_memory_range() has been
> called.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> drivers/of/fdt.c | 33 +++++++++++++++++++++++----------
> 1 file changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 37b477a51175359..f7b72fa773250ad 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
>
> static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
>
> +/*
> + * The main usage of linux,usable-memory-range is for crash dump kernel.
> + * Originally, the number of usable-memory regions is one. Now there may
> + * be two regions, low region and high region.
> + * To make compatibility with existing user-space and older kdump, the low
> + * region is always the last range of linux,usable-memory-range if exist.
> + */
> +#define MAX_USABLE_RANGES 2
> +
> /**
> * early_init_dt_check_for_usable_mem_range - Decode usable memory range
> * location from flat tree
> @@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
> */
> static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
> {
> - const __be32 *prop;
> - int len;
> - phys_addr_t cap_mem_addr;
> - phys_addr_t cap_mem_size;
> + struct memblock_region rgn[MAX_USABLE_RANGES] = {0};
> + const __be32 *prop, *endp;
> + int len, i;
>
> if ((long)node < 0)
> return;
> @@ -985,16 +993,21 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
> pr_debug("Looking for usable-memory-range property... ");
>
> prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> - if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
> + if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells)))
> return;
>
> - cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);
> - cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
> + endp = prop + (len / sizeof(__be32));
> + for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) {
> + rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop);
> + rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop);
>
> - pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
> - &cap_mem_size);
> + pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
> + i, &rgn[i].base, &rgn[i].size);
> + }
>
> - memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> + memblock_cap_memory_range(rgn[0].base, rgn[0].size);
> + for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++)
> + memblock_add(rgn[i].base, rgn[i].size);
> }
>
> #ifdef CONFIG_SERIAL_EARLYCON


2021-12-13 14:34:52

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 08/10] of: fdt: Aggregate the processing of "linux,usable-memory-range"

On 12/10/21 12:55 AM, Zhen Lei wrote:
> Currently, we parse the "linux,usable-memory-range" property in
> early_init_dt_scan_chosen(), to obtain the specified memory range of the
> crash kernel. We then reserve the required memory after
> early_init_dt_scan_memory() has identified all available physical memory.
> Because the two pieces of code are separated far, the readability and
> maintainability are reduced. So bring them together.
>
> Suggested-by: Rob Herring <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> drivers/of/fdt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index bdca35284cebd56..37b477a51175359 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
> elfcorehdr_addr, elfcorehdr_size);
> }
>
> -static phys_addr_t cap_mem_addr;
> -static phys_addr_t cap_mem_size;
> +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
>
> /**
> * early_init_dt_check_for_usable_mem_range - Decode usable memory range
> @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
> {
> const __be32 *prop;
> int len;
> + phys_addr_t cap_mem_addr;
> + phys_addr_t cap_mem_size;
> +
> + if ((long)node < 0)
> + return;
>
> pr_debug("Looking for usable-memory-range property... ");
>
> @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
>
> pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
> &cap_mem_size);
> +
> + memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> }
>
> #ifdef CONFIG_SERIAL_EARLYCON
> @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
> (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
> return 0;
>
> + chosen_node_offset = node;
> +
> early_init_dt_check_for_initrd(node);
> early_init_dt_check_for_elfcorehdr(node);
> - early_init_dt_check_for_usable_mem_range(node);
>
> /* Retrieve command line */
> p = of_get_flat_dt_prop(node, "bootargs", &l);
> @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void)
> of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>
> /* Handle linux,usable-memory-range property */
> - memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> + early_init_dt_check_for_usable_mem_range(chosen_node_offset);
> }
>
> bool __init early_init_dt_scan(void *params)


2021-12-13 14:35:33

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 10/10] kdump: update Documentation about crashkernel

On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> For arm64, the behavior of crashkernel=X has been changed, which
> tries low allocation in DMA zone and fall back to high allocation
> if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
>
> So update the Documentation.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>

Acked-by: John Donnelly <[email protected]>

> ---
> Documentation/admin-guide/kdump/kdump.rst | 11 +++++++++--
> Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++--
> 2 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
> index cb30ca3df27c9b2..d4c287044be0c70 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -361,8 +361,15 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>
> - On arm64, use "crashkernel=Y[@X]". Note that the start address of
> - the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
> + On arm64, use "crashkernel=X" to try low allocation in DMA zone and
> + fall back to high allocation if it fails.
> + We can also use "crashkernel=X,high" to select a high region above
> + DMA zone, which also tries to allocate at least 256M low memory in
> + DMA zone automatically.
> + "crashkernel=Y,low" can be used to allocate specified size low memory.
> + Use "crashkernel=Y@X" if you really have to reserve memory from
> + specified start address X. Note that the start address of the kernel,
> + X if explicitly specified, must be aligned to 2MiB (0x200000).
>
> Load the Dump-capture Kernel
> ============================
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9725c546a0d46db..91f3a8dc537d404 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -783,6 +783,9 @@
> [KNL, X86-64] Select a region under 4G first, and
> fall back to reserve region above 4G when '@offset'
> hasn't been specified.
> + [KNL, ARM64] Try low allocation in DMA zone and fall back
> + to high allocation if it fails when '@offset' hasn't been
> + specified.
> See Documentation/admin-guide/kdump/kdump.rst for further details.
>
> crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -799,6 +802,8 @@
> Otherwise memory region will be allocated below 4G, if
> available.
> It will be ignored if crashkernel=X is specified.
> + [KNL, ARM64] range in high memory.
> + Allow kernel to allocate physical memory region from top.
> crashkernel=size[KMG],low
> [KNL, X86-64] range under 4G. When crashkernel=X,high
> is passed, kernel could allocate physical memory region
> @@ -807,13 +812,15 @@
> requires at least 64M+32K low memory, also enough extra
> low memory is needed to make sure DMA buffers for 32-bit
> devices won't run out. Kernel would try to allocate at
> - at least 256M below 4G automatically.
> + least 256M below 4G automatically.
> This one let user to specify own low range under 4G
> for second kernel instead.
> 0: to disable low allocation.
> It will be ignored when crashkernel=X,high is not used
> or memory reserved is below 4G.
> -
> + [KNL, ARM64] range in low memory.
> + This one let user to specify a low range in DMA zone for
> + crash dump kernel.
> cryptomgr.notests
> [KNL] Disable crypto self-tests
>


2021-12-13 14:38:38

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

On 12/10/21 12:55 AM, Zhen Lei wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following 10 patches:
>
> 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> 0005 makes functions reserve_crashkernel[_low]() generic.
> 0006-0007 reimplements arm64 crashkernel=X.
> 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> 0010 updates the doc.
>
> Changes since [v16]
> - Because no functional changes in this version, so add
> "Tested-by: Dave Kleikamp <[email protected]>" for patch 1-9
> - Add "Reviewed-by: Rob Herring <[email protected]>" for patch 8
> - Update patch 9 based on the review comments of Rob Herring
> - As Catalin Marinas's suggestion, merge the implementation of
> ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> contents of X86 and ARM64 do not overlap, and reduce unnecessary
> temporary differences.
>
> Changes since [v15]
> - Aggregate the processing of "linux,usable-memory-range" into one function.
> Only patch 9-10 have been updated.
>
> Changes since [v14]
> - Recovering the requirement that the CrashKernel memory regions on X86
> only requires 1 MiB alignment.
> - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
> by patch 6 was introduced by patch 5 in v14.
> - As with crashk_res, crashk_low_res is also processed by
> crash_exclude_mem_range() in patch 7.
> - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
> has removed the architecture-specific code, extend the property "linux,usable-memory-range"
> in the platform-agnostic FDT core code. See patch 9.
> - Discard the x86 description update in the document, because the description
> has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> - Change "arm64" to "ARM64" in Doc.
>
>
> Changes since [v13]
> - Rebased on top of 5.11-rc5.
> - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> Since reserve_crashkernel[_low]() implementations are quite similar on
> other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> arch/Kconfig and select this by X86 and ARM64.
> - Some minor cleanup.
>
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
> [v12]: https://lkml.org/lkml/2020/9/7/1037
> [v13]: https://lkml.org/lkml/2020/10/31/34
> [v14]: https://lkml.org/lkml/2021/1/30/53
> [v15]: https://lkml.org/lkml/2021/10/19/1405
> [v16]: https://lkml.org/lkml/2021/11/23/435
>
>
> Chen Zhou (9):
> x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> x86: kdump: make the lower bound of crash kernel reservation
> consistent
> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> reserve_crashkernel()
> x86: kdump: move xen_pv_domain() check and insert_resource() to
> setup_arch()
> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> arm64: kdump: introduce some macros for crash kernel reservation
> arm64: kdump: reimplement crashkernel=X
> of: fdt: Add memory for devices by DT property
> "linux,usable-memory-range"
> kdump: update Documentation about crashkernel
>
> Zhen Lei (1):
> of: fdt: Aggregate the processing of "linux,usable-memory-range"
>
> Documentation/admin-guide/kdump/kdump.rst | 11 +-
> .../admin-guide/kernel-parameters.txt | 11 +-
> arch/Kconfig | 3 +
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/kexec.h | 10 ++
> arch/arm64/kernel/machine_kexec_file.c | 12 +-
> arch/arm64/kernel/setup.c | 13 +-
> arch/arm64/mm/init.c | 59 ++-----
> arch/x86/Kconfig | 2 +
> arch/x86/include/asm/elf.h | 3 +
> arch/x86/include/asm/kexec.h | 31 +++-
> arch/x86/kernel/setup.c | 163 ++----------------
> drivers/of/fdt.c | 42 +++--
> include/linux/crash_core.h | 3 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 156 +++++++++++++++++
> kernel/kexec_core.c | 17 --
> 17 files changed, 301 insertions(+), 238 deletions(-)
>


Hello ,

After 2 years, and 17 versions, can we now get this series promoted into
a build ?


Thank you,

JD

2021-12-13 18:51:07

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

On Fri, Dec 10, 2021 at 03:15:00PM +0800, Kefeng Wang wrote:
>
> On 2021/12/10 14:55, Zhen Lei wrote:
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > will fail when there is no enough low memory.
> > 2. If reserving crashkernel above 4G, in this case, crash dump
> > kernel will boot failure because there is no low memory available
> > for allocation.
> >
> > To solve these issues, change the behavior of crashkernel=X.
> > crashkernel=X tries low allocation in DMA zone and fall back to high
> > allocation if it fails.
> >
> > We can also use "crashkernel=X,high" to select a high region above
> > DMA zone, which also tries to allocate at least 256M low memory in
> > DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> > specified size low memory.
> >
> > When reserving crashkernel in high memory, some low memory is reserved
> > for crash dump kernel devices. So there may be two regions reserved for
> > crash dump kernel.
> > In order to distinct from the high region and make no effect to the use
> > of existing kexec-tools, rename the low region as "Crash kernel (low)",
> > and pass the low region by reusing DT property
> > "linux,usable-memory-range". We made the low memory region as the last
> > range of "linux,usable-memory-range" to keep compatibility with existing
> > user-space and older kdump kernels.
> >
> > Besides, we need to modify kexec-tools:
> > arm64: support more than one crash kernel regions(see [1])
> >
> > Another update is document about DT property 'linux,usable-memory-range':
> > schemas: update 'linux,usable-memory-range' node schema(see [2])
> >
> > This patchset contains the following 10 patches:
> >
> > 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> > 0005 makes functions reserve_crashkernel[_low]() generic.
> > 0006-0007 reimplements arm64 crashkernel=X.
> > 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> > 0010 updates the doc.
> >
> > Changes since [v16]
> > - Because no functional changes in this version, so add
> > "Tested-by: Dave Kleikamp <[email protected]>" for patch 1-9
> > - Add "Reviewed-by: Rob Herring <[email protected]>" for patch 8
> > - Update patch 9 based on the review comments of Rob Herring
> > - As Catalin Marinas's suggestion, merge the implementation of
> > ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> > contents of X86 and ARM64 do not overlap, and reduce unnecessary
> > temporary differences.
>
> An Internal review has been done, so for this series,
>
> Reviewed-by: Kefeng Wang <[email protected]>

That's good, but it would be _much_ better if you could do these reviews
on the public mailing list in future. Is that possible? Otherwise, it's
hard for maintainers to know what the reviews actually covered.

Thanks,

Will

2021-12-13 18:57:09

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

On Mon, Dec 13, 2021 at 08:37:48AM -0600, [email protected] wrote:
> On 12/10/21 12:55 AM, Zhen Lei wrote:
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > will fail when there is no enough low memory.
> > 2. If reserving crashkernel above 4G, in this case, crash dump
> > kernel will boot failure because there is no low memory available
> > for allocation.
> >
> > To solve these issues, change the behavior of crashkernel=X.
> > crashkernel=X tries low allocation in DMA zone and fall back to high
> > allocation if it fails.
> >
> > We can also use "crashkernel=X,high" to select a high region above
> > DMA zone, which also tries to allocate at least 256M low memory in
> > DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> > specified size low memory.
> >
> > When reserving crashkernel in high memory, some low memory is reserved
> > for crash dump kernel devices. So there may be two regions reserved for
> > crash dump kernel.
> > In order to distinct from the high region and make no effect to the use
> > of existing kexec-tools, rename the low region as "Crash kernel (low)",
> > and pass the low region by reusing DT property
> > "linux,usable-memory-range". We made the low memory region as the last
> > range of "linux,usable-memory-range" to keep compatibility with existing
> > user-space and older kdump kernels.
> >
> > Besides, we need to modify kexec-tools:
> > arm64: support more than one crash kernel regions(see [1])
> >
> > Another update is document about DT property 'linux,usable-memory-range':
> > schemas: update 'linux,usable-memory-range' node schema(see [2])
> >
> > This patchset contains the following 10 patches:
> >
> > 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> > 0005 makes functions reserve_crashkernel[_low]() generic.
> > 0006-0007 reimplements arm64 crashkernel=X.
> > 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> > 0010 updates the doc.
> >
> > Changes since [v16]
> > - Because no functional changes in this version, so add
> > "Tested-by: Dave Kleikamp <[email protected]>" for patch 1-9
> > - Add "Reviewed-by: Rob Herring <[email protected]>" for patch 8
> > - Update patch 9 based on the review comments of Rob Herring
> > - As Catalin Marinas's suggestion, merge the implementation of
> > ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> > contents of X86 and ARM64 do not overlap, and reduce unnecessary
> > temporary differences.
> >
> > Changes since [v15]
> > - Aggregate the processing of "linux,usable-memory-range" into one function.
> > Only patch 9-10 have been updated.
> >
> > Changes since [v14]
> > - Recovering the requirement that the CrashKernel memory regions on X86
> > only requires 1 MiB alignment.
> > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
> > by patch 6 was introduced by patch 5 in v14.
> > - As with crashk_res, crashk_low_res is also processed by
> > crash_exclude_mem_range() in patch 7.
> > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
> > has removed the architecture-specific code, extend the property "linux,usable-memory-range"
> > in the platform-agnostic FDT core code. See patch 9.
> > - Discard the x86 description update in the document, because the description
> > has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> > - Change "arm64" to "ARM64" in Doc.
> >
> >
> > Changes since [v13]
> > - Rebased on top of 5.11-rc5.
> > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> > Since reserve_crashkernel[_low]() implementations are quite similar on
> > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> > arch/Kconfig and select this by X86 and ARM64.
> > - Some minor cleanup.
> >
> > Changes since [v12]
> > - Rebased on top of 5.10-rc1.
> > - Keep CRASH_ALIGN as 16M suggested by Dave.
> > - Drop patch "kdump: add threshold for the required memory".
> > - Add Tested-by from John.
> >
> > Changes since [v11]
> > - Rebased on top of 5.9-rc4.
> > - Make the function reserve_crashkernel() of x86 generic.
> > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> > and arm64 use the generic version to reimplement crashkernel=X.
> >
> > Changes since [v10]
> > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> >
> > Changes since [v9]
> > - Patch 1 add Acked-by from Dave.
> > - Update patch 5 according to Dave's comments.
> > - Update chosen schema.
> >
> > Changes since [v8]
> > - Reuse DT property "linux,usable-memory-range".
> > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> > memory region.
> > - Fix kdump broken with ZONE_DMA reintroduced.
> > - Update chosen schema.
> >
> > Changes since [v7]
> > - Move x86 CRASH_ALIGN to 2M
> > Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> > - Update Documentation/devicetree/bindings/chosen.txt.
> > Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> > suggested by Arnd.
> > - Add Tested-by from Jhon and pk.
> >
> > Changes since [v6]
> > - Fix build errors reported by kbuild test robot.
> >
> > Changes since [v5]
> > - Move reserve_crashkernel_low() into kernel/crash_core.c.
> > - Delete crashkernel=X,high.
> > - Modify crashkernel=X,low.
> > If crashkernel=X,low is specified simultaneously, reserve spcified size low
> > memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> > In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> > pass to crash dump kernel by DT property "linux,low-memory-range".
> > - Update Documentation/admin-guide/kdump/kdump.rst.
> >
> > Changes since [v4]
> > - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> >
> > Changes since [v3]
> > - Add memblock_cap_memory_ranges back for multiple ranges.
> > - Fix some compiling warnings.
> >
> > Changes since [v2]
> > - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> > two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> > patch.
> >
> > Changes since [v1]:
> > - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> > - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> > in fdt_enforce_memory_region().
> > There are at most two crash kernel regions, for two crash kernel regions
> > case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> > and then remove the memory range in the middle.
> >
> > [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> > [2]: https://github.com/robherring/dt-schema/pull/19
> > [v1]: https://lkml.org/lkml/2019/4/2/1174
> > [v2]: https://lkml.org/lkml/2019/4/9/86
> > [v3]: https://lkml.org/lkml/2019/4/9/306
> > [v4]: https://lkml.org/lkml/2019/4/15/273
> > [v5]: https://lkml.org/lkml/2019/5/6/1360
> > [v6]: https://lkml.org/lkml/2019/8/30/142
> > [v7]: https://lkml.org/lkml/2019/12/23/411
> > [v8]: https://lkml.org/lkml/2020/5/21/213
> > [v9]: https://lkml.org/lkml/2020/6/28/73
> > [v10]: https://lkml.org/lkml/2020/7/2/1443
> > [v11]: https://lkml.org/lkml/2020/8/1/150
> > [v12]: https://lkml.org/lkml/2020/9/7/1037
> > [v13]: https://lkml.org/lkml/2020/10/31/34
> > [v14]: https://lkml.org/lkml/2021/1/30/53
> > [v15]: https://lkml.org/lkml/2021/10/19/1405
> > [v16]: https://lkml.org/lkml/2021/11/23/435
> >
> >
> > Chen Zhou (9):
> > x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> > x86: kdump: make the lower bound of crash kernel reservation
> > consistent
> > x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> > reserve_crashkernel()
> > x86: kdump: move xen_pv_domain() check and insert_resource() to
> > setup_arch()
> > x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> > arm64: kdump: introduce some macros for crash kernel reservation
> > arm64: kdump: reimplement crashkernel=X
> > of: fdt: Add memory for devices by DT property
> > "linux,usable-memory-range"
> > kdump: update Documentation about crashkernel
> >
> > Zhen Lei (1):
> > of: fdt: Aggregate the processing of "linux,usable-memory-range"
> >
> > Documentation/admin-guide/kdump/kdump.rst | 11 +-
> > .../admin-guide/kernel-parameters.txt | 11 +-
> > arch/Kconfig | 3 +
> > arch/arm64/Kconfig | 1 +
> > arch/arm64/include/asm/kexec.h | 10 ++
> > arch/arm64/kernel/machine_kexec_file.c | 12 +-
> > arch/arm64/kernel/setup.c | 13 +-
> > arch/arm64/mm/init.c | 59 ++-----
> > arch/x86/Kconfig | 2 +
> > arch/x86/include/asm/elf.h | 3 +
> > arch/x86/include/asm/kexec.h | 31 +++-
> > arch/x86/kernel/setup.c | 163 ++----------------
> > drivers/of/fdt.c | 42 +++--
> > include/linux/crash_core.h | 3 +
> > include/linux/kexec.h | 2 -
> > kernel/crash_core.c | 156 +++++++++++++++++
> > kernel/kexec_core.c | 17 --
> > 17 files changed, 301 insertions(+), 238 deletions(-)
>
> After 2 years, and 17 versions, can we now get this series promoted into a
> build ?

We don't push patches to linux-next while there are still pending
issues. The v17 looks alright though, I'll push it out for a bit more
coverage and, if there are no objections, it should land in 5.17.

--
Catalin

2021-12-13 18:57:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 00/10] support reserving crashkernel above 4G on arm64 kdump

On Mon, Dec 13, 2021 at 08:37:48AM -0600, [email protected] wrote:
> After 2 years, and 17 versions, can we now get this series promoted into a
> build ?

For example:

$ ./scripts/get_maintainer.pl -f Documentation/admin-guide/kdump/kdump.rst
Baoquan He <[email protected]> (maintainer:KDUMP)
Vivek Goyal <[email protected]> (reviewer:KDUMP)
Dave Young <[email protected]> (reviewer:KDUMP)
Jonathan Corbet <[email protected]> (maintainer:DOCUMENTATION)
[email protected] (open list:KDUMP)

I see only two acks from Baoquan.

So yes, this needs to go through the normal review process first.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-13 19:54:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

> Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

From Documentation/process/maintainer-tip.rst:

"Patch subject
^^^^^^^^^^^^^

The tip tree preferred format for patch subject prefixes is
'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
'genirq/core:'. Please do not use file names or complete file paths as
prefix. 'git log path/to/file' should give you a reasonable hint in most
cases.

The condensed patch description in the subject line should start with a
uppercase letter and should be written in imperative tone."

Please fix 1-5 for your next submission.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-14 08:42:04

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN



On 2021/12/13 21:17, Baoquan He wrote:
> On 12/10/21 at 02:55pm, Zhen Lei wrote:
>> From: Chen Zhou <[email protected]>
>>
>> Move CRASH_ALIGN to header asm/kexec.h for later use.
>>
>> Suggested-by: Dave Young <[email protected]>
>> Suggested-by: Baoquan He <[email protected]>
>
> I remember Dave and I discussed and suggested this when reviewing.
> You can remove my Suggested-by.

OK, I will do it.

>
> For this one, I would like to add ack:
>
> Acked-by: Baoquan He <[email protected]>
>
>> Signed-off-by: Chen Zhou <[email protected]>
>> Signed-off-by: Zhen Lei <[email protected]>
>> Tested-by: John Donnelly <[email protected]>
>> Tested-by: Dave Kleikamp <[email protected]>
>> ---
>> arch/x86/include/asm/kexec.h | 3 +++
>> arch/x86/kernel/setup.c | 3 ---
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 11b7c06e2828c30..3a22e65262aa70b 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -18,6 +18,9 @@
>>
>> # define KEXEC_CONTROL_CODE_MAX_SIZE 2048
>>
>> +/* 16M alignment for crash kernel regions */
>> +#define CRASH_ALIGN SZ_16M
>> +
>> #ifndef __ASSEMBLY__
>>
>> #include <linux/string.h>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 6a190c7f4d71b05..5cc60996eac56d6 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>
>> #ifdef CONFIG_KEXEC_CORE
>>
>> -/* 16M alignment for crash kernel regions */
>> -#define CRASH_ALIGN SZ_16M
>> -
>> /*
>> * Keep the crash kernel below this limit.
>> *
>> --
>> 2.25.1
>>
>
> .
>

2021-12-14 08:48:21

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent



On 2021/12/13 21:37, Baoquan He wrote:
> On 12/10/21 at 02:55pm, Zhen Lei wrote:
>> From: Chen Zhou <[email protected]>
>>
>> The lower bounds of crash kernel reservation and crash kernel low
>> reservation are different, use the consistent value CRASH_ALIGN.
>>
>> Suggested-by: Dave Young <[email protected]>
>> Signed-off-by: Chen Zhou <[email protected]>
>> Signed-off-by: Zhen Lei <[email protected]>
>
> You may need add Co-developed-by to clarify who is author, and who is
> co-author. Please check section "When to use Acked-by:, Cc:, and Co-developed-by:"
> of Documentation/process/submitting-patches.rst. Otherwise,

Okay, thanks for the heads-up. I will modify it.

>
> Acked-by: Baoquan He <[email protected]>
>
>> Tested-by: John Donnelly <[email protected]>
>> Tested-by: Dave Kleikamp <[email protected]>
>> ---
>> arch/x86/kernel/setup.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 5cc60996eac56d6..6424ee4f23da2cf 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
>> return 0;
>> }
>>
>> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
>> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
>> + CRASH_ADDR_LOW_MAX);
>> if (!low_base) {
>> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> (unsigned long)(low_size >> 20));
>> --
>> 2.25.1
>>
>
> .
>

2021-12-14 08:55:00

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> To make the functions reserve_crashkernel() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>

If you made change to this patch, please remove the old Acked-by. If you
didn't contribute change, Signed-off-by should be taken off.

Compared this with the version I acked, only see
memblock_free() -> memblock_phys_free() update which should be done
from the rebase.

So ack this one again, and please also consider adding Co-developed-by.

> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> Acked-by: Baoquan He <[email protected]>
> ---
> arch/x86/kernel/setup.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6424ee4f23da2cf..bb2a0973b98059e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
> if (!crash_base) {
> /*
> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over 4G, also allocates
> - * 256M extra low memory for DMA buffers and swiotlb.
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> * But the extra memory is not required for all machines.
> * So try low memory first and fall back to high memory
> * unless "crashkernel=size[KMG],high" is specified.
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
> }
> }
>
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> memblock_phys_free(crash_base, crash_size);
> return;
> }
> --
> 2.25.1
>


2021-12-14 09:27:34

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN



On 2021/12/14 3:54, Borislav Petkov wrote:
>> Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
>
>>From Documentation/process/maintainer-tip.rst:
>
> "Patch subject
> ^^^^^^^^^^^^^
>
> The tip tree preferred format for patch subject prefixes is
> 'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
> 'genirq/core:'. Please do not use file names or complete file paths as
> prefix. 'git log path/to/file' should give you a reasonable hint in most
> cases.
>
> The condensed patch description in the subject line should start with a
> uppercase letter and should be written in imperative tone."
>
> Please fix 1-5 for your next submission.

OK. I will update them.

>
> Thx.
>

2021-12-14 09:38:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Tue, Dec 14, 2021 at 04:54:40PM +0800, Baoquan He wrote:
> If you didn't contribute change, Signed-off-by should be taken off.

The second SOB is correct here. I'll let you figure it out what it
means.

Hint: Documentation/process/submitting-patches.rst

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-14 09:57:30

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On 12/14/21 at 10:38am, Borislav Petkov wrote:
> On Tue, Dec 14, 2021 at 04:54:40PM +0800, Baoquan He wrote:
> > If you didn't contribute change, Signed-off-by should be taken off.
>
> The second SOB is correct here. I'll let you figure it out what it
> means.
>
> Hint: Documentation/process/submitting-patches.rst

Ah, OK, I see the new paragraph from you added in below commit.

commit 9bf19b78a203b6ed20ed7b5d7222f5ef7a49aed4
Author: Borislav Petkov <[email protected]>
Date: Thu Dec 17 19:37:56 2020 +0100

Documentation/submitting-patches: Document the SoB chain

Hi Lei,

I take back the wrong comment on Signed-off-by tag since you post the
patch, please ignore them.


2021-12-14 10:46:15

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> Make the functions reserve_crashkernel[_low]() as generic. Since
> reserve_crashkernel[_low]() implementations are quite similar on other
> architectures as well, we can have more users of this later.
>
> So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
> select this by X86.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> ---
> arch/Kconfig | 3 +
> arch/x86/Kconfig | 2 +
> arch/x86/include/asm/elf.h | 3 +
> arch/x86/include/asm/kexec.h | 28 ++++++-
> arch/x86/kernel/setup.c | 143 +-------------------------------
> include/linux/crash_core.h | 3 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 156 +++++++++++++++++++++++++++++++++++
> kernel/kexec_core.c | 17 ----
> 9 files changed, 194 insertions(+), 163 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c275..7bdb32c41985dc5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -24,6 +24,9 @@ config KEXEC_ELF
> config HAVE_IMA_KEXEC
> bool
>
> +config ARCH_WANT_RESERVE_CRASH_KERNEL
> + bool
> +
> config SET_FS
> bool
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5c2ccb85f2efb86..bd78ed8193079b9 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -12,6 +12,7 @@ config X86_32
> depends on !64BIT
> # Options that are inherently 32-bit kernel only:
> select ARCH_WANT_IPC_PARSE_VERSION
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
> select CLKSRC_I8253
> select CLONE_BACKWARDS
> select GENERIC_VDSO_32
> @@ -28,6 +29,7 @@ config X86_64
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
> select ARCH_USE_CMPXCHG_LOCKREF
> + select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
> select HAVE_ARCH_SOFT_DIRTY
> select MODULES_USE_ELF_RELA
> select NEED_DMA_MAP_STATE
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 29fea180a6658e8..7a6c36cff8331f5 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
>
> #define elf_check_arch(x) elf_check_arch_ia32(x)
>
> +/* We can also handle crash dumps from 64 bit kernel. */
> +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
> +
> /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
> contains a pointer to a function which might be registered using `atexit'.
> This provides a mean for the dynamic linker to call DT_FINI functions for
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 3a22e65262aa70b..3ff38a1353a2b86 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -21,6 +21,27 @@
> /* 16M alignment for crash kernel regions */
> #define CRASH_ALIGN SZ_16M
>
> +/*
> + * Keep the crash kernel below this limit.
> + *
> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> + * due to mapping restrictions.
> + *
> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
> + * jump could be from 5-level paging to 4-level paging, the jump will fail if
> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> + * no good way to detect the paging mode of the target kernel which will be
> + * loaded for dumping.
> + */
> +#ifdef CONFIG_X86_32
> +# define CRASH_ADDR_LOW_MAX SZ_512M
> +# define CRASH_ADDR_HIGH_MAX SZ_512M
> +#else
> +# define CRASH_ADDR_LOW_MAX SZ_4G
> +# define CRASH_ADDR_HIGH_MAX SZ_64T
> +#endif
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/string.h>
> @@ -51,9 +72,6 @@ struct kimage;
>
> /* The native architecture */
> # define KEXEC_ARCH KEXEC_ARCH_386
> -
> -/* We can also handle crash dumps from 64 bit kernel. */
> -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
> #else
> /* Maximum physical address we can use pages from */
> # define KEXEC_SOURCE_MEMORY_LIMIT (MAXMEM-1)
> @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
> extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> extern void kdump_nmi_shootdown_cpus(void);
>
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 7ae00716a208f82..5519baa7f4b964e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -39,6 +39,7 @@
> #include <asm/io_apic.h>
> #include <asm/kasan.h>
> #include <asm/kaslr.h>
> +#include <asm/kexec.h>
> #include <asm/mce.h>
> #include <asm/mtrr.h>
> #include <asm/realmode.h>
> @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
> }
> }
>
> -/*
> - * --------- Crashkernel reservation ------------------------------
> - */
> -
> -#ifdef CONFIG_KEXEC_CORE
> -
> -/*
> - * Keep the crash kernel below this limit.
> - *
> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> - * due to mapping restrictions.
> - *
> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
> - * jump could be from 5-level paging to 4-level paging, the jump will fail if
> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> - * no good way to detect the paging mode of the target kernel which will be
> - * loaded for dumping.
> - */
> -#ifdef CONFIG_X86_32
> -# define CRASH_ADDR_LOW_MAX SZ_512M
> -# define CRASH_ADDR_HIGH_MAX SZ_512M
> -#else
> -# define CRASH_ADDR_LOW_MAX SZ_4G
> -# define CRASH_ADDR_HIGH_MAX SZ_64T
> -#endif
> -
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> - unsigned long long base, low_base = 0, low_size = 0;
> - unsigned long low_mem_limit;
> - int ret;
> -
> - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> -
> - /* crashkernel=Y,low */
> - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> - if (ret) {
> - /*
> - * two parts from kernel/dma/swiotlb.c:
> - * -swiotlb size: user-specified with swiotlb= or default.
> - *
> - * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> - * to 8M for other buffers that may need to stay low too. Also
> - * make sure we allocate enough extra low memory so that we
> - * don't run out of DMA buffers for 32-bit devices.
> - */
> - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> - } else {
> - /* passed with crashkernel=0,low ? */
> - if (!low_size)
> - return 0;
> - }
> -
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_LOW_MAX);
> - if (!low_base) {
> - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> - (unsigned long)(low_size >> 20));
> - return -ENOMEM;
> - }
> -
> - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> - (unsigned long)(low_size >> 20),
> - (unsigned long)(low_base >> 20),
> - (unsigned long)(low_mem_limit >> 20));
> -
> - crashk_low_res.start = low_base;
> - crashk_low_res.end = low_base + low_size - 1;
> -#endif
> - return 0;
> -}
> -
> -static void __init reserve_crashkernel(void)
> -{
> - unsigned long long crash_size, crash_base, total_mem;
> - bool high = false;
> - int ret;
> -
> - total_mem = memblock_phys_mem_size();
> -
> - /* crashkernel=XM */
> - ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> - if (ret != 0 || crash_size <= 0) {
> - /* crashkernel=X,high */
> - ret = parse_crashkernel_high(boot_command_line, total_mem,
> - &crash_size, &crash_base);
> - if (ret != 0 || crash_size <= 0)
> - return;
> - high = true;
> - }
> -
> - /* 0 means: find the address automatically */
> - if (!crash_base) {
> - /*
> - * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> - * also allocates 256M extra low memory for DMA buffers
> - * and swiotlb.
> - * But the extra memory is not required for all machines.
> - * So try low memory first and fall back to high memory
> - * unless "crashkernel=size[KMG],high" is specified.
> - */
> - if (!high)
> - crash_base = memblock_phys_alloc_range(crash_size,
> - CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_LOW_MAX);
> - if (!crash_base)
> - crash_base = memblock_phys_alloc_range(crash_size,
> - CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_HIGH_MAX);
> - if (!crash_base) {
> - pr_info("crashkernel reservation failed - No suitable area found.\n");
> - return;
> - }
> - } else {
> - unsigned long long start;
> -
> - start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> - crash_base + crash_size);
> - if (start != crash_base) {
> - pr_info("crashkernel reservation failed - memory is in use.\n");
> - return;
> - }
> - }
> -
> - if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> - memblock_phys_free(crash_base, crash_size);
> - return;
> - }
> -
> - pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> - (unsigned long)(crash_size >> 20),
> - (unsigned long)(crash_base >> 20),
> - (unsigned long)(total_mem >> 20));
> -
> - crashk_res.start = crash_base;
> - crashk_res.end = crash_base + crash_size - 1;
> -}
> -#else
> +#ifndef CONFIG_KEXEC_CORE
> static void __init reserve_crashkernel(void)
> {
> }
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e7db..f6b99da4ed08ecf 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -73,6 +73,9 @@ extern unsigned char *vmcoreinfo_data;
> extern size_t vmcoreinfo_size;
> extern u32 *vmcoreinfo_note;
>
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
> +
> Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len);
> void final_note(Elf_Word *buf);
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729e1e..cd744d962f6f417 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -352,8 +352,6 @@ extern int kexec_load_disabled;
>
> /* Location of a reserved region to hold the crash kernel.
> */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
> extern note_buf_t __percpu *crash_notes;
>
> /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index eb53f5ec62c900f..b23cfc0ca8905fd 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -8,6 +8,12 @@
> #include <linux/crash_core.h>
> #include <linux/utsname.h>
> #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
> +
> +#ifdef CONFIG_KEXEC_CORE
> +#include <asm/kexec.h>
> +#endif
>
> #include <asm/page.h>
> #include <asm/sections.h>
> @@ -22,6 +28,22 @@ u32 *vmcoreinfo_note;
> /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
> static unsigned char *vmcoreinfo_data_safecopy;
>
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> + .name = "Crash kernel",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> + .name = "Crash kernel",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_CRASH_KERNEL
> +};
> +
> /*
> * parsing the "crashkernel" commandline
> *
> @@ -295,6 +317,140 @@ int __init parse_crashkernel_low(char *cmdline,
> "crashkernel=", suffix_tbl[SUFFIX_LOW]);
> }
>
> +/*
> + * --------- Crashkernel reservation ------------------------------
> + */
> +
> +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
> +static int __init reserve_crashkernel_low(void)
> +{
> +#ifdef CONFIG_64BIT
> + unsigned long long base, low_base = 0, low_size = 0;
> + unsigned long low_mem_limit;
> + int ret;
> +
> + low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> +
> + /* crashkernel=Y,low */
> + ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> + if (ret) {
> + /*
> + * two parts from kernel/dma/swiotlb.c:
> + * -swiotlb size: user-specified with swiotlb= or default.
> + *
> + * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> + * to 8M for other buffers that may need to stay low too. Also
> + * make sure we allocate enough extra low memory so that we
> + * don't run out of DMA buffers for 32-bit devices.
> + */
> + low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> + } else {
> + /* passed with crashkernel=0,low ? */
> + if (!low_size)
> + return 0;
> + }
> +
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> + if (!low_base) {
> + pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> + (unsigned long)(low_size >> 20));
> + return -ENOMEM;
> + }
> +
> + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> + (unsigned long)(low_size >> 20),
> + (unsigned long)(low_base >> 20),
> + (unsigned long)(low_mem_limit >> 20));
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end = low_base + low_size - 1;
> +#endif
> + return 0;
> +}
> +
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)
> +{
> + unsigned long long crash_size, crash_base, total_mem;
> + bool high = false;
> + int ret;
> +
> + total_mem = memblock_phys_mem_size();
> +
> + /* crashkernel=XM */
> + ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> + if (ret != 0 || crash_size <= 0) {
> + /* crashkernel=X,high */
> + ret = parse_crashkernel_high(boot_command_line, total_mem,
> + &crash_size, &crash_base);
> + if (ret != 0 || crash_size <= 0)
> + return;
> + high = true;
> + }
> +
> + /* 0 means: find the address automatically */
> + if (!crash_base) {
> + /*
> + * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> + * But the extra memory is not required for all machines.
> + * So try low memory first and fall back to high memory
> + * unless "crashkernel=size[KMG],high" is specified.
> + */
> + if (!high)
> + crash_base = memblock_phys_alloc_range(crash_size,
> + CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> + if (!crash_base)
> + crash_base = memblock_phys_alloc_range(crash_size,
> + CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX);
> + if (!crash_base) {
> + pr_info("crashkernel reservation failed - No suitable area found.\n");
> + return;
> + }
> + } else {
> + /* User specifies base address explicitly. */
If you plan to repost, please take above sentence off either. Then we
can say this patch is only doing code moving.

> + unsigned long long start;
> +

OK, I can see that this patch is only moving code, and introducing
CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL to wrap them appropriately, no
extra functionality change added or removed, except of this place.
An alignment checking is added for the user specified base address.
I love this checking. While I have to say it will be more perfect if
it's put in another small patch, that will be look much better from
patch splitting and reviewing point of view.

This whole patch looks great to me, thanks for the effort.


> + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
> + pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
> + (unsigned long)CRASH_ALIGN >> 20);
> + return;
> + }
> +
> + start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> + crash_base + crash_size);
> + if (start != crash_base) {
> + pr_info("crashkernel reservation failed - memory is in use.\n");
> + return;
> + }
> + }
> +
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> + memblock_phys_free(crash_base, crash_size);
> + return;
> + }
> +
> + pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> + (unsigned long)(crash_size >> 20),
> + (unsigned long)(crash_base >> 20),
> + (unsigned long)(total_mem >> 20));
> +
> + crashk_res.start = crash_base;
> + crashk_res.end = crash_base + crash_size - 1;
> +}
> +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
> +
> Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len)
> {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5a5d192a89ac307..1e0d4909bbb6b77 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
> /* Flag to indicate we are going to kexec a new kernel */
> bool kexec_in_progress = false;
>
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> - .name = "Crash kernel",
> - .start = 0,
> - .end = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> - .name = "Crash kernel",
> - .start = 0,
> - .end = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc = IORES_DESC_CRASH_KERNEL
> -};
> -
> int kexec_should_crash(struct task_struct *p)
> {
> /*
> --
> 2.25.1
>


2021-12-14 11:41:09

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()



On 2021/12/10 14:55, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> We will make the functions reserve_crashkernel() as generic, the
> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> So move xen_pv_domain() check and insert_resource() to setup_arch()
> to keep them in x86.
>
> Suggested-by: Mike Rapoport <[email protected]>
> Signed-off-by: Chen Zhou <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> Acked-by: Baoquan He <[email protected]>
> ---
> arch/x86/kernel/setup.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bb2a0973b98059e..7ae00716a208f82 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>
> crashk_low_res.start = low_base;
> crashk_low_res.end = low_base + low_size - 1;
> - insert_resource(&iomem_resource, &crashk_low_res);
> #endif
> return 0;
> }
> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
> high = true;
> }
>
> - if (xen_pv_domain()) {
> - pr_info("Ignoring crashkernel for a Xen PV domain\n");
> - return;
> - }
> -
> /* 0 means: find the address automatically */
> if (!crash_base) {
> /*
> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>
> crashk_res.start = crash_base;
> crashk_res.end = crash_base + crash_size - 1;
> - insert_resource(&iomem_resource, &crashk_res);
> }
> #else
> static void __init reserve_crashkernel(void)
> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
> * Reserve memory for crash kernel after SRAT is parsed so that it
> * won't consume hotpluggable memory.
> */
> - reserve_crashkernel();

Hi Baoquan:
How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
when CONFIG_KEXEC_CORE is enabled before.

> + if (xen_pv_domain())
> + pr_info("Ignoring crashkernel for a Xen PV domain\n");
> + else {
> + reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> + if (crashk_res.end > crashk_res.start)
> + insert_resource(&iomem_resource, &crashk_res);
> + if (crashk_low_res.end > crashk_low_res.start)
> + insert_resource(&iomem_resource, &crashk_low_res);
> +#endif
> + }
>
> memblock_find_dma_reserve();
>
>

2021-12-14 12:38:47

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c



On 2021/12/14 18:45, Baoquan He wrote:
>> + /* User specifies base address explicitly. */
> If you plan to repost, please take above sentence off either. Then we
> can say this patch is only doing code moving.
>
>> + unsigned long long start;
>> +
> OK, I can see that this patch is only moving code, and introducing
> CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL to wrap them appropriately, no
> extra functionality change added or removed, except of this place.
> An alignment checking is added for the user specified base address.
> I love this checking. While I have to say it will be more perfect if
> it's put in another small patch, that will be look much better from
> patch splitting and reviewing point of view.

Good eye. I will put it in a new patch.

>
> This whole patch looks great to me, thanks for the effort.
>
>

2021-12-14 14:24:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Tue, Dec 14, 2021 at 05:56:57PM +0800, Baoquan He wrote:
> Ah, OK, I see the new paragraph from you added in below commit.

That is supposed to make it absolutely clear and explicit. There are
other hints as to what a subsequent SOB means in that document already.

Just the next section says:

"The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery
path."

It wouldn't hurt if people would look at that doc from time to time - we
end up referring to it on a daily basis.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-14 19:08:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> The lower bounds of crash kernel reservation and crash kernel low
> reservation are different, use the consistent value CRASH_ALIGN.

A big WHY is missing here to explain why the lower bound of the
allocation range needs to be 16M and why was 0 wrong?

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 5cc60996eac56d6..6424ee4f23da2cf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
> return 0;
> }
>
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);

You don't have to break this line.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-14 19:24:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > From: Chen Zhou <[email protected]>
> >
> > The lower bounds of crash kernel reservation and crash kernel low
> > reservation are different, use the consistent value CRASH_ALIGN.
>
> A big WHY is missing here to explain why the lower bound of the
> allocation range needs to be 16M and why was 0 wrong?

I asked the same here:

https://lore.kernel.org/r/[email protected]

IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
lower part, so that's equivalent in practice to starting from
CRASH_ALIGN.

Anyway, I agree the commit log should describe this.

--
Catalin

2021-12-15 02:10:51

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent



On 2021/12/15 3:24, Catalin Marinas wrote:
> On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
>> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
>>> From: Chen Zhou <[email protected]>
>>>
>>> The lower bounds of crash kernel reservation and crash kernel low
>>> reservation are different, use the consistent value CRASH_ALIGN.
>>
>> A big WHY is missing here to explain why the lower bound of the
>> allocation range needs to be 16M and why was 0 wrong?
>
> I asked the same here:
>
> https://lore.kernel.org/r/[email protected]
>
> IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> lower part, so that's equivalent in practice to starting from
> CRASH_ALIGN.
>
> Anyway, I agree the commit log should describe this.

OK, I will add the description.

>

2021-12-15 03:42:40

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > From: Chen Zhou <[email protected]>
> > >
> > > The lower bounds of crash kernel reservation and crash kernel low
> > > reservation are different, use the consistent value CRASH_ALIGN.
> >
> > A big WHY is missing here to explain why the lower bound of the
> > allocation range needs to be 16M and why was 0 wrong?
>
> I asked the same here:
>
> https://lore.kernel.org/r/[email protected]
>
> IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> lower part, so that's equivalent in practice to starting from
> CRASH_ALIGN.

Yeah, even for i386, there's area reserved by BIOS inside low 1M.
Considering the existing alignment CRASH_ALIGN which is 16M, we
definitely have no chance to get memory starting from 0. So starting
from 16M can skip the useless memblock searching, and make the
crashkernel low reservation consisten with crashkernel reservation on
allocation code.

>
> Anyway, I agree the commit log should describe this.

Yes, totally.


2021-12-15 06:01:43

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On 12/14/21 at 03:24pm, Borislav Petkov wrote:
> On Tue, Dec 14, 2021 at 05:56:57PM +0800, Baoquan He wrote:
> > Ah, OK, I see the new paragraph from you added in below commit.
>
> That is supposed to make it absolutely clear and explicit. There are
> other hints as to what a subsequent SOB means in that document already.
>
> Just the next section says:
>
> "The Signed-off-by: tag indicates that the signer was involved in the
> development of the patch, or that he/she was in the patch's delivery
> path."
>
> It wouldn't hurt if people would look at that doc from time to time - we
> end up referring to it on a daily basis.


Thanks, I need read this completely, and often check if anything new
is added.


2021-12-15 08:56:29

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()



On 2021/12/14 19:40, Leizhen (ThunderTown) wrote:
>
>
> On 2021/12/10 14:55, Zhen Lei wrote:
>> From: Chen Zhou <[email protected]>
>>
>> We will make the functions reserve_crashkernel() as generic, the
>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>> x86, the same as insert_resource() in reserve_crashkernel[_low]().
>> So move xen_pv_domain() check and insert_resource() to setup_arch()
>> to keep them in x86.
>>
>> Suggested-by: Mike Rapoport <[email protected]>
>> Signed-off-by: Chen Zhou <[email protected]>
>> Signed-off-by: Zhen Lei <[email protected]>
>> Tested-by: John Donnelly <[email protected]>
>> Tested-by: Dave Kleikamp <[email protected]>
>> Acked-by: Baoquan He <[email protected]>
>> ---
>> arch/x86/kernel/setup.c | 19 +++++++++++--------
>> 1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index bb2a0973b98059e..7ae00716a208f82 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>>
>> crashk_low_res.start = low_base;
>> crashk_low_res.end = low_base + low_size - 1;
>> - insert_resource(&iomem_resource, &crashk_low_res);
>> #endif
>> return 0;
>> }
>> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
>> high = true;
>> }
>>
>> - if (xen_pv_domain()) {
>> - pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> - return;
>> - }
>> -
>> /* 0 means: find the address automatically */
>> if (!crash_base) {
>> /*
>> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>>
>> crashk_res.start = crash_base;
>> crashk_res.end = crash_base + crash_size - 1;
>> - insert_resource(&iomem_resource, &crashk_res);
>> }
>> #else
>> static void __init reserve_crashkernel(void)
>> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
>> * Reserve memory for crash kernel after SRAT is parsed so that it
>> * won't consume hotpluggable memory.
>> */
>> - reserve_crashkernel();
>
> Hi Baoquan:
> How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
> empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
> when CONFIG_KEXEC_CORE is enabled before.

Hi Baoquan:
Did you miss this email? If no reply, I will keep it no change.

>
>> + if (xen_pv_domain())
>> + pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> + else {
>> + reserve_crashkernel();
>> +#ifdef CONFIG_KEXEC_CORE
>> + if (crashk_res.end > crashk_res.start)
>> + insert_resource(&iomem_resource, &crashk_res);
>> + if (crashk_low_res.end > crashk_low_res.start)
>> + insert_resource(&iomem_resource, &crashk_low_res);
>> +#endif
>> + }
>>
>> memblock_find_dma_reserve();
>>
>>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
>

2021-12-15 09:22:56

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()

On 12/15/21 at 04:56pm, Leizhen (ThunderTown) wrote:
>
>
> On 2021/12/14 19:40, Leizhen (ThunderTown) wrote:
> >
> >
> > On 2021/12/10 14:55, Zhen Lei wrote:
> >> From: Chen Zhou <[email protected]>
> >>
> >> We will make the functions reserve_crashkernel() as generic, the
> >> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> >> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> >> So move xen_pv_domain() check and insert_resource() to setup_arch()
> >> to keep them in x86.
> >>
> >> Suggested-by: Mike Rapoport <[email protected]>
> >> Signed-off-by: Chen Zhou <[email protected]>
> >> Signed-off-by: Zhen Lei <[email protected]>
> >> Tested-by: John Donnelly <[email protected]>
> >> Tested-by: Dave Kleikamp <[email protected]>
> >> Acked-by: Baoquan He <[email protected]>
> >> ---
> >> arch/x86/kernel/setup.c | 19 +++++++++++--------
> >> 1 file changed, 11 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index bb2a0973b98059e..7ae00716a208f82 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
> >>
> >> crashk_low_res.start = low_base;
> >> crashk_low_res.end = low_base + low_size - 1;
> >> - insert_resource(&iomem_resource, &crashk_low_res);
> >> #endif
> >> return 0;
> >> }
> >> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
> >> high = true;
> >> }
> >>
> >> - if (xen_pv_domain()) {
> >> - pr_info("Ignoring crashkernel for a Xen PV domain\n");
> >> - return;
> >> - }
> >> -
> >> /* 0 means: find the address automatically */
> >> if (!crash_base) {
> >> /*
> >> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
> >>
> >> crashk_res.start = crash_base;
> >> crashk_res.end = crash_base + crash_size - 1;
> >> - insert_resource(&iomem_resource, &crashk_res);
> >> }
> >> #else
> >> static void __init reserve_crashkernel(void)
> >> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
> >> * Reserve memory for crash kernel after SRAT is parsed so that it
> >> * won't consume hotpluggable memory.
> >> */
> >> - reserve_crashkernel();
> >
> > Hi Baoquan:
> > How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
> > empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
> > when CONFIG_KEXEC_CORE is enabled before.
>
> Hi Baoquan:
> Did you miss this email? If no reply, I will keep it no change.

I checked this patch, and it's no update since I acked it.

Moving reserve_crashkernel() into the CONFIG_KEXEC_CORE ifdeffery is
also fine to me.


2021-12-15 11:01:15

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > From: Chen Zhou <[email protected]>
> > > >
> > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > reservation are different, use the consistent value CRASH_ALIGN.
> > >
> > > A big WHY is missing here to explain why the lower bound of the
> > > allocation range needs to be 16M and why was 0 wrong?
> >
> > I asked the same here:
> >
> > https://lore.kernel.org/r/[email protected]
> >
> > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > lower part, so that's equivalent in practice to starting from
> > CRASH_ALIGN.
>
> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> Considering the existing alignment CRASH_ALIGN which is 16M, we
> definitely have no chance to get memory starting from 0. So starting
> from 16M can skip the useless memblock searching, and make the
> crashkernel low reservation consisten with crashkernel reservation on
> allocation code.

That's the x86 assumption. Is it valid for other architectures once the
code has been made generic in patch 6? It should be ok for arm64, RAM
tends to start from higher up but other architectures may start using
this common code.

If you want to keep the same semantics as before, just leave it as 0.
It's not that the additional lower bound makes the search slower.

--
Catalin

2021-12-15 11:17:06

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent

On 12/15/21 at 11:01am, Catalin Marinas wrote:
> On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> > On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > > From: Chen Zhou <[email protected]>
> > > > >
> > > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > > reservation are different, use the consistent value CRASH_ALIGN.
> > > >
> > > > A big WHY is missing here to explain why the lower bound of the
> > > > allocation range needs to be 16M and why was 0 wrong?
> > >
> > > I asked the same here:
> > >
> > > https://lore.kernel.org/r/[email protected]
> > >
> > > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > > lower part, so that's equivalent in practice to starting from
> > > CRASH_ALIGN.
> >
> > Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> > Considering the existing alignment CRASH_ALIGN which is 16M, we
> > definitely have no chance to get memory starting from 0. So starting
> > from 16M can skip the useless memblock searching, and make the
> > crashkernel low reservation consisten with crashkernel reservation on
> > allocation code.
>
> That's the x86 assumption. Is it valid for other architectures once the
> code has been made generic in patch 6? It should be ok for arm64, RAM
> tends to start from higher up but other architectures may start using
> this common code.

Good point. I didn't think of this from generic code side, then let's
keep it as 0.

>
> If you want to keep the same semantics as before, just leave it as 0.
> It's not that the additional lower bound makes the search slower.

Agree.


2021-12-15 11:46:05

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent



On 2021/12/15 19:16, Baoquan He wrote:
> On 12/15/21 at 11:01am, Catalin Marinas wrote:
>> On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
>>> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
>>>> On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
>>>>> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
>>>>>> From: Chen Zhou <[email protected]>
>>>>>>
>>>>>> The lower bounds of crash kernel reservation and crash kernel low
>>>>>> reservation are different, use the consistent value CRASH_ALIGN.
>>>>>
>>>>> A big WHY is missing here to explain why the lower bound of the
>>>>> allocation range needs to be 16M and why was 0 wrong?
>>>>
>>>> I asked the same here:
>>>>
>>>> https://lore.kernel.org/r/[email protected]
>>>>
>>>> IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
>>>> lower part, so that's equivalent in practice to starting from
>>>> CRASH_ALIGN.
>>>
>>> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
>>> Considering the existing alignment CRASH_ALIGN which is 16M, we
>>> definitely have no chance to get memory starting from 0. So starting
>>> from 16M can skip the useless memblock searching, and make the
>>> crashkernel low reservation consisten with crashkernel reservation on
>>> allocation code.
>>
>> That's the x86 assumption. Is it valid for other architectures once the
>> code has been made generic in patch 6? It should be ok for arm64, RAM
>> tends to start from higher up but other architectures may start using
>> this common code.
>
> Good point. I didn't think of this from generic code side, then let's
> keep it as 0.
>
>>
>> If you want to keep the same semantics as before, just leave it as 0.
>> It's not that the additional lower bound makes the search slower.
>
> Agree.

OK, I will drop this patch.

>
> .
>

2021-12-15 13:28:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
> }
> }
>
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> memblock_phys_free(crash_base, crash_size);
> return;
> }

That's not a equivalent transformation on X86_32.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 01:10:54

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On 12/15/21 at 02:28pm, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
> > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
> > }
> > }
> >
> > - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> > + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> > memblock_phys_free(crash_base, crash_size);
> > return;
> > }
>
> That's not a equivalent transformation on X86_32.

reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
transformation for x86_32 doesn't matter, I think.


2021-12-16 02:46:28

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()



On 2021/12/16 9:10, Baoquan He wrote:
> On 12/15/21 at 02:28pm, Borislav Petkov wrote:
>> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
>>> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>>> }
>>> }
>>>
>>> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>> memblock_phys_free(crash_base, crash_size);
>>> return;
>>> }
>>
>> That's not a equivalent transformation on X86_32.

The original value (1ULL << 32) is inaccurate, and it enlarged the CRASH_ADDR_LOW
upper limit. This is because when the memory is allocated from the low end,
the address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. If
the memory is allocated from the high end, 'crash_base' is greater than or
equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.

I think I should update the description, thanks.

if (!high)
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_LOW_MAX);
if (!crash_base)
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_HIGH_MAX);

>
> reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> transformation for x86_32 doesn't matter, I think.
>
> .
>

2021-12-16 10:56:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Thu, Dec 16, 2021 at 09:10:40AM +0800, Baoquan He wrote:
> reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> transformation for x86_32 doesn't matter, I think.

That is, of course, very obvious... not!

Why is that function parsing crashkernel=XM, and crashkernel=X,high,
then it attempts some low memory reservation too? Why isn't
crashkernel=Y,low parsed there too?

I guess this alludes to why:

crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out.

So, before this is going anywhere, I'd like to see this function
documented properly. I see Documentation/admin-guide/kdump/kdump.rst
explains the crashkernel= options too so you can refer to it in the
comments so that when someone looks at that code, someone can follow why
it is doing what it is doing.

Then, as a future work, all that parsing of crashkernel= cmdline options
should be concentrated at the beginning and once it is clear what the
user requests, the reservations should be done.

As it is, reserve_crashkernel() is pretty unwieldy and hard to read.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 11:07:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote:
> The original value (1ULL << 32) is inaccurate

I keep asking *why*?

> and it enlarged the CRASH_ADDR_LOW upper limit.

$ git grep -E "CRASH_ADDR_LOW\W"
$

I have no clue what you mean here.

> This is because when the memory is allocated from the low end, the
> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch.

> If
> the memory is allocated from the high end, 'crash_base' is greater than or
> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.
>
> I think I should update the description, thanks.

I think you should explain why is (1ULL << 32) wrong.

It came from:

eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")

which simply frees the high memory portion when the low reservation
fails. And the test for that is, is crash base > 4G. So that makes
perfect sense to me.

So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of
the _low() variant always returning 0 on 32-bit.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 11:17:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

On Fri, Dec 10, 2021 at 02:55:28PM +0800, Zhen Lei wrote:
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)

As I've already alluded to in another mail, ontop of this there should
be a patch or multiple patches which clean this up more and perhaps even
split it into separate functions doing stuff in this order:

1. Parse all crashkernel= cmdline options

2. Do all crash_base, crash_size etc checks

3. Do the memory reservations

And all that supplied with comments explaining why stuff is being done.

This set of functions is a mess and there's no better time for cleaning
it up and documenting it properly than when you move it to generic code.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 12:08:47

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()



On 2021/12/16 19:07, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote:
>> The original value (1ULL << 32) is inaccurate
>
> I keep asking *why*?
>
>> and it enlarged the CRASH_ADDR_LOW upper limit.
>
> $ git grep -E "CRASH_ADDR_LOW\W"
> $
>
> I have no clue what you mean here.

#ifdef CONFIG_X86_32
# define CRASH_ADDR_LOW_MAX SZ_512M
# define CRASH_ADDR_HIGH_MAX SZ_512M
#endif

if (!high)
(1) crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_LOW_MAX);
if (!crash_base)
(2) crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_HIGH_MAX);

- if (crash_base >= (1ULL << 32) && reserve_crashkernel_low())
+(3) if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low())

If the memory of 'crash_base' is successfully allocated at (1), because the last
parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
"crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
upper limit.

If the memory of 'crash_base' is successfully allocated at (2), you see,
CRASH_ADDR_HIGH_MAX = CRASH_ADDR_LOW_MAX = SZ_512M, the same as (1). In fact,
"crashkernel=high," may not be recommended on X86_32.

Is it possible that (CRASH_ADDR_HIGH_MAX >= 4G) and (CRASH_ADDR_LOW_MAX < 4G)?
In this case, the memory allocated at (2) maybe over 4G. But why shouldn't
CRASH_ADDR_LOW_MAX be equal to 4G at this point?


>
>> This is because when the memory is allocated from the low end, the
>> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch.
>
>> If
>> the memory is allocated from the high end, 'crash_base' is greater than or
>> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.
>>
>> I think I should update the description, thanks.
>
> I think you should explain why is (1ULL << 32) wrong.
>
> It came from:
>
> eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")
>
> which simply frees the high memory portion when the low reservation
> fails. And the test for that is, is crash base > 4G. So that makes
> perfect sense to me.
>
> So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of
> the _low() variant always returning 0 on 32-bit.
>

2021-12-16 12:23:48

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()



On 2021/12/16 20:08, Leizhen (ThunderTown) wrote:
>
>
> On 2021/12/16 19:07, Borislav Petkov wrote:
>> On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote:
>>> The original value (1ULL << 32) is inaccurate
>>
>> I keep asking *why*?
>>
>>> and it enlarged the CRASH_ADDR_LOW upper limit.
>>
>> $ git grep -E "CRASH_ADDR_LOW\W"
>> $
>>
>> I have no clue what you mean here.
>
> #ifdef CONFIG_X86_32
> # define CRASH_ADDR_LOW_MAX SZ_512M
> # define CRASH_ADDR_HIGH_MAX SZ_512M
> #endif
>
> if (!high)
> (1) crash_base = memblock_phys_alloc_range(crash_size,
> CRASH_ALIGN, CRASH_ALIGN,
> CRASH_ADDR_LOW_MAX);
> if (!crash_base)
> (2) crash_base = memblock_phys_alloc_range(crash_size,
> CRASH_ALIGN, CRASH_ALIGN,
> CRASH_ADDR_HIGH_MAX);
>
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low())
> +(3) if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low())
>
> If the memory of 'crash_base' is successfully allocated at (1), because the last
> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
> upper limit.
>
> If the memory of 'crash_base' is successfully allocated at (2), you see,
> CRASH_ADDR_HIGH_MAX = CRASH_ADDR_LOW_MAX = SZ_512M, the same as (1). In fact,
> "crashkernel=high," may not be recommended on X86_32.
>
> Is it possible that (CRASH_ADDR_HIGH_MAX >= 4G) and (CRASH_ADDR_LOW_MAX < 4G)?
> In this case, the memory allocated at (2) maybe over 4G. But why shouldn't
> CRASH_ADDR_LOW_MAX be equal to 4G at this point?

We divide two memory areas: low memory area and high memory area. The doc told us:
at least 256MB memory should be reserved at low memory area. So that if
"crash_base >= CRASH_ADDR_LOW_MAX" is true at (3), that means we have not reserved
any memory at low memory area, so we should call reserve_crashkernel_low().
The low memory area is not equivalent to <=4G, I think. So replace (1ULL << 32) with
CRASH_ADDR_LOW_MAX is logically correct.

>
>
>>
>>> This is because when the memory is allocated from the low end, the
>>> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch.
>>
>>> If
>>> the memory is allocated from the high end, 'crash_base' is greater than or
>>> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.
>>>
>>> I think I should update the description, thanks.
>>
>> I think you should explain why is (1ULL << 32) wrong.
>>
>> It came from:
>>
>> eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")
>>
>> which simply frees the high memory portion when the low reservation
>> fails. And the test for that is, is crash base > 4G. So that makes
>> perfect sense to me.
>>
>> So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of
>> the _low() variant always returning 0 on 32-bit.
>>

2021-12-16 13:15:35

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c



On 2021/12/16 19:17, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:28PM +0800, Zhen Lei wrote:
>> + * reserve_crashkernel() - reserves memory for crash kernel
>> + *
>> + * This function reserves memory area given in "crashkernel=" kernel command
>> + * line parameter. The memory reserved is used by dump capture kernel when
>> + * primary kernel is crashing.
>> + */
>> +void __init reserve_crashkernel(void)
>
> As I've already alluded to in another mail, ontop of this there should
> be a patch or multiple patches which clean this up more and perhaps even
> split it into separate functions doing stuff in this order:
>
> 1. Parse all crashkernel= cmdline options
>
> 2. Do all crash_base, crash_size etc checks
>
> 3. Do the memory reservations
>
> And all that supplied with comments explaining why stuff is being done.

I agree with you. This makes the code look clear. I will do it, try to
post v18 next Monday.

>
> This set of functions is a mess and there's no better time for cleaning
> it up and documenting it properly than when you move it to generic code.
>
> Thx.
>

2021-12-16 14:12:52

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On 12/16/21 at 11:55am, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 09:10:40AM +0800, Baoquan He wrote:
> > reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> > transformation for x86_32 doesn't matter, I think.
>
> That is, of course, very obvious... not!
>
> Why is that function parsing crashkernel=XM, and crashkernel=X,high,
> then it attempts some low memory reservation too? Why isn't
> crashkernel=Y,low parsed there too?
>
> I guess this alludes to why:
>
> crashkernel=size[KMG],low
> [KNL, X86-64] range under 4G. When crashkernel=X,high
> is passed, kernel could allocate physical memory region
> above 4G, that cause second kernel crash on system
> that require some amount of low memory, e.g. swiotlb
> requires at least 64M+32K low memory, also enough extra
> low memory is needed to make sure DMA buffers for 32-bit
> devices won't run out.
>
> So, before this is going anywhere, I'd like to see this function
> documented properly. I see Documentation/admin-guide/kdump/kdump.rst
> explains the crashkernel= options too so you can refer to it in the
> comments so that when someone looks at that code, someone can follow why
> it is doing what it is doing.
>
> Then, as a future work, all that parsing of crashkernel= cmdline options
> should be concentrated at the beginning and once it is clear what the
> user requests, the reservations should be done.

Totally agree we should refactor code to make reserve_crashkernel()
clearer on logic and readibility. In this patchset, we can rewrite the
kernel-doc of reserve_crashkernel() to add more words to explain. As for
the code refactoring, it can be done in another patchset.

>
> As it is, reserve_crashkernel() is pretty unwieldy and hard to read.


2021-12-16 14:48:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Thu, Dec 16, 2021 at 08:08:30PM +0800, Leizhen (ThunderTown) wrote:
> If the memory of 'crash_base' is successfully allocated at (1), because the last
> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
> upper limit.

No, this is actually wrong - that check *must* be 4G. See:

eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")

It is even documented:

crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out.

so you need to do a low allocation for DMA *when* the reserved memory is
above 4G. *NOT* above 512M. But that works due to the obscure situation,
as Baoquan stated, that reserve_crashkernel_low() returns 0 on 32-bit.

So all this is telling us is that that function needs serious cleanup.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 14:51:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

On Thu, Dec 16, 2021 at 09:15:18PM +0800, Leizhen (ThunderTown) wrote:
> I agree with you. This makes the code look clear. I will do it, try to
> post v18 next Monday.

Don't rush it: take your time, do it nice and clean, make sure each
patch does one logical thing only so that there are no bugs introduced
and the pile can is reviewable.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 14:58:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Thu, Dec 16, 2021 at 10:11:15PM +0800, Baoquan He wrote:
> As for the code refactoring, it can be done in another patchset.

Well, I said "future work" before having seen where this patchset is
going. So no, in that case, the usual kernel development process is:
cleanups/fixes first - new features later.

You can see for yourself that piling more crap on what is an already
unreadable mess is only going to make it worse. So, in order to avoid
the maintenance nightmare, we clean up, streamline, document and then
add the new feature and all is shiny.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-17 02:51:11

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()



On 2021/12/16 22:48, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 08:08:30PM +0800, Leizhen (ThunderTown) wrote:
>> If the memory of 'crash_base' is successfully allocated at (1), because the last
>> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
>> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
>> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
>> upper limit.
>
> No, this is actually wrong - that check *must* be 4G. See:
>
> eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")
>
> It is even documented:
>
> crashkernel=size[KMG],low
> [KNL, X86-64] range under 4G. When crashkernel=X,high

[KNL, X86-64], This doc is for X86-64, not for X86-32

> is passed, kernel could allocate physical memory region
> above 4G, that cause second kernel crash on system
> that require some amount of low memory, e.g. swiotlb
> requires at least 64M+32K low memory, also enough extra
> low memory is needed to make sure DMA buffers for 32-bit
> devices won't run out.

vi arch/x86/kernel/setup.c +398

/*
* Keep the crash kernel below this limit.
*
* Earlier 32-bits kernels would limit the kernel to the low 512 MB range
* due to mapping restrictions.

If there is no such restriction, we can make CRASH_ADDR_LOW_MAX equal to (1ULL << 32) minus 1 on X86_32.

>
> so you need to do a low allocation for DMA *when* the reserved memory is
> above 4G. *NOT* above 512M. But that works due to the obscure situation,
> as Baoquan stated, that reserve_crashkernel_low() returns 0 on 32-bit.
>
> So all this is telling us is that that function needs serious cleanup.
>

2021-12-21 22:23:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v17 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

On Fri, Dec 17, 2021 at 10:51:04AM +0800, Leizhen (ThunderTown) wrote:
> [KNL, X86-64], This doc is for X86-64, not for X86-32

reserve_crashkernel() runs on both.

> If there is no such restriction, we can make CRASH_ADDR_LOW_MAX equal
> to (1ULL << 32) minus 1 on X86_32.

Again, the 4G limit check is relevant only for 64-bit kernels - not
32-bit ones.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette