2020-10-31 07:40:59

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
if the memory reserved for crash dump kernel falled in ZONE_DMA32,
the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
"crashkernel=Y,low" can be used to allocate specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following eight patches:
0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
0006-arm64-kdump-reimplement-crashkernel-X.patch
0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
0008-kdump-update-Documentation-about-crashkernel.patch

0001-0003 are some x86 cleanups which prepares for making
functionsreserve_crashkernel[_low]() generic.
0004 makes functions reserve_crashkernel[_low]() generic.
0005-0006 reimplements arm64 crashkernel=X.
0007 adds memory for devices by DT property linux,usable-memory-range.
0008 updates the doc.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037

Chen Zhou (8):
x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
x86: kdump: make the lower bound of crash kernel reservation
consistent
x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
reserve_crashkernel()
x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
arm64: kdump: introduce some macroes for crash kernel reservation
arm64: kdump: reimplement crashkernel=X
arm64: kdump: add memory for devices by DT property
linux,usable-memory-range
kdump: update Documentation about crashkernel

Documentation/admin-guide/kdump/kdump.rst | 23 ++-
.../admin-guide/kernel-parameters.txt | 12 +-
arch/arm64/include/asm/kexec.h | 15 ++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/kernel/setup.c | 13 +-
arch/arm64/mm/init.c | 105 ++++-------
arch/arm64/mm/mmu.c | 4 +
arch/x86/include/asm/kexec.h | 28 +++
arch/x86/kernel/setup.c | 153 +---------------
include/linux/crash_core.h | 4 +
include/linux/kexec.h | 2 -
kernel/crash_core.c | 168 ++++++++++++++++++
kernel/kexec_core.c | 17 --
13 files changed, 301 insertions(+), 244 deletions(-)

--
2.20.1


2020-10-31 07:41:00

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 3/8] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()

To make the functions reserve_crashkernel() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.

Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
arch/x86/kernel/setup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d1599449a001..1289f079ad5f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -491,8 +491,9 @@ static void __init reserve_crashkernel(void)
if (!crash_base) {
/*
* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
- * crashkernel=x,high reserves memory over 4G, also allocates
- * 256M extra low memory for DMA buffers and swiotlb.
+ * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+ * also allocates 256M extra low memory for DMA buffers
+ * and swiotlb.
* But the extra memory is not required for all machines.
* So try low memory first and fall back to high memory
* unless "crashkernel=size[KMG],high" is specified.
@@ -520,7 +521,7 @@ static void __init reserve_crashkernel(void)
}
}

- if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
+ if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
memblock_free(crash_base, crash_size);
return;
}
--
2.20.1

2020-10-31 07:41:06

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 2/8] x86: kdump: make the lower bound of crash kernel reservation consistent

The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.

Suggested-by: Dave Young <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
arch/x86/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bf373422dc8a..d1599449a001 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -444,7 +444,7 @@ static int __init reserve_crashkernel_low(void)
return 0;
}

- low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
+ low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX);
if (!low_base) {
pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
(unsigned long)(low_size >> 20));
--
2.20.1

2020-10-31 07:41:11

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 7/8] arm64: kdump: add memory for devices by DT property linux,usable-memory-range

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
arch/arm64/mm/init.c | 43 +++++++++++++++++++++++++++++++++----------
1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 888c4f7eadc3..794f992cb200 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -69,6 +69,15 @@ static void __init reserve_crashkernel(void)
}
#endif

+/*
+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES 2
+
#ifdef CONFIG_CRASH_DUMP
static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
const char *uname, int depth, void *data)
@@ -184,9 +193,9 @@ early_param("mem", early_mem);
static int __init early_init_dt_scan_usablemem(unsigned long node,
const char *uname, int depth, void *data)
{
- struct memblock_region *usablemem = data;
- const __be32 *reg;
- int len;
+ struct memblock_region *usable_rgns = data;
+ const __be32 *reg, *endp;
+ int len, nr = 0;

if (depth != 1 || strcmp(uname, "chosen") != 0)
return 0;
@@ -195,22 +204,36 @@ static int __init early_init_dt_scan_usablemem(unsigned long node,
if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
return 1;

- usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
- usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
+ endp = reg + (len / sizeof(__be32));
+ while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+ usable_rgns[nr].base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+ usable_rgns[nr].size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+ if (++nr >= MAX_USABLE_RANGES)
+ break;
+ }

return 1;
}

static void __init fdt_enforce_memory_region(void)
{
- struct memblock_region reg = {
- .size = 0,
+ struct memblock_region usable_rgns[MAX_USABLE_RANGES] = {
+ { .size = 0 },
+ { .size = 0 }
};

- of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
+ of_scan_flat_dt(early_init_dt_scan_usablemem, &usable_rgns);

- if (reg.size)
- memblock_cap_memory_range(reg.base, reg.size);
+ /*
+ * The first range of usable-memory regions is for crash dump
+ * kernel with only one region or for high region with two regions,
+ * the second range is dedicated for low region if exist.
+ */
+ if (usable_rgns[0].size)
+ memblock_cap_memory_range(usable_rgns[0].base, usable_rgns[0].size);
+ if (usable_rgns[1].size)
+ memblock_add(usable_rgns[1].base, usable_rgns[1].size);
}

void __init arm64_memblock_init(void)
--
2.20.1

2020-10-31 07:41:17

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 5/8] arm64: kdump: introduce some macroes for crash kernel reservation

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macroes instead.

Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.

Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
arch/arm64/include/asm/kexec.h | 6 ++++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/mm/init.c | 8 ++++----
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index d24b527e8c00..402d208265a3 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,12 @@

#define KEXEC_ARCH KEXEC_ARCH_AARCH64

+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma32_phys_limit
+#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
+
#ifndef __ASSEMBLY__

/**
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index fce8cbecd6bc..12131655cab7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -96,6 +96,7 @@
#endif /* CONFIG_ARM64_FORCE_52BIT */

extern phys_addr_t arm64_dma_phys_limit;
+extern phys_addr_t arm64_dma32_phys_limit;
#define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1)

struct debug_info {
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 095540667f0f..a07fd8e1f926 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -60,7 +60,7 @@ EXPORT_SYMBOL(memstart_addr);
* bit addressable memory area.
*/
phys_addr_t arm64_dma_phys_limit __ro_after_init;
-static phys_addr_t arm64_dma32_phys_limit __ro_after_init;
+phys_addr_t arm64_dma32_phys_limit __ro_after_init;

#ifdef CONFIG_KEXEC_CORE
/*
@@ -85,8 +85,8 @@ static void __init reserve_crashkernel(void)

if (crash_base == 0) {
/* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
- crash_size, SZ_2M);
+ crash_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX,
+ crash_size, CRASH_ALIGN);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
crash_size);
@@ -104,7 +104,7 @@ static void __init reserve_crashkernel(void)
return;
}

- if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
return;
}
--
2.20.1

2020-10-31 07:42:02

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 8/8] kdump: update Documentation about crashkernel

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone or DMA32 zone if CONFIG_ZONE_DMA
is disabled, and fall back to high allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).

"crashkernel=Y,low" can be used to allocate specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
Documentation/admin-guide/kdump/kdump.rst | 23 ++++++++++++++++---
.../admin-guide/kernel-parameters.txt | 12 ++++++++--
2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index 75a9dd98e76e..bde5f994d185 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -299,7 +299,16 @@ Boot into System Kernel
"crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.

- On x86 and x86_64, use "crashkernel=64M@16M".
+ On x86 use "crashkernel=64M@16M".
+
+ On x86_64, use "crashkernel=X" to select a region under 4G first, and
+ fall back to reserve region above 4G. And go for high allocation
+ directly if the required size is too large.
+ We can also use "crashkernel=X,high" to select a region above 4G, which
+ also tries to allocate at least 256M below 4G automatically and
+ "crashkernel=Y,low" can be used to allocate specified size low memory.
+ Use "crashkernel=Y@X" if you really have to reserve memory from specified
+ start address X.

On ppc64, use "crashkernel=128M@32M".

@@ -316,8 +325,16 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.

- On arm64, use "crashkernel=Y[@X]". Note that the start address of
- the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+ On arm64, use "crashkernel=X" to try low allocation in DMA zone (or
+ DMA32 zone if CONFIG_ZONE_DMA is disabled), and fall back to high
+ allocation if it fails.
+ We can also use "crashkernel=X,high" to select a high region above
+ DMA zone, which also tries to allocate at least 256M low memory in
+ DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
+ "crashkernel=Y,low" can be used to allocate specified size low memory.
+ Use "crashkernel=Y@X" if you really have to reserve memory from
+ specified start address X. Note that the start address of the kernel,
+ X if explicitly specified, must be aligned to 2MiB (0x200000).

Load the Dump-capture Kernel
============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..b2955d9379e8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -738,6 +738,9 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+ [KNL, arm64] Try low allocation in DMA zone (or DMA32 zone
+ if CONFIG_ZONE_DMA is disabled), fall back to high allocation
+ if it fails when '@offset' hasn't been specified.
See Documentation/admin-guide/kdump/kdump.rst for further details.

crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -754,6 +757,8 @@
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
+ [KNL, arm64] range in high memory.
+ Allow kernel to allocate physical memory region from top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -762,13 +767,16 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
- at least 256M below 4G automatically.
+ least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+ [KNL, arm64] range in low memory.
+ This one let user to specify a low range in DMA zone for
+ crash dump kernel (or the DMA32 zone if CONFIG_ZONE_DMA
+ is disabled).
cryptomgr.notests
[KNL] Disable crypto self-tests

--
2.20.1

2020-10-31 07:42:20

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
if the memory reserved for crash dump kernel falled in ZONE_DMA32,
the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back
to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically
(or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
"crashkernel=Y,low" can be used to allocate specified size low memory.

Another minor change, there may be two regions reserved for crash
dump kernel, in order to distinct from the high region and make no
effect to the use of existing kexec-tools, rename the low region as
"Crash kernel (low)".

Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
arch/arm64/include/asm/kexec.h | 9 +++++
arch/arm64/kernel/setup.c | 13 +++++++-
arch/arm64/mm/init.c | 60 ++--------------------------------
arch/arm64/mm/mmu.c | 4 +++
kernel/crash_core.c | 8 +++--
5 files changed, 34 insertions(+), 60 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 402d208265a3..79909ae5e22e 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -28,7 +28,12 @@
/* 2M alignment for crash kernel regions */
#define CRASH_ALIGN SZ_2M

+#ifdef CONFIG_ZONE_DMA
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#else
#define CRASH_ADDR_LOW_MAX arm64_dma32_phys_limit
+#endif
+
#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE

#ifndef __ASSEMBLY__
@@ -96,6 +101,10 @@ static inline void crash_prepare_suspend(void) {}
static inline void crash_post_resume(void) {}
#endif

+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
#ifdef CONFIG_KEXEC_FILE
#define ARCH_HAS_KIMAGE_ARCH

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 133257ffd859..6aff30de8f47 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
#ifdef CONFIG_KEXEC_CORE
- /* Userspace will find "Crash kernel" region in /proc/iomem. */
+ /*
+ * Userspace will find "Crash kernel" or "Crash kernel (low)"
+ * region in /proc/iomem.
+ * In order to distinct from the high region and make no effect
+ * to the use of existing kexec-tools, rename the low region as
+ * "Crash kernel (low)".
+ */
+ if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+ crashk_low_res.end <= res->end) {
+ crashk_low_res.name = "Crash kernel (low)";
+ request_resource(res, &crashk_low_res);
+ }
if (crashk_res.end && crashk_res.start >= res->start &&
crashk_res.end <= res->end)
request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a07fd8e1f926..888c4f7eadc3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -34,6 +34,7 @@
#include <asm/fixmap.h>
#include <asm/kasan.h>
#include <asm/kernel-pgtable.h>
+#include <asm/kexec.h>
#include <asm/memory.h>
#include <asm/numa.h>
#include <asm/sections.h>
@@ -62,66 +63,11 @@ EXPORT_SYMBOL(memstart_addr);
phys_addr_t arm64_dma_phys_limit __ro_after_init;
phys_addr_t arm64_dma32_phys_limit __ro_after_init;

-#ifdef CONFIG_KEXEC_CORE
-/*
- * reserve_crashkernel() - reserves memory for crash kernel
- *
- * This function reserves memory area given in "crashkernel=" kernel command
- * line parameter. The memory reserved is used by dump capture kernel when
- * primary kernel is crashing.
- */
-static void __init reserve_crashkernel(void)
-{
- unsigned long long crash_base, crash_size;
- int ret;
-
- ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
- &crash_size, &crash_base);
- /* no crashkernel= or invalid value specified */
- if (ret || !crash_size)
- return;
-
- crash_size = PAGE_ALIGN(crash_size);
-
- if (crash_base == 0) {
- /* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX,
- crash_size, CRASH_ALIGN);
- if (crash_base == 0) {
- pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
- crash_size);
- return;
- }
- } else {
- /* User specifies base address explicitly. */
- if (!memblock_is_region_memory(crash_base, crash_size)) {
- pr_warn("cannot reserve crashkernel: region is not memory\n");
- return;
- }
-
- if (memblock_is_region_reserved(crash_base, crash_size)) {
- pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
- return;
- }
-
- if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
- pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
- return;
- }
- }
- memblock_reserve(crash_base, crash_size);
-
- pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
- crash_base, crash_base + crash_size, crash_size >> 20);
-
- crashk_res.start = crash_base;
- crashk_res.end = crash_base + crash_size - 1;
-}
-#else
+#ifndef CONFIG_KEXEC_CORE
static void __init reserve_crashkernel(void)
{
}
-#endif /* CONFIG_KEXEC_CORE */
+#endif

#ifdef CONFIG_CRASH_DUMP
static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1c0f3e02f731..c55cee290bbb 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
*/
memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
#ifdef CONFIG_KEXEC_CORE
+ if (crashk_low_res.end)
+ memblock_mark_nomap(crashk_low_res.start,
+ resource_size(&crashk_low_res));
+
if (crashk_res.end)
memblock_mark_nomap(crashk_res.start,
resource_size(&crashk_res));
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index d39892bdb9ae..cdef7d8c91a6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,

int __init reserve_crashkernel_low(void)
{
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
unsigned long long base, low_base = 0, low_size = 0;
unsigned long low_mem_limit;
int ret;
@@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)

crashk_low_res.start = low_base;
crashk_low_res.end = low_base + low_size - 1;
+#ifdef CONFIG_X86_64
insert_resource(&iomem_resource, &crashk_low_res);
+#endif
#endif
return 0;
}

-#ifdef CONFIG_X86
+#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
#ifdef CONFIG_KEXEC_CORE
/*
* reserve_crashkernel() - reserves memory for crash kernel
@@ -453,7 +455,9 @@ void __init reserve_crashkernel(void)

crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
+#ifdef CONFIG_X86
insert_resource(&iomem_resource, &crashk_res);
+#endif
}
#endif /* CONFIG_KEXEC_CORE */
#endif
--
2.20.1

2020-10-31 07:43:19

by chenzhou

[permalink] [raw]
Subject: [PATCH v13 4/8] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

Make the functions reserve_crashkernel[_low]() as generic.
Arm64 will use these to reimplement crashkernel=X.

Signed-off-by: Chen Zhou <[email protected]>
Tested-by: John Donnelly <[email protected]>
---
arch/x86/include/asm/kexec.h | 25 ++++++
arch/x86/kernel/setup.c | 151 +-------------------------------
include/linux/crash_core.h | 4 +
include/linux/kexec.h | 2 -
kernel/crash_core.c | 164 +++++++++++++++++++++++++++++++++++
kernel/kexec_core.c | 17 ----
6 files changed, 195 insertions(+), 168 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 8cf9d3fd31c7..34afa7b645f9 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -21,6 +21,27 @@
/* 2M alignment for crash kernel regions */
#define CRASH_ALIGN SZ_16M

+/*
+ * Keep the crash kernel below this limit.
+ *
+ * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
+ * due to mapping restrictions.
+ *
+ * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
+ * the upper limit of system RAM in 4-level paging mode. Since the kdump
+ * jump could be from 5-level paging to 4-level paging, the jump will fail if
+ * the kernel is put above 64 TB, and during the 1st kernel bootup there's
+ * no good way to detect the paging mode of the target kernel which will be
+ * loaded for dumping.
+ */
+#ifdef CONFIG_X86_32
+# define CRASH_ADDR_LOW_MAX SZ_512M
+# define CRASH_ADDR_HIGH_MAX SZ_512M
+#else
+# define CRASH_ADDR_LOW_MAX SZ_4G
+# define CRASH_ADDR_HIGH_MAX SZ_64T
+#endif
+
#ifndef __ASSEMBLY__

#include <linux/string.h>
@@ -200,6 +221,10 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);

+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1289f079ad5f..00b3840d30f9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -25,8 +25,6 @@

#include <uapi/linux/mount.h>

-#include <xen/xen.h>
-
#include <asm/apic.h>
#include <asm/numa.h>
#include <asm/bios_ebda.h>
@@ -38,6 +36,7 @@
#include <asm/io_apic.h>
#include <asm/kasan.h>
#include <asm/kaslr.h>
+#include <asm/kexec.h>
#include <asm/mce.h>
#include <asm/mtrr.h>
#include <asm/realmode.h>
@@ -389,153 +388,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
}
}

-/*
- * --------- Crashkernel reservation ------------------------------
- */
-
-#ifdef CONFIG_KEXEC_CORE
-
-/*
- * Keep the crash kernel below this limit.
- *
- * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
- * due to mapping restrictions.
- *
- * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
- * the upper limit of system RAM in 4-level paging mode. Since the kdump
- * jump could be from 5-level paging to 4-level paging, the jump will fail if
- * the kernel is put above 64 TB, and during the 1st kernel bootup there's
- * no good way to detect the paging mode of the target kernel which will be
- * loaded for dumping.
- */
-#ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX SZ_512M
-# define CRASH_ADDR_HIGH_MAX SZ_512M
-#else
-# define CRASH_ADDR_LOW_MAX SZ_4G
-# define CRASH_ADDR_HIGH_MAX SZ_64T
-#endif
-
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
- unsigned long long base, low_base = 0, low_size = 0;
- unsigned long low_mem_limit;
- int ret;
-
- low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
-
- /* crashkernel=Y,low */
- ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
- if (ret) {
- /*
- * two parts from kernel/dma/swiotlb.c:
- * -swiotlb size: user-specified with swiotlb= or default.
- *
- * -swiotlb overflow buffer: now hardcoded to 32k. We round it
- * to 8M for other buffers that may need to stay low too. Also
- * make sure we allocate enough extra low memory so that we
- * don't run out of DMA buffers for 32-bit devices.
- */
- low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
- } else {
- /* passed with crashkernel=0,low ? */
- if (!low_size)
- return 0;
- }
-
- low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX);
- if (!low_base) {
- pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
- (unsigned long)(low_size >> 20));
- return -ENOMEM;
- }
-
- pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
- (unsigned long)(low_size >> 20),
- (unsigned long)(low_base >> 20),
- (unsigned long)(low_mem_limit >> 20));
-
- crashk_low_res.start = low_base;
- crashk_low_res.end = low_base + low_size - 1;
- insert_resource(&iomem_resource, &crashk_low_res);
-#endif
- return 0;
-}
-
-static void __init reserve_crashkernel(void)
-{
- unsigned long long crash_size, crash_base, total_mem;
- bool high = false;
- int ret;
-
- total_mem = memblock_phys_mem_size();
-
- /* crashkernel=XM */
- ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0) {
- /* crashkernel=X,high */
- ret = parse_crashkernel_high(boot_command_line, total_mem,
- &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
- return;
- high = true;
- }
-
- if (xen_pv_domain()) {
- pr_info("Ignoring crashkernel for a Xen PV domain\n");
- return;
- }
-
- /* 0 means: find the address automatically */
- if (!crash_base) {
- /*
- * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
- * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
- * also allocates 256M extra low memory for DMA buffers
- * and swiotlb.
- * But the extra memory is not required for all machines.
- * So try low memory first and fall back to high memory
- * unless "crashkernel=size[KMG],high" is specified.
- */
- if (!high)
- crash_base = memblock_phys_alloc_range(crash_size,
- CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_LOW_MAX);
- if (!crash_base)
- crash_base = memblock_phys_alloc_range(crash_size,
- CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_HIGH_MAX);
- if (!crash_base) {
- pr_info("crashkernel reservation failed - No suitable area found.\n");
- return;
- }
- } else {
- unsigned long long start;
-
- start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base,
- crash_base + crash_size);
- if (start != crash_base) {
- pr_info("crashkernel reservation failed - memory is in use.\n");
- return;
- }
- }
-
- if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
- memblock_free(crash_base, crash_size);
- return;
- }
-
- pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
- (unsigned long)(crash_size >> 20),
- (unsigned long)(crash_base >> 20),
- (unsigned long)(total_mem >> 20));
-
- crashk_res.start = crash_base;
- crashk_res.end = crash_base + crash_size - 1;
- insert_resource(&iomem_resource, &crashk_res);
-}
-#else
+#ifndef CONFIG_KEXEC_CORE
static void __init reserve_crashkernel(void)
{
}
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 206bde8308b2..5021d7c70aee 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -69,6 +69,9 @@ extern unsigned char *vmcoreinfo_data;
extern size_t vmcoreinfo_size;
extern u32 *vmcoreinfo_note;

+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
+
/* raw contents of kernel .notes section */
extern const void __start_notes __weak;
extern const void __stop_notes __weak;
@@ -83,5 +86,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+int __init reserve_crashkernel_low(void);

#endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 9e93bef52968..f301f2f5cfc4 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -337,8 +337,6 @@ extern int kexec_load_disabled;

/* Location of a reserved region to hold the crash kernel.
*/
-extern struct resource crashk_res;
-extern struct resource crashk_low_res;
extern note_buf_t __percpu *crash_notes;

/* flag to track if kexec reboot is in progress */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 106e4500fd53..d39892bdb9ae 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,7 +7,12 @@
#include <linux/crash_core.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/memblock.h>
+#include <linux/swiotlb.h>

+#include <xen/xen.h>
+
+#include <asm/kexec.h>
#include <asm/page.h>
#include <asm/sections.h>

@@ -21,6 +26,22 @@ u32 *vmcoreinfo_note;
/* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
static unsigned char *vmcoreinfo_data_safecopy;

+/* Location of the reserved area for the crash kernel */
+struct resource crashk_res = {
+ .name = "Crash kernel",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+ .desc = IORES_DESC_CRASH_KERNEL
+};
+struct resource crashk_low_res = {
+ .name = "Crash kernel",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+ .desc = IORES_DESC_CRASH_KERNEL
+};
+
/*
* parsing the "crashkernel" commandline
*
@@ -294,6 +315,149 @@ int __init parse_crashkernel_low(char *cmdline,
"crashkernel=", suffix_tbl[SUFFIX_LOW]);
}

+/*
+ * --------- Crashkernel reservation ------------------------------
+ */
+
+int __init reserve_crashkernel_low(void)
+{
+#ifdef CONFIG_X86_64
+ unsigned long long base, low_base = 0, low_size = 0;
+ unsigned long low_mem_limit;
+ int ret;
+
+ low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
+
+ /* crashkernel=Y,low */
+ ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
+ if (ret) {
+ /*
+ * two parts from kernel/dma/swiotlb.c:
+ * -swiotlb size: user-specified with swiotlb= or default.
+ *
+ * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+ * to 8M for other buffers that may need to stay low too. Also
+ * make sure we allocate enough extra low memory so that we
+ * don't run out of DMA buffers for 32-bit devices.
+ */
+ low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
+ } else {
+ /* passed with crashkernel=0,low ? */
+ if (!low_size)
+ return 0;
+ }
+
+ low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_LOW_MAX);
+ if (!low_base) {
+ pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
+ (unsigned long)(low_size >> 20));
+ return -ENOMEM;
+ }
+
+ pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
+ (unsigned long)(low_size >> 20),
+ (unsigned long)(low_base >> 20),
+ (unsigned long)(low_mem_limit >> 20));
+
+ crashk_low_res.start = low_base;
+ crashk_low_res.end = low_base + low_size - 1;
+ insert_resource(&iomem_resource, &crashk_low_res);
+#endif
+ return 0;
+}
+
+#ifdef CONFIG_X86
+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_size, crash_base, total_mem;
+ bool high = false;
+ int ret;
+
+ total_mem = memblock_phys_mem_size();
+
+ /* crashkernel=XM */
+ ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
+ if (ret != 0 || crash_size <= 0) {
+ /* crashkernel=X,high */
+ ret = parse_crashkernel_high(boot_command_line, total_mem,
+ &crash_size, &crash_base);
+ if (ret != 0 || crash_size <= 0)
+ return;
+ high = true;
+ }
+
+ if (xen_pv_domain()) {
+ pr_info("Ignoring crashkernel for a Xen PV domain\n");
+ return;
+ }
+
+ /* 0 means: find the address automatically */
+ if (!crash_base) {
+ /*
+ * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
+ * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+ * also allocates 256M extra low memory for DMA buffers
+ * and swiotlb.
+ * But the extra memory is not required for all machines.
+ * So try low memory first and fall back to high memory
+ * unless "crashkernel=size[KMG],high" is specified.
+ */
+ if (!high)
+ crash_base = memblock_phys_alloc_range(crash_size,
+ CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_LOW_MAX);
+ if (!crash_base)
+ crash_base = memblock_phys_alloc_range(crash_size,
+ CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX);
+ if (!crash_base) {
+ pr_info("crashkernel reservation failed - No suitable area found.\n");
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ unsigned long long start;
+
+ if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
+ pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
+ (unsigned long)CRASH_ALIGN >> 20);
+ return;
+ }
+
+ start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base,
+ crash_base + crash_size);
+ if (start != crash_base) {
+ pr_info("crashkernel reservation failed - memory is in use.\n");
+ return;
+ }
+ }
+
+ if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
+ memblock_free(crash_base, crash_size);
+ return;
+ }
+
+ pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
+ (unsigned long)(crash_size >> 20),
+ (unsigned long)(crash_base >> 20),
+ (unsigned long)(total_mem >> 20));
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+ insert_resource(&iomem_resource, &crashk_res);
+}
+#endif /* CONFIG_KEXEC_CORE */
+#endif
+
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
void *data, size_t data_len)
{
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 8798a8183974..2ca887514145 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
/* Flag to indicate we are going to kexec a new kernel */
bool kexec_in_progress = false;

-
-/* Location of the reserved area for the crash kernel */
-struct resource crashk_res = {
- .name = "Crash kernel",
- .start = 0,
- .end = 0,
- .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
- .desc = IORES_DESC_CRASH_KERNEL
-};
-struct resource crashk_low_res = {
- .name = "Crash kernel",
- .start = 0,
- .end = 0,
- .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
- .desc = IORES_DESC_CRASH_KERNEL
-};
-
int kexec_should_crash(struct task_struct *p)
{
/*
--
2.20.1

2020-11-09 12:38:15

by chenzhou

[permalink] [raw]
Subject: Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

Hi all,

Friendly ping...


On 2020/10/31 15:44, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
> "crashkernel=Y,low" can be used to allocate specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following eight patches:
> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0008-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements arm64 crashkernel=X.
> 0007 adds memory for devices by DT property linux,usable-memory-range.
> 0008 updates the doc.
>
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
> [v12]: https://lkml.org/lkml/2020/9/7/1037
>
> Chen Zhou (8):
> x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> x86: kdump: make the lower bound of crash kernel reservation
> consistent
> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> reserve_crashkernel()
> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> arm64: kdump: introduce some macroes for crash kernel reservation
> arm64: kdump: reimplement crashkernel=X
> arm64: kdump: add memory for devices by DT property
> linux,usable-memory-range
> kdump: update Documentation about crashkernel
>
> Documentation/admin-guide/kdump/kdump.rst | 23 ++-
> .../admin-guide/kernel-parameters.txt | 12 +-
> arch/arm64/include/asm/kexec.h | 15 ++
> arch/arm64/include/asm/processor.h | 1 +
> arch/arm64/kernel/setup.c | 13 +-
> arch/arm64/mm/init.c | 105 ++++-------
> arch/arm64/mm/mmu.c | 4 +
> arch/x86/include/asm/kexec.h | 28 +++
> arch/x86/kernel/setup.c | 153 +---------------
> include/linux/crash_core.h | 4 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 168 ++++++++++++++++++
> kernel/kexec_core.c | 17 --
> 13 files changed, 301 insertions(+), 244 deletions(-)
>

2020-11-11 02:04:30

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

On 10/31/20 at 03:44pm, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X and
> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
> in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back
> to high allocation if it fails.
> We can also use "crashkernel=X,high" to select a region above DMA zone,
> which also tries to allocate at least 256M in DMA zone automatically
> (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
> "crashkernel=Y,low" can be used to allocate specified size low memory.
>
> Another minor change, there may be two regions reserved for crash
> dump kernel, in order to distinct from the high region and make no
> effect to the use of existing kexec-tools, rename the low region as
> "Crash kernel (low)".
>
> Signed-off-by: Chen Zhou <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> ---
> arch/arm64/include/asm/kexec.h | 9 +++++
> arch/arm64/kernel/setup.c | 13 +++++++-
> arch/arm64/mm/init.c | 60 ++--------------------------------
> arch/arm64/mm/mmu.c | 4 +++
> kernel/crash_core.c | 8 +++--
> 5 files changed, 34 insertions(+), 60 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 402d208265a3..79909ae5e22e 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -28,7 +28,12 @@
> /* 2M alignment for crash kernel regions */
> #define CRASH_ALIGN SZ_2M
>
> +#ifdef CONFIG_ZONE_DMA
> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
> +#else
> #define CRASH_ADDR_LOW_MAX arm64_dma32_phys_limit
> +#endif
> +
> #define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
>
> #ifndef __ASSEMBLY__
> @@ -96,6 +101,10 @@ static inline void crash_prepare_suspend(void) {}
> static inline void crash_post_resume(void) {}
> #endif
>
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
> #ifdef CONFIG_KEXEC_FILE
> #define ARCH_HAS_KIMAGE_ARCH
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 133257ffd859..6aff30de8f47 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
> kernel_data.end <= res->end)
> request_resource(res, &kernel_data);
> #ifdef CONFIG_KEXEC_CORE
> - /* Userspace will find "Crash kernel" region in /proc/iomem. */
> + /*
> + * Userspace will find "Crash kernel" or "Crash kernel (low)"
> + * region in /proc/iomem.
> + * In order to distinct from the high region and make no effect
> + * to the use of existing kexec-tools, rename the low region as
> + * "Crash kernel (low)".
> + */
> + if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> + crashk_low_res.end <= res->end) {
> + crashk_low_res.name = "Crash kernel (low)";
> + request_resource(res, &crashk_low_res);
> + }
> if (crashk_res.end && crashk_res.start >= res->start &&
> crashk_res.end <= res->end)
> request_resource(res, &crashk_res);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a07fd8e1f926..888c4f7eadc3 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -34,6 +34,7 @@
> #include <asm/fixmap.h>
> #include <asm/kasan.h>
> #include <asm/kernel-pgtable.h>
> +#include <asm/kexec.h>
> #include <asm/memory.h>
> #include <asm/numa.h>
> #include <asm/sections.h>
> @@ -62,66 +63,11 @@ EXPORT_SYMBOL(memstart_addr);
> phys_addr_t arm64_dma_phys_limit __ro_after_init;
> phys_addr_t arm64_dma32_phys_limit __ro_after_init;
>
> -#ifdef CONFIG_KEXEC_CORE
> -/*
> - * reserve_crashkernel() - reserves memory for crash kernel
> - *
> - * This function reserves memory area given in "crashkernel=" kernel command
> - * line parameter. The memory reserved is used by dump capture kernel when
> - * primary kernel is crashing.
> - */
> -static void __init reserve_crashkernel(void)
> -{
> - unsigned long long crash_base, crash_size;
> - int ret;
> -
> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> - &crash_size, &crash_base);
> - /* no crashkernel= or invalid value specified */
> - if (ret || !crash_size)
> - return;
> -
> - crash_size = PAGE_ALIGN(crash_size);
> -
> - if (crash_base == 0) {
> - /* Current arm64 boot protocol requires 2MB alignment */
> - crash_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX,
> - crash_size, CRASH_ALIGN);
> - if (crash_base == 0) {
> - pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> - crash_size);
> - return;
> - }
> - } else {
> - /* User specifies base address explicitly. */
> - if (!memblock_is_region_memory(crash_base, crash_size)) {
> - pr_warn("cannot reserve crashkernel: region is not memory\n");
> - return;
> - }
> -
> - if (memblock_is_region_reserved(crash_base, crash_size)) {
> - pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
> - return;
> - }
> -
> - if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
> - pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
> - return;
> - }
> - }
> - memblock_reserve(crash_base, crash_size);
> -
> - pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
> - crash_base, crash_base + crash_size, crash_size >> 20);
> -
> - crashk_res.start = crash_base;
> - crashk_res.end = crash_base + crash_size - 1;
> -}
> -#else
> +#ifndef CONFIG_KEXEC_CORE
> static void __init reserve_crashkernel(void)
> {
> }
> -#endif /* CONFIG_KEXEC_CORE */
> +#endif
>
> #ifdef CONFIG_CRASH_DUMP
> static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 1c0f3e02f731..c55cee290bbb 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
> */
> memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> #ifdef CONFIG_KEXEC_CORE
> + if (crashk_low_res.end)
> + memblock_mark_nomap(crashk_low_res.start,
> + resource_size(&crashk_low_res));
> +
> if (crashk_res.end)
> memblock_mark_nomap(crashk_res.start,
> resource_size(&crashk_res));
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index d39892bdb9ae..cdef7d8c91a6 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
>
> int __init reserve_crashkernel_low(void)
> {
> -#ifdef CONFIG_X86_64
> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)

Not very sure if a CONFIG_64BIT checking is better.

> unsigned long long base, low_base = 0, low_size = 0;
> unsigned long low_mem_limit;
> int ret;
> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
>
> crashk_low_res.start = low_base;
> crashk_low_res.end = low_base + low_size - 1;
> +#ifdef CONFIG_X86_64
> insert_resource(&iomem_resource, &crashk_low_res);
> +#endif
> #endif
> return 0;
> }
>
> -#ifdef CONFIG_X86
> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)

Should we make this weak default so that we can remove the ARCH config?

> #ifdef CONFIG_KEXEC_CORE
> /*
> * reserve_crashkernel() - reserves memory for crash kernel
> @@ -453,7 +455,9 @@ void __init reserve_crashkernel(void)
>
> crashk_res.start = crash_base;
> crashk_res.end = crash_base + crash_size - 1;
> +#ifdef CONFIG_X86
> insert_resource(&iomem_resource, &crashk_res);
> +#endif
> }
> #endif /* CONFIG_KEXEC_CORE */
> #endif
> --
> 2.20.1
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
>

2020-11-11 03:04:15

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

Hi Zhou, Bhupesh

On 10/31/20 at 03:44pm, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.

I went through this patchset, mainly the x86 related and generic
changes, the changes look great and no risk. And I know Bhupesh is
following up this and helping review, thanks, both.

So you have also tested crashkernel reservation on x86_64, with the
normal reservation, and high/low reservation, it is working well,
right? Asking this because I didn't see the test result description, and
just note it.

Thanks
Baoquan

>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
> "crashkernel=Y,low" can be used to allocate specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following eight patches:
> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0008-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements arm64 crashkernel=X.
> 0007 adds memory for devices by DT property linux,usable-memory-range.
> 0008 updates the doc.
>
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
> [v12]: https://lkml.org/lkml/2020/9/7/1037
>
> Chen Zhou (8):
> x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> x86: kdump: make the lower bound of crash kernel reservation
> consistent
> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> reserve_crashkernel()
> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> arm64: kdump: introduce some macroes for crash kernel reservation
> arm64: kdump: reimplement crashkernel=X
> arm64: kdump: add memory for devices by DT property
> linux,usable-memory-range
> kdump: update Documentation about crashkernel
>
> Documentation/admin-guide/kdump/kdump.rst | 23 ++-
> .../admin-guide/kernel-parameters.txt | 12 +-
> arch/arm64/include/asm/kexec.h | 15 ++
> arch/arm64/include/asm/processor.h | 1 +
> arch/arm64/kernel/setup.c | 13 +-
> arch/arm64/mm/init.c | 105 ++++-------
> arch/arm64/mm/mmu.c | 4 +
> arch/x86/include/asm/kexec.h | 28 +++
> arch/x86/kernel/setup.c | 153 +---------------
> include/linux/crash_core.h | 4 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 168 ++++++++++++++++++
> kernel/kexec_core.c | 17 --
> 13 files changed, 301 insertions(+), 244 deletions(-)
>
> --
> 2.20.1
>

2020-11-11 13:30:28

by chenzhou

[permalink] [raw]
Subject: Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

Hi Baoquan,


On 2020/11/11 9:59, Baoquan He wrote:
> On 10/31/20 at 03:44pm, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>>
>> To solve these issues, change the behavior of crashkernel=X and
>> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
>> in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back
>> to high allocation if it fails.
>> We can also use "crashkernel=X,high" to select a region above DMA zone,
>> which also tries to allocate at least 256M in DMA zone automatically
>> (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>>
>> Another minor change, there may be two regions reserved for crash
>> dump kernel, in order to distinct from the high region and make no
>> effect to the use of existing kexec-tools, rename the low region as
>> "Crash kernel (low)".
>>
>> Signed-off-by: Chen Zhou <[email protected]>
>> Tested-by: John Donnelly <[email protected]>
>> ---
>> arch/arm64/include/asm/kexec.h | 9 +++++
>> arch/arm64/kernel/setup.c | 13 +++++++-
>> arch/arm64/mm/init.c | 60 ++--------------------------------
>> arch/arm64/mm/mmu.c | 4 +++
>> kernel/crash_core.c | 8 +++--
>> 5 files changed, 34 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>> index 402d208265a3..79909ae5e22e 100644
>> --- a/arch/arm64/include/asm/kexec.h
>> +++ b/arch/arm64/include/asm/kexec.h
>> @@ -28,7 +28,12 @@
>> /* 2M alignment for crash kernel regions */
>> #define CRASH_ALIGN SZ_2M
>>
>> +#ifdef CONFIG_ZONE_DMA
>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
>> +#else
>> #define CRASH_ADDR_LOW_MAX arm64_dma32_phys_limit
>> +#endif
>> +
>> #define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
>>
>> #ifndef __ASSEMBLY__
>> @@ -96,6 +101,10 @@ static inline void crash_prepare_suspend(void) {}
>> static inline void crash_post_resume(void) {}
>> #endif
>>
>> +#ifdef CONFIG_KEXEC_CORE
>> +extern void __init reserve_crashkernel(void);
>> +#endif
>> +
>> #ifdef CONFIG_KEXEC_FILE
>> #define ARCH_HAS_KIMAGE_ARCH
>>
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 133257ffd859..6aff30de8f47 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>> kernel_data.end <= res->end)
>> request_resource(res, &kernel_data);
>> #ifdef CONFIG_KEXEC_CORE
>> - /* Userspace will find "Crash kernel" region in /proc/iomem. */
>> + /*
>> + * Userspace will find "Crash kernel" or "Crash kernel (low)"
>> + * region in /proc/iomem.
>> + * In order to distinct from the high region and make no effect
>> + * to the use of existing kexec-tools, rename the low region as
>> + * "Crash kernel (low)".
>> + */
>> + if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>> + crashk_low_res.end <= res->end) {
>> + crashk_low_res.name = "Crash kernel (low)";
>> + request_resource(res, &crashk_low_res);
>> + }
>> if (crashk_res.end && crashk_res.start >= res->start &&
>> crashk_res.end <= res->end)
>> request_resource(res, &crashk_res);
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index a07fd8e1f926..888c4f7eadc3 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -34,6 +34,7 @@
>> #include <asm/fixmap.h>
>> #include <asm/kasan.h>
>> #include <asm/kernel-pgtable.h>
>> +#include <asm/kexec.h>
>> #include <asm/memory.h>
>> #include <asm/numa.h>
>> #include <asm/sections.h>
>> @@ -62,66 +63,11 @@ EXPORT_SYMBOL(memstart_addr);
>> phys_addr_t arm64_dma_phys_limit __ro_after_init;
>> phys_addr_t arm64_dma32_phys_limit __ro_after_init;
>>
>> -#ifdef CONFIG_KEXEC_CORE
>> -/*
>> - * reserve_crashkernel() - reserves memory for crash kernel
>> - *
>> - * This function reserves memory area given in "crashkernel=" kernel command
>> - * line parameter. The memory reserved is used by dump capture kernel when
>> - * primary kernel is crashing.
>> - */
>> -static void __init reserve_crashkernel(void)
>> -{
>> - unsigned long long crash_base, crash_size;
>> - int ret;
>> -
>> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>> - &crash_size, &crash_base);
>> - /* no crashkernel= or invalid value specified */
>> - if (ret || !crash_size)
>> - return;
>> -
>> - crash_size = PAGE_ALIGN(crash_size);
>> -
>> - if (crash_base == 0) {
>> - /* Current arm64 boot protocol requires 2MB alignment */
>> - crash_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX,
>> - crash_size, CRASH_ALIGN);
>> - if (crash_base == 0) {
>> - pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>> - crash_size);
>> - return;
>> - }
>> - } else {
>> - /* User specifies base address explicitly. */
>> - if (!memblock_is_region_memory(crash_base, crash_size)) {
>> - pr_warn("cannot reserve crashkernel: region is not memory\n");
>> - return;
>> - }
>> -
>> - if (memblock_is_region_reserved(crash_base, crash_size)) {
>> - pr_warn("cannot reserve crashkernel: region overlaps reserved memory\n");
>> - return;
>> - }
>> -
>> - if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
>> - pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
>> - return;
>> - }
>> - }
>> - memblock_reserve(crash_base, crash_size);
>> -
>> - pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
>> - crash_base, crash_base + crash_size, crash_size >> 20);
>> -
>> - crashk_res.start = crash_base;
>> - crashk_res.end = crash_base + crash_size - 1;
>> -}
>> -#else
>> +#ifndef CONFIG_KEXEC_CORE
>> static void __init reserve_crashkernel(void)
>> {
>> }
>> -#endif /* CONFIG_KEXEC_CORE */
>> +#endif
>>
>> #ifdef CONFIG_CRASH_DUMP
>> static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 1c0f3e02f731..c55cee290bbb 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
>> */
>> memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
>> #ifdef CONFIG_KEXEC_CORE
>> + if (crashk_low_res.end)
>> + memblock_mark_nomap(crashk_low_res.start,
>> + resource_size(&crashk_low_res));
>> +
>> if (crashk_res.end)
>> memblock_mark_nomap(crashk_res.start,
>> resource_size(&crashk_res));
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index d39892bdb9ae..cdef7d8c91a6 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
>>
>> int __init reserve_crashkernel_low(void)
>> {
>> -#ifdef CONFIG_X86_64
>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> Not very sure if a CONFIG_64BIT checking is better.
If doing like this, there may be some compiling errors for other 64-bit kernel, such as mips.
>
>> unsigned long long base, low_base = 0, low_size = 0;
>> unsigned long low_mem_limit;
>> int ret;
>> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
>>
>> crashk_low_res.start = low_base;
>> crashk_low_res.end = low_base + low_size - 1;
>> +#ifdef CONFIG_X86_64
>> insert_resource(&iomem_resource, &crashk_low_res);
>> +#endif
>> #endif
>> return 0;
>> }
>>
>> -#ifdef CONFIG_X86
>> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> Should we make this weak default so that we can remove the ARCH config?
The same as above, some arch may not support kdump, in that case, compiling errors occur.

Thanks,
Chen Zhou
>
>> #ifdef CONFIG_KEXEC_CORE
>> /*
>> * reserve_crashkernel() - reserves memory for crash kernel
>> @@ -453,7 +455,9 @@ void __init reserve_crashkernel(void)
>>
>> crashk_res.start = crash_base;
>> crashk_res.end = crash_base + crash_size - 1;
>> +#ifdef CONFIG_X86
>> insert_resource(&iomem_resource, &crashk_res);
>> +#endif
>> }
>> #endif /* CONFIG_KEXEC_CORE */
>> #endif
>> --
>> 2.20.1
>>
>>
>> _______________________________________________
>> kexec mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/kexec
>>
> .
>

2020-11-11 13:34:08

by chenzhou

[permalink] [raw]
Subject: Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

Hi Baoquan, Bhupesh,


On 2020/11/11 11:01, Baoquan He wrote:
> Hi Zhou, Bhupesh
>
> On 10/31/20 at 03:44pm, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
> I went through this patchset, mainly the x86 related and generic
> changes, the changes look great and no risk. And I know Bhupesh is
> following up this and helping review, thanks, both.
>
> So you have also tested crashkernel reservation on x86_64, with the
> normal reservation, and high/low reservation, it is working well,
> right? Asking this because I didn't see the test result description, and
> just note it.

Yeah, i also tested on x86_64 and work well. I did these basic tests before sending every
new version.
But Bhupesh may have some review comments(Bhupesh referred one month ago).

Thanks,
Chen Zhou

>
> Thanks
> Baoquan
>
>> To solve these issues, change the behavior of crashkernel=X.
>> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
>> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
>>
>> We can also use "crashkernel=X,high" to select a high region above
>> DMA zone, which also tries to allocate at least 256M low memory in
>> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>>
>> When reserving crashkernel in high memory, some low memory is reserved
>> for crash dump kernel devices. So there may be two regions reserved for
>> crash dump kernel.
>> In order to distinct from the high region and make no effect to the use
>> of existing kexec-tools, rename the low region as "Crash kernel (low)",
>> and pass the low region by reusing DT property
>> "linux,usable-memory-range". We made the low memory region as the last
>> range of "linux,usable-memory-range" to keep compatibility with existing
>> user-space and older kdump kernels.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: support more than one crash kernel regions(see [1])
>>
>> Another update is document about DT property 'linux,usable-memory-range':
>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>>
>> This patchset contains the following eight patches:
>> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
>> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
>> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
>> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
>> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
>> 0006-arm64-kdump-reimplement-crashkernel-X.patch
>> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
>> 0008-kdump-update-Documentation-about-crashkernel.patch
>>
>> 0001-0003 are some x86 cleanups which prepares for making
>> functionsreserve_crashkernel[_low]() generic.
>> 0004 makes functions reserve_crashkernel[_low]() generic.
>> 0005-0006 reimplements arm64 crashkernel=X.
>> 0007 adds memory for devices by DT property linux,usable-memory-range.
>> 0008 updates the doc.
>>
>> Changes since [v12]
>> - Rebased on top of 5.10-rc1.
>> - Keep CRASH_ALIGN as 16M suggested by Dave.
>> - Drop patch "kdump: add threshold for the required memory".
>> - Add Tested-by from John.
>>
>> Changes since [v11]
>> - Rebased on top of 5.9-rc4.
>> - Make the function reserve_crashkernel() of x86 generic.
>> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
>> and arm64 use the generic version to reimplement crashkernel=X.
>>
>> Changes since [v10]
>> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>>
>> Changes since [v9]
>> - Patch 1 add Acked-by from Dave.
>> - Update patch 5 according to Dave's comments.
>> - Update chosen schema.
>>
>> Changes since [v8]
>> - Reuse DT property "linux,usable-memory-range".
>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
>> memory region.
>> - Fix kdump broken with ZONE_DMA reintroduced.
>> - Update chosen schema.
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt.
>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
>> suggested by Arnd.
>> - Add Tested-by from Jhon and pk.
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
>> [2]: https://github.com/robherring/dt-schema/pull/19
>> [v1]: https://lkml.org/lkml/2019/4/2/1174
>> [v2]: https://lkml.org/lkml/2019/4/9/86
>> [v3]: https://lkml.org/lkml/2019/4/9/306
>> [v4]: https://lkml.org/lkml/2019/4/15/273
>> [v5]: https://lkml.org/lkml/2019/5/6/1360
>> [v6]: https://lkml.org/lkml/2019/8/30/142
>> [v7]: https://lkml.org/lkml/2019/12/23/411
>> [v8]: https://lkml.org/lkml/2020/5/21/213
>> [v9]: https://lkml.org/lkml/2020/6/28/73
>> [v10]: https://lkml.org/lkml/2020/7/2/1443
>> [v11]: https://lkml.org/lkml/2020/8/1/150
>> [v12]: https://lkml.org/lkml/2020/9/7/1037
>>
>> Chen Zhou (8):
>> x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
>> x86: kdump: make the lower bound of crash kernel reservation
>> consistent
>> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>> reserve_crashkernel()
>> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>> arm64: kdump: introduce some macroes for crash kernel reservation
>> arm64: kdump: reimplement crashkernel=X
>> arm64: kdump: add memory for devices by DT property
>> linux,usable-memory-range
>> kdump: update Documentation about crashkernel
>>
>> Documentation/admin-guide/kdump/kdump.rst | 23 ++-
>> .../admin-guide/kernel-parameters.txt | 12 +-
>> arch/arm64/include/asm/kexec.h | 15 ++
>> arch/arm64/include/asm/processor.h | 1 +
>> arch/arm64/kernel/setup.c | 13 +-
>> arch/arm64/mm/init.c | 105 ++++-------
>> arch/arm64/mm/mmu.c | 4 +
>> arch/x86/include/asm/kexec.h | 28 +++
>> arch/x86/kernel/setup.c | 153 +---------------
>> include/linux/crash_core.h | 4 +
>> include/linux/kexec.h | 2 -
>> kernel/crash_core.c | 168 ++++++++++++++++++
>> kernel/kexec_core.c | 17 --
>> 13 files changed, 301 insertions(+), 244 deletions(-)
>>
>> --
>> 2.20.1
>>
> .
>

2020-11-11 14:01:40

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

On 11/11/20 at 09:27pm, chenzhou wrote:
> Hi Baoquan,
...
> >> #ifdef CONFIG_CRASH_DUMP
> >> static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index 1c0f3e02f731..c55cee290bbb 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
> >> */
> >> memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> >> #ifdef CONFIG_KEXEC_CORE
> >> + if (crashk_low_res.end)
> >> + memblock_mark_nomap(crashk_low_res.start,
> >> + resource_size(&crashk_low_res));
> >> +
> >> if (crashk_res.end)
> >> memblock_mark_nomap(crashk_res.start,
> >> resource_size(&crashk_res));
> >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> >> index d39892bdb9ae..cdef7d8c91a6 100644
> >> --- a/kernel/crash_core.c
> >> +++ b/kernel/crash_core.c
> >> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
> >>
> >> int __init reserve_crashkernel_low(void)
> >> {
> >> -#ifdef CONFIG_X86_64
> >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> > Not very sure if a CONFIG_64BIT checking is better.
> If doing like this, there may be some compiling errors for other 64-bit kernel, such as mips.
> >
> >> unsigned long long base, low_base = 0, low_size = 0;
> >> unsigned long low_mem_limit;
> >> int ret;
> >> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
> >>
> >> crashk_low_res.start = low_base;
> >> crashk_low_res.end = low_base + low_size - 1;
> >> +#ifdef CONFIG_X86_64
> >> insert_resource(&iomem_resource, &crashk_low_res);
> >> +#endif
> >> #endif
> >> return 0;
> >> }
> >>
> >> -#ifdef CONFIG_X86
> >> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> > Should we make this weak default so that we can remove the ARCH config?
> The same as above, some arch may not support kdump, in that case, compiling errors occur.

OK, not sure if other people have better idea, oterwise, we can leave with it.
Thanks for telling.

2020-11-11 19:44:19

by Bhupesh Sharma

[permalink] [raw]
Subject: Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

Hi Chen,

On Wed, Nov 11, 2020 at 7:05 PM chenzhou <[email protected]> wrote:
>
> Hi Baoquan, Bhupesh,
>
>
> On 2020/11/11 11:01, Baoquan He wrote:
> > Hi Zhou, Bhupesh
> >
> > On 10/31/20 at 03:44pm, Chen Zhou wrote:
> >> There are following issues in arm64 kdump:
> >> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> >> will fail when there is no enough low memory.
> >> 2. If reserving crashkernel above 4G, in this case, crash dump
> >> kernel will boot failure because there is no low memory available
> >> for allocation.
> >> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> >> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> >> the devices in crash dump kernel need to use ZONE_DMA will alloc
> >> fail.
> > I went through this patchset, mainly the x86 related and generic
> > changes, the changes look great and no risk. And I know Bhupesh is
> > following up this and helping review, thanks, both.
> >
> > So you have also tested crashkernel reservation on x86_64, with the
> > normal reservation, and high/low reservation, it is working well,
> > right? Asking this because I didn't see the test result description, and
> > just note it.
>
> Yeah, i also tested on x86_64 and work well. I did these basic tests before sending every
> new version.
> But Bhupesh may have some review comments(Bhupesh referred one month ago).

Sorry for the late response. I was caught up in some other urgent
issues. I have just started reviewing
this series and will have more updates in a day or two. I am also
testing the same on x86_64 and arm64 machines and will share the test
observations soon as well.

Thanks for your patience.
Regards,
Bhupesh

> >> To solve these issues, change the behavior of crashkernel=X.
> >> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
> >> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
> >>
> >> We can also use "crashkernel=X,high" to select a high region above
> >> DMA zone, which also tries to allocate at least 256M low memory in
> >> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
> >> "crashkernel=Y,low" can be used to allocate specified size low memory.
> >>
> >> When reserving crashkernel in high memory, some low memory is reserved
> >> for crash dump kernel devices. So there may be two regions reserved for
> >> crash dump kernel.
> >> In order to distinct from the high region and make no effect to the use
> >> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> >> and pass the low region by reusing DT property
> >> "linux,usable-memory-range". We made the low memory region as the last
> >> range of "linux,usable-memory-range" to keep compatibility with existing
> >> user-space and older kdump kernels.
> >>
> >> Besides, we need to modify kexec-tools:
> >> arm64: support more than one crash kernel regions(see [1])
> >>
> >> Another update is document about DT property 'linux,usable-memory-range':
> >> schemas: update 'linux,usable-memory-range' node schema(see [2])
> >>
> >> This patchset contains the following eight patches:
> >> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
> >> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> >> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> >> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> >> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> >> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> >> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> >> 0008-kdump-update-Documentation-about-crashkernel.patch
> >>
> >> 0001-0003 are some x86 cleanups which prepares for making
> >> functionsreserve_crashkernel[_low]() generic.
> >> 0004 makes functions reserve_crashkernel[_low]() generic.
> >> 0005-0006 reimplements arm64 crashkernel=X.
> >> 0007 adds memory for devices by DT property linux,usable-memory-range.
> >> 0008 updates the doc.
> >>
> >> Changes since [v12]
> >> - Rebased on top of 5.10-rc1.
> >> - Keep CRASH_ALIGN as 16M suggested by Dave.
> >> - Drop patch "kdump: add threshold for the required memory".
> >> - Add Tested-by from John.
> >>
> >> Changes since [v11]
> >> - Rebased on top of 5.9-rc4.
> >> - Make the function reserve_crashkernel() of x86 generic.
> >> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> >> and arm64 use the generic version to reimplement crashkernel=X.
> >>
> >> Changes since [v10]
> >> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> >>
> >> Changes since [v9]
> >> - Patch 1 add Acked-by from Dave.
> >> - Update patch 5 according to Dave's comments.
> >> - Update chosen schema.
> >>
> >> Changes since [v8]
> >> - Reuse DT property "linux,usable-memory-range".
> >> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> >> memory region.
> >> - Fix kdump broken with ZONE_DMA reintroduced.
> >> - Update chosen schema.
> >>
> >> Changes since [v7]
> >> - Move x86 CRASH_ALIGN to 2M
> >> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> >> - Update Documentation/devicetree/bindings/chosen.txt.
> >> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> >> suggested by Arnd.
> >> - Add Tested-by from Jhon and pk.
> >>
> >> Changes since [v6]
> >> - Fix build errors reported by kbuild test robot.
> >>
> >> Changes since [v5]
> >> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> >> - Delete crashkernel=X,high.
> >> - Modify crashkernel=X,low.
> >> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> >> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> >> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> >> pass to crash dump kernel by DT property "linux,low-memory-range".
> >> - Update Documentation/admin-guide/kdump/kdump.rst.
> >>
> >> Changes since [v4]
> >> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> >>
> >> Changes since [v3]
> >> - Add memblock_cap_memory_ranges back for multiple ranges.
> >> - Fix some compiling warnings.
> >>
> >> Changes since [v2]
> >> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> >> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> >> patch.
> >>
> >> Changes since [v1]:
> >> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> >> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> >> in fdt_enforce_memory_region().
> >> There are at most two crash kernel regions, for two crash kernel regions
> >> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> >> and then remove the memory range in the middle.
> >>
> >> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> >> [2]: https://github.com/robherring/dt-schema/pull/19
> >> [v1]: https://lkml.org/lkml/2019/4/2/1174
> >> [v2]: https://lkml.org/lkml/2019/4/9/86
> >> [v3]: https://lkml.org/lkml/2019/4/9/306
> >> [v4]: https://lkml.org/lkml/2019/4/15/273
> >> [v5]: https://lkml.org/lkml/2019/5/6/1360
> >> [v6]: https://lkml.org/lkml/2019/8/30/142
> >> [v7]: https://lkml.org/lkml/2019/12/23/411
> >> [v8]: https://lkml.org/lkml/2020/5/21/213
> >> [v9]: https://lkml.org/lkml/2020/6/28/73
> >> [v10]: https://lkml.org/lkml/2020/7/2/1443
> >> [v11]: https://lkml.org/lkml/2020/8/1/150
> >> [v12]: https://lkml.org/lkml/2020/9/7/1037
> >>
> >> Chen Zhou (8):
> >> x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> >> x86: kdump: make the lower bound of crash kernel reservation
> >> consistent
> >> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> >> reserve_crashkernel()
> >> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> >> arm64: kdump: introduce some macroes for crash kernel reservation
> >> arm64: kdump: reimplement crashkernel=X
> >> arm64: kdump: add memory for devices by DT property
> >> linux,usable-memory-range
> >> kdump: update Documentation about crashkernel
> >>
> >> Documentation/admin-guide/kdump/kdump.rst | 23 ++-
> >> .../admin-guide/kernel-parameters.txt | 12 +-
> >> arch/arm64/include/asm/kexec.h | 15 ++
> >> arch/arm64/include/asm/processor.h | 1 +
> >> arch/arm64/kernel/setup.c | 13 +-
> >> arch/arm64/mm/init.c | 105 ++++-------
> >> arch/arm64/mm/mmu.c | 4 +
> >> arch/x86/include/asm/kexec.h | 28 +++
> >> arch/x86/kernel/setup.c | 153 +---------------
> >> include/linux/crash_core.h | 4 +
> >> include/linux/kexec.h | 2 -
> >> kernel/crash_core.c | 168 ++++++++++++++++++
> >> kernel/kexec_core.c | 17 --
> >> 13 files changed, 301 insertions(+), 244 deletions(-)
> >>
> >> --
> >> 2.20.1
> >>
> > .
> >
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2020-11-12 08:14:23

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v13 4/8] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

On Sat, Oct 31, 2020 at 03:44:33PM +0800, Chen Zhou wrote:
> Make the functions reserve_crashkernel[_low]() as generic.
> Arm64 will use these to reimplement crashkernel=X.
>
> Signed-off-by: Chen Zhou <[email protected]>
> Tested-by: John Donnelly <[email protected]>
> ---
> arch/x86/include/asm/kexec.h | 25 ++++++
> arch/x86/kernel/setup.c | 151 +-------------------------------
> include/linux/crash_core.h | 4 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 164 +++++++++++++++++++++++++++++++++++
> kernel/kexec_core.c | 17 ----
> 6 files changed, 195 insertions(+), 168 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 8cf9d3fd31c7..34afa7b645f9 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -21,6 +21,27 @@
> /* 2M alignment for crash kernel regions */
> #define CRASH_ALIGN SZ_16M
>
> +/*
> + * Keep the crash kernel below this limit.
> + *
> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> + * due to mapping restrictions.
> + *
> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
> + * jump could be from 5-level paging to 4-level paging, the jump will fail if
> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> + * no good way to detect the paging mode of the target kernel which will be
> + * loaded for dumping.
> + */
> +#ifdef CONFIG_X86_32
> +# define CRASH_ADDR_LOW_MAX SZ_512M
> +# define CRASH_ADDR_HIGH_MAX SZ_512M
> +#else
> +# define CRASH_ADDR_LOW_MAX SZ_4G
> +# define CRASH_ADDR_HIGH_MAX SZ_64T
> +#endif
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/string.h>
> @@ -200,6 +221,10 @@ typedef void crash_vmclear_fn(void);
> extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> extern void kdump_nmi_shootdown_cpus(void);
>
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 1289f079ad5f..00b3840d30f9 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -25,8 +25,6 @@
>
> #include <uapi/linux/mount.h>
>
> -#include <xen/xen.h>
> -
> #include <asm/apic.h>
> #include <asm/numa.h>
> #include <asm/bios_ebda.h>
> @@ -38,6 +36,7 @@
> #include <asm/io_apic.h>
> #include <asm/kasan.h>
> #include <asm/kaslr.h>
> +#include <asm/kexec.h>
> #include <asm/mce.h>
> #include <asm/mtrr.h>
> #include <asm/realmode.h>
> @@ -389,153 +388,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
> }
> }
>
> -/*
> - * --------- Crashkernel reservation ------------------------------
> - */
> -
> -#ifdef CONFIG_KEXEC_CORE
> -
> -/*
> - * Keep the crash kernel below this limit.
> - *
> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> - * due to mapping restrictions.
> - *
> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
> - * jump could be from 5-level paging to 4-level paging, the jump will fail if
> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> - * no good way to detect the paging mode of the target kernel which will be
> - * loaded for dumping.
> - */
> -#ifdef CONFIG_X86_32
> -# define CRASH_ADDR_LOW_MAX SZ_512M
> -# define CRASH_ADDR_HIGH_MAX SZ_512M
> -#else
> -# define CRASH_ADDR_LOW_MAX SZ_4G
> -# define CRASH_ADDR_HIGH_MAX SZ_64T
> -#endif
> -
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> - unsigned long long base, low_base = 0, low_size = 0;
> - unsigned long low_mem_limit;
> - int ret;
> -
> - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> -
> - /* crashkernel=Y,low */
> - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> - if (ret) {
> - /*
> - * two parts from kernel/dma/swiotlb.c:
> - * -swiotlb size: user-specified with swiotlb= or default.
> - *
> - * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> - * to 8M for other buffers that may need to stay low too. Also
> - * make sure we allocate enough extra low memory so that we
> - * don't run out of DMA buffers for 32-bit devices.
> - */
> - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> - } else {
> - /* passed with crashkernel=0,low ? */
> - if (!low_size)
> - return 0;
> - }
> -
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX);
> - if (!low_base) {
> - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> - (unsigned long)(low_size >> 20));
> - return -ENOMEM;
> - }
> -
> - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> - (unsigned long)(low_size >> 20),
> - (unsigned long)(low_base >> 20),
> - (unsigned long)(low_mem_limit >> 20));
> -
> - crashk_low_res.start = low_base;
> - crashk_low_res.end = low_base + low_size - 1;
> - insert_resource(&iomem_resource, &crashk_low_res);
> -#endif
> - return 0;
> -}
> -
> -static void __init reserve_crashkernel(void)
> -{
> - unsigned long long crash_size, crash_base, total_mem;
> - bool high = false;
> - int ret;
> -
> - total_mem = memblock_phys_mem_size();
> -
> - /* crashkernel=XM */
> - ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> - if (ret != 0 || crash_size <= 0) {
> - /* crashkernel=X,high */
> - ret = parse_crashkernel_high(boot_command_line, total_mem,
> - &crash_size, &crash_base);
> - if (ret != 0 || crash_size <= 0)
> - return;
> - high = true;
> - }
> -
> - if (xen_pv_domain()) {
> - pr_info("Ignoring crashkernel for a Xen PV domain\n");
> - return;
> - }

This is relevant only to x86, maybe we could move this check to
setup_arch before calling reserve_crashkernel() to keep it in x86?

> -
> - /* 0 means: find the address automatically */
> - if (!crash_base) {
> - /*
> - * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> - * also allocates 256M extra low memory for DMA buffers
> - * and swiotlb.
> - * But the extra memory is not required for all machines.
> - * So try low memory first and fall back to high memory
> - * unless "crashkernel=size[KMG],high" is specified.
> - */
> - if (!high)
> - crash_base = memblock_phys_alloc_range(crash_size,
> - CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_LOW_MAX);
> - if (!crash_base)
> - crash_base = memblock_phys_alloc_range(crash_size,
> - CRASH_ALIGN, CRASH_ALIGN,
> - CRASH_ADDR_HIGH_MAX);
> - if (!crash_base) {
> - pr_info("crashkernel reservation failed - No suitable area found.\n");
> - return;
> - }
> - } else {
> - unsigned long long start;
> -
> - start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base,
> - crash_base + crash_size);
> - if (start != crash_base) {
> - pr_info("crashkernel reservation failed - memory is in use.\n");
> - return;
> - }
> - }
> -
> - if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> - memblock_free(crash_base, crash_size);
> - return;
> - }
> -
> - pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> - (unsigned long)(crash_size >> 20),
> - (unsigned long)(crash_base >> 20),
> - (unsigned long)(total_mem >> 20));
> -
> - crashk_res.start = crash_base;
> - crashk_res.end = crash_base + crash_size - 1;
> - insert_resource(&iomem_resource, &crashk_res);
> -}
> -#else
> +#ifndef CONFIG_KEXEC_CORE
> static void __init reserve_crashkernel(void)
> {
> }
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 206bde8308b2..5021d7c70aee 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -69,6 +69,9 @@ extern unsigned char *vmcoreinfo_data;
> extern size_t vmcoreinfo_size;
> extern u32 *vmcoreinfo_note;
>
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
> +
> /* raw contents of kernel .notes section */
> extern const void __start_notes __weak;
> extern const void __stop_notes __weak;
> @@ -83,5 +86,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
> unsigned long long *crash_size, unsigned long long *crash_base);
> int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
> unsigned long long *crash_size, unsigned long long *crash_base);
> +int __init reserve_crashkernel_low(void);
>
> #endif /* LINUX_CRASH_CORE_H */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 9e93bef52968..f301f2f5cfc4 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -337,8 +337,6 @@ extern int kexec_load_disabled;
>
> /* Location of a reserved region to hold the crash kernel.
> */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
> extern note_buf_t __percpu *crash_notes;
>
> /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..d39892bdb9ae 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -7,7 +7,12 @@
> #include <linux/crash_core.h>
> #include <linux/utsname.h>
> #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
>
> +#include <xen/xen.h>
> +
> +#include <asm/kexec.h>
> #include <asm/page.h>
> #include <asm/sections.h>
>
> @@ -21,6 +26,22 @@ u32 *vmcoreinfo_note;
> /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
> static unsigned char *vmcoreinfo_data_safecopy;
>
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> + .name = "Crash kernel",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> + .name = "Crash kernel",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> + .desc = IORES_DESC_CRASH_KERNEL
> +};
> +
> /*
> * parsing the "crashkernel" commandline
> *
> @@ -294,6 +315,149 @@ int __init parse_crashkernel_low(char *cmdline,
> "crashkernel=", suffix_tbl[SUFFIX_LOW]);
> }
>
> +/*
> + * --------- Crashkernel reservation ------------------------------
> + */
> +
> +int __init reserve_crashkernel_low(void)

static?

> +{
> +#ifdef CONFIG_X86_64
> + unsigned long long base, low_base = 0, low_size = 0;
> + unsigned long low_mem_limit;
> + int ret;
> +
> + low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> +
> + /* crashkernel=Y,low */
> + ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> + if (ret) {
> + /*
> + * two parts from kernel/dma/swiotlb.c:
> + * -swiotlb size: user-specified with swiotlb= or default.
> + *
> + * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> + * to 8M for other buffers that may need to stay low too. Also
> + * make sure we allocate enough extra low memory so that we
> + * don't run out of DMA buffers for 32-bit devices.
> + */
> + low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> + } else {
> + /* passed with crashkernel=0,low ? */
> + if (!low_size)
> + return 0;
> + }
> +
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> + if (!low_base) {
> + pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> + (unsigned long)(low_size >> 20));
> + return -ENOMEM;
> + }
> +
> + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> + (unsigned long)(low_size >> 20),
> + (unsigned long)(low_base >> 20),
> + (unsigned long)(low_mem_limit >> 20));
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end = low_base + low_size - 1;
> + insert_resource(&iomem_resource, &crashk_low_res);
> +#endif
> + return 0;
> +}
> +
> +#ifdef CONFIG_X86
> +#ifdef CONFIG_KEXEC_CORE
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)
> +{
> + unsigned long long crash_size, crash_base, total_mem;
> + bool high = false;
> + int ret;
> +
> + total_mem = memblock_phys_mem_size();
> +
> + /* crashkernel=XM */
> + ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> + if (ret != 0 || crash_size <= 0) {
> + /* crashkernel=X,high */
> + ret = parse_crashkernel_high(boot_command_line, total_mem,
> + &crash_size, &crash_base);
> + if (ret != 0 || crash_size <= 0)
> + return;
> + high = true;
> + }
> +
> + if (xen_pv_domain()) {
> + pr_info("Ignoring crashkernel for a Xen PV domain\n");
> + return;
> + }
> +
> + /* 0 means: find the address automatically */
> + if (!crash_base) {
> + /*
> + * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> + * But the extra memory is not required for all machines.
> + * So try low memory first and fall back to high memory
> + * unless "crashkernel=size[KMG],high" is specified.
> + */
> + if (!high)
> + crash_base = memblock_phys_alloc_range(crash_size,
> + CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
> + if (!crash_base)
> + crash_base = memblock_phys_alloc_range(crash_size,
> + CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX);
> + if (!crash_base) {
> + pr_info("crashkernel reservation failed - No suitable area found.\n");
> + return;
> + }
> + } else {
> + /* User specifies base address explicitly. */
> + unsigned long long start;
> +
> + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
> + pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
> + (unsigned long)CRASH_ALIGN >> 20);
> + return;
> + }
> +
> + start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base,
> + crash_base + crash_size);
> + if (start != crash_base) {
> + pr_info("crashkernel reservation failed - memory is in use.\n");
> + return;
> + }
> + }
> +
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> + memblock_free(crash_base, crash_size);
> + return;
> + }
> +
> + pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> + (unsigned long)(crash_size >> 20),
> + (unsigned long)(crash_base >> 20),
> + (unsigned long)(total_mem >> 20));
> +
> + crashk_res.start = crash_base;
> + crashk_res.end = crash_base + crash_size - 1;
> + insert_resource(&iomem_resource, &crashk_res);
> +}
> +#endif /* CONFIG_KEXEC_CORE */
> +#endif
> +
> Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len)
> {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 8798a8183974..2ca887514145 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
> /* Flag to indicate we are going to kexec a new kernel */
> bool kexec_in_progress = false;
>
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> - .name = "Crash kernel",
> - .start = 0,
> - .end = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> - .name = "Crash kernel",
> - .start = 0,
> - .end = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc = IORES_DESC_CRASH_KERNEL
> -};
> -
> int kexec_should_crash(struct task_struct *p)
> {
> /*
> --
> 2.20.1
>

--
Sincerely yours,
Mike.

2020-11-12 08:27:13

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

On Wed, Nov 11, 2020 at 09:54:48PM +0800, Baoquan He wrote:
> On 11/11/20 at 09:27pm, chenzhou wrote:
> > Hi Baoquan,
> ...
> > >> #ifdef CONFIG_CRASH_DUMP
> > >> static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > >> index 1c0f3e02f731..c55cee290bbb 100644
> > >> --- a/arch/arm64/mm/mmu.c
> > >> +++ b/arch/arm64/mm/mmu.c
> > >> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
> > >> */
> > >> memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> > >> #ifdef CONFIG_KEXEC_CORE
> > >> + if (crashk_low_res.end)
> > >> + memblock_mark_nomap(crashk_low_res.start,
> > >> + resource_size(&crashk_low_res));
> > >> +
> > >> if (crashk_res.end)
> > >> memblock_mark_nomap(crashk_res.start,
> > >> resource_size(&crashk_res));
> > >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > >> index d39892bdb9ae..cdef7d8c91a6 100644
> > >> --- a/kernel/crash_core.c
> > >> +++ b/kernel/crash_core.c
> > >> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
> > >>
> > >> int __init reserve_crashkernel_low(void)
> > >> {
> > >> -#ifdef CONFIG_X86_64
> > >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> > > Not very sure if a CONFIG_64BIT checking is better.
> > If doing like this, there may be some compiling errors for other 64-bit kernel, such as mips.
> > >
> > >> unsigned long long base, low_base = 0, low_size = 0;
> > >> unsigned long low_mem_limit;
> > >> int ret;
> > >> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
> > >>
> > >> crashk_low_res.start = low_base;
> > >> crashk_low_res.end = low_base + low_size - 1;
> > >> +#ifdef CONFIG_X86_64
> > >> insert_resource(&iomem_resource, &crashk_low_res);
> > >> +#endif
> > >> #endif
> > >> return 0;
> > >> }
> > >>
> > >> -#ifdef CONFIG_X86
> > >> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> > > Should we make this weak default so that we can remove the ARCH config?
> > The same as above, some arch may not support kdump, in that case, compiling errors occur.
>
> OK, not sure if other people have better idea, oterwise, we can leave with it.
> Thanks for telling.

I think it would be better to have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
in arch/Kconfig and select this by X86 and ARM64.

Since reserve_crashkernel() implementations are quite similart on other
architectures as well, we can have more users of this later.

--
Sincerely yours,
Mike.

2020-11-12 08:38:59

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

On 11/12/20 at 10:25am, Mike Rapoport wrote:
> On Wed, Nov 11, 2020 at 09:54:48PM +0800, Baoquan He wrote:
> > On 11/11/20 at 09:27pm, chenzhou wrote:
> > > Hi Baoquan,
> > ...
> > > >> #ifdef CONFIG_CRASH_DUMP
> > > >> static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
> > > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > >> index 1c0f3e02f731..c55cee290bbb 100644
> > > >> --- a/arch/arm64/mm/mmu.c
> > > >> +++ b/arch/arm64/mm/mmu.c
> > > >> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
> > > >> */
> > > >> memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> > > >> #ifdef CONFIG_KEXEC_CORE
> > > >> + if (crashk_low_res.end)
> > > >> + memblock_mark_nomap(crashk_low_res.start,
> > > >> + resource_size(&crashk_low_res));
> > > >> +
> > > >> if (crashk_res.end)
> > > >> memblock_mark_nomap(crashk_res.start,
> > > >> resource_size(&crashk_res));
> > > >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > >> index d39892bdb9ae..cdef7d8c91a6 100644
> > > >> --- a/kernel/crash_core.c
> > > >> +++ b/kernel/crash_core.c
> > > >> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
> > > >>
> > > >> int __init reserve_crashkernel_low(void)
> > > >> {
> > > >> -#ifdef CONFIG_X86_64
> > > >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
> > > > Not very sure if a CONFIG_64BIT checking is better.
> > > If doing like this, there may be some compiling errors for other 64-bit kernel, such as mips.
> > > >
> > > >> unsigned long long base, low_base = 0, low_size = 0;
> > > >> unsigned long low_mem_limit;
> > > >> int ret;
> > > >> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
> > > >>
> > > >> crashk_low_res.start = low_base;
> > > >> crashk_low_res.end = low_base + low_size - 1;
> > > >> +#ifdef CONFIG_X86_64
> > > >> insert_resource(&iomem_resource, &crashk_low_res);
> > > >> +#endif
> > > >> #endif
> > > >> return 0;
> > > >> }
> > > >>
> > > >> -#ifdef CONFIG_X86
> > > >> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> > > > Should we make this weak default so that we can remove the ARCH config?
> > > The same as above, some arch may not support kdump, in that case, compiling errors occur.
> >
> > OK, not sure if other people have better idea, oterwise, we can leave with it.
> > Thanks for telling.
>
> I think it would be better to have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
> in arch/Kconfig and select this by X86 and ARM64.
>
> Since reserve_crashkernel() implementations are quite similart on other
> architectures as well, we can have more users of this later.

Yes, this sounds like a nice way.

2020-11-12 13:04:49

by chenzhou

[permalink] [raw]
Subject: Re: [PATCH v13 4/8] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c



On 2020/11/12 16:11, Mike Rapoport wrote:
> On Sat, Oct 31, 2020 at 03:44:33PM +0800, Chen Zhou wrote:
>> Make the functions reserve_crashkernel[_low]() as generic.
>> Arm64 will use these to reimplement crashkernel=X.
>>
>> Signed-off-by: Chen Zhou <[email protected]>
>> Tested-by: John Donnelly <[email protected]>
>> ---
>> arch/x86/include/asm/kexec.h | 25 ++++++
>> arch/x86/kernel/setup.c | 151 +-------------------------------
>> include/linux/crash_core.h | 4 +
>> include/linux/kexec.h | 2 -
>> kernel/crash_core.c | 164 +++++++++++++++++++++++++++++++++++
>> kernel/kexec_core.c | 17 ----
>> 6 files changed, 195 insertions(+), 168 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 8cf9d3fd31c7..34afa7b645f9 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -21,6 +21,27 @@
>> /* 2M alignment for crash kernel regions */
>> #define CRASH_ALIGN SZ_16M
>>
>> +/*
>> + * Keep the crash kernel below this limit.
>> + *
>> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
>> + * due to mapping restrictions.
>> + *
>> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
>> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
>> + * jump could be from 5-level paging to 4-level paging, the jump will fail if
>> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
>> + * no good way to detect the paging mode of the target kernel which will be
>> + * loaded for dumping.
>> + */
>> +#ifdef CONFIG_X86_32
>> +# define CRASH_ADDR_LOW_MAX SZ_512M
>> +# define CRASH_ADDR_HIGH_MAX SZ_512M
>> +#else
>> +# define CRASH_ADDR_LOW_MAX SZ_4G
>> +# define CRASH_ADDR_HIGH_MAX SZ_64T
>> +#endif
>> +
>> #ifndef __ASSEMBLY__
>>
>> #include <linux/string.h>
>> @@ -200,6 +221,10 @@ typedef void crash_vmclear_fn(void);
>> extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>> extern void kdump_nmi_shootdown_cpus(void);
>>
>> +#ifdef CONFIG_KEXEC_CORE
>> +extern void __init reserve_crashkernel(void);
>> +#endif
>> +
>> #endif /* __ASSEMBLY__ */
>>
>> #endif /* _ASM_X86_KEXEC_H */
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 1289f079ad5f..00b3840d30f9 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -25,8 +25,6 @@
>>
>> #include <uapi/linux/mount.h>
>>
>> -#include <xen/xen.h>
>> -
>> #include <asm/apic.h>
>> #include <asm/numa.h>
>> #include <asm/bios_ebda.h>
>> @@ -38,6 +36,7 @@
>> #include <asm/io_apic.h>
>> #include <asm/kasan.h>
>> #include <asm/kaslr.h>
>> +#include <asm/kexec.h>
>> #include <asm/mce.h>
>> #include <asm/mtrr.h>
>> #include <asm/realmode.h>
>> @@ -389,153 +388,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>> }
>> }
>>
>> -/*
>> - * --------- Crashkernel reservation ------------------------------
>> - */
>> -
>> -#ifdef CONFIG_KEXEC_CORE
>> -
>> -/*
>> - * Keep the crash kernel below this limit.
>> - *
>> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
>> - * due to mapping restrictions.
>> - *
>> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
>> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
>> - * jump could be from 5-level paging to 4-level paging, the jump will fail if
>> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
>> - * no good way to detect the paging mode of the target kernel which will be
>> - * loaded for dumping.
>> - */
>> -#ifdef CONFIG_X86_32
>> -# define CRASH_ADDR_LOW_MAX SZ_512M
>> -# define CRASH_ADDR_HIGH_MAX SZ_512M
>> -#else
>> -# define CRASH_ADDR_LOW_MAX SZ_4G
>> -# define CRASH_ADDR_HIGH_MAX SZ_64T
>> -#endif
>> -
>> -static int __init reserve_crashkernel_low(void)
>> -{
>> -#ifdef CONFIG_X86_64
>> - unsigned long long base, low_base = 0, low_size = 0;
>> - unsigned long low_mem_limit;
>> - int ret;
>> -
>> - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
>> -
>> - /* crashkernel=Y,low */
>> - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
>> - if (ret) {
>> - /*
>> - * two parts from kernel/dma/swiotlb.c:
>> - * -swiotlb size: user-specified with swiotlb= or default.
>> - *
>> - * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>> - * to 8M for other buffers that may need to stay low too. Also
>> - * make sure we allocate enough extra low memory so that we
>> - * don't run out of DMA buffers for 32-bit devices.
>> - */
>> - low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>> - } else {
>> - /* passed with crashkernel=0,low ? */
>> - if (!low_size)
>> - return 0;
>> - }
>> -
>> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX);
>> - if (!low_base) {
>> - pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> - (unsigned long)(low_size >> 20));
>> - return -ENOMEM;
>> - }
>> -
>> - pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
>> - (unsigned long)(low_size >> 20),
>> - (unsigned long)(low_base >> 20),
>> - (unsigned long)(low_mem_limit >> 20));
>> -
>> - crashk_low_res.start = low_base;
>> - crashk_low_res.end = low_base + low_size - 1;
>> - insert_resource(&iomem_resource, &crashk_low_res);
>> -#endif
>> - return 0;
>> -}
>> -
>> -static void __init reserve_crashkernel(void)
>> -{
>> - unsigned long long crash_size, crash_base, total_mem;
>> - bool high = false;
>> - int ret;
>> -
>> - total_mem = memblock_phys_mem_size();
>> -
>> - /* crashkernel=XM */
>> - ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
>> - if (ret != 0 || crash_size <= 0) {
>> - /* crashkernel=X,high */
>> - ret = parse_crashkernel_high(boot_command_line, total_mem,
>> - &crash_size, &crash_base);
>> - if (ret != 0 || crash_size <= 0)
>> - return;
>> - high = true;
>> - }
>> -
>> - if (xen_pv_domain()) {
>> - pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> - return;
>> - }
> This is relevant only to x86, maybe we could move this check to
> setup_arch before calling reserve_crashkernel() to keep it in x86?
Yes, we could move this check to setup_arch.
>
>> -
>> - /* 0 means: find the address automatically */
>> - if (!crash_base) {
>> - /*
>> - * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>> - * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>> - * also allocates 256M extra low memory for DMA buffers
>> - * and swiotlb.
>> - * But the extra memory is not required for all machines.
>> - * So try low memory first and fall back to high memory
>> - * unless "crashkernel=size[KMG],high" is specified.
>> - */
>> - if (!high)
>> - crash_base = memblock_phys_alloc_range(crash_size,
>> - CRASH_ALIGN, CRASH_ALIGN,
>> - CRASH_ADDR_LOW_MAX);
>> - if (!crash_base)
>> - crash_base = memblock_phys_alloc_range(crash_size,
>> - CRASH_ALIGN, CRASH_ALIGN,
>> - CRASH_ADDR_HIGH_MAX);
>> - if (!crash_base) {
>> - pr_info("crashkernel reservation failed - No suitable area found.\n");
>> - return;
>> - }
>> - } else {
>> - unsigned long long start;
>> -
>> - start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base,
>> - crash_base + crash_size);
>> - if (start != crash_base) {
>> - pr_info("crashkernel reservation failed - memory is in use.\n");
>> - return;
>> - }
>> - }
>> -
>> - if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>> - memblock_free(crash_base, crash_size);
>> - return;
>> - }
>> -
>> - pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
>> - (unsigned long)(crash_size >> 20),
>> - (unsigned long)(crash_base >> 20),
>> - (unsigned long)(total_mem >> 20));
>> -
>> - crashk_res.start = crash_base;
>> - crashk_res.end = crash_base + crash_size - 1;
>> - insert_resource(&iomem_resource, &crashk_res);
>> -}
>> -#else
>> +#ifndef CONFIG_KEXEC_CORE
>> static void __init reserve_crashkernel(void)
>> {
>> }
>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>> index 206bde8308b2..5021d7c70aee 100644
>> --- a/include/linux/crash_core.h
>> +++ b/include/linux/crash_core.h
>> @@ -69,6 +69,9 @@ extern unsigned char *vmcoreinfo_data;
>> extern size_t vmcoreinfo_size;
>> extern u32 *vmcoreinfo_note;
>>
>> +extern struct resource crashk_res;
>> +extern struct resource crashk_low_res;
>> +
>> /* raw contents of kernel .notes section */
>> extern const void __start_notes __weak;
>> extern const void __stop_notes __weak;
>> @@ -83,5 +86,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>> unsigned long long *crash_size, unsigned long long *crash_base);
>> int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>> unsigned long long *crash_size, unsigned long long *crash_base);
>> +int __init reserve_crashkernel_low(void);
>>
>> #endif /* LINUX_CRASH_CORE_H */
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 9e93bef52968..f301f2f5cfc4 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -337,8 +337,6 @@ extern int kexec_load_disabled;
>>
>> /* Location of a reserved region to hold the crash kernel.
>> */
>> -extern struct resource crashk_res;
>> -extern struct resource crashk_low_res;
>> extern note_buf_t __percpu *crash_notes;
>>
>> /* flag to track if kexec reboot is in progress */
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 106e4500fd53..d39892bdb9ae 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -7,7 +7,12 @@
>> #include <linux/crash_core.h>
>> #include <linux/utsname.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/memblock.h>
>> +#include <linux/swiotlb.h>
>>
>> +#include <xen/xen.h>
>> +
>> +#include <asm/kexec.h>
>> #include <asm/page.h>
>> #include <asm/sections.h>
>>
>> @@ -21,6 +26,22 @@ u32 *vmcoreinfo_note;
>> /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>> static unsigned char *vmcoreinfo_data_safecopy;
>>
>> +/* Location of the reserved area for the crash kernel */
>> +struct resource crashk_res = {
>> + .name = "Crash kernel",
>> + .start = 0,
>> + .end = 0,
>> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> + .desc = IORES_DESC_CRASH_KERNEL
>> +};
>> +struct resource crashk_low_res = {
>> + .name = "Crash kernel",
>> + .start = 0,
>> + .end = 0,
>> + .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> + .desc = IORES_DESC_CRASH_KERNEL
>> +};
>> +
>> /*
>> * parsing the "crashkernel" commandline
>> *
>> @@ -294,6 +315,149 @@ int __init parse_crashkernel_low(char *cmdline,
>> "crashkernel=", suffix_tbl[SUFFIX_LOW]);
>> }
>>
>> +/*
>> + * --------- Crashkernel reservation ------------------------------
>> + */
>> +
>> +int __init reserve_crashkernel_low(void)
> static?
Ok, i will update in next version.
>
>> +{
>> +#ifdef CONFIG_X86_64
>> + unsigned long long base, low_base = 0, low_size = 0;
>> + unsigned long low_mem_limit;
>> + int ret;
>> +
>> + low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
>> +
>> + /* crashkernel=Y,low */
>> + ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
>> + if (ret) {
>> + /*
>> + * two parts from kernel/dma/swiotlb.c:
>> + * -swiotlb size: user-specified with swiotlb= or default.
>> + *
>> + * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>> + * to 8M for other buffers that may need to stay low too. Also
>> + * make sure we allocate enough extra low memory so that we
>> + * don't run out of DMA buffers for 32-bit devices.
>> + */
>> + low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>> + } else {
>> + /* passed with crashkernel=0,low ? */
>> + if (!low_size)
>> + return 0;
>> + }
>> +
>> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
>> + CRASH_ADDR_LOW_MAX);
>> + if (!low_base) {
>> + pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> + (unsigned long)(low_size >> 20));
>> + return -ENOMEM;
>> + }
>> +
>> + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
>> + (unsigned long)(low_size >> 20),
>> + (unsigned long)(low_base >> 20),
>> + (unsigned long)(low_mem_limit >> 20));
>> +
>> + crashk_low_res.start = low_base;
>> + crashk_low_res.end = low_base + low_size - 1;
>> + insert_resource(&iomem_resource, &crashk_low_res);
>> +#endif
>> + return 0;
>> +}
>> +
>> +#ifdef CONFIG_X86
>> +#ifdef CONFIG_KEXEC_CORE
>> +/*
>> + * reserve_crashkernel() - reserves memory for crash kernel
>> + *
>> + * This function reserves memory area given in "crashkernel=" kernel command
>> + * line parameter. The memory reserved is used by dump capture kernel when
>> + * primary kernel is crashing.
>> + */
>> +void __init reserve_crashkernel(void)
>> +{
>> + unsigned long long crash_size, crash_base, total_mem;
>> + bool high = false;
>> + int ret;
>> +
>> + total_mem = memblock_phys_mem_size();
>> +
>> + /* crashkernel=XM */
>> + ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
>> + if (ret != 0 || crash_size <= 0) {
>> + /* crashkernel=X,high */
>> + ret = parse_crashkernel_high(boot_command_line, total_mem,
>> + &crash_size, &crash_base);
>> + if (ret != 0 || crash_size <= 0)
>> + return;
>> + high = true;
>> + }
>> +
>> + if (xen_pv_domain()) {
>> + pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> + return;
>> + }
>> +
>> + /* 0 means: find the address automatically */
>> + if (!crash_base) {
>> + /*
>> + * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>> + * also allocates 256M extra low memory for DMA buffers
>> + * and swiotlb.
>> + * But the extra memory is not required for all machines.
>> + * So try low memory first and fall back to high memory
>> + * unless "crashkernel=size[KMG],high" is specified.
>> + */
>> + if (!high)
>> + crash_base = memblock_phys_alloc_range(crash_size,
>> + CRASH_ALIGN, CRASH_ALIGN,
>> + CRASH_ADDR_LOW_MAX);
>> + if (!crash_base)
>> + crash_base = memblock_phys_alloc_range(crash_size,
>> + CRASH_ALIGN, CRASH_ALIGN,
>> + CRASH_ADDR_HIGH_MAX);
>> + if (!crash_base) {
>> + pr_info("crashkernel reservation failed - No suitable area found.\n");
>> + return;
>> + }
>> + } else {
>> + /* User specifies base address explicitly. */
>> + unsigned long long start;
>> +
>> + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
>> + pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
>> + (unsigned long)CRASH_ALIGN >> 20);
>> + return;
>> + }
>> +
>> + start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base,
>> + crash_base + crash_size);
>> + if (start != crash_base) {
>> + pr_info("crashkernel reservation failed - memory is in use.\n");
>> + return;
>> + }
>> + }
>> +
>> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>> + memblock_free(crash_base, crash_size);
>> + return;
>> + }
>> +
>> + pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
>> + (unsigned long)(crash_size >> 20),
>> + (unsigned long)(crash_base >> 20),
>> + (unsigned long)(total_mem >> 20));
>> +
>> + crashk_res.start = crash_base;
>> + crashk_res.end = crash_base + crash_size - 1;
>> + insert_resource(&iomem_resource, &crashk_res);
>> +}
>> +#endif /* CONFIG_KEXEC_CORE */
>> +#endif
>> +
>> Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>> void *data, size_t data_len)
>> {
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 8798a8183974..2ca887514145 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>> /* Flag to indicate we are going to kexec a new kernel */
>> bool kexec_in_progress = false;
>>
>> -
>> -/* Location of the reserved area for the crash kernel */
>> -struct resource crashk_res = {
>> - .name = "Crash kernel",
>> - .start = 0,
>> - .end = 0,
>> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> - .desc = IORES_DESC_CRASH_KERNEL
>> -};
>> -struct resource crashk_low_res = {
>> - .name = "Crash kernel",
>> - .start = 0,
>> - .end = 0,
>> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>> - .desc = IORES_DESC_CRASH_KERNEL
>> -};
>> -
>> int kexec_should_crash(struct task_struct *p)
>> {
>> /*
>> --
>> 2.20.1
>>

2020-11-12 13:13:26

by chenzhou

[permalink] [raw]
Subject: Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X



On 2020/11/12 16:36, Baoquan He wrote:
> On 11/12/20 at 10:25am, Mike Rapoport wrote:
>> On Wed, Nov 11, 2020 at 09:54:48PM +0800, Baoquan He wrote:
>>> On 11/11/20 at 09:27pm, chenzhou wrote:
>>>> Hi Baoquan,
>>> ...
>>>>>> #ifdef CONFIG_CRASH_DUMP
>>>>>> static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
>>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>>> index 1c0f3e02f731..c55cee290bbb 100644
>>>>>> --- a/arch/arm64/mm/mmu.c
>>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>>> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
>>>>>> */
>>>>>> memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
>>>>>> #ifdef CONFIG_KEXEC_CORE
>>>>>> + if (crashk_low_res.end)
>>>>>> + memblock_mark_nomap(crashk_low_res.start,
>>>>>> + resource_size(&crashk_low_res));
>>>>>> +
>>>>>> if (crashk_res.end)
>>>>>> memblock_mark_nomap(crashk_res.start,
>>>>>> resource_size(&crashk_res));
>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>> index d39892bdb9ae..cdef7d8c91a6 100644
>>>>>> --- a/kernel/crash_core.c
>>>>>> +++ b/kernel/crash_core.c
>>>>>> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
>>>>>>
>>>>>> int __init reserve_crashkernel_low(void)
>>>>>> {
>>>>>> -#ifdef CONFIG_X86_64
>>>>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>>>>> Not very sure if a CONFIG_64BIT checking is better.
>>>> If doing like this, there may be some compiling errors for other 64-bit kernel, such as mips.
>>>>>> unsigned long long base, low_base = 0, low_size = 0;
>>>>>> unsigned long low_mem_limit;
>>>>>> int ret;
>>>>>> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
>>>>>>
>>>>>> crashk_low_res.start = low_base;
>>>>>> crashk_low_res.end = low_base + low_size - 1;
>>>>>> +#ifdef CONFIG_X86_64
>>>>>> insert_resource(&iomem_resource, &crashk_low_res);
>>>>>> +#endif
>>>>>> #endif
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> -#ifdef CONFIG_X86
>>>>>> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
>>>>> Should we make this weak default so that we can remove the ARCH config?
>>>> The same as above, some arch may not support kdump, in that case, compiling errors occur.
>>> OK, not sure if other people have better idea, oterwise, we can leave with it.
>>> Thanks for telling.
>> I think it would be better to have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
>> in arch/Kconfig and select this by X86 and ARM64.
>>
>> Since reserve_crashkernel() implementations are quite similart on other
>> architectures as well, we can have more users of this later.
> Yes, this sounds like a nice way.
I will think about this in next version.

Thanks,
Chen Zhou
>
> .
>