2021-12-22 13:12:25

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 00/17] support reserving crashkernel above 4G on arm64 kdump

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following 17 patches:

0001-0003 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
0004-0006 Add helper parse_crashkernel_in_order() and refact reserve_crashkernel{_low}
0007 cleanup
0008-0009 makes functions reserve_crashkernel[_low]() generic.
0010-0012 do some cleanups for parse_crashkernel_{high|low} and __parse_crashkernel()
0013-0014 reimplements arm64 crashkernel=X.
0015-0016 adds memory for devices by DT property linux,usable-memory-range.
0017 updates the doc.

Changes since [v17]
The patches 0013-0016 have no change, 0017 see below, all other 0001-0012 patches
are rewritten or new. The main change is patches 0004-0005, which refact
reserve_crashkernel{_low}. Patch 0009 reduced some unnecessary changes compared
with 0005 in v17, two other differences are broken down into 0001 and 0008.

New Changes in 0017:
- It will be ignored if crashkernel=X is specified.
+ It will be ignored if crashkernel=X is correctly specified.

Changes since [v16]
- Because no functional changes in this version, so add
"Tested-by: Dave Kleikamp <[email protected]>" for patch 1-9
- Add "Reviewed-by: Rob Herring <[email protected]>" for patch 8
- Update patch 9 based on the review comments of Rob Herring
- As Catalin Marinas's suggestion, merge the implementation of
ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
contents of X86 and ARM64 do not overlap, and reduce unnecessary
temporary differences.

Changes since [v15]
- Aggregate the processing of "linux,usable-memory-range" into one function.
Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
has removed the architecture-specific code, extend the property "linux,usable-memory-range"
in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037
[v13]: https://lkml.org/lkml/2020/10/31/34
[v14]: https://lkml.org/lkml/2021/1/30/53
[v15]: https://lkml.org/lkml/2021/10/19/1405
[v16]: https://lkml.org/lkml/2021/11/23/435
[v17}: https://lkml.org/lkml/2021/12/10/38

Chen Zhou (9):
x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to
asm/kexec.h
x86/setup: Move xen_pv_domain() check and insert_resource() to
setup_arch()
x86/setup: Eliminate a magic number in reserve_crashkernel()
x86/setup: Add build option ARCH_WANT_RESERVE_CRASH_KERNEL
x86/setup: Move reserve_crashkernel[_low]() into crash_core.c
arm64: kdump: introduce some macros for crash kernel reservation
arm64: kdump: reimplement crashkernel=X
of: fdt: Add memory for devices by DT property
"linux,usable-memory-range"
kdump: update Documentation about crashkernel

Zhen Lei (8):
x86/setup: Adjust the range of codes separated by CONFIG_X86_64
x86/setup: Add helper parse_crashkernel_in_order()
x86/setup: Use parse_crashkernel_in_order() to make code logic clear
x86/setup: Update comments in reserve_crashkernel()
kdump: Simplify the parameters of __parse_crashkernel()
kdump: Make parse_crashkernel_{high|low} static
kdump: Reduce unused parameters of parse_crashkernel_{high|low}
of: fdt: Aggregate the processing of "linux,usable-memory-range"

Documentation/admin-guide/kdump/kdump.rst | 11 +-
.../admin-guide/kernel-parameters.txt | 13 +-
arch/Kconfig | 3 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kexec.h | 10 +
arch/arm64/kernel/machine_kexec_file.c | 12 +-
arch/arm64/kernel/setup.c | 13 +-
arch/arm64/mm/init.c | 59 +----
arch/x86/Kconfig | 2 +
arch/x86/include/asm/kexec.h | 28 +++
arch/x86/kernel/setup.c | 166 +-------------
drivers/of/fdt.c | 42 +++-
include/linux/crash_core.h | 4 -
kernel/crash_core.c | 207 ++++++++++++++++--
14 files changed, 328 insertions(+), 243 deletions(-)

--
2.25.1



2021-12-22 13:12:29

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 01/17] x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h

From: Chen Zhou <[email protected]>

We want to make function reserve_crashkernel[_low](), which is implemented
by X86, available to other architectures. It references macro CRASH_ALIGN
and will be moved to public crash_core.c. But the defined values of
CRASH_ALIGN may be different in different architectures. So moving the
definition of CRASH_ALIGN to asm/kexec.h is a good choice.

The reason for moving CRASH_ADDR_{LOW|HIGH}_MAX is the same as above.

Signed-off-by: Chen Zhou <[email protected]>
Co-developed-by: Zhen Lei <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/include/asm/kexec.h | 24 ++++++++++++++++++++++++
arch/x86/kernel/setup.c | 24 ------------------------
2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 11b7c06e2828c30..a0223a6c0238a15 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -18,6 +18,30 @@

# define KEXEC_CONTROL_CODE_MAX_SIZE 2048

+/* 16M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_16M
+
+/*
+ * Keep the crash kernel below this limit.
+ *
+ * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
+ * due to mapping restrictions.
+ *
+ * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
+ * the upper limit of system RAM in 4-level paging mode. Since the kdump
+ * jump could be from 5-level paging to 4-level paging, the jump will fail if
+ * the kernel is put above 64 TB, and during the 1st kernel bootup there's
+ * no good way to detect the paging mode of the target kernel which will be
+ * loaded for dumping.
+ */
+#ifdef CONFIG_X86_32
+# define CRASH_ADDR_LOW_MAX SZ_512M
+# define CRASH_ADDR_HIGH_MAX SZ_512M
+#else
+# define CRASH_ADDR_LOW_MAX SZ_4G
+# define CRASH_ADDR_HIGH_MAX SZ_64T
+#endif
+
#ifndef __ASSEMBLY__

#include <linux/string.h>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6a190c7f4d71b05..ae8f63661363e25 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -392,30 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)

#ifdef CONFIG_KEXEC_CORE

-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN SZ_16M
-
-/*
- * Keep the crash kernel below this limit.
- *
- * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
- * due to mapping restrictions.
- *
- * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
- * the upper limit of system RAM in 4-level paging mode. Since the kdump
- * jump could be from 5-level paging to 4-level paging, the jump will fail if
- * the kernel is put above 64 TB, and during the 1st kernel bootup there's
- * no good way to detect the paging mode of the target kernel which will be
- * loaded for dumping.
- */
-#ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX SZ_512M
-# define CRASH_ADDR_HIGH_MAX SZ_512M
-#else
-# define CRASH_ADDR_LOW_MAX SZ_4G
-# define CRASH_ADDR_HIGH_MAX SZ_64T
-#endif
-
static int __init reserve_crashkernel_low(void)
{
#ifdef CONFIG_X86_64
--
2.25.1


2021-12-22 13:12:35

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()

From: Chen Zhou <[email protected]>

We will make the functions reserve_crashkernel() as generic, the
xen_pv_domain() check in reserve_crashkernel() is relevant only to
x86, the same as insert_resource() in reserve_crashkernel[_low]().
So move xen_pv_domain() check and insert_resource() to setup_arch()
to keep them in x86.

Suggested-by: Mike Rapoport <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
Co-developed-by: Zhen Lei <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/kernel/setup.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ae8f63661363e25..acf2f2eedfe3415 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -434,7 +434,6 @@ static int __init reserve_crashkernel_low(void)

crashk_low_res.start = low_base;
crashk_low_res.end = low_base + low_size - 1;
- insert_resource(&iomem_resource, &crashk_low_res);
#endif
return 0;
}
@@ -458,11 +457,6 @@ static void __init reserve_crashkernel(void)
high = true;
}

- if (xen_pv_domain()) {
- pr_info("Ignoring crashkernel for a Xen PV domain\n");
- return;
- }
-
/* 0 means: find the address automatically */
if (!crash_base) {
/*
@@ -508,11 +502,6 @@ static void __init reserve_crashkernel(void)

crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
- insert_resource(&iomem_resource, &crashk_res);
-}
-#else
-static void __init reserve_crashkernel(void)
-{
}
#endif

@@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
* Reserve memory for crash kernel after SRAT is parsed so that it
* won't consume hotpluggable memory.
*/
- reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+ if (xen_pv_domain())
+ pr_info("Ignoring crashkernel for a Xen PV domain\n");
+ else {
+ reserve_crashkernel();
+ if (crashk_res.end > crashk_res.start)
+ insert_resource(&iomem_resource, &crashk_res);
+ if (crashk_low_res.end > crashk_low_res.start)
+ insert_resource(&iomem_resource, &crashk_low_res);
+ }
+#endif

memblock_find_dma_reserve();

--
2.25.1


2021-12-22 13:12:37

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 03/17] x86/setup: Adjust the range of codes separated by CONFIG_X86_64

Currently, only X86_64 requires that at least 256M low memory be reserved.
X86_32 does not have this requirement. So move all the code related to
reserve_crashkernel_low() into macro CONFIG_X86_64.

Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/kernel/setup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index acf2f2eedfe3415..d9080bfa131a654 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -392,9 +392,9 @@ static void __init memblock_x86_reserve_range_setup_data(void)

#ifdef CONFIG_KEXEC_CORE

+#ifdef CONFIG_X86_64
static int __init reserve_crashkernel_low(void)
{
-#ifdef CONFIG_X86_64
unsigned long long base, low_base = 0, low_size = 0;
unsigned long low_mem_limit;
int ret;
@@ -434,9 +434,10 @@ static int __init reserve_crashkernel_low(void)

crashk_low_res.start = low_base;
crashk_low_res.end = low_base + low_size - 1;
-#endif
+
return 0;
}
+#endif

static void __init reserve_crashkernel(void)
{
@@ -490,10 +491,12 @@ static void __init reserve_crashkernel(void)
}
}

+#ifdef CONFIG_X86_64
if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
memblock_phys_free(crash_base, crash_size);
return;
}
+#endif

pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
--
2.25.1


2021-12-22 13:12:39

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 04/17] x86/setup: Add helper parse_crashkernel_in_order()

Currently, there are two possible combinations of configurations.
(1) crashkernel=X[@offset]
(2) crashkernel=X,high, with or without crashkernel=X,low

(1) has the highest priority, if it is configured correctly, (2) will be
ignored. Similarly, in combination (2), crashkernel=X,low is valid only
when crashkernel=X,high is valid.

Putting the operations of parsing all "crashkernel=" configurations in one
function helps to sort out the strong dependency.

So add helper parse_crashkernel_in_order(). The "__maybe_unused" will be
removed in the next patch.

Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/kernel/setup.c | 51 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d9080bfa131a654..f997074d36f2484 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -439,6 +439,57 @@ static int __init reserve_crashkernel_low(void)
}
#endif

+#define CRASHKERNEL_MEM_NONE 0x0 /* crashkernel= is not exist or invalid */
+#define CRASHKERNEL_MEM_CLASSIC 0x1 /* crashkernel=X[@offset] is valid */
+#define CRASHKERNEL_MEM_HIGH 0x2 /* crashkernel=X,high is valid */
+#define CRASHKERNEL_MEM_LOW 0x4 /* crashkernel=X,low is valid */
+
+/**
+ * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in
+ * priority order until a valid combination is found.
+ * @cmdline: The bootup command line.
+ * @system_ram: Total system memory size.
+ * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or
+ * "crashkernel=X,high".
+ * @crash_base: Save the base address specified by "crashkernel=X@offset"
+ * @low_size: Save the memory size specified by "crashkernel=X,low"
+ *
+ * Returns the status flag of the parsing result of "crashkernel=", such as
+ * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH.
+ */
+__maybe_unused
+static int __init parse_crashkernel_in_order(char *cmdline,
+ unsigned long long system_ram,
+ unsigned long long *crash_size,
+ unsigned long long *crash_base,
+ unsigned long long *low_size)
+{
+ int ret, flag = CRASHKERNEL_MEM_NONE;
+
+ BUG_ON(!crash_size || !crash_base || !low_size);
+
+ /* crashkernel=X[@offset] */
+ ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base);
+ if (!ret && crash_size > 0)
+ return CRASHKERNEL_MEM_CLASSIC;
+
+#ifdef CONFIG_X86_64
+ /* crashkernel=X,high */
+ ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
+ if (ret || crash_size <= 0)
+ return CRASHKERNEL_MEM_NONE;
+
+ flag = CRASHKERNEL_MEM_HIGH;
+
+ /* crashkernel=Y,low */
+ ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base);
+ if (!ret)
+ flag |= CRASHKERNEL_MEM_LOW;
+#endif
+
+ return flag;
+}
+
static void __init reserve_crashkernel(void)
{
unsigned long long crash_size, crash_base, total_mem;
--
2.25.1


2021-12-22 13:12:46

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 05/17] x86/setup: Use parse_crashkernel_in_order() to make code logic clear

Currently, the parsing of "crashkernel=X,high" and the parsing of
"crashkernel=X,low" are not in the same function, but they are strongly
dependent, which affects readability. Use parse_crashkernel_in_order() to
bring them together.

In addition, the operation to ensure at least 256M low memory is moved out
of reserve_craskernel_low() so that it only needs to focus on low memory
allocation.

Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/kernel/setup.c | 69 ++++++++++++++++++-----------------------
1 file changed, 30 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f997074d36f2484..07a58313db5c5f7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -393,32 +393,16 @@ static void __init memblock_x86_reserve_range_setup_data(void)
#ifdef CONFIG_KEXEC_CORE

#ifdef CONFIG_X86_64
-static int __init reserve_crashkernel_low(void)
+static int __init reserve_crashkernel_low(unsigned long long low_size)
{
- unsigned long long base, low_base = 0, low_size = 0;
+ unsigned long long low_base = 0;
unsigned long low_mem_limit;
- int ret;

low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);

- /* crashkernel=Y,low */
- ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
- if (ret) {
- /*
- * two parts from kernel/dma/swiotlb.c:
- * -swiotlb size: user-specified with swiotlb= or default.
- *
- * -swiotlb overflow buffer: now hardcoded to 32k. We round it
- * to 8M for other buffers that may need to stay low too. Also
- * make sure we allocate enough extra low memory so that we
- * don't run out of DMA buffers for 32-bit devices.
- */
- low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
- } else {
- /* passed with crashkernel=0,low ? */
- if (!low_size)
- return 0;
- }
+ /* passed with crashkernel=0,low ? */
+ if (!low_size)
+ return 0;

low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
if (!low_base) {
@@ -457,7 +441,6 @@ static int __init reserve_crashkernel_low(void)
* Returns the status flag of the parsing result of "crashkernel=", such as
* CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH.
*/
-__maybe_unused
static int __init parse_crashkernel_in_order(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
@@ -492,22 +475,15 @@ static int __init parse_crashkernel_in_order(char *cmdline,

static void __init reserve_crashkernel(void)
{
- unsigned long long crash_size, crash_base, total_mem;
- bool high = false;
- int ret;
+ unsigned long long crash_size, crash_base, total_mem, low_size;
+ int flag;

total_mem = memblock_phys_mem_size();

- /* crashkernel=XM */
- ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0) {
- /* crashkernel=X,high */
- ret = parse_crashkernel_high(boot_command_line, total_mem,
- &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
- return;
- high = true;
- }
+ flag = parse_crashkernel_in_order(boot_command_line, total_mem,
+ &crash_size, &crash_base, &low_size);
+ if (flag == CRASHKERNEL_MEM_NONE)
+ return;

/* 0 means: find the address automatically */
if (!crash_base) {
@@ -519,7 +495,7 @@ static void __init reserve_crashkernel(void)
* So try low memory first and fall back to high memory
* unless "crashkernel=size[KMG],high" is specified.
*/
- if (!high)
+ if (!(flag & CRASHKERNEL_MEM_HIGH))
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_LOW_MAX);
@@ -543,9 +519,24 @@ static void __init reserve_crashkernel(void)
}

#ifdef CONFIG_X86_64
- if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
- memblock_phys_free(crash_base, crash_size);
- return;
+ if (crash_base >= (1ULL << 32)) {
+ if (!(flag & CRASHKERNEL_MEM_LOW)) {
+ /*
+ * two parts from kernel/dma/swiotlb.c:
+ * -swiotlb size: user-specified with swiotlb= or default.
+ *
+ * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+ * to 8M for other buffers that may need to stay low too. Also
+ * make sure we allocate enough extra low memory so that we
+ * don't run out of DMA buffers for 32-bit devices.
+ */
+ low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
+ }
+
+ if (reserve_crashkernel_low(low_size)) {
+ memblock_phys_free(crash_base, crash_size);
+ return;
+ }
}
#endif

--
2.25.1


2021-12-22 13:12:46

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 06/17] x86/setup: Update comments in reserve_crashkernel()

Add comments to describe which bootup parameters are processed by the
code, and make comments close to the code being commented.

Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/kernel/setup.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 07a58313db5c5f7..52aa925877ca787 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -485,20 +485,20 @@ static void __init reserve_crashkernel(void)
if (flag == CRASHKERNEL_MEM_NONE)
return;

- /* 0 means: find the address automatically */
if (!crash_base) {
/*
- * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
- * crashkernel=x,high reserves memory over 4G, also allocates
- * 256M extra low memory for DMA buffers and swiotlb.
- * But the extra memory is not required for all machines.
- * So try low memory first and fall back to high memory
- * unless "crashkernel=size[KMG],high" is specified.
+ * For the case of crashkernel=X[@offset] and offset is omitted,
+ * try the low memory first.
*/
if (!(flag & CRASHKERNEL_MEM_HIGH))
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
CRASH_ADDR_LOW_MAX);
+
+ /*
+ * If low memory allocation failed above, or for the case of
+ * crashkernel=X,high, try the high memory.
+ */
if (!crash_base)
crash_base = memblock_phys_alloc_range(crash_size,
CRASH_ALIGN, CRASH_ALIGN,
@@ -510,6 +510,10 @@ static void __init reserve_crashkernel(void)
} else {
unsigned long long start;

+ /*
+ * The case of crashkernel=X@offset and offset is specified.
+ * Only user-specified space can be reserved.
+ */
start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
crash_base + crash_size);
if (start != crash_base) {
@@ -520,6 +524,10 @@ static void __init reserve_crashkernel(void)

#ifdef CONFIG_X86_64
if (crash_base >= (1ULL << 32)) {
+ /*
+ * Ensure that at least 256M extra low memory is allocated for
+ * DMA buffers and swiotlb, if low memory size is not specified.
+ */
if (!(flag & CRASHKERNEL_MEM_LOW)) {
/*
* two parts from kernel/dma/swiotlb.c:
--
2.25.1


2021-12-22 13:12:50

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 08/17] x86/setup: Add build option ARCH_WANT_RESERVE_CRASH_KERNEL

From: Chen Zhou <[email protected]>

There are multiple ARCHs that implement reserve_crashkernel(), all of them
are marked as static. Currently, we want to combine the implementations on
x86 and arm64 into one, and move it to public crash_core.c. To avoid
symbol conflicts on other platforms, add a new build option
ARCH_WANT_RESERVE_CRASH_KERNEL. And change CONFIG_X86_64 to CONFIG_64BIT,
so it can be shared with arm64, or other users in future.

Signed-off-by: Chen Zhou <[email protected]>
Co-developed-by: Zhen Lei <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
arch/Kconfig | 3 +++
arch/x86/Kconfig | 2 ++
arch/x86/kernel/setup.c | 10 +++++-----
3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d3c4ab249e9c275..f53dd7852290b9a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,6 +11,9 @@ source "arch/$(SRCARCH)/Kconfig"

menu "General architecture-dependent options"

+config ARCH_WANT_RESERVE_CRASH_KERNEL
+ bool
+
config CRASH_CORE
bool

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2efb86..bd78ed8193079b9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -12,6 +12,7 @@ config X86_32
depends on !64BIT
# Options that are inherently 32-bit kernel only:
select ARCH_WANT_IPC_PARSE_VERSION
+ select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select CLKSRC_I8253
select CLONE_BACKWARDS
select GENERIC_VDSO_32
@@ -28,6 +29,7 @@ config X86_64
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_USE_CMPXCHG_LOCKREF
+ select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index abff57ffbe92884..beb73cce4b8b826 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -390,9 +390,9 @@ static void __init memblock_x86_reserve_range_setup_data(void)
* --------- Crashkernel reservation ------------------------------
*/

-#ifdef CONFIG_KEXEC_CORE
+#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL

-#ifdef CONFIG_X86_64
+#ifdef CONFIG_64BIT
static int __init reserve_crashkernel_low(unsigned long long low_size)
{
unsigned long long low_base = 0;
@@ -456,7 +456,7 @@ static int __init parse_crashkernel_in_order(char *cmdline,
if (!ret && crash_size > 0)
return CRASHKERNEL_MEM_CLASSIC;

-#ifdef CONFIG_X86_64
+#ifdef CONFIG_64BIT
/* crashkernel=X,high */
ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
if (ret || crash_size <= 0)
@@ -522,7 +522,7 @@ static void __init reserve_crashkernel(void)
}
}

-#ifdef CONFIG_X86_64
+#ifdef CONFIG_64BIT
if (crash_base >= CRASH_ADDR_LOW_MAX) {
/*
* Ensure that at least 256M extra low memory is allocated for
@@ -556,7 +556,7 @@ static void __init reserve_crashkernel(void)
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
}
-#endif
+#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */

static struct resource standard_io_resources[] = {
{ .name = "dma1", .start = 0x00, .end = 0x1f,
--
2.25.1


2021-12-22 13:12:51

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 10/17] kdump: Simplify the parameters of __parse_crashkernel()

After commit adbc742bf786 ("x86, kdump: Change crashkernel_high/low= to
crashkernel=,high/low"), all kdump bootup parameters start with
"crashkernel=". Therefore, it is better for __parse_crashkernel() to use
it directly than for the caller to pass it. So the parameter 'name' can
be omitted.

Similarly, we can pass the suffix type instead of the suffix name to avoid
the global variable 'suffix_tbl' appearing in multiple places.

There is no change in functionality, but it makes the code look a little
more concise.

Signed-off-by: Zhen Lei <[email protected]>
---
kernel/crash_core.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 64ed082382f3f18..496dae2718cf026 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -233,11 +233,12 @@ static int __init __parse_crashkernel(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base,
- const char *name,
- const char *suffix)
+ int suffix_type)
{
char *first_colon, *first_space;
char *ck_cmdline;
+ const char *name = "crashkernel=";
+ const char *suffix = suffix_tbl[suffix_type];

BUG_ON(!crash_size || !crash_base);
*crash_size = 0;
@@ -275,8 +276,7 @@ int __init parse_crashkernel(char *cmdline,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
- return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel=", NULL);
+ return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_NULL);
}

int __init parse_crashkernel_high(char *cmdline,
@@ -284,8 +284,7 @@ int __init parse_crashkernel_high(char *cmdline,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
- return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel=", suffix_tbl[SUFFIX_HIGH]);
+ return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_HIGH);
}

int __init parse_crashkernel_low(char *cmdline,
@@ -293,8 +292,7 @@ int __init parse_crashkernel_low(char *cmdline,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
- return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel=", suffix_tbl[SUFFIX_LOW]);
+ return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_LOW);
}

/*
--
2.25.1


2021-12-22 13:12:53

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 13/17] arm64: kdump: introduce some macros for crash kernel reservation

From: Chen Zhou <[email protected]>

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macros instead.

Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/arm64/include/asm/kexec.h | 6 ++++++
arch/arm64/mm/init.c | 4 ++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 9839bfc163d7147..1b9edc69f0244ca 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,12 @@

#define KEXEC_ARCH KEXEC_ARCH_AARCH64

+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
+
#ifndef __ASSEMBLY__

/**
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a8834434af99ae0..be4595dc7459115 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
static void __init reserve_crashkernel(void)
{
unsigned long long crash_base, crash_size;
- unsigned long long crash_max = arm64_dma_phys_limit;
+ unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;

ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
@@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void)
crash_max = crash_base + crash_size;

/* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+ crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
crash_base, crash_max);
if (!crash_base) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
--
2.25.1


2021-12-22 13:13:01

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 12/17] kdump: Reduce unused parameters of parse_crashkernel_{high|low}

The parameters 'system_ram' and 'crash_base' is only needed by the case of
"crashkernel=X@[offset]". The argument list of parse_crashkernel_suffix()
can help prove this point.

Signed-off-by: Zhen Lei <[email protected]>
---
kernel/crash_core.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a037076b89a9bb2..67f5065e3c3cfcc 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -288,19 +288,19 @@ int __init parse_crashkernel(char *cmdline,

#ifdef CONFIG_64BIT
static int __init parse_crashkernel_high(char *cmdline,
- unsigned long long system_ram,
- unsigned long long *crash_size,
- unsigned long long *crash_base)
+ unsigned long long *crash_size)
{
- return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_HIGH);
+ unsigned long long base;
+
+ return __parse_crashkernel(cmdline, 0, crash_size, &base, SUFFIX_HIGH);
}

static int __init parse_crashkernel_low(char *cmdline,
- unsigned long long system_ram,
- unsigned long long *crash_size,
- unsigned long long *crash_base)
+ unsigned long long *crash_size)
{
- return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_LOW);
+ unsigned long long base;
+
+ return __parse_crashkernel(cmdline, 0, crash_size, &base, SUFFIX_LOW);
}

static int __init reserve_crashkernel_low(unsigned long long low_size)
@@ -368,14 +368,14 @@ static int __init parse_crashkernel_in_order(char *cmdline,

#ifdef CONFIG_64BIT
/* crashkernel=X,high */
- ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
+ ret = parse_crashkernel_high(cmdline, crash_size);
if (ret || crash_size <= 0)
return CRASHKERNEL_MEM_NONE;

flag = CRASHKERNEL_MEM_HIGH;

/* crashkernel=Y,low */
- ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base);
+ ret = parse_crashkernel_low(cmdline, low_size);
if (!ret)
flag |= CRASHKERNEL_MEM_LOW;
#endif
--
2.25.1


2021-12-22 13:13:08

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 14/17] arm64: kdump: reimplement crashkernel=X

From: Chen Zhou <[email protected]>

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.

Another minor change, there may be two regions reserved for crash
dump kernel, in order to distinct from the high region and make no
effect to the use of existing kexec-tools, rename the low region as
"Crash kernel (low)".

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: John Donnelly <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kexec.h | 4 ++
arch/arm64/kernel/machine_kexec_file.c | 12 +++++-
arch/arm64/kernel/setup.c | 13 +++++-
arch/arm64/mm/init.c | 59 +++++---------------------
5 files changed, 38 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17ffb..4b99efa36da3793 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -95,6 +95,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+ select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
select ARCH_WANTS_NO_INSTR
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARM_AMBA
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 1b9edc69f0244ca..3bde0079925d771 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
static inline void crash_post_resume(void) {}
#endif

+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
#if defined(CONFIG_KEXEC_CORE)
void cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
unsigned long arg0, unsigned long arg1,
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 59c648d51848886..889951291cc0f9c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)

/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+ if (ret)
+ goto out;
+
+ if (crashk_low_res.end) {
+ ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+ if (ret)
+ goto out;
+ }

- if (!ret)
- ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
+ ret = crash_prepare_elf64_headers(cmem, true, addr, sz);

+out:
kfree(cmem);
return ret;
}
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index be5f85b0a24de69..4bb2e55366be64d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -248,7 +248,18 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
#ifdef CONFIG_KEXEC_CORE
- /* Userspace will find "Crash kernel" region in /proc/iomem. */
+ /*
+ * Userspace will find "Crash kernel" or "Crash kernel (low)"
+ * region in /proc/iomem.
+ * In order to distinct from the high region and make no effect
+ * to the use of existing kexec-tools, rename the low region as
+ * "Crash kernel (low)".
+ */
+ if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+ crashk_low_res.end <= res->end) {
+ crashk_low_res.name = "Crash kernel (low)";
+ request_resource(res, &crashk_low_res);
+ }
if (crashk_res.end && crashk_res.start >= res->start &&
crashk_res.end <= res->end)
request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index be4595dc7459115..85c83e4eff2b6c4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
#include <asm/fixmap.h>
#include <asm/kasan.h>
#include <asm/kernel-pgtable.h>
+#include <asm/kexec.h>
#include <asm/kvm_host.h>
#include <asm/memory.h>
#include <asm/numa.h>
@@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr);
*/
phys_addr_t arm64_dma_phys_limit __ro_after_init;

-#ifdef CONFIG_KEXEC_CORE
-/*
- * reserve_crashkernel() - reserves memory for crash kernel
- *
- * This function reserves memory area given in "crashkernel=" kernel command
- * line parameter. The memory reserved is used by dump capture kernel when
- * primary kernel is crashing.
- */
+#ifndef CONFIG_KEXEC_CORE
static void __init reserve_crashkernel(void)
{
- unsigned long long crash_base, crash_size;
- unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
- int ret;
-
- ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
- &crash_size, &crash_base);
- /* no crashkernel= or invalid value specified */
- if (ret || !crash_size)
- return;
-
- crash_size = PAGE_ALIGN(crash_size);
-
- /* User specifies base address explicitly. */
- if (crash_base)
- crash_max = crash_base + crash_size;
-
- /* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
- crash_base, crash_max);
- if (!crash_base) {
- pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
- crash_size);
- return;
- }
-
- pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
- crash_base, crash_base + crash_size, crash_size >> 20);
-
- /*
- * The crashkernel memory will be removed from the kernel linear
- * map. Inform kmemleak so that it won't try to access it.
- */
- kmemleak_ignore_phys(crash_base);
- crashk_res.start = crash_base;
- crashk_res.end = crash_base + crash_size - 1;
}
-#else
-static void __init reserve_crashkernel(void)
-{
-}
-#endif /* CONFIG_KEXEC_CORE */
+#endif

/*
* Return the maximum physical address for a zone accessible by the given bits
@@ -362,6 +317,14 @@ void __init bootmem_init(void)
* reserved, so do it here.
*/
reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+ /*
+ * The low region is intended to be used for crash dump kernel devices,
+ * just mark the low region as "nomap" simply.
+ */
+ if (crashk_low_res.end)
+ memblock_mark_nomap(crashk_low_res.start, resource_size(&crashk_low_res));
+#endif

memblock_dump_all();
}
--
2.25.1


2021-12-22 13:13:16

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 15/17] of: fdt: Aggregate the processing of "linux,usable-memory-range"

Currently, we parse the "linux,usable-memory-range" property in
early_init_dt_scan_chosen(), to obtain the specified memory range of the
crash kernel. We then reserve the required memory after
early_init_dt_scan_memory() has identified all available physical memory.
Because the two pieces of code are separated far, the readability and
maintainability are reduced. So bring them together.

Suggested-by: Rob Herring <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
drivers/of/fdt.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index bdca35284cebd56..37b477a51175359 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
elfcorehdr_addr, elfcorehdr_size);
}

-static phys_addr_t cap_mem_addr;
-static phys_addr_t cap_mem_size;
+static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;

/**
* early_init_dt_check_for_usable_mem_range - Decode usable memory range
@@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
{
const __be32 *prop;
int len;
+ phys_addr_t cap_mem_addr;
+ phys_addr_t cap_mem_size;
+
+ if ((long)node < 0)
+ return;

pr_debug("Looking for usable-memory-range property... ");

@@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)

pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
&cap_mem_size);
+
+ memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
}

#ifdef CONFIG_SERIAL_EARLYCON
@@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
(strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
return 0;

+ chosen_node_offset = node;
+
early_init_dt_check_for_initrd(node);
early_init_dt_check_for_elfcorehdr(node);
- early_init_dt_check_for_usable_mem_range(node);

/* Retrieve command line */
p = of_get_flat_dt_prop(node, "bootargs", &l);
@@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);

/* Handle linux,usable-memory-range property */
- memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+ early_init_dt_check_for_usable_mem_range(chosen_node_offset);
}

bool __init early_init_dt_scan(void *params)
--
2.25.1


2021-12-22 13:13:18

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 17/17] kdump: update Documentation about crashkernel

From: Chen Zhou <[email protected]>

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone and fall back to high allocation
if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
Documentation/admin-guide/kdump/kdump.rst | 11 +++++++++--
Documentation/admin-guide/kernel-parameters.txt | 13 ++++++++++---
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3df27c9b2..d4c287044be0c70 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -361,8 +361,15 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.

- On arm64, use "crashkernel=Y[@X]". Note that the start address of
- the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+ On arm64, use "crashkernel=X" to try low allocation in DMA zone and
+ fall back to high allocation if it fails.
+ We can also use "crashkernel=X,high" to select a high region above
+ DMA zone, which also tries to allocate at least 256M low memory in
+ DMA zone automatically.
+ "crashkernel=Y,low" can be used to allocate specified size low memory.
+ Use "crashkernel=Y@X" if you really have to reserve memory from
+ specified start address X. Note that the start address of the kernel,
+ X if explicitly specified, must be aligned to 2MiB (0x200000).

Load the Dump-capture Kernel
============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fc34332c8d9a6df..ff5f15008707cab 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -783,6 +783,9 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+ [KNL, ARM64] Try low allocation in DMA zone and fall back
+ to high allocation if it fails when '@offset' hasn't been
+ specified.
See Documentation/admin-guide/kdump/kdump.rst for further details.

crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -798,7 +801,9 @@
be above 4G if system have more than 4G ram installed.
Otherwise memory region will be allocated below 4G, if
available.
- It will be ignored if crashkernel=X is specified.
+ It will be ignored if crashkernel=X is correctly specified.
+ [KNL, ARM64] range in high memory.
+ Allow kernel to allocate physical memory region from top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -807,13 +812,15 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
- at least 256M below 4G automatically.
+ least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+ [KNL, ARM64] range in low memory.
+ This one let user to specify a low range in DMA zone for
+ crash dump kernel.
cryptomgr.notests
[KNL] Disable crypto self-tests

--
2.25.1


2021-12-22 13:13:22

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 16/17] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

From: Chen Zhou <[email protected]>

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou <[email protected]>
Co-developed-by: Zhen Lei <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Tested-by: Dave Kleikamp <[email protected]>
---
drivers/of/fdt.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 37b477a51175359..f7b72fa773250ad 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -967,6 +967,15 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)

static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;

+/*
+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES 2
+
/**
* early_init_dt_check_for_usable_mem_range - Decode usable memory range
* location from flat tree
@@ -974,10 +983,9 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
*/
static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
{
- const __be32 *prop;
- int len;
- phys_addr_t cap_mem_addr;
- phys_addr_t cap_mem_size;
+ struct memblock_region rgn[MAX_USABLE_RANGES] = {0};
+ const __be32 *prop, *endp;
+ int len, i;

if ((long)node < 0)
return;
@@ -985,16 +993,21 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
pr_debug("Looking for usable-memory-range property... ");

prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
- if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
+ if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells)))
return;

- cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);
- cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
+ endp = prop + (len / sizeof(__be32));
+ for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) {
+ rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, &prop);
+ rgn[i].size = dt_mem_next_cell(dt_root_size_cells, &prop);

- pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
- &cap_mem_size);
+ pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
+ i, &rgn[i].base, &rgn[i].size);
+ }

- memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+ memblock_cap_memory_range(rgn[0].base, rgn[0].size);
+ for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++)
+ memblock_add(rgn[i].base, rgn[i].size);
}

#ifdef CONFIG_SERIAL_EARLYCON
--
2.25.1


2021-12-22 13:13:25

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 11/17] kdump: Make parse_crashkernel_{high|low} static

Currently, parse_crashkernel_{high|low} is only referenced by
parse_crashkernel_in_order(), and they are in the same file. So make it
static, and move it into "#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL".

Signed-off-by: Zhen Lei <[email protected]>
---
include/linux/crash_core.h | 4 ----
kernel/crash_core.c | 19 ++++++++++---------
2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e7db..529a0a783190bfe 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -79,9 +79,5 @@ void final_note(Elf_Word *buf);

int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
-int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
- unsigned long long *crash_size, unsigned long long *crash_base);
-int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
- unsigned long long *crash_size, unsigned long long *crash_base);

#endif /* LINUX_CRASH_CORE_H */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 496dae2718cf026..a037076b89a9bb2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -279,7 +279,15 @@ int __init parse_crashkernel(char *cmdline,
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_NULL);
}

-int __init parse_crashkernel_high(char *cmdline,
+
+/*
+ * --------- Crashkernel reservation ------------------------------
+ */
+
+#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
+
+#ifdef CONFIG_64BIT
+static int __init parse_crashkernel_high(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base)
@@ -287,7 +295,7 @@ int __init parse_crashkernel_high(char *cmdline,
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_HIGH);
}

-int __init parse_crashkernel_low(char *cmdline,
+static int __init parse_crashkernel_low(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base)
@@ -295,13 +303,6 @@ int __init parse_crashkernel_low(char *cmdline,
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, SUFFIX_LOW);
}

-/*
- * --------- Crashkernel reservation ------------------------------
- */
-
-#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
-
-#ifdef CONFIG_64BIT
static int __init reserve_crashkernel_low(unsigned long long low_size)
{
unsigned long long low_base = 0;
--
2.25.1


2021-12-22 13:13:27

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 07/17] x86/setup: Eliminate a magic number in reserve_crashkernel()

From: Chen Zhou <[email protected]>

Replace '(1ULL << 32)' with CRASH_ADDR_LOW_MAX to improve readability,
they are equal.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 52aa925877ca787..abff57ffbe92884 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -523,7 +523,7 @@ static void __init reserve_crashkernel(void)
}

#ifdef CONFIG_X86_64
- if (crash_base >= (1ULL << 32)) {
+ if (crash_base >= CRASH_ADDR_LOW_MAX) {
/*
* Ensure that at least 256M extra low memory is allocated for
* DMA buffers and swiotlb, if low memory size is not specified.
--
2.25.1


2021-12-22 13:13:29

by Zhen Lei

[permalink] [raw]
Subject: [PATCH v18 09/17] x86/setup: Move reserve_crashkernel[_low]() into crash_core.c

From: Chen Zhou <[email protected]>

Make the functions reserve_crashkernel[_low]() as generic. Since
reserve_crashkernel[_low]() implementations are quite similar on other
architectures as well, we can have more users of this later.

Signed-off-by: Chen Zhou <[email protected]>
Co-developed-by: Zhen Lei <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
---
arch/x86/include/asm/kexec.h | 4 +
arch/x86/kernel/setup.c | 172 ----------------------------------
kernel/crash_core.c | 176 ++++++++++++++++++++++++++++++++++-
3 files changed, 179 insertions(+), 173 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index a0223a6c0238a15..21fd1f2796e1057 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -216,6 +216,10 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);

+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index beb73cce4b8b826..ba56e410f01811d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -386,178 +386,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
}
}

-/*
- * --------- Crashkernel reservation ------------------------------
- */
-
-#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
-
-#ifdef CONFIG_64BIT
-static int __init reserve_crashkernel_low(unsigned long long low_size)
-{
- unsigned long long low_base = 0;
- unsigned long low_mem_limit;
-
- low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
-
- /* passed with crashkernel=0,low ? */
- if (!low_size)
- return 0;
-
- low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
- if (!low_base) {
- pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
- (unsigned long)(low_size >> 20));
- return -ENOMEM;
- }
-
- pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
- (unsigned long)(low_size >> 20),
- (unsigned long)(low_base >> 20),
- (unsigned long)(low_mem_limit >> 20));
-
- crashk_low_res.start = low_base;
- crashk_low_res.end = low_base + low_size - 1;
-
- return 0;
-}
-#endif
-
-#define CRASHKERNEL_MEM_NONE 0x0 /* crashkernel= is not exist or invalid */
-#define CRASHKERNEL_MEM_CLASSIC 0x1 /* crashkernel=X[@offset] is valid */
-#define CRASHKERNEL_MEM_HIGH 0x2 /* crashkernel=X,high is valid */
-#define CRASHKERNEL_MEM_LOW 0x4 /* crashkernel=X,low is valid */
-
-/**
- * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in
- * priority order until a valid combination is found.
- * @cmdline: The bootup command line.
- * @system_ram: Total system memory size.
- * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or
- * "crashkernel=X,high".
- * @crash_base: Save the base address specified by "crashkernel=X@offset"
- * @low_size: Save the memory size specified by "crashkernel=X,low"
- *
- * Returns the status flag of the parsing result of "crashkernel=", such as
- * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH.
- */
-static int __init parse_crashkernel_in_order(char *cmdline,
- unsigned long long system_ram,
- unsigned long long *crash_size,
- unsigned long long *crash_base,
- unsigned long long *low_size)
-{
- int ret, flag = CRASHKERNEL_MEM_NONE;
-
- BUG_ON(!crash_size || !crash_base || !low_size);
-
- /* crashkernel=X[@offset] */
- ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base);
- if (!ret && crash_size > 0)
- return CRASHKERNEL_MEM_CLASSIC;
-
-#ifdef CONFIG_64BIT
- /* crashkernel=X,high */
- ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
- if (ret || crash_size <= 0)
- return CRASHKERNEL_MEM_NONE;
-
- flag = CRASHKERNEL_MEM_HIGH;
-
- /* crashkernel=Y,low */
- ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base);
- if (!ret)
- flag |= CRASHKERNEL_MEM_LOW;
-#endif
-
- return flag;
-}
-
-static void __init reserve_crashkernel(void)
-{
- unsigned long long crash_size, crash_base, total_mem, low_size;
- int flag;
-
- total_mem = memblock_phys_mem_size();
-
- flag = parse_crashkernel_in_order(boot_command_line, total_mem,
- &crash_size, &crash_base, &low_size);
- if (flag == CRASHKERNEL_MEM_NONE)
- return;
-
- if (!crash_base) {
- /*
- * For the case of crashkernel=X[@offset] and offset is omitted,
- * try the low memory first.
- */
- if (!(flag & CRASHKERNEL_MEM_HIGH))
- crash_base = memblock_phys_alloc_range(crash_size,
- CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_LOW_MAX);
-
- /*
- * If low memory allocation failed above, or for the case of
- * crashkernel=X,high, try the high memory.
- */
- if (!crash_base)
- crash_base = memblock_phys_alloc_range(crash_size,
- CRASH_ALIGN, CRASH_ALIGN,
- CRASH_ADDR_HIGH_MAX);
- if (!crash_base) {
- pr_info("crashkernel reservation failed - No suitable area found.\n");
- return;
- }
- } else {
- unsigned long long start;
-
- /*
- * The case of crashkernel=X@offset and offset is specified.
- * Only user-specified space can be reserved.
- */
- start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
- crash_base + crash_size);
- if (start != crash_base) {
- pr_info("crashkernel reservation failed - memory is in use.\n");
- return;
- }
- }
-
-#ifdef CONFIG_64BIT
- if (crash_base >= CRASH_ADDR_LOW_MAX) {
- /*
- * Ensure that at least 256M extra low memory is allocated for
- * DMA buffers and swiotlb, if low memory size is not specified.
- */
- if (!(flag & CRASHKERNEL_MEM_LOW)) {
- /*
- * two parts from kernel/dma/swiotlb.c:
- * -swiotlb size: user-specified with swiotlb= or default.
- *
- * -swiotlb overflow buffer: now hardcoded to 32k. We round it
- * to 8M for other buffers that may need to stay low too. Also
- * make sure we allocate enough extra low memory so that we
- * don't run out of DMA buffers for 32-bit devices.
- */
- low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
- }
-
- if (reserve_crashkernel_low(low_size)) {
- memblock_phys_free(crash_base, crash_size);
- return;
- }
- }
-#endif
-
- pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
- (unsigned long)(crash_size >> 20),
- (unsigned long)(crash_base >> 20),
- (unsigned long)(total_mem >> 20));
-
- crashk_res.start = crash_base;
- crashk_res.end = crash_base + crash_size - 1;
-}
-#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
-
static struct resource standard_io_resources[] = {
{ .name = "dma1", .start = 0x00, .end = 0x1f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index eb53f5ec62c900f..64ed082382f3f18 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -5,9 +5,11 @@
*/

#include <linux/buildid.h>
-#include <linux/crash_core.h>
+#include <linux/kexec.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/memblock.h>
+#include <linux/swiotlb.h>

#include <asm/page.h>
#include <asm/sections.h>
@@ -295,6 +297,178 @@ int __init parse_crashkernel_low(char *cmdline,
"crashkernel=", suffix_tbl[SUFFIX_LOW]);
}

+/*
+ * --------- Crashkernel reservation ------------------------------
+ */
+
+#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
+
+#ifdef CONFIG_64BIT
+static int __init reserve_crashkernel_low(unsigned long long low_size)
+{
+ unsigned long long low_base = 0;
+ unsigned long low_mem_limit;
+
+ low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
+
+ /* passed with crashkernel=0,low ? */
+ if (!low_size)
+ return 0;
+
+ low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
+ if (!low_base) {
+ pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
+ (unsigned long)(low_size >> 20));
+ return -ENOMEM;
+ }
+
+ pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
+ (unsigned long)(low_size >> 20),
+ (unsigned long)(low_base >> 20),
+ (unsigned long)(low_mem_limit >> 20));
+
+ crashk_low_res.start = low_base;
+ crashk_low_res.end = low_base + low_size - 1;
+
+ return 0;
+}
+#endif
+
+#define CRASHKERNEL_MEM_NONE 0x0 /* crashkernel= is not exist or invalid */
+#define CRASHKERNEL_MEM_CLASSIC 0x1 /* crashkernel=X[@offset] is valid */
+#define CRASHKERNEL_MEM_HIGH 0x2 /* crashkernel=X,high is valid */
+#define CRASHKERNEL_MEM_LOW 0x4 /* crashkernel=X,low is valid */
+
+/**
+ * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in
+ * priority order until a valid combination is found.
+ * @cmdline: The bootup command line.
+ * @system_ram: Total system memory size.
+ * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or
+ * "crashkernel=X,high".
+ * @crash_base: Save the base address specified by "crashkernel=X@offset"
+ * @low_size: Save the memory size specified by "crashkernel=X,low"
+ *
+ * Returns the status flag of the parsing result of "crashkernel=", such as
+ * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH.
+ */
+static int __init parse_crashkernel_in_order(char *cmdline,
+ unsigned long long system_ram,
+ unsigned long long *crash_size,
+ unsigned long long *crash_base,
+ unsigned long long *low_size)
+{
+ int ret, flag = CRASHKERNEL_MEM_NONE;
+
+ BUG_ON(!crash_size || !crash_base || !low_size);
+
+ /* crashkernel=X[@offset] */
+ ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base);
+ if (!ret && crash_size > 0)
+ return CRASHKERNEL_MEM_CLASSIC;
+
+#ifdef CONFIG_64BIT
+ /* crashkernel=X,high */
+ ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
+ if (ret || crash_size <= 0)
+ return CRASHKERNEL_MEM_NONE;
+
+ flag = CRASHKERNEL_MEM_HIGH;
+
+ /* crashkernel=Y,low */
+ ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base);
+ if (!ret)
+ flag |= CRASHKERNEL_MEM_LOW;
+#endif
+
+ return flag;
+}
+
+void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_size, crash_base, total_mem, low_size;
+ int flag;
+
+ total_mem = memblock_phys_mem_size();
+
+ flag = parse_crashkernel_in_order(boot_command_line, total_mem,
+ &crash_size, &crash_base, &low_size);
+ if (flag == CRASHKERNEL_MEM_NONE)
+ return;
+
+ if (!crash_base) {
+ /*
+ * For the case of crashkernel=X[@offset] and offset is omitted,
+ * try the low memory first.
+ */
+ if (!(flag & CRASHKERNEL_MEM_HIGH))
+ crash_base = memblock_phys_alloc_range(crash_size,
+ CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_LOW_MAX);
+
+ /*
+ * If low memory allocation failed above, or for the case of
+ * crashkernel=X,high, try the high memory.
+ */
+ if (!crash_base)
+ crash_base = memblock_phys_alloc_range(crash_size,
+ CRASH_ALIGN, CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX);
+ if (!crash_base) {
+ pr_info("crashkernel reservation failed - No suitable area found.\n");
+ return;
+ }
+ } else {
+ unsigned long long start;
+
+ /*
+ * The case of crashkernel=X@offset and offset is specified.
+ * Only user-specified space can be reserved.
+ */
+ start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
+ crash_base + crash_size);
+ if (start != crash_base) {
+ pr_info("crashkernel reservation failed - memory is in use.\n");
+ return;
+ }
+ }
+
+#ifdef CONFIG_64BIT
+ if (crash_base >= CRASH_ADDR_LOW_MAX) {
+ /*
+ * Ensure that at least 256M extra low memory is allocated for
+ * DMA buffers and swiotlb, if low memory size is not specified.
+ */
+ if (!(flag & CRASHKERNEL_MEM_LOW)) {
+ /*
+ * two parts from kernel/dma/swiotlb.c:
+ * -swiotlb size: user-specified with swiotlb= or default.
+ *
+ * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+ * to 8M for other buffers that may need to stay low too. Also
+ * make sure we allocate enough extra low memory so that we
+ * don't run out of DMA buffers for 32-bit devices.
+ */
+ low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
+ }
+
+ if (reserve_crashkernel_low(low_size)) {
+ memblock_phys_free(crash_base, crash_size);
+ return;
+ }
+ }
+#endif
+
+ pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
+ (unsigned long)(crash_size >> 20),
+ (unsigned long)(crash_base >> 20),
+ (unsigned long)(total_mem >> 20));
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
+
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
void *data, size_t data_len)
{
--
2.25.1


2021-12-22 20:43:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v18 01/17] x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h

On Wed, Dec 22, 2021 at 09:08:04PM +0800, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> We want to make function reserve_crashkernel[_low](), which is implemented
^^

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

Also, pls read section "2) Describe your changes" in
Documentation/process/submitting-patches.rst for more details.

Bottom line is: personal pronouns are ambiguous in text, especially with
so many parties/companies/etc developing the kernel so let's avoid them
please.

> by X86, available to other architectures. It references macro CRASH_ALIGN

"x86"

> and will be moved to public crash_core.c. But the defined values of
> CRASH_ALIGN may be different in different architectures. So moving the
> definition of CRASH_ALIGN to asm/kexec.h is a good choice.
>
> The reason for moving CRASH_ADDR_{LOW|HIGH}_MAX is the same as above.

This commit message needs to say something along the lines of:

"Move CRASH_ALIGN and ... to the arch-specific header in preparation
of making reserve_crashkernel[_low]() generic, used by other
architectures."

or so.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-23 02:09:41

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 01/17] x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to asm/kexec.h



On 2021/12/23 4:43, Borislav Petkov wrote:
> On Wed, Dec 22, 2021 at 09:08:04PM +0800, Zhen Lei wrote:
>> From: Chen Zhou <[email protected]>
>>
>> We want to make function reserve_crashkernel[_low](), which is implemented
> ^^
>
> Please use passive voice in your commit message: no "we" or "I", etc,
> and describe your changes in imperative mood.

My bad language habits. I've made this mistake several times.

>
> Also, pls read section "2) Describe your changes" in
> Documentation/process/submitting-patches.rst for more details.
>
> Bottom line is: personal pronouns are ambiguous in text, especially with
> so many parties/companies/etc developing the kernel so let's avoid them
> please.

OK, I'll check the description of the other patches.

>
>> by X86, available to other architectures. It references macro CRASH_ALIGN
>
> "x86"

OK

>
>> and will be moved to public crash_core.c. But the defined values of
>> CRASH_ALIGN may be different in different architectures. So moving the
>> definition of CRASH_ALIGN to asm/kexec.h is a good choice.
>>
>> The reason for moving CRASH_ADDR_{LOW|HIGH}_MAX is the same as above.
>
> This commit message needs to say something along the lines of:
>
> "Move CRASH_ALIGN and ... to the arch-specific header in preparation
> of making reserve_crashkernel[_low]() generic, used by other
> architectures."

OK, I will use this one, thanks.

By the way, patch 0004-0006 were written based on your suggestion. Can you
take a moment to review it? I think I forgot to add "Suggested-by".

>
> or so.
>

2021-12-23 15:49:12

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH v18 15/17] of: fdt: Aggregate the processing of "linux,usable-memory-range"

On 12/22/21 7:08AM, Zhen Lei wrote:
> Currently, we parse the "linux,usable-memory-range" property in
> early_init_dt_scan_chosen(), to obtain the specified memory range of the
> crash kernel. We then reserve the required memory after
> early_init_dt_scan_memory() has identified all available physical memory.
> Because the two pieces of code are separated far, the readability and
> maintainability are reduced. So bring them together.

Rob Herring is pushing a modified version of this patch (changes made by
Pingfan Liu) to fix a regression. This will cause a conflict when it hits
mainline.

https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/log/?h=dt/linus

https://lore.kernel.org/linux-arm-kernel/[email protected]/

>
> Suggested-by: Rob Herring <[email protected]>
> Signed-off-by: Zhen Lei <[email protected]>
> Reviewed-by: Rob Herring <[email protected]>
> Tested-by: Dave Kleikamp <[email protected]>
> ---
> drivers/of/fdt.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index bdca35284cebd56..37b477a51175359 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
> elfcorehdr_addr, elfcorehdr_size);
> }
>
> -static phys_addr_t cap_mem_addr;
> -static phys_addr_t cap_mem_size;
> +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
>
> /**
> * early_init_dt_check_for_usable_mem_range - Decode usable memory range
> @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
> {
> const __be32 *prop;
> int len;
> + phys_addr_t cap_mem_addr;
> + phys_addr_t cap_mem_size;
> +
> + if ((long)node < 0)
> + return;
>
> pr_debug("Looking for usable-memory-range property... ");
>
> @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
>
> pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
> &cap_mem_size);
> +
> + memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> }
>
> #ifdef CONFIG_SERIAL_EARLYCON
> @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
> (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
> return 0;
>
> + chosen_node_offset = node;
> +
> early_init_dt_check_for_initrd(node);
> early_init_dt_check_for_elfcorehdr(node);
> - early_init_dt_check_for_usable_mem_range(node);
>
> /* Retrieve command line */
> p = of_get_flat_dt_prop(node, "bootargs", &l);
> @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void)
> of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>
> /* Handle linux,usable-memory-range property */
> - memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> + early_init_dt_check_for_usable_mem_range(chosen_node_offset);
> }
>
> bool __init early_init_dt_scan(void *params)


2021-12-23 17:26:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()

On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
> From: Chen Zhou <[email protected]>
>
> We will make the functions reserve_crashkernel() as generic, the
> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> x86,

Why is that so? Is Xen-PV x86-only?

> the same as insert_resource() in reserve_crashkernel[_low]().

Why?

Looking at

0212f9159694 ("x86: Add Crash kernel low reservation")

it *surprisingly* explains why that resources thing is being added:

We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.

Then,

157752d84f5d ("kexec: use Crash kernel for Crash kernel low")

renamed it because, as it states, kexec-tools was taught to handle
multiple resources of the same name.

So why does kexec-tools on arm *not* need those iomem resources? How
does it parse the ranges there? Questions over questions...

So last time I told you to sit down and take your time with this cleanup.
From reading this here, it doesn't look like it. Rather, it looks like
hastily done in a hurry and hurrying stuff doesn't help you one bit - it
actually makes it worse.

Your commit messages need to explain *why* a change is being done and
why is that ok. This one doesn't.

> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
> * Reserve memory for crash kernel after SRAT is parsed so that it
> * won't consume hotpluggable memory.
> */
> - reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> + if (xen_pv_domain())
> + pr_info("Ignoring crashkernel for a Xen PV domain\n");

This is wrong - the check is currently being done inside
reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline
correctly - and not before.

Your change would print on Xen PV, regardless of whether it has received
crashkernel= on the cmdline or not.

This is exactly why I say that making those functions generic and shared
might not be such a good idea, after all, because then you'd have to
sprinkle around arch-specific stuff.

One of the ways how to address this particular case here would be:

1. Add a x86-specific wrapper around parse_crashkernel() which does
all the parsing. When that wrapper finishes, you should have parsed
everything that has crashkernel= on the cmdline.

2. At the end of that wrapper, you do arch-specific checks and setup
like the xen_pv_domain() one.

3. Now, you do reserve_crashkernel(), if those checks pass.

The question is, whether the flow on arm64 can do the same. Probably but
it needs careful auditing.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-24 01:04:05

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 15/17] of: fdt: Aggregate the processing of "linux,usable-memory-range"



On 2021/12/23 23:48, Dave Kleikamp wrote:
> On 12/22/21 7:08AM, Zhen Lei wrote:
>> Currently, we parse the "linux,usable-memory-range" property in
>> early_init_dt_scan_chosen(), to obtain the specified memory range of the
>> crash kernel. We then reserve the required memory after
>> early_init_dt_scan_memory() has identified all available physical memory.
>> Because the two pieces of code are separated far, the readability and
>> maintainability are reduced. So bring them together.
>
> Rob Herring is pushing a modified version of this patch (changes made by
> Pingfan Liu) to fix a regression. This will cause a conflict when it hits
> mainline.

Yes, I saw it yesterday, and I'll delete it in the next version. Thanks.

>
> https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/log/?h=dt/linus
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
>>
>> Suggested-by: Rob Herring <[email protected]>
>> Signed-off-by: Zhen Lei <[email protected]>
>> Reviewed-by: Rob Herring <[email protected]>
>> Tested-by: Dave Kleikamp <[email protected]>
>> ---
>>   drivers/of/fdt.c | 15 +++++++++++----
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index bdca35284cebd56..37b477a51175359 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -965,8 +965,7 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
>>            elfcorehdr_addr, elfcorehdr_size);
>>   }
>>   -static phys_addr_t cap_mem_addr;
>> -static phys_addr_t cap_mem_size;
>> +static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
>>     /**
>>    * early_init_dt_check_for_usable_mem_range - Decode usable memory range
>> @@ -977,6 +976,11 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
>>   {
>>       const __be32 *prop;
>>       int len;
>> +    phys_addr_t cap_mem_addr;
>> +    phys_addr_t cap_mem_size;
>> +
>> +    if ((long)node < 0)
>> +        return;
>>         pr_debug("Looking for usable-memory-range property... ");
>>   @@ -989,6 +993,8 @@ static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
>>         pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
>>            &cap_mem_size);
>> +
>> +    memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
>>   }
>>     #ifdef CONFIG_SERIAL_EARLYCON
>> @@ -1137,9 +1143,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
>>           (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
>>           return 0;
>>   +    chosen_node_offset = node;
>> +
>>       early_init_dt_check_for_initrd(node);
>>       early_init_dt_check_for_elfcorehdr(node);
>> -    early_init_dt_check_for_usable_mem_range(node);
>>         /* Retrieve command line */
>>       p = of_get_flat_dt_prop(node, "bootargs", &l);
>> @@ -1275,7 +1282,7 @@ void __init early_init_dt_scan_nodes(void)
>>       of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>>         /* Handle linux,usable-memory-range property */
>> -    memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
>> +    early_init_dt_check_for_usable_mem_range(chosen_node_offset);
>>   }
>>     bool __init early_init_dt_scan(void *params)
>
> .
>

2021-12-24 06:37:07

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()



On 2021/12/24 1:26, Borislav Petkov wrote:
> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>> From: Chen Zhou <[email protected]>
>>
>> We will make the functions reserve_crashkernel() as generic, the
>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>> x86,
>
> Why is that so? Is Xen-PV x86-only?
>
>> the same as insert_resource() in reserve_crashkernel[_low]().
>
> Why?
>
> Looking at
>
> 0212f9159694 ("x86: Add Crash kernel low reservation")
>
> it *surprisingly* explains why that resources thing is being added:
>
> We need to add another range in /proc/iomem like "Crash kernel low",
> so kexec-tools could find that info and append to kdump kernel
> command line.
>
> Then,
>
> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>
> renamed it because, as it states, kexec-tools was taught to handle
> multiple resources of the same name.
>
> So why does kexec-tools on arm *not* need those iomem resources? How
> does it parse the ranges there? Questions over questions...

https://lkml.org/lkml/2019/4/4/1758

Chen Zhou has explained before, see below. I'll analyze why x86 and arm64 need
to process iomem resources at different times.

< This very reminds what x86 does. Any chance some of the code can be reused
< rather than duplicated?
As i said in the comment, i transport reserve_crashkernel_low() from x86_64. There are minor
differences. In arm64, we don't need to do insert_resource(), we do request_resource()
in request_standard_resources() later.

>
> So last time I told you to sit down and take your time with this cleanup.
>>From reading this here, it doesn't look like it. Rather, it looks like
> hastily done in a hurry and hurrying stuff doesn't help you one bit - it
> actually makes it worse.
>
> Your commit messages need to explain *why* a change is being done and
> why is that ok. This one doesn't.

OK, I'll do this in follow-up patches.

>
>> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
>> * Reserve memory for crash kernel after SRAT is parsed so that it
>> * won't consume hotpluggable memory.
>> */
>> - reserve_crashkernel();
>> +#ifdef CONFIG_KEXEC_CORE
>> + if (xen_pv_domain())
>> + pr_info("Ignoring crashkernel for a Xen PV domain\n");
>
> This is wrong - the check is currently being done inside
> reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline
> correctly - and not before.
>
> Your change would print on Xen PV, regardless of whether it has received
> crashkernel= on the cmdline or not.

Yes, you're right. There are changes in code logic, but the print doesn't
seem to cause any misunderstanding.

>
> This is exactly why I say that making those functions generic and shared
> might not be such a good idea, after all, because then you'd have to
> sprinkle around arch-specific stuff.

Yes, I'm thinking about that too. Perhaps they are not suitable for full
code sharing, but it looks like there's some code that can be shared.
For example, the function parse_crashkernel_in_order() that I extracted
based on your suggestion, it could also be parse_crashkernel_high_low().
Or the function reserve_crashkernel_low().

There are two ways to reserve memory above 4G:
1. Use crashkernel=X,high, with or without crashkernel=X,low
2. Use crashkernel=X,[offset], but try low memory first. If failed, then
try high memory, and retry at least 256M low memory.

I plan to only implement 2 in the next version so that there can be fewer
changes. Then implement 1 after 2 is applied.

>
> One of the ways how to address this particular case here would be:
>
> 1. Add a x86-specific wrapper around parse_crashkernel() which does
> all the parsing. When that wrapper finishes, you should have parsed
> everything that has crashkernel= on the cmdline.
>
> 2. At the end of that wrapper, you do arch-specific checks and setup
> like the xen_pv_domain() one.
>
> 3. Now, you do reserve_crashkernel(), if those checks pass.
>
> The question is, whether the flow on arm64 can do the same. Probably but
> it needs careful auditing.
>

2021-12-25 01:53:41

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()



On 2021/12/24 14:36, Leizhen (ThunderTown) wrote:
>
>
> On 2021/12/24 1:26, Borislav Petkov wrote:
>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>>> From: Chen Zhou <[email protected]>
>>>
>>> We will make the functions reserve_crashkernel() as generic, the
>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>>> x86,
>>
>> Why is that so? Is Xen-PV x86-only?
>>
>>> the same as insert_resource() in reserve_crashkernel[_low]().
>>
>> Why?
>>
>> Looking at
>>
>> 0212f9159694 ("x86: Add Crash kernel low reservation")
>>
>> it *surprisingly* explains why that resources thing is being added:
>>
>> We need to add another range in /proc/iomem like "Crash kernel low",
>> so kexec-tools could find that info and append to kdump kernel
>> command line.
>>
>> Then,
>>
>> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>>
>> renamed it because, as it states, kexec-tools was taught to handle
>> multiple resources of the same name.
>>
>> So why does kexec-tools on arm *not* need those iomem resources? How
>> does it parse the ranges there? Questions over questions...

It's a good question worth figuring out. I'm going to dig into this.
I admire your rigorous style and sharp vision.

>
> https://lkml.org/lkml/2019/4/4/1758
>
> Chen Zhou has explained before, see below. I'll analyze why x86 and arm64 need
> to process iomem resources at different times.
>
> < This very reminds what x86 does. Any chance some of the code can be reused
> < rather than duplicated?
> As i said in the comment, i transport reserve_crashkernel_low() from x86_64. There are minor
> differences. In arm64, we don't need to do insert_resource(), we do request_resource()
> in request_standard_resources() later.
>
>>
>> So last time I told you to sit down and take your time with this cleanup.
>> >From reading this here, it doesn't look like it. Rather, it looks like
>> hastily done in a hurry and hurrying stuff doesn't help you one bit - it
>> actually makes it worse.
>>
>> Your commit messages need to explain *why* a change is being done and
>> why is that ok. This one doesn't.
>
> OK, I'll do this in follow-up patches.
>
>>
>>> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
>>> * Reserve memory for crash kernel after SRAT is parsed so that it
>>> * won't consume hotpluggable memory.
>>> */
>>> - reserve_crashkernel();
>>> +#ifdef CONFIG_KEXEC_CORE
>>> + if (xen_pv_domain())
>>> + pr_info("Ignoring crashkernel for a Xen PV domain\n");

Right, these two lines of code do not need to be moved. xen_pv_domain() is
a friendly macro function.

>>
>> This is wrong - the check is currently being done inside
>> reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline
>> correctly - and not before.
>>
>> Your change would print on Xen PV, regardless of whether it has received
>> crashkernel= on the cmdline or not.
>
> Yes, you're right. There are changes in code logic, but the print doesn't
> seem to cause any misunderstanding.
>
>>
>> This is exactly why I say that making those functions generic and shared
>> might not be such a good idea, after all, because then you'd have to
>> sprinkle around arch-specific stuff.
>
> Yes, I'm thinking about that too. Perhaps they are not suitable for full
> code sharing, but it looks like there's some code that can be shared.
> For example, the function parse_crashkernel_in_order() that I extracted
> based on your suggestion, it could also be parse_crashkernel_high_low().
> Or the function reserve_crashkernel_low().
>
> There are two ways to reserve memory above 4G:
> 1. Use crashkernel=X,high, with or without crashkernel=X,low
> 2. Use crashkernel=X,[offset], but try low memory first. If failed, then
> try high memory, and retry at least 256M low memory.
>
> I plan to only implement 2 in the next version so that there can be fewer
> changes. Then implement 1 after 2 is applied.

I tried it yesterday and it didn't work. I still have to deal with the
problem of adjusting insert_resource().

How about I isolate some cleanup patches first? Strive for them to be
merged into v5.17. This way, we can focus on the core changes in the
next version. And I can also save some repetitive rebase workload.

>
>>
>> One of the ways how to address this particular case here would be:
>>
>> 1. Add a x86-specific wrapper around parse_crashkernel() which does
>> all the parsing. When that wrapper finishes, you should have parsed
>> everything that has crashkernel= on the cmdline.
>>
>> 2. At the end of that wrapper, you do arch-specific checks and setup
>> like the xen_pv_domain() one.
>>
>> 3. Now, you do reserve_crashkernel(), if those checks pass.
>>
>> The question is, whether the flow on arm64 can do the same. Probably but
>> it needs careful auditing.
>>

2021-12-25 01:59:03

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 04/17] x86/setup: Add helper parse_crashkernel_in_order()



On 2021/12/22 21:08, Zhen Lei wrote:
> Currently, there are two possible combinations of configurations.
> (1) crashkernel=X[@offset]
> (2) crashkernel=X,high, with or without crashkernel=X,low
>
> (1) has the highest priority, if it is configured correctly, (2) will be
> ignored. Similarly, in combination (2), crashkernel=X,low is valid only
> when crashkernel=X,high is valid.
>
> Putting the operations of parsing all "crashkernel=" configurations in one
> function helps to sort out the strong dependency.
>
> So add helper parse_crashkernel_in_order(). The "__maybe_unused" will be
> removed in the next patch.
>
> Signed-off-by: Zhen Lei <[email protected]>
> ---
> arch/x86/kernel/setup.c | 51 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 51 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d9080bfa131a654..f997074d36f2484 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -439,6 +439,57 @@ static int __init reserve_crashkernel_low(void)
> }
> #endif
>
> +#define CRASHKERNEL_MEM_NONE 0x0 /* crashkernel= is not exist or invalid */
> +#define CRASHKERNEL_MEM_CLASSIC 0x1 /* crashkernel=X[@offset] is valid */
> +#define CRASHKERNEL_MEM_HIGH 0x2 /* crashkernel=X,high is valid */
> +#define CRASHKERNEL_MEM_LOW 0x4 /* crashkernel=X,low is valid */
> +
> +/**
> + * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in
> + * priority order until a valid combination is found.
> + * @cmdline: The bootup command line.
> + * @system_ram: Total system memory size.
> + * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or
> + * "crashkernel=X,high".
> + * @crash_base: Save the base address specified by "crashkernel=X@offset"
> + * @low_size: Save the memory size specified by "crashkernel=X,low"
> + *
> + * Returns the status flag of the parsing result of "crashkernel=", such as
> + * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH.
> + */
> +__maybe_unused
> +static int __init parse_crashkernel_in_order(char *cmdline,
> + unsigned long long system_ram,
> + unsigned long long *crash_size,
> + unsigned long long *crash_base,
> + unsigned long long *low_size)

I rethought yesterday that this function name is not self-annotated. In addition,
the meaning of the return value is not mainstream. It would be better to change it
to parse_crashkernel_high_low().

> +{
> + int ret, flag = CRASHKERNEL_MEM_NONE;
> +
> + BUG_ON(!crash_size || !crash_base || !low_size);
> +
> + /* crashkernel=X[@offset] */
> + ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base);
> + if (!ret && crash_size > 0)
> + return CRASHKERNEL_MEM_CLASSIC;
> +
> +#ifdef CONFIG_X86_64
> + /* crashkernel=X,high */
> + ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
> + if (ret || crash_size <= 0)
> + return CRASHKERNEL_MEM_NONE;
> +
> + flag = CRASHKERNEL_MEM_HIGH;
> +
> + /* crashkernel=Y,low */
> + ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base);
> + if (!ret)
> + flag |= CRASHKERNEL_MEM_LOW;
> +#endif
> +
> + return flag;
> +}
> +
> static void __init reserve_crashkernel(void)
> {
> unsigned long long crash_size, crash_base, total_mem;
>

2021-12-25 10:17:02

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()



On 2021/12/25 9:53, Leizhen (ThunderTown) wrote:
>>> This is exactly why I say that making those functions generic and shared
>>> might not be such a good idea, after all, because then you'd have to
>>> sprinkle around arch-specific stuff.

Hi Borislav and all:
Merry Christmas!

I have a new idea now. It helps us get around all the arguments and
minimizes changes to the x86 (also to arm64).
Previously, Chen Zhou and I tried to share the entire function
reserve_crashkernel(), which led to the following series of problems:
1. reserve_crashkernel() is also defined on other architectures, so we should
add build option ARCH_WANT_RESERVE_CRASH_KERNEL to avoid conflicts.
2. Move xen_pv_domain() check out of reserve_crashkernel().
3. Move insert_resource() out of reserve_crashkernel()

Others:
4. start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
crash_base + crash_size);
Change SZ_1M to CRASH_ALIGN, or keep it no change.
The current conclusion is no change. But I think adding a new macro
CRASH_FIXED_ALIGN is also a way. 2M alignment allows page tables to
use block mappings for most architectures.
5. if (crash_base >= (1ULL << 32) && reserve_crashkernel_low())
Change (1ULL << 32) to CRASH_ADDR_LOW_MAX, or keep it no change.
I reanalyzed it, and this doesn't need to be changed.

So for 1-3,why not add a new function reserve_crashkernel_mem() and rename
reserve_crashkernel_low() to reserve_crashkernel_mem_low().
On x86:
static void __init reserve_crashkernel(void)
{
//Parse all "crashkernel=" configurations in priority order until
//a valid combination is found. Or return upon failure.

if (xen_pv_domain()) {
pr_info("Ignoring crashkernel for a Xen PV domain\n");
return;
}

//Call reserve_crashkernel_mem() to reserve crashkernel memory, it will
//call reserve_crashkernel_mem_low() if needed.

if (crashk_low_res.end)
insert_resource(&iomem_resource, &crashk_low_res);
insert_resource(&iomem_resource, &crashk_res);
}

On arm64:
static void __init reserve_crashkernel(void)
{
//Parse all "crashkernel=" configurations in priority order until
//a valid combination is found. Or return upon failure.

//Call reserve_crashkernel_mem() to reserve crashkernel memory, it will
//call reserve_crashkernel_mem_low() if needed.
}


1. reserve_crashkernel() is still static, so that there is no
need to add ARCH_WANT_RESERVE_CRASH_KERNEL.
2. The xen_pv_domain() check have not been affected in any way.
Hi Borislav:
As you mentioned, this check may also be needed on arm64. But it may be
better not to add it until the problem is actually triggered on arm64.
3. insert_resource() is not moved outside reserve_crashkernel() on x86.
Hi Borislav:
Currently, I haven't figured out why request_resource() can't be replaced
with insert_resource() on arm64. But I have a hunch that the kexec tool may
be involved. The cost of modification on arm64 is definitely higher than that
on x86. Other architectures that want to use reserve_crashkernel_mem() may
also face the same problem. So it's probably better that function
reserve_crashkernel_mem() doesn't invoke insert_resource().

I guess you have a long Christmas holiday. So I'm going to send the next version
without waiting for your response.


>> Yes, I'm thinking about that too. Perhaps they are not suitable for full
>> code sharing, but it looks like there's some code that can be shared.
>> For example, the function parse_crashkernel_in_order() that I extracted
>> based on your suggestion, it could also be parse_crashkernel_high_low().
>> Or the function reserve_crashkernel_low().
>>
>> There are two ways to reserve memory above 4G:
>> 1. Use crashkernel=X,high, with or without crashkernel=X,low
>> 2. Use crashkernel=X,[offset], but try low memory first. If failed, then
>> try high memory, and retry at least 256M low memory.
>>
>> I plan to only implement 2 in the next version so that there can be fewer
>> changes. Then implement 1 after 2 is applied.
> I tried it yesterday and it didn't work. I still have to deal with the
> problem of adjusting insert_resource().
>
> How about I isolate some cleanup patches first? Strive for them to be
> merged into v5.17. This way, we can focus on the core changes in the
> next version. And I can also save some repetitive rebase workload.
>

--
Regards,
Zhen Lei

2022-01-07 08:13:06

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()



On 2021/12/25 9:53, Leizhen (ThunderTown) wrote:
>
>
> On 2021/12/24 14:36, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/12/24 1:26, Borislav Petkov wrote:
>>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>>>> From: Chen Zhou <[email protected]>
>>>>
>>>> We will make the functions reserve_crashkernel() as generic, the
>>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>>>> x86,
>>>
>>> Why is that so? Is Xen-PV x86-only?
>>>
>>>> the same as insert_resource() in reserve_crashkernel[_low]().
>>>
>>> Why?
>>>
>>> Looking at
>>>
>>> 0212f9159694 ("x86: Add Crash kernel low reservation")
>>>
>>> it *surprisingly* explains why that resources thing is being added:
>>>
>>> We need to add another range in /proc/iomem like "Crash kernel low",
>>> so kexec-tools could find that info and append to kdump kernel
>>> command line.
>>>
>>> Then,
>>>
>>> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>>>
>>> renamed it because, as it states, kexec-tools was taught to handle
>>> multiple resources of the same name.
>>>
>>> So why does kexec-tools on arm *not* need those iomem resources? How
>>> does it parse the ranges there? Questions over questions...

Hi Borislav:
The reason why insert_resource() cannot be used in reserve_crashkernel[_low]()
on arm64 is clear. The parent resource node of crashk[_low]_res is added by
request_resource() in request_standard_resources(), so that it will be conflicted.
All request_resource() in request_standard_resources() should be changed to
insert_resource(), to make insert_resource() can be used in reserve_crashkernel[_low]().

I found commit e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem")
in kexec-tools. I'm trying to port it to arm64, or make it generic.

Thanks.

>
> It's a good question worth figuring out. I'm going to dig into this.
> I admire your rigorous style and sharp vision.
>


--
Regards,
Zhen Lei

2022-01-07 13:09:35

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH v18 02/17] x86/setup: Move xen_pv_domain() check and insert_resource() to setup_arch()



On 2022/1/7 16:13, Leizhen (ThunderTown) wrote:
>
>
> On 2021/12/25 9:53, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/12/24 14:36, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2021/12/24 1:26, Borislav Petkov wrote:
>>>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>>>>> From: Chen Zhou <[email protected]>
>>>>>
>>>>> We will make the functions reserve_crashkernel() as generic, the
>>>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>>>>> x86,
>>>>
>>>> Why is that so? Is Xen-PV x86-only?
>>>>
>>>>> the same as insert_resource() in reserve_crashkernel[_low]().
>>>>
>>>> Why?
>>>>
>>>> Looking at
>>>>
>>>> 0212f9159694 ("x86: Add Crash kernel low reservation")
>>>>
>>>> it *surprisingly* explains why that resources thing is being added:
>>>>
>>>> We need to add another range in /proc/iomem like "Crash kernel low",
>>>> so kexec-tools could find that info and append to kdump kernel
>>>> command line.
>>>>
>>>> Then,
>>>>
>>>> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>>>>
>>>> renamed it because, as it states, kexec-tools was taught to handle
>>>> multiple resources of the same name.
>>>>
>>>> So why does kexec-tools on arm *not* need those iomem resources? How
>>>> does it parse the ranges there? Questions over questions...
>
> Hi Borislav:
> The reason why insert_resource() cannot be used in reserve_crashkernel[_low]()
> on arm64 is clear. The parent resource node of crashk[_low]_res is added by
> request_resource() in request_standard_resources(), so that it will be conflicted.
> All request_resource() in request_standard_resources() should be changed to
> insert_resource(), to make insert_resource() can be used in reserve_crashkernel[_low]().
>
> I found commit e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem")
> in kexec-tools. I'm trying to port it to arm64, or make it generic.

Chen Zhou's done it before. But the "Crash kernel (low)" can really be eliminated. Chen
Zhou just used it to distinguish whether the crashkernel memory range is crashkernel load
range or not. We can use get_crash_kernel_load_range() to get and check the load range.


>
> Thanks.
>
>>
>> It's a good question worth figuring out. I'm going to dig into this.
>> I admire your rigorous style and sharp vision.
>>
>
>

--
Regards,
Zhen Lei