There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
if the memory reserved for crash dump kernel falled in ZONE_DMA32,
the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.
To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone, and fall back to
high allocation if it fails.
If requized size X is too large and leads to very little low memory
in DMA zone after low allocation, the system may not work normally.
So add a threshold and go for high allocation directly if the required
size is too large. The value of threshold is set as the half of
the low memory.
We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.
For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.
Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])
Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])
This patchset contains the following nine patches:
0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
0006-arm64-kdump-reimplement-crashkernel-X.patch
0007-kdump-add-threshold-for-the-required-memory.patch
0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
0009-kdump-update-Documentation-about-crashkernel.patch
0001-0003 are some x86 cleanups which prepares for making
functionsreserve_crashkernel[_low]() generic.
0004 makes functions reserve_crashkernel[_low]() generic.
0005-0006 reimplements crashkernel=X.
0007 adds threshold for the required memory.
0008 adds memory for devices by DT property linux,usable-memory-range.
0009 updates the doc.
Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.
Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.
Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.
Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.
Changes since [v6]
- Fix build errors reported by kbuild test robot.
Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.
Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.
Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.
Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.
[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
Chen Zhou (9):
x86: kdump: move CRASH_ALIGN to 2M
x86: kdump: make the lower bound of crash kernel reservation
consistent
x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
reserve_crashkernel[_low]()
x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
arm64: kdump: introduce some macroes for crash kernel reservation
arm64: kdump: reimplement crashkernel=X
kdump: add threshold for the required memory
arm64: kdump: add memory for devices by DT property
linux,usable-memory-range
kdump: update Documentation about crashkernel
Documentation/admin-guide/kdump/kdump.rst | 25 ++-
.../admin-guide/kernel-parameters.txt | 13 +-
arch/arm64/include/asm/kexec.h | 15 ++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/kernel/setup.c | 13 +-
arch/arm64/mm/init.c | 105 ++++------
arch/arm64/mm/mmu.c | 4 +
arch/x86/include/asm/kexec.h | 28 +++
arch/x86/kernel/setup.c | 165 +--------------
include/linux/crash_core.h | 4 +
include/linux/kexec.h | 2 -
kernel/crash_core.c | 192 ++++++++++++++++++
kernel/kexec_core.c | 17 --
13 files changed, 328 insertions(+), 256 deletions(-)
--
2.20.1
CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M and default
value is 2M, so move CRASH_ALIGN to 2M, with smaller value reservation
can have more chance to succeed.
And replace the hard-coded alignment with macro CRASH_ALIGN in function
reserve_crashkernel().
Suggested-by: Dave Young <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
---
arch/x86/include/asm/kexec.h | 3 +++
arch/x86/kernel/setup.c | 5 +----
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6802c59e8252..83f200dd54a1 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -18,6 +18,9 @@
# define KEXEC_CONTROL_CODE_MAX_SIZE 2048
+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_2M
+
#ifndef __ASSEMBLY__
#include <linux/string.h>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3511736fbc74..296294ad0dd8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -402,9 +402,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
#ifdef CONFIG_KEXEC_CORE
-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN SZ_16M
-
/*
* Keep the crash kernel below this limit.
*
@@ -530,7 +527,7 @@ static void __init reserve_crashkernel(void)
start = memblock_find_in_range(crash_base,
crash_base + crash_size,
- crash_size, 1 << 20);
+ crash_size, CRASH_ALIGN);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
--
2.20.1
The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.
Suggested-by: Dave Young <[email protected]>
Signed-off-by: Chen Zhou <[email protected]>
---
arch/x86/kernel/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 296294ad0dd8..d7fd90c52dae 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
return 0;
}
- low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+ low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
if (!low_base) {
pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
(unsigned long)(low_size >> 20));
--
2.20.1
Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macroes instead.
Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.
Signed-off-by: Chen Zhou <[email protected]>
---
arch/arm64/include/asm/kexec.h | 6 ++++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/mm/init.c | 8 ++++----
3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index d24b527e8c00..402d208265a3 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,12 @@
#define KEXEC_ARCH KEXEC_ARCH_AARCH64
+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma32_phys_limit
+#define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
+
#ifndef __ASSEMBLY__
/**
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 240fe5e5b720..af71063f352c 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -95,6 +95,7 @@
#endif /* CONFIG_ARM64_FORCE_52BIT */
extern phys_addr_t arm64_dma_phys_limit;
+extern phys_addr_t arm64_dma32_phys_limit;
#define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1)
struct debug_info {
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 481d22c32a2e..ad27dc4cc55e 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -67,7 +67,7 @@ EXPORT_SYMBOL(vmemmap);
* bit addressable memory area.
*/
phys_addr_t arm64_dma_phys_limit __ro_after_init;
-static phys_addr_t arm64_dma32_phys_limit __ro_after_init;
+phys_addr_t arm64_dma32_phys_limit __ro_after_init;
#ifdef CONFIG_KEXEC_CORE
/*
@@ -92,8 +92,8 @@ static void __init reserve_crashkernel(void)
if (crash_base == 0) {
/* Current arm64 boot protocol requires 2MB alignment */
- crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
- crash_size, SZ_2M);
+ crash_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX,
+ crash_size, CRASH_ALIGN);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
crash_size);
@@ -111,7 +111,7 @@ static void __init reserve_crashkernel(void)
return;
}
- if (!IS_ALIGNED(crash_base, SZ_2M)) {
+ if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n");
return;
}
--
2.20.1
For crashkernel=X, if required size X is too large and leads to very
little free low memory after low allocation, the system may not work
normally.
So add a threshold and go for high allocation directly if the required
size is too large. The value of threshold is set as the half of the
low memory.
Signed-off-by: Chen Zhou <[email protected]>
---
kernel/crash_core.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 3f735cb37ace..d11d597a470d 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -378,6 +378,15 @@ int __init reserve_crashkernel_low(void)
}
#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
+
+/*
+ * Add a threshold for required memory size of crashkernel. If required memory
+ * size is greater than threshold, just go for high allocation directly. The
+ * value of threshold is set as half of the total low memory.
+ */
+#define REQUIRED_MEMORY_THRESHOLD (memblock_mem_size(CRASH_ADDR_LOW_MAX >> \
+ PAGE_SHIFT) >> 1)
+
#ifdef CONFIG_KEXEC_CORE
/*
* reserve_crashkernel() - reserves memory for crash kernel
@@ -422,7 +431,7 @@ void __init reserve_crashkernel(void)
* So try low memory first and fall back to high memory
* unless "crashkernel=size[KMG],high" is specified.
*/
- if (!high)
+ if (!high && crash_size <= REQUIRED_MEMORY_THRESHOLD)
crash_base = memblock_find_in_range(CRASH_ALIGN,
CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
--
2.20.1
To make the functions reserve_crashkernel[_low]() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
Signed-off-by: Chen Zhou <[email protected]>
---
arch/x86/kernel/setup.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7fd90c52dae..71a6a6e7ca5b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
unsigned long total_low_mem;
int ret;
- total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
+ total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
/* crashkernel=Y,low */
ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
@@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
return 0;
}
- low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
+ low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, low_size, CRASH_ALIGN);
if (!low_base) {
pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
(unsigned long)(low_size >> 20));
@@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
if (!crash_base) {
/*
* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
- * crashkernel=x,high reserves memory over 4G, also allocates
- * 256M extra low memory for DMA buffers and swiotlb.
+ * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+ * also allocates 256M extra low memory for DMA buffers
+ * and swiotlb.
* But the extra memory is not required for all machines.
* So try low memory first and fall back to high memory
* unless "crashkernel=size[KMG],high" is specified.
@@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
return;
}
- if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
+ if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
memblock_free(crash_base, crash_size);
return;
}
--
2.20.1
For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone, and fall back to high allocation
if it fails.
We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory
in DMA zone.
For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
For x86 and arm64, we introduce threshold for the required memory.
if required size X is too large and leads to very little free low
memory after low allocation, the system may not work well.
So add a threshold and go for high allocation directly if the required
size is too large. The threshold is set as the half of low memory.
So update the Documentation.
Signed-off-by: Chen Zhou <[email protected]>
---
Documentation/admin-guide/kdump/kdump.rst | 25 ++++++++++++++++---
.../admin-guide/kernel-parameters.txt | 13 ++++++++--
2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..549611abc581 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -299,7 +299,16 @@ Boot into System Kernel
"crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
- On x86 and x86_64, use "crashkernel=64M@16M".
+ On x86 use "crashkernel=64M@16M".
+
+ On x86_64, use "crashkernel=X" to select a region under 4G first, and
+ fall back to reserve region above 4G. And go for high allocation
+ directly if the required size is too large.
+ We can also use "crashkernel=X,high" to select a region above 4G, which
+ also tries to allocate at least 256M below 4G automatically and
+ "crashkernel=Y,low" can be used to allocate specified size low memory.
+ Use "crashkernel=Y@X" if you really have to reserve memory from specified
+ start address X.
On ppc64, use "crashkernel=128M@32M".
@@ -316,8 +325,18 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.
- On arm64, use "crashkernel=Y[@X]". Note that the start address of
- the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+ On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
+ fall back to high allocation if it fails. And go for high allocation
+ directly if the required size is too large.
+ We can also use "crashkernel=X,high" to select a high region above
+ DMA zone, which also tries to allocate at least 256M low memory in
+ DMA zone automatically.
+ "crashkernel=Y,low" can be used to allocate specified size low memory
+ in DMA zone.
+ For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
+ Use "crashkernel=Y@X" if you really have to reserve memory from
+ specified start address X. Note that the start address of the kernel,
+ X if explicitly specified, must be aligned to 2MiB (0x200000).
Load the Dump-capture Kernel
============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..f7df572d8f64 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -727,6 +727,10 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+ [KNL, arm64] Try low allocation in DMA zone, fall back
+ to high allocation if it fails when '@offset' hasn't been
+ specified. For non-RPi4 platforms, change DMA zone to
+ DMA32 zone.
See Documentation/admin-guide/kdump/kdump.rst for further details.
crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -743,6 +747,8 @@
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
+ [KNL, arm64] range in high memory.
+ Allow kernel to allocate physical memory region from top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -751,13 +757,16 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
- at least 256M below 4G automatically.
+ least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+ [KNL, arm64] range in low memory.
+ This one let user to specify a low range in DMA zone for
+ crash dump kernel. For non-RPi4 platforms, change DMA zone
+ to DMA32 zone.
cryptomgr.notests
[KNL] Disable crypto self-tests
--
2.20.1
Hi,
On 09/07/20 at 09:47pm, Chen Zhou wrote:
> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M and default
> value is 2M, so move CRASH_ALIGN to 2M, with smaller value reservation
> can have more chance to succeed.
Seems still some misunderstanding about the change :( I'm sorry if I
did not explain it clearly.
Previously I missed the PHYSICAL_ALIGN can change according to .config
I mean we should change the value to CONFIG_PHYSICAL_ALIGN for X86
And I suggest to move back to keep using 16M. And do not change it in
this series.
> And replace the hard-coded alignment with macro CRASH_ALIGN in function
> reserve_crashkernel().
>
> Suggested-by: Dave Young <[email protected]>
> Signed-off-by: Chen Zhou <[email protected]>
> ---
> arch/x86/include/asm/kexec.h | 3 +++
> arch/x86/kernel/setup.c | 5 +----
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 6802c59e8252..83f200dd54a1 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -18,6 +18,9 @@
>
> # define KEXEC_CONTROL_CODE_MAX_SIZE 2048
>
> +/* 2M alignment for crash kernel regions */
> +#define CRASH_ALIGN SZ_2M
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/string.h>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3511736fbc74..296294ad0dd8 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -402,9 +402,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>
> #ifdef CONFIG_KEXEC_CORE
>
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN SZ_16M
> -
> /*
> * Keep the crash kernel below this limit.
> *
> @@ -530,7 +527,7 @@ static void __init reserve_crashkernel(void)
>
> start = memblock_find_in_range(crash_base,
> crash_base + crash_size,
> - crash_size, 1 << 20);
> + crash_size, CRASH_ALIGN);
> if (start != crash_base) {
> pr_info("crashkernel reservation failed - memory is in use.\n");
> return;
> --
> 2.20.1
>
Thanks
Dave
On 2020/9/8 9:21, Dave Young wrote:
> Hi,
>
> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M and default
>> value is 2M, so move CRASH_ALIGN to 2M, with smaller value reservation
>> can have more chance to succeed.
> Seems still some misunderstanding about the change :( I'm sorry if I
> did not explain it clearly.
>
> Previously I missed the PHYSICAL_ALIGN can change according to .config
> I mean we should change the value to CONFIG_PHYSICAL_ALIGN for X86
> And I suggest to move back to keep using 16M. And do not change it in
> this series.
Hi Dave,
Sorry, i misunderstood about this.
Ok, this patch will keep the value of CRASH_ALIGN as it is,
just move CRASH_ALIGN to header asm/kexec.h and replace the hard-coded alignment
with macro CRASH_ALIGN in function reserve_crashkernel().
Thanks,
Chen Zhou
>
>> And replace the hard-coded alignment with macro CRASH_ALIGN in function
>> reserve_crashkernel().
>>
>> Suggested-by: Dave Young <[email protected]>
>> Signed-off-by: Chen Zhou <[email protected]>
>> ---
>> arch/x86/include/asm/kexec.h | 3 +++
>> arch/x86/kernel/setup.c | 5 +----
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 6802c59e8252..83f200dd54a1 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -18,6 +18,9 @@
>>
>> # define KEXEC_CONTROL_CODE_MAX_SIZE 2048
>>
>> +/* 2M alignment for crash kernel regions */
>> +#define CRASH_ALIGN SZ_2M
>> +
>> #ifndef __ASSEMBLY__
>>
>> #include <linux/string.h>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 3511736fbc74..296294ad0dd8 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -402,9 +402,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>
>> #ifdef CONFIG_KEXEC_CORE
>>
>> -/* 16M alignment for crash kernel regions */
>> -#define CRASH_ALIGN SZ_16M
>> -
>> /*
>> * Keep the crash kernel below this limit.
>> *
>> @@ -530,7 +527,7 @@ static void __init reserve_crashkernel(void)
>>
>> start = memblock_find_in_range(crash_base,
>> crash_base + crash_size,
>> - crash_size, 1 << 20);
>> + crash_size, CRASH_ALIGN);
>> if (start != crash_base) {
>> pr_info("crashkernel reservation failed - memory is in use.\n");
>> return;
>> --
>> 2.20.1
>>
> Thanks
> Dave
>
>
> .
>
On 9/7/20 8:47 AM, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone, and fall back to
> high allocation if it fails.
> If requized size X is too large and leads to very little low memory
> in DMA zone after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following nine patches:
> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-kdump-add-threshold-for-the-required-memory.patch
> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0009-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
>
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements crashkernel=X.
> 0007 adds threshold for the required memory.
> 0008 adds memory for devices by DT property linux,usable-memory-range.
> 0009 updates the doc.
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BsfMsIet$
> [2]: https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bv1JxB2D$
> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BgTzrgKq$
> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Btz3iM8F$
> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BuqcVDab$
> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bgdlc1Y7$
> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BsuuZ6C_$
> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bo4IxHqi$
> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BjlqN_6I$
> [v8]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/5/21/213__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BlBSztwY$
> [v9]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/28/73__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BoNFCNt9$
> [v10]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/7/2/1443__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BvfD2Ihf$
> [v11]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/8/1/150__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BohKxmce$
>
> Chen Zhou (9):
> x86: kdump: move CRASH_ALIGN to 2M
> x86: kdump: make the lower bound of crash kernel reservation
> consistent
> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> reserve_crashkernel[_low]()
> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> arm64: kdump: introduce some macroes for crash kernel reservation
> arm64: kdump: reimplement crashkernel=X
> kdump: add threshold for the required memory
> arm64: kdump: add memory for devices by DT property
> linux,usable-memory-range
> kdump: update Documentation about crashkernel
>
> Documentation/admin-guide/kdump/kdump.rst | 25 ++-
> .../admin-guide/kernel-parameters.txt | 13 +-
> arch/arm64/include/asm/kexec.h | 15 ++
> arch/arm64/include/asm/processor.h | 1 +
> arch/arm64/kernel/setup.c | 13 +-
> arch/arm64/mm/init.c | 105 ++++------
> arch/arm64/mm/mmu.c | 4 +
> arch/x86/include/asm/kexec.h | 28 +++
> arch/x86/kernel/setup.c | 165 +--------------
> include/linux/crash_core.h | 4 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 192 ++++++++++++++++++
> kernel/kexec_core.c | 17 --
> 13 files changed, 328 insertions(+), 256 deletions(-)
>
I did a brief unit-test on 5.9-rc4.
Please add:
Tested-by: John Donnelly <[email protected]>
This activity is over a year old. It needs accepted.
On 2020/9/7 21:47, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone, and fall back to
> high allocation if it fails.
> If requized size X is too large and leads to very little low memory
> in DMA zone after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following nine patches:
> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-kdump-add-threshold-for-the-required-memory.patch
> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0009-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
>
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements crashkernel=X.
> 0007 adds threshold for the required memory.
> 0008 adds memory for devices by DT property linux,usable-memory-range.
> 0009 updates the doc.
Hi Catalin and Dave,
Any other suggestions about this patchset? Let me know if you have any questions.
Thanks,
Chen Zhou
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
>
> Chen Zhou (9):
> x86: kdump: move CRASH_ALIGN to 2M
> x86: kdump: make the lower bound of crash kernel reservation
> consistent
> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> reserve_crashkernel[_low]()
> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> arm64: kdump: introduce some macroes for crash kernel reservation
> arm64: kdump: reimplement crashkernel=X
> kdump: add threshold for the required memory
> arm64: kdump: add memory for devices by DT property
> linux,usable-memory-range
> kdump: update Documentation about crashkernel
>
> Documentation/admin-guide/kdump/kdump.rst | 25 ++-
> .../admin-guide/kernel-parameters.txt | 13 +-
> arch/arm64/include/asm/kexec.h | 15 ++
> arch/arm64/include/asm/processor.h | 1 +
> arch/arm64/kernel/setup.c | 13 +-
> arch/arm64/mm/init.c | 105 ++++------
> arch/arm64/mm/mmu.c | 4 +
> arch/x86/include/asm/kexec.h | 28 +++
> arch/x86/kernel/setup.c | 165 +--------------
> include/linux/crash_core.h | 4 +
> include/linux/kexec.h | 2 -
> kernel/crash_core.c | 192 ++++++++++++++++++
> kernel/kexec_core.c | 17 --
> 13 files changed, 328 insertions(+), 256 deletions(-)
>
On 09/07/20 at 09:47pm, Chen Zhou wrote:
> To make the functions reserve_crashkernel[_low]() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>
> Signed-off-by: Chen Zhou <[email protected]>
> ---
> arch/x86/kernel/setup.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d7fd90c52dae..71a6a6e7ca5b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
> unsigned long total_low_mem;
> int ret;
>
> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
total_low_mem != CRASH_ADDR_LOW_MAX
>
> /* crashkernel=Y,low */
> ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
The param total_low_mem is for dynamically change crash_size according
to system ram size.
Is above change a must for your arm64 patches?
> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
> return 0;
> }
>
> - low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
> + low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, low_size, CRASH_ALIGN);
> if (!low_base) {
> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> (unsigned long)(low_size >> 20));
> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
> if (!crash_base) {
> /*
> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over 4G, also allocates
> - * 256M extra low memory for DMA buffers and swiotlb.
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> * But the extra memory is not required for all machines.
> * So try low memory first and fall back to high memory
> * unless "crashkernel=size[KMG],high" is specified.
> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
> return;
> }
>
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> memblock_free(crash_base, crash_size);
> return;
> }
> --
> 2.20.1
>
Hi Dave,
On 2020/9/18 11:01, Dave Young wrote:
> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>> To make the functions reserve_crashkernel[_low]() as generic,
>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>>
>> Signed-off-by: Chen Zhou <[email protected]>
>> ---
>> arch/x86/kernel/setup.c | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index d7fd90c52dae..71a6a6e7ca5b 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>> unsigned long total_low_mem;
>> int ret;
>>
>> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> total_low_mem != CRASH_ADDR_LOW_MAX
I just replace the magic number with macro, no other change.
Besides, function memblock_mem_size(limit_pfn) will compute the memory size
according to the actual system ram.
Thanks,
Chen Zhou
>
>>
>> /* crashkernel=Y,low */
>> ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
> The param total_low_mem is for dynamically change crash_size according
> to system ram size.
>
> Is above change a must for your arm64 patches?
See above.
>
>> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>> return 0;
>> }
>>
>> - low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
>> + low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, low_size, CRASH_ALIGN);
>> if (!low_base) {
>> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> (unsigned long)(low_size >> 20));
>> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>> if (!crash_base) {
>> /*
>> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>> - * crashkernel=x,high reserves memory over 4G, also allocates
>> - * 256M extra low memory for DMA buffers and swiotlb.
>> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>> + * also allocates 256M extra low memory for DMA buffers
>> + * and swiotlb.
>> * But the extra memory is not required for all machines.
>> * So try low memory first and fall back to high memory
>> * unless "crashkernel=size[KMG],high" is specified.
>> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>> return;
>> }
>>
>> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>> memblock_free(crash_base, crash_size);
>> return;
>> }
>> --
>> 2.20.1
>>
> .
>
On 09/18/20 at 11:57am, chenzhou wrote:
> Hi Dave,
>
>
> On 2020/9/18 11:01, Dave Young wrote:
> > On 09/07/20 at 09:47pm, Chen Zhou wrote:
> >> To make the functions reserve_crashkernel[_low]() as generic,
> >> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> >>
> >> Signed-off-by: Chen Zhou <[email protected]>
> >> ---
> >> arch/x86/kernel/setup.c | 11 ++++++-----
> >> 1 file changed, 6 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index d7fd90c52dae..71a6a6e7ca5b 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
> >> unsigned long total_low_mem;
> >> int ret;
> >>
> >> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> >> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> > total_low_mem != CRASH_ADDR_LOW_MAX
> I just replace the magic number with macro, no other change.
> Besides, function memblock_mem_size(limit_pfn) will compute the memory size
> according to the actual system ram.
>
Ok, it is not obvious in patch this is 64bit only, I'm fine with this
then.
Hi,
On 09/07/20 at 09:47pm, Chen Zhou wrote:
> To make the functions reserve_crashkernel[_low]() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>
> Signed-off-by: Chen Zhou <[email protected]>
> ---
> arch/x86/kernel/setup.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d7fd90c52dae..71a6a6e7ca5b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
> unsigned long total_low_mem;
> int ret;
>
> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
Just note that the replacement has been done in another patch from Mike
Rapoport, partially. He seems to have done reserve_crashkernel_low()
part, there's one left in reserve_crashkernel(), you might want to check
that.
Mike's patch which is from a patchset has been merged into Andrew's next
tree.
commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
Author: Mike Rapoport <[email protected]>
Date: Wed Aug 26 15:22:32 2020 +1000
x86/setup: simplify reserve_crashkernel()
>
> /* crashkernel=Y,low */
> ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
> return 0;
> }
>
> - low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
> + low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, low_size, CRASH_ALIGN);
> if (!low_base) {
> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> (unsigned long)(low_size >> 20));
> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
> if (!crash_base) {
> /*
> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> - * crashkernel=x,high reserves memory over 4G, also allocates
> - * 256M extra low memory for DMA buffers and swiotlb.
> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> + * also allocates 256M extra low memory for DMA buffers
> + * and swiotlb.
> * But the extra memory is not required for all machines.
> * So try low memory first and fall back to high memory
> * unless "crashkernel=size[KMG],high" is specified.
> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
> return;
> }
>
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> memblock_free(crash_base, crash_size);
> return;
> }
> --
> 2.20.1
>
Hi Baoquan,
On 2020/9/18 15:25, Baoquan He wrote:
> Hi,
>
> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>> To make the functions reserve_crashkernel[_low]() as generic,
>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>>
>> Signed-off-by: Chen Zhou <[email protected]>
>> ---
>> arch/x86/kernel/setup.c | 11 ++++++-----
>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index d7fd90c52dae..71a6a6e7ca5b 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>> unsigned long total_low_mem;
>> int ret;
>>
>> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> Just note that the replacement has been done in another patch from Mike
> Rapoport, partially. He seems to have done reserve_crashkernel_low()
> part, there's one left in reserve_crashkernel(), you might want to check
> that.
>
> Mike's patch which is from a patchset has been merged into Andrew's next
> tree.
>
> commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
> Author: Mike Rapoport <[email protected]>
> Date: Wed Aug 26 15:22:32 2020 +1000
>
> x86/setup: simplify reserve_crashkernel()
Yeah, the function reserve_crashkernel() has been changed in the next tree.
Thanks for your review and reminder.
Thanks,
Chen Zhou
>
>>
>> /* crashkernel=Y,low */
>> ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
>> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>> return 0;
>> }
>>
>> - low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
>> + low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, low_size, CRASH_ALIGN);
>> if (!low_base) {
>> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>> (unsigned long)(low_size >> 20));
>> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>> if (!crash_base) {
>> /*
>> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>> - * crashkernel=x,high reserves memory over 4G, also allocates
>> - * 256M extra low memory for DMA buffers and swiotlb.
>> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>> + * also allocates 256M extra low memory for DMA buffers
>> + * and swiotlb.
>> * But the extra memory is not required for all machines.
>> * So try low memory first and fall back to high memory
>> * unless "crashkernel=size[KMG],high" is specified.
>> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>> return;
>> }
>>
>> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>> memblock_free(crash_base, crash_size);
>> return;
>> }
>> --
>> 2.20.1
>>
> .
>
Hi Catalin,
On 2020/9/18 16:59, chenzhou wrote:
> Hi Baoquan,
>
> On 2020/9/18 15:25, Baoquan He wrote:
>> Hi,
>>
>> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>>> To make the functions reserve_crashkernel[_low]() as generic,
>>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>>>
>>> Signed-off-by: Chen Zhou <[email protected]>
>>> ---
>>> arch/x86/kernel/setup.c | 11 ++++++-----
>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index d7fd90c52dae..71a6a6e7ca5b 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>>> unsigned long total_low_mem;
>>> int ret;
>>>
>>> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
>> Just note that the replacement has been done in another patch from Mike
>> Rapoport, partially. He seems to have done reserve_crashkernel_low()
>> part, there's one left in reserve_crashkernel(), you might want to check
>> that.
>>
>> Mike's patch which is from a patchset has been merged into Andrew's next
>> tree.
>>
>> commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
>> Author: Mike Rapoport <[email protected]>
>> Date: Wed Aug 26 15:22:32 2020 +1000
>>
>> x86/setup: simplify reserve_crashkernel()
As Baoquan said, some functions have been changed in the next tree,
if i need to rebase on top of the next tree.
Thanks,
Chen Zhou
> Yeah, the function reserve_crashkernel() has been changed in the next tree.
> Thanks for your review and reminder.
>
> Thanks,
> Chen Zhou
>>>
>>> /* crashkernel=Y,low */
>>> ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
>>> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>>> return 0;
>>> }
>>>
>>> - low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, CRASH_ALIGN);
>>> + low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, low_size, CRASH_ALIGN);
>>> if (!low_base) {
>>> pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>> (unsigned long)(low_size >> 20));
>>> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>>> if (!crash_base) {
>>> /*
>>> * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>>> - * crashkernel=x,high reserves memory over 4G, also allocates
>>> - * 256M extra low memory for DMA buffers and swiotlb.
>>> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>>> + * also allocates 256M extra low memory for DMA buffers
>>> + * and swiotlb.
>>> * But the extra memory is not required for all machines.
>>> * So try low memory first and fall back to high memory
>>> * unless "crashkernel=size[KMG],high" is specified.
>>> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>>> return;
>>> }
>>>
>>> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>> memblock_free(crash_base, crash_size);
>>> return;
>>> }
>>> --
>>> 2.20.1
>>>
>> .
>>
> On Sep 15, 2020, at 2:16 AM, chenzhou <[email protected]> wrote:
>
>
>
> On 2020/9/7 21:47, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>>
>> To solve these issues, change the behavior of crashkernel=X.
>> crashkernel=X tries low allocation in DMA zone, and fall back to
>> high allocation if it fails.
>> If requized size X is too large and leads to very little low memory
>> in DMA zone after low allocation, the system may not work normally.
>> So add a threshold and go for high allocation directly if the required
>> size is too large. The value of threshold is set as the half of
>> the low memory.
>>
>> We can also use "crashkernel=X,high" to select a high region above
>> DMA zone, which also tries to allocate at least 256M low memory in
>> DMA zone automatically.
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>>
>> When reserving crashkernel in high memory, some low memory is reserved
>> for crash dump kernel devices. So there may be two regions reserved for
>> crash dump kernel.
>> In order to distinct from the high region and make no effect to the use
>> of existing kexec-tools, rename the low region as "Crash kernel (low)",
>> and pass the low region by reusing DT property
>> "linux,usable-memory-range". We made the low memory region as the last
>> range of "linux,usable-memory-range" to keep compatibility with existing
>> user-space and older kdump kernels.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: support more than one crash kernel regions(see [1])
>>
>> Another update is document about DT property 'linux,usable-memory-range':
>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>>
>> This patchset contains the following nine patches:
>> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
>> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
>> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
>> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
>> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
>> 0006-arm64-kdump-reimplement-crashkernel-X.patch
>> 0007-kdump-add-threshold-for-the-required-memory.patch
>> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
>> 0009-kdump-update-Documentation-about-crashkernel.patch
>>
>> 0001-0003 are some x86 cleanups which prepares for making
>> functionsreserve_crashkernel[_low]() generic.
>>
>> 0004 makes functions reserve_crashkernel[_low]() generic.
>> 0005-0006 reimplements crashkernel=X.
>> 0007 adds threshold for the required memory.
>> 0008 adds memory for devices by DT property linux,usable-memory-range.
>> 0009 updates the doc.
> Hi Catalin and Dave,
Hi,
This patch set has been going on since May, 2019. When will this be accepted and integrated into a rc build ?
>
> Any other suggestions about this patchset? Let me know if you have any questions.
>
> Thanks,
> Chen Zhou
>>
>> Changes since [v11]
>> - Rebased on top of 5.9-rc4.
>> - Make the function reserve_crashkernel() of x86 generic.
>> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
>> and arm64 use the generic version to reimplement crashkernel=X.
>>
>> Changes since [v10]
>> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>>
>> Changes since [v9]
>> - Patch 1 add Acked-by from Dave.
>> - Update patch 5 according to Dave's comments.
>> - Update chosen schema.
>>
>> Changes since [v8]
>> - Reuse DT property "linux,usable-memory-range".
>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
>> memory region.
>> - Fix kdump broken with ZONE_DMA reintroduced.
>> - Update chosen schema.
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt.
>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
>> suggested by Arnd.
>> - Add Tested-by from Jhon and pk.
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_zMCI6U-$
>> [2]: https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_5c9NEUf$
>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_1bFn-eN$
>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_wVqWygD$
>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_8fQ7uBl$
>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_ztbOBKM$
>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_9TAk7Oj$
>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_9IFx5Hx$
>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_0x8im8q$
>> [v8]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/5/21/213__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_yVVP42e$
>> [v9]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/28/73__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_y2-BLN1$
>> [v10]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/7/2/1443__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_0qJHLGR$
>> [v11]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/8/1/150__;!!GqivPVa7Brio!JI57eED82U9Uq1k8V_Kus7azGGPSDqfaSZPHM0WkR6OxQ0trzzeR2zyIkUM8_3QitPUY$
>>
>> Chen Zhou (9):
>> x86: kdump: move CRASH_ALIGN to 2M
>> x86: kdump: make the lower bound of crash kernel reservation
>> consistent
>> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>> reserve_crashkernel[_low]()
>> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>> arm64: kdump: introduce some macroes for crash kernel reservation
>> arm64: kdump: reimplement crashkernel=X
>> kdump: add threshold for the required memory
>> arm64: kdump: add memory for devices by DT property
>> linux,usable-memory-range
>> kdump: update Documentation about crashkernel
>>
>> Documentation/admin-guide/kdump/kdump.rst | 25 ++-
>> .../admin-guide/kernel-parameters.txt | 13 +-
>> arch/arm64/include/asm/kexec.h | 15 ++
>> arch/arm64/include/asm/processor.h | 1 +
>> arch/arm64/kernel/setup.c | 13 +-
>> arch/arm64/mm/init.c | 105 ++++------
>> arch/arm64/mm/mmu.c | 4 +
>> arch/x86/include/asm/kexec.h | 28 +++
>> arch/x86/kernel/setup.c | 165 +--------------
>> include/linux/crash_core.h | 4 +
>> include/linux/kexec.h | 2 -
>> kernel/crash_core.c | 192 ++++++++++++++++++
>> kernel/kexec_core.c | 17 --
>> 13 files changed, 328 insertions(+), 256 deletions(-)
On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
> On 9/7/20 8:47 AM, Chen Zhou wrote:
> > Chen Zhou (9):
> > x86: kdump: move CRASH_ALIGN to 2M
> > x86: kdump: make the lower bound of crash kernel reservation
> > consistent
> > x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> > reserve_crashkernel[_low]()
> > x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> > arm64: kdump: introduce some macroes for crash kernel reservation
> > arm64: kdump: reimplement crashkernel=X
> > kdump: add threshold for the required memory
> > arm64: kdump: add memory for devices by DT property
> > linux,usable-memory-range
> > kdump: update Documentation about crashkernel
[...]
> I did a brief unit-test on 5.9-rc4.
>
> Please add:
>
> Tested-by: John Donnelly <[email protected]>
Thanks for testing.
> This activity is over a year old. It needs accepted.
It's getting there, hopefully in 5.11. There are some minor tweaks to
address.
--
Catalin
On Mon, Sep 07, 2020 at 09:47:43PM +0800, Chen Zhou wrote:
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 3f735cb37ace..d11d597a470d 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -378,6 +378,15 @@ int __init reserve_crashkernel_low(void)
> }
>
> #if defined(CONFIG_X86) || defined(CONFIG_ARM64)
> +
> +/*
> + * Add a threshold for required memory size of crashkernel. If required memory
> + * size is greater than threshold, just go for high allocation directly. The
> + * value of threshold is set as half of the total low memory.
> + */
> +#define REQUIRED_MEMORY_THRESHOLD (memblock_mem_size(CRASH_ADDR_LOW_MAX >> \
> + PAGE_SHIFT) >> 1)
> +
> #ifdef CONFIG_KEXEC_CORE
> /*
> * reserve_crashkernel() - reserves memory for crash kernel
> @@ -422,7 +431,7 @@ void __init reserve_crashkernel(void)
> * So try low memory first and fall back to high memory
> * unless "crashkernel=size[KMG],high" is specified.
> */
> - if (!high)
> + if (!high && crash_size <= REQUIRED_MEMORY_THRESHOLD)
> crash_base = memblock_find_in_range(CRASH_ALIGN,
> CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
Since any change now is affecting the x86 semantics slightly, I'd
suggest you drop this patch. We can add it later if needed, once the
core changes are in.
Thinking about this, if one requires a crashkernel reservation that
allocates all of the ZONE_DMA, it would probably be noticed and explicit
,high/,low options can be used.
Note that we are also trying to make ZONE_DMA full 32-bit on non-RPi4
hardware.
--
Catalin
On Mon, Sep 07, 2020 at 09:47:45PM +0800, Chen Zhou wrote:
> diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..549611abc581 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
[...]
> @@ -316,8 +325,18 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>
> - On arm64, use "crashkernel=Y[@X]". Note that the start address of
> - the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
> + On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
> + fall back to high allocation if it fails. And go for high allocation
> + directly if the required size is too large.
> + We can also use "crashkernel=X,high" to select a high region above
> + DMA zone, which also tries to allocate at least 256M low memory in
> + DMA zone automatically.
> + "crashkernel=Y,low" can be used to allocate specified size low memory
> + in DMA zone.
> + For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
I don't think we should mention non-RPi4 explicitly here. I don't even
understand what the suggestion is since the only way is to disable
ZONE_DMA in the kernel config. I'd just stick to ZONE_DMA description
here.
> + Use "crashkernel=Y@X" if you really have to reserve memory from
> + specified start address X. Note that the start address of the kernel,
> + X if explicitly specified, must be aligned to 2MiB (0x200000).
>
> Load the Dump-capture Kernel
> ============================
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a1068742a6df..f7df572d8f64 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -727,6 +727,10 @@
> [KNL, X86-64] Select a region under 4G first, and
> fall back to reserve region above 4G when '@offset'
> hasn't been specified.
> + [KNL, arm64] Try low allocation in DMA zone, fall back
> + to high allocation if it fails when '@offset' hasn't been
> + specified. For non-RPi4 platforms, change DMA zone to
> + DMA32 zone.
Same here, unclear what "change DMA zone to DMA32 zone" means.
> See Documentation/admin-guide/kdump/kdump.rst for further details.
>
> crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -743,6 +747,8 @@
> Otherwise memory region will be allocated below 4G, if
> available.
> It will be ignored if crashkernel=X is specified.
> + [KNL, arm64] range in high memory.
> + Allow kernel to allocate physical memory region from top.
> crashkernel=size[KMG],low
> [KNL, X86-64] range under 4G. When crashkernel=X,high
> is passed, kernel could allocate physical memory region
> @@ -751,13 +757,16 @@
> requires at least 64M+32K low memory, also enough extra
> low memory is needed to make sure DMA buffers for 32-bit
> devices won't run out. Kernel would try to allocate at
> - at least 256M below 4G automatically.
> + least 256M below 4G automatically.
> This one let user to specify own low range under 4G
> for second kernel instead.
> 0: to disable low allocation.
> It will be ignored when crashkernel=X,high is not used
> or memory reserved is below 4G.
> -
> + [KNL, arm64] range in low memory.
> + This one let user to specify a low range in DMA zone for
> + crash dump kernel. For non-RPi4 platforms, change DMA zone
> + to DMA32 zone.
And again here.
--
Catalin
On Fri, Sep 18, 2020 at 05:06:37PM +0800, chenzhou wrote:
> On 2020/9/18 16:59, chenzhou wrote:
> > On 2020/9/18 15:25, Baoquan He wrote:
> >> On 09/07/20 at 09:47pm, Chen Zhou wrote:
> >>> To make the functions reserve_crashkernel[_low]() as generic,
> >>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> >>>
> >>> Signed-off-by: Chen Zhou <[email protected]>
> >>> ---
> >>> arch/x86/kernel/setup.c | 11 ++++++-----
> >>> 1 file changed, 6 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >>> index d7fd90c52dae..71a6a6e7ca5b 100644
> >>> --- a/arch/x86/kernel/setup.c
> >>> +++ b/arch/x86/kernel/setup.c
> >>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
> >>> unsigned long total_low_mem;
> >>> int ret;
> >>>
> >>> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> >>> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> >> Just note that the replacement has been done in another patch from Mike
> >> Rapoport, partially. He seems to have done reserve_crashkernel_low()
> >> part, there's one left in reserve_crashkernel(), you might want to check
> >> that.
> >>
> >> Mike's patch which is from a patchset has been merged into Andrew's next
> >> tree.
> >>
> >> commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
> >> Author: Mike Rapoport <[email protected]>
> >> Date: Wed Aug 26 15:22:32 2020 +1000
> >>
> >> x86/setup: simplify reserve_crashkernel()
> As Baoquan said, some functions have been changed in the next tree,
> if i need to rebase on top of the next tree.
Please rebase at 5.10-rc1 when the x86 change will probably be in and
aim to queue this series for 5.11.
Thanks.
--
Catalin
Hi Catalin, Chen,
On Mon, Oct 5, 2020 at 10:39 PM Catalin Marinas <[email protected]> wrote:
>
> On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
> > On 9/7/20 8:47 AM, Chen Zhou wrote:
> > > Chen Zhou (9):
> > > x86: kdump: move CRASH_ALIGN to 2M
> > > x86: kdump: make the lower bound of crash kernel reservation
> > > consistent
> > > x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> > > reserve_crashkernel[_low]()
> > > x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> > > arm64: kdump: introduce some macroes for crash kernel reservation
> > > arm64: kdump: reimplement crashkernel=X
> > > kdump: add threshold for the required memory
> > > arm64: kdump: add memory for devices by DT property
> > > linux,usable-memory-range
> > > kdump: update Documentation about crashkernel
> [...]
> > I did a brief unit-test on 5.9-rc4.
> >
> > Please add:
> >
> > Tested-by: John Donnelly <[email protected]>
>
> Thanks for testing.
>
> > This activity is over a year old. It needs accepted.
>
> It's getting there, hopefully in 5.11. There are some minor tweaks to
> address.
I think my earlier email with the test results on this series bounced
off the mailing list server (for some weird reason), but I still see
several issues with this patchset. I will add specific issues in the
review comments for each patch again, but overall, with a crashkernel
size of say 786M, I see the following issue:
# cat /proc/cmdline
BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..>
rd.lvm.lv=<..snip..> crashkernel=786M
I see two regions of size 786M and 256M reserved in low and high
regions respectively, So we reserve a total of 1042M of memory, which
is an incorrect behaviour:
# dmesg | grep -i crash
[ 0.000000] Reserving 256MB of low memory at 2816MB for crashkernel
(System low RAM: 768MB)
[ 0.000000] Reserving 786MB of memory at 654158MB for crashkernel
(System RAM: 130816MB)
[ 0.000000] Kernel command line:
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+
root=/dev/mapper/rhel_ampere--hr330a--03-root ro
rd.lvm.lv=rhel_ampere-hr330a-03/root
rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
# cat /proc/iomem | grep -i crash
b0000000-bfffffff : Crash kernel (low)
bfcbe00000-bffcffffff : Crash kernel
IMO, we should test this feature more before including this in 5.11
Thanks,
Bhupesh
On 2020/10/6 1:12, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 09:47:43PM +0800, Chen Zhou wrote:
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 3f735cb37ace..d11d597a470d 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -378,6 +378,15 @@ int __init reserve_crashkernel_low(void)
>> }
>>
>> #if defined(CONFIG_X86) || defined(CONFIG_ARM64)
>> +
>> +/*
>> + * Add a threshold for required memory size of crashkernel. If required memory
>> + * size is greater than threshold, just go for high allocation directly. The
>> + * value of threshold is set as half of the total low memory.
>> + */
>> +#define REQUIRED_MEMORY_THRESHOLD (memblock_mem_size(CRASH_ADDR_LOW_MAX >> \
>> + PAGE_SHIFT) >> 1)
>> +
>> #ifdef CONFIG_KEXEC_CORE
>> /*
>> * reserve_crashkernel() - reserves memory for crash kernel
>> @@ -422,7 +431,7 @@ void __init reserve_crashkernel(void)
>> * So try low memory first and fall back to high memory
>> * unless "crashkernel=size[KMG],high" is specified.
>> */
>> - if (!high)
>> + if (!high && crash_size <= REQUIRED_MEMORY_THRESHOLD)
>> crash_base = memblock_find_in_range(CRASH_ALIGN,
>> CRASH_ADDR_LOW_MAX,
>> crash_size, CRASH_ALIGN);
> Since any change now is affecting the x86 semantics slightly, I'd
> suggest you drop this patch. We can add it later if needed, once the
> core changes are in.
Ok, i will drop this patch in next version.
Thanks,
Chen Zhou
>
> Thinking about this, if one requires a crashkernel reservation that
> allocates all of the ZONE_DMA, it would probably be noticed and explicit
> ,high/,low options can be used.
>
> Note that we are also trying to make ZONE_DMA full 32-bit on non-RPi4
> hardware.
>
Hi Bhupesh,
On 2020/10/6 1:42, Bhupesh Sharma wrote:
> Hi Catalin, Chen,
>
> On Mon, Oct 5, 2020 at 10:39 PM Catalin Marinas <[email protected]> wrote:
>> On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
>>> On 9/7/20 8:47 AM, Chen Zhou wrote:
>>>> Chen Zhou (9):
>>>> x86: kdump: move CRASH_ALIGN to 2M
>>>> x86: kdump: make the lower bound of crash kernel reservation
>>>> consistent
>>>> x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>>>> reserve_crashkernel[_low]()
>>>> x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>>>> arm64: kdump: introduce some macroes for crash kernel reservation
>>>> arm64: kdump: reimplement crashkernel=X
>>>> kdump: add threshold for the required memory
>>>> arm64: kdump: add memory for devices by DT property
>>>> linux,usable-memory-range
>>>> kdump: update Documentation about crashkernel
>> [...]
>>> I did a brief unit-test on 5.9-rc4.
>>>
>>> Please add:
>>>
>>> Tested-by: John Donnelly <[email protected]>
>> Thanks for testing.
>>
>>> This activity is over a year old. It needs accepted.
>> It's getting there, hopefully in 5.11. There are some minor tweaks to
>> address.
> I think my earlier email with the test results on this series bounced
> off the mailing list server (for some weird reason), but I still see
> several issues with this patchset. I will add specific issues in the
> review comments for each patch again, but overall, with a crashkernel
> size of say 786M, I see the following issue:
>
> # cat /proc/cmdline
> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..>
> rd.lvm.lv=<..snip..> crashkernel=786M
>
> I see two regions of size 786M and 256M reserved in low and high
> regions respectively, So we reserve a total of 1042M of memory, which
> is an incorrect behaviour:
>
> # dmesg | grep -i crash
> [ 0.000000] Reserving 256MB of low memory at 2816MB for crashkernel
> (System low RAM: 768MB)
> [ 0.000000] Reserving 786MB of memory at 654158MB for crashkernel
> (System RAM: 130816MB)
> [ 0.000000] Kernel command line:
> BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+
> root=/dev/mapper/rhel_ampere--hr330a--03-root ro
> rd.lvm.lv=rhel_ampere-hr330a-03/root
> rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
>
> # cat /proc/iomem | grep -i crash
> b0000000-bfffffff : Crash kernel (low)
> bfcbe00000-bffcffffff : Crash kernel
>
> IMO, we should test this feature more before including this in 5.11
Thanks for you test. This behavior is what we what. What is the correct behavior you think?
Besides, this feature is been tested by John and PK, and i test for various parameters.
We may miss something, any comments are welcome.
Thanks,
Chen Zhou
>
> Thanks,
> Bhupesh
>
> .
>
Hi Catalin,
On 2020/10/6 1:19, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 09:47:45PM +0800, Chen Zhou wrote:
>> diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
>> index 2da65fef2a1c..549611abc581 100644
>> --- a/Documentation/admin-guide/kdump/kdump.rst
>> +++ b/Documentation/admin-guide/kdump/kdump.rst
> [...]
>> @@ -316,8 +325,18 @@ Boot into System Kernel
>> kernel will automatically locate the crash kernel image within the
>> first 512MB of RAM if X is not given.
>>
>> - On arm64, use "crashkernel=Y[@X]". Note that the start address of
>> - the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
>> + On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
>> + fall back to high allocation if it fails. And go for high allocation
>> + directly if the required size is too large.
>> + We can also use "crashkernel=X,high" to select a high region above
>> + DMA zone, which also tries to allocate at least 256M low memory in
>> + DMA zone automatically.
>> + "crashkernel=Y,low" can be used to allocate specified size low memory
>> + in DMA zone.
>> + For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
> I don't think we should mention non-RPi4 explicitly here. I don't even
> understand what the suggestion is since the only way is to disable
> ZONE_DMA in the kernel config. I'd just stick to ZONE_DMA description
> here.
How about like this:
If the kernel config ZONE_DMA is disabled, just try low allocation in DMA32 zone
and high allocation above DMA32 zone.
Thanks,
Chen Zhou
>
>> + Use "crashkernel=Y@X" if you really have to reserve memory from
>> + specified start address X. Note that the start address of the kernel,
>> + X if explicitly specified, must be aligned to 2MiB (0x200000).
>>
>> Load the Dump-capture Kernel
>> ============================
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index a1068742a6df..f7df572d8f64 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -727,6 +727,10 @@
>> [KNL, X86-64] Select a region under 4G first, and
>> fall back to reserve region above 4G when '@offset'
>> hasn't been specified.
>> + [KNL, arm64] Try low allocation in DMA zone, fall back
>> + to high allocation if it fails when '@offset' hasn't been
>> + specified. For non-RPi4 platforms, change DMA zone to
>> + DMA32 zone.
> Same here, unclear what "change DMA zone to DMA32 zone" means.
>
>> See Documentation/admin-guide/kdump/kdump.rst for further details.
>>
>> crashkernel=range1:size1[,range2:size2,...][@offset]
>> @@ -743,6 +747,8 @@
>> Otherwise memory region will be allocated below 4G, if
>> available.
>> It will be ignored if crashkernel=X is specified.
>> + [KNL, arm64] range in high memory.
>> + Allow kernel to allocate physical memory region from top.
>> crashkernel=size[KMG],low
>> [KNL, X86-64] range under 4G. When crashkernel=X,high
>> is passed, kernel could allocate physical memory region
>> @@ -751,13 +757,16 @@
>> requires at least 64M+32K low memory, also enough extra
>> low memory is needed to make sure DMA buffers for 32-bit
>> devices won't run out. Kernel would try to allocate at
>> - at least 256M below 4G automatically.
>> + least 256M below 4G automatically.
>> This one let user to specify own low range under 4G
>> for second kernel instead.
>> 0: to disable low allocation.
>> It will be ignored when crashkernel=X,high is not used
>> or memory reserved is below 4G.
>> -
>> + [KNL, arm64] range in low memory.
>> + This one let user to specify a low range in DMA zone for
>> + crash dump kernel. For non-RPi4 platforms, change DMA zone
>> + to DMA32 zone.
> And again here.
>
On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> I think my earlier email with the test results on this series bounced
> off the mailing list server (for some weird reason), but I still see
> several issues with this patchset. I will add specific issues in the
> review comments for each patch again, but overall, with a crashkernel
> size of say 786M, I see the following issue:
>
> # cat /proc/cmdline
> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
>
> I see two regions of size 786M and 256M reserved in low and high
> regions respectively, So we reserve a total of 1042M of memory, which
> is an incorrect behaviour:
>
> # dmesg | grep -i crash
> [ 0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
> [ 0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
> [ 0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
>
> # cat /proc/iomem | grep -i crash
> b0000000-bfffffff : Crash kernel (low)
> bfcbe00000-bffcffffff : Crash kernel
As Chen said, that's the intended behaviour and how x86 works. The
requested 768M goes in the high range if there's not enough low memory
and an additional buffer for swiotlb is allocated, hence the low 256M.
We could (as an additional patch), subtract the 256M from the high
allocation so that you'd get a low 256M and a high 512M, not sure it's
worth it. Note that with a "crashkernel=768M,high" option, you still get
the additional low 256M, otherwise the crashkernel won't be able to
boot as there's no memory in ZONE_DMA. In the explicit ",high" request
case, I'm not sure subtracted the 256M is more intuitive.
In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
to cover the entire 32-bit address space (i.e. identical to the current
ZONE_DMA32).
> IMO, we should test this feature more before including this in 5.11
Definitely. That's one of the reasons we haven't queued it yet. So any
help with testing here is appreciated.
Thanks.
--
Catalin
Hi Catalin,
On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas <[email protected]> wrote:
>
> On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> > I think my earlier email with the test results on this series bounced
> > off the mailing list server (for some weird reason), but I still see
> > several issues with this patchset. I will add specific issues in the
> > review comments for each patch again, but overall, with a crashkernel
> > size of say 786M, I see the following issue:
> >
> > # cat /proc/cmdline
> > BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
> >
> > I see two regions of size 786M and 256M reserved in low and high
> > regions respectively, So we reserve a total of 1042M of memory, which
> > is an incorrect behaviour:
> >
> > # dmesg | grep -i crash
> > [ 0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
> > [ 0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
> > [ 0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
> >
> > # cat /proc/iomem | grep -i crash
> > b0000000-bfffffff : Crash kernel (low)
> > bfcbe00000-bffcffffff : Crash kernel
>
> As Chen said, that's the intended behaviour and how x86 works. The
> requested 768M goes in the high range if there's not enough low memory
> and an additional buffer for swiotlb is allocated, hence the low 256M.
I understand, but why 256M (as low) for arm64? x86_64 setups usually
have more system memory available as compared to several commercially
available arm64 setups. So is the intent, just to keep the behavior
similar between arm64 and x86_64?
Should we have a CONFIG option / bootarg to help one select the max
'low_size'? Currently the ' low_size' value is calculated as:
/*
* two parts from kernel/dma/swiotlb.c:
* -swiotlb size: user-specified with swiotlb= or default.
*
* -swiotlb overflow buffer: now hardcoded to 32k. We round it
* to 8M for other buffers that may need to stay low too. Also
* make sure we allocate enough extra low memory so that we
* don't run out of DMA buffers for 32-bit devices.
*/
low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
bootargs, the low_size, still ends up being 256M in such cases,
whereas this 256M can be used for some other purposes - so should we
be limiting this to 64M and failing the crash kernel allocation
request (gracefully) otherwise?
> We could (as an additional patch), subtract the 256M from the high
> allocation so that you'd get a low 256M and a high 512M, not sure it's
> worth it. Note that with a "crashkernel=768M,high" option, you still get
> the additional low 256M, otherwise the crashkernel won't be able to
> boot as there's no memory in ZONE_DMA. In the explicit ",high" request
> case, I'm not sure subtracted the 256M is more intuitive.
> In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
> to cover the entire 32-bit address space (i.e. identical to the current
> ZONE_DMA32).
>
> > IMO, we should test this feature more before including this in 5.11
>
> Definitely. That's one of the reasons we haven't queued it yet. So any
> help with testing here is appreciated.
Sure, I am running more checks on this series. I will be soon back
with more updates.
Regards,
Bhupesh
On Tue, Oct 06, 2020 at 10:10:54AM +0800, chenzhou wrote:
> On 2020/10/6 1:19, Catalin Marinas wrote:
> > On Mon, Sep 07, 2020 at 09:47:45PM +0800, Chen Zhou wrote:
> >> diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
> >> index 2da65fef2a1c..549611abc581 100644
> >> --- a/Documentation/admin-guide/kdump/kdump.rst
> >> +++ b/Documentation/admin-guide/kdump/kdump.rst
> > [...]
> >> @@ -316,8 +325,18 @@ Boot into System Kernel
> >> kernel will automatically locate the crash kernel image within the
> >> first 512MB of RAM if X is not given.
> >>
> >> - On arm64, use "crashkernel=Y[@X]". Note that the start address of
> >> - the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
> >> + On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
> >> + fall back to high allocation if it fails. And go for high allocation
> >> + directly if the required size is too large.
> >> + We can also use "crashkernel=X,high" to select a high region above
> >> + DMA zone, which also tries to allocate at least 256M low memory in
> >> + DMA zone automatically.
> >> + "crashkernel=Y,low" can be used to allocate specified size low memory
> >> + in DMA zone.
> >> + For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
> > I don't think we should mention non-RPi4 explicitly here. I don't even
> > understand what the suggestion is since the only way is to disable
> > ZONE_DMA in the kernel config. I'd just stick to ZONE_DMA description
> > here.
> How about like this:
> If the kernel config ZONE_DMA is disabled, just try low allocation in DMA32 zone
> and high allocation above DMA32 zone.
Something like: "allocate 256M low memory in the DMA zone automatically
(or the DMA32 zone if CONFIG_ZONE_DMA is disabled)".
I'd keep it short.
--
Catalin
On Wed, Oct 07, 2020 at 12:37:49PM +0530, Bhupesh Sharma wrote:
> On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas <[email protected]> wrote:
> > On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> > > I think my earlier email with the test results on this series bounced
> > > off the mailing list server (for some weird reason), but I still see
> > > several issues with this patchset. I will add specific issues in the
> > > review comments for each patch again, but overall, with a crashkernel
> > > size of say 786M, I see the following issue:
> > >
> > > # cat /proc/cmdline
> > > BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
> > >
> > > I see two regions of size 786M and 256M reserved in low and high
> > > regions respectively, So we reserve a total of 1042M of memory, which
> > > is an incorrect behaviour:
> > >
> > > # dmesg | grep -i crash
> > > [ 0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
> > > [ 0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
> > > [ 0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
> > >
> > > # cat /proc/iomem | grep -i crash
> > > b0000000-bfffffff : Crash kernel (low)
> > > bfcbe00000-bffcffffff : Crash kernel
> >
> > As Chen said, that's the intended behaviour and how x86 works. The
> > requested 768M goes in the high range if there's not enough low memory
> > and an additional buffer for swiotlb is allocated, hence the low 256M.
>
> I understand, but why 256M (as low) for arm64? x86_64 setups usually
> have more system memory available as compared to several commercially
> available arm64 setups. So is the intent, just to keep the behavior
> similar between arm64 and x86_64?
Similar in the sense of the fallback to high memory and some low memory
allocation but the amounts can vary per architecture.
> Should we have a CONFIG option / bootarg to help one select the max
> 'low_size'? Currently the ' low_size' value is calculated as:
>
> /*
> * two parts from kernel/dma/swiotlb.c:
> * -swiotlb size: user-specified with swiotlb= or default.
> *
> * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> * to 8M for other buffers that may need to stay low too. Also
> * make sure we allocate enough extra low memory so that we
> * don't run out of DMA buffers for 32-bit devices.
> */
> low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>
> Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
> bootargs, the low_size, still ends up being 256M in such cases,
> whereas this 256M can be used for some other purposes - so should we
> be limiting this to 64M and failing the crash kernel allocation
> request (gracefully) otherwise?
I think it makes sense to set a low_size = 0 if
swiotlb_size_or_default() is 0. The assumption would be that if the main
kernel doesn't need an swiotlb, the crashdump one wouldn't need it
either. But this probably needs the ZONE_DMA for non-RPi4 platforms
addressed as well (expanded to the whole ZONE_DMA32).
--
Catalin
Hi Bhupesh,
On 2020/10/7 15:07, Bhupesh Sharma wrote:
> Hi Catalin,
>
> On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas <[email protected]> wrote:
>> On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
>>> I think my earlier email with the test results on this series bounced
>>> off the mailing list server (for some weird reason), but I still see
>>> several issues with this patchset. I will add specific issues in the
>>> review comments for each patch again, but overall, with a crashkernel
>>> size of say 786M, I see the following issue:
>>>
>>> # cat /proc/cmdline
>>> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> rd.lvm.lv=<..snip..> crashkernel=786M
>>>
>>> I see two regions of size 786M and 256M reserved in low and high
>>> regions respectively, So we reserve a total of 1042M of memory, which
>>> is an incorrect behaviour:
>>>
>>> # dmesg | grep -i crash
>>> [ 0.000000] Reserving 256MB of low memory at 2816MB for crashkernel (System low RAM: 768MB)
>>> [ 0.000000] Reserving 786MB of memory at 654158MB for crashkernel (System RAM: 130816MB)
>>> [ 0.000000] Kernel command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ root=/dev/mapper/rhel_ampere--hr330a--03-root ro rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
>>>
>>> # cat /proc/iomem | grep -i crash
>>> b0000000-bfffffff : Crash kernel (low)
>>> bfcbe00000-bffcffffff : Crash kernel
>> As Chen said, that's the intended behaviour and how x86 works. The
>> requested 768M goes in the high range if there's not enough low memory
>> and an additional buffer for swiotlb is allocated, hence the low 256M.
> I understand, but why 256M (as low) for arm64? x86_64 setups usually
> have more system memory available as compared to several commercially
> available arm64 setups. So is the intent, just to keep the behavior
> similar between arm64 and x86_64?
>
> Should we have a CONFIG option / bootarg to help one select the max
> 'low_size'? Currently the ' low_size' value is calculated as:
>
> /*
> * two parts from kernel/dma/swiotlb.c:
> * -swiotlb size: user-specified with swiotlb= or default.
> *
> * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> * to 8M for other buffers that may need to stay low too. Also
> * make sure we allocate enough extra low memory so that we
> * don't run out of DMA buffers for 32-bit devices.
> */
> low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>
> Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
> bootargs, the low_size, still ends up being 256M in such cases,
> whereas this 256M can be used for some other purposes - so should we
> be limiting this to 64M and failing the crash kernel allocation
> request (gracefully) otherwise?
>
>> We could (as an additional patch), subtract the 256M from the high
>> allocation so that you'd get a low 256M and a high 512M, not sure it's
>> worth it. Note that with a "crashkernel=768M,high" option, you still get
>> the additional low 256M, otherwise the crashkernel won't be able to
>> boot as there's no memory in ZONE_DMA. In the explicit ",high" request
>> case, I'm not sure subtracted the 256M is more intuitive.
>> In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
>> to cover the entire 32-bit address space (i.e. identical to the current
>> ZONE_DMA32).
>>
>>> IMO, we should test this feature more before including this in 5.11
>> Definitely. That's one of the reasons we haven't queued it yet. So any
>> help with testing here is appreciated.
> Sure, I am running more checks on this series. I will be soon back
> with more updates.
Sorry to bother you. I am looking forward to your review comments.
Thanks,
Chen Zhou
>
> Regards,
> Bhupesh
>
> .
>