2021-10-08 16:17:37

by Sourabh Jain

[permalink] [raw]
Subject: [PATCH 0/3] Update crashkernel offset to allow kernel to boot on large config LPARs

As the crashkernel reserve memory at 128MB offset in the first memory
block, it leaves less than 128MB memory to accommodate other essential
system resources that need memory reservation in the same block. This
creates kernel boot failure on large config LPARs having core count
greater than 192.

Setting the crashkernel to mid of RMA size which can be 512MB or more
instead of capping it to 128MB by default leaves enough space to allocate
memory to another system resource in the first memory block.

Now keeping the crashkernel at mid of RMA size works fine for the primary
kernel but creates boot failure for the kdump kernel when the crashekernel
reservation start offset crosses 256MB. The reason is, in the early boot
MMU feature of 1T segments support is not detected which restricts the paca
allocation for boot CPU below 256MB. When the crashkernel itself is
starting at 256MB offset, attempt to allocate paca below 256MB leads to the
kdump kernel boot failure.

Moving the detection of segment sizes before identifying the boot CPU
removes the restriction of 256MB limit for boot CPU paca allocation
which allows the kdump kernel to successfully boot and capture vmcore.

While allocating paca for boot CPU we found that there is a small window
during kernel boot where early_radix_enabled returns True even though
the radix is disabled using command-line. This leads to an invalid bolated
size calculation on which paca limit of boot CPU is dependent. Patch 0001
closes that window that by fixing the radix bit in mmu_feature.

changes in v2:
0001: Rename the function name to update_cpu_features() as per
the review comment.
0002: Fixed compilation error reported by kernel test robot.
0003: Added comment and updated commit message

Mahesh Salgaonkar (2):
fixup mmu_features immediately after getting cpu pa features.
Remove 256MB limit restriction for boot cpu paca allocation

Sourabh Jain (1):
powerpc: Set crashkernel offset to mid of RMA region

arch/powerpc/include/asm/book3s/64/mmu.h | 2 ++
arch/powerpc/include/asm/mmu.h | 1 +
arch/powerpc/kernel/prom.c | 8 ++++++++
arch/powerpc/kernel/rtas.c | 4 ++++
arch/powerpc/kexec/core.c | 15 +++++++++++----
arch/powerpc/mm/book3s64/hash_utils.c | 5 ++++-
arch/powerpc/mm/init_64.c | 5 ++++-
7 files changed, 34 insertions(+), 6 deletions(-)

--
2.31.1


2021-10-08 16:18:56

by Sourabh Jain

[permalink] [raw]
Subject: [PATCH 3/3] powerpc: Set crashkernel offset to mid of RMA region

On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

Signed-off-by: Sourabh Jain <[email protected]>
Reported-and-tested-by: Abdul haleem <[email protected]>
---
arch/powerpc/kernel/rtas.c | 4 ++++
arch/powerpc/kexec/core.c | 15 +++++++++++----
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index ff80bbad22a5..a49137727370 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1235,6 +1235,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
sizep = of_get_flat_dt_prop(node, "rtas-size", NULL);

+ /* need this feature to decide the crashkernel offset */
+ if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
+ powerpc_firmware_features |= FW_FEATURE_LPAR;
+
if (basep && entryp && sizep) {
rtas.base = *basep;
rtas.entry = *entryp;
diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index 48525e8b5730..71b1bfdadd76 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -147,11 +147,18 @@ void __init reserve_crashkernel(void)
if (!crashk_res.start) {
#ifdef CONFIG_PPC64
/*
- * On 64bit we split the RMO in half but cap it at half of
- * a small SLB (128MB) since the crash kernel needs to place
- * itself and some stacks to be in the first segment.
+ * On the LPAR platform place the crash kernel to mid of
+ * RMA size (512MB or more) to ensure the crash kernel
+ * gets enough space to place itself and some stack to be
+ * in the first segment. At the same time normal kernel
+ * also get enough space to allocate memory for essential
+ * system resource in the first segment. Keep the crash
+ * kernel starts at 128MB offset on other platforms.
*/
- crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ crashk_res.start = ppc64_rma_size / 2;
+ else
+ crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
#else
crashk_res.start = KDUMP_KERNELBASE;
#endif
--
2.31.1

2021-10-08 16:19:57

by Sourabh Jain

[permalink] [raw]
Subject: [PATCH 2/3] Remove 256MB limit restriction for boot cpu paca allocation

From: Mahesh Salgaonkar <[email protected]>

At the time when we detect and allocate paca for boot cpu, we havn't yet
detected mmu feature of 1T segments support (not until
mmu_early_init_devtree() call). This causes ppc64_bolted_size() to return
256MB as limit forcing boot cpu paca allocation below 256MB always.

This works fine for kdump kernel boot as long as crashkernel reservation is
at offset below 256MB. But when we move kdump offset to 256MB or above,
kdump kernel fails to allocate paca for boot cpu below 256MB and crashes in
allocate_paca().

Moving the detection of segment sizes just before paca allocation for boot
cpu removes this restriction of 256MB limit. This allows kdump kernel to
successfully boot and capture vmcore.

Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Sourabh Jain <[email protected]>
Reported-and-tested-by: Abdul haleem <[email protected]>
---
arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
arch/powerpc/kernel/prom.c | 6 ++++++
arch/powerpc/mm/book3s64/hash_utils.c | 5 ++++-
3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index d60be5051d60..9b05a84313bb 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -199,6 +199,7 @@ extern int mmu_io_psize;
/* MMU initialization */
void update_cpu_features(void);
void mmu_early_init_devtree(void);
+void hash__early_detect_seg_size(void);
void hash__early_init_devtree(void);
void radix__early_init_devtree(void);
#ifdef CONFIG_PPC_PKEY
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 889c909e4ed4..5da2bfff4dea 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -385,6 +385,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
identical_pvr_fixup(node);
init_mmu_slb_size(node);

+#ifdef CONFIG_PPC_BOOK3S_64
+ /* Initialize segment sizes */
+ if (!early_radix_enabled())
+ hash__early_detect_seg_size();
+#endif
+
#ifdef CONFIG_PPC64
if (nthreads == 1)
cur_cpu_spec->cpu_features &= ~CPU_FTR_SMT;
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index c145776d3ae5..ef4fc6bb1b30 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1020,11 +1020,14 @@ static void __init htab_initialize(void)
#undef KB
#undef MB

-void __init hash__early_init_devtree(void)
+void __init hash__early_detect_seg_size(void)
{
/* Initialize segment sizes */
of_scan_flat_dt(htab_dt_scan_seg_sizes, NULL);
+}

+void __init hash__early_init_devtree(void)
+{
/* Initialize page sizes */
htab_scan_page_sizes();
}
--
2.31.1