# apologies for a bit late updates
This patch series is a set of bug fixes to address kexec/kdump
failures which are sometimes observed on ACPI-only system and reported
in LAK-ML before.
In short, the phenomena are:
1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
by a new kernel (or other data) being loaded into System RAM. Currently
kexec may possibly allocate space ignoring such "reserved" regions.
We will see no messages after "Bye!"
2. crash dump (kdump) kernel can fail to boot and get into panic due to
an alignment fault when accessing ACPI tables. This can happen because
those tables are not always properly aligned while they are mapped
non-cacheable (ioremap'ed) as they are not recognized as part of System
RAM under the current implementation.
After discussing several possibilities to address those issues,
the agreed approach, in my understanding, is
* to add resource entries for every "reserved", i.e. memblock_reserve(),
regions to /proc/iomem.
(NOMAP regions, also marked as "reserved," remains at top-level for
backward compatibility.)
* For case (1), user space (kexec-tools) should rule out such regions
in searching for free space for loaded data.
* For case (2), the kernel should access ACPI tables by mapping
them with appropriate memory attributes described in UEFI memory map.
(This means that it doesn't require any changes in /proc/iomem, and
hence user space.)
Please find past discussions about /proc/iomem in [1].
Patch#1 addresses kexec case, for which you are also required to update
user space. See necessary patches in [2]. If you want to review Patch#1,
please also take a look at and review [2].
Patch#2 and #3 addresses kdump case. This is a revised version after
Ard's comments.[3]
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565980.html
[2] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/resv_mem
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/573655.html
AKASHI Takahiro (2):
arm64: acpi,efi: fix alignment fault in accessing ACPI tables at kdump
init: map UEFI memory map early if on arm or arm64
James Morse (1):
arm64: export memblock_reserve()d regions via /proc/iomem
arch/arm64/include/asm/acpi.h | 23 ++++++++++++------
arch/arm64/kernel/acpi.c | 11 +++------
arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++
drivers/firmware/efi/arm-runtime.c | 27 ++++++++++-----------
init/main.c | 3 +++
5 files changed, 72 insertions(+), 30 deletions(-)
--
2.17.0
From: James Morse <[email protected]>
There has been some confusion around what is necessary to prevent kexec
overwriting important memory regions. memblock: reserve, or nomap?
Only memblock nomap regions are reported via /proc/iomem, kexec's
user-space doesn't know about memblock_reserve()d regions.
Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
and thus possible for kexec to overwrite with the new kernel or initrd.
But this was always broken, as the UEFI memory map is also reserved
and not marked as nomap.
Exporting both nomap and reserved memblock types is a nuisance as
they live in different memblock structures which we can't walk at
the same time.
Take a second walk over memblock.reserved and add new 'reserved'
subnodes for the memblock_reserved() regions that aren't already
described by the existing code. (e.g. Kernel Code)
We use reserve_region_with_split() to find the gaps in existing named
regions. This handles the gap between 'kernel code' and 'kernel data'
which is memblock_reserve()d, but already partially described by
request_standard_resources(). e.g.:
| 80000000-dfffffff : System RAM
| 80080000-80ffffff : Kernel code
| 81000000-8158ffff : reserved
| 81590000-8237efff : Kernel data
| a0000000-dfffffff : Crash kernel
| e00f0000-f949ffff : System RAM
reserve_region_with_split needs kzalloc() which isn't available when
request_standard_resources() is called, use an initcall.
Reported-by: Bhupesh Sharma <[email protected]>
Reported-by: Tyler Baicar <[email protected]>
Suggested-by: Akashi Takahiro <[email protected]>
Signed-off-by: James Morse <[email protected]>
Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support")
CC: Ard Biesheuvel <[email protected]>
CC: Mark Rutland <[email protected]>
---
arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 30ad2f085d1f..5b4fac434c84 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
}
}
+static int __init reserve_memblock_reserved_regions(void)
+{
+ phys_addr_t start, end, roundup_end = 0;
+ struct resource *mem, *res;
+ u64 i;
+
+ for_each_reserved_mem_region(i, &start, &end) {
+ if (end <= roundup_end)
+ continue; /* done already */
+
+ start = __pfn_to_phys(PFN_DOWN(start));
+ end = __pfn_to_phys(PFN_UP(end)) - 1;
+ roundup_end = end;
+
+ res = kzalloc(sizeof(*res), GFP_ATOMIC);
+ if (WARN_ON(!res))
+ return -ENOMEM;
+ res->start = start;
+ res->end = end;
+ res->name = "reserved";
+ res->flags = IORESOURCE_MEM;
+
+ mem = request_resource_conflict(&iomem_resource, res);
+ /*
+ * We expected memblock_reserve() regions to conflict with
+ * memory created by request_standard_resources().
+ */
+ if (WARN_ON_ONCE(!mem))
+ continue;
+ kfree(res);
+
+ reserve_region_with_split(mem, start, end, "reserved");
+ }
+
+ return 0;
+}
+arch_initcall(reserve_memblock_reserved_regions);
+
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
void __init setup_arch(char **cmdline_p)
--
2.17.0
As ACPI tables may not always be properly aligned, those regions should
be mapped cacheable in order to allow the kernel safe access to them.
UEFI memory map contains necessary information for mappings, and we want
to make sure that it should get accessible before any acpi_os_ioremap()'s.
So, in this patch, efi_enter_virtual_mode(), which was previously named
efi_enable_runtime_services() and invoked via early_initcall on arm/arm64,
is now moved early enough as the first access will occur in
acpi_load_tables() of acpi_early_init().
See a relevant commit:
arm64: acpi,efi: fix alignment fault in accessing ACPI tables at kdump
Signed-off-by: AKASHI Takahiro <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Cc: Andrew Morton <[email protected]>
---
init/main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..532fc0d02353 100644
--- a/init/main.c
+++ b/init/main.c
@@ -694,6 +694,9 @@ asmlinkage __visible void __init start_kernel(void)
debug_objects_mem_init();
setup_per_cpu_pageset();
numa_policy_init();
+ if (IS_ENABLED(CONFIG_EFI) &&
+ (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)))
+ efi_enter_virtual_mode();
acpi_early_init();
if (late_time_init)
late_time_init();
--
2.17.0
This is a fix against the issue that crash dump kernel may hang up
during booting, which can happen on any ACPI-based system with "ACPI
Reclaim Memory."
(kernel messages after panic kicked off kdump)
(snip...)
Bye!
(snip...)
ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
Internal error: Oops: 96000021 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
task: ffff000008d05180 task.stack: ffff000008cc0000
PC is at acpi_ns_lookup+0x25c/0x3c0
LR is at acpi_ds_load1_begin_op+0xa4/0x294
(snip...)
Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
Call trace:
(snip...)
[<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[<ffff000008b70d50>] start_kernel+0x3b4/0x43c
Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
---[ end trace c46ed37f9651c58e ]---
Kernel panic - not syncing: Fatal exception
Rebooting in 10 seconds..
(diagnosis)
* This fault is a data abort, alignment fault (ESR=0x96000021)
during reading out ACPI table.
* Initial ACPI tables are normally stored in system ram and marked as
"ACPI Reclaim memory" by the firmware.
* After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
memory as MEMBLOCK_NOMAP"), those regions are differently handled
as they are "memblock-reserved", without NOMAP bit.
* So they are now excluded from device tree's "usable-memory-range"
which kexec-tools determines based on a current view of /proc/iomem.
* When crash dump kernel boots up, it tries to accesses ACPI tables by
mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
since they are no longer part of mapped system ram.
* Given that ACPI accessor/helper functions are compiled in without
unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
any unaligned access to ACPI tables can cause a fatal panic.
With this patch, acpi_os_ioremap() always honors memory attribute
information provided by the firmware (EFI) and retaining cacheability
allows the kernel safe access to ACPI tables.
Please note that arm_enable_runtime_services() is now renamed to
efi_enter_virtual_mode() due to the similarity to x86's.
Signed-off-by: AKASHI Takahiro <[email protected]>
Suggested-by: James Morse <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Reported-by and Tested-by: Bhupesh Sharma <[email protected]>
(for older version)
---
arch/arm64/include/asm/acpi.h | 23 ++++++++++++++++-------
arch/arm64/kernel/acpi.c | 11 +++--------
drivers/firmware/efi/arm-runtime.c | 27 ++++++++++++---------------
3 files changed, 31 insertions(+), 30 deletions(-)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..d53c95f4e1a9 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -12,10 +12,12 @@
#ifndef _ASM_ACPI_H
#define _ASM_ACPI_H
+#include <linux/efi.h>
#include <linux/memblock.h>
#include <linux/psci.h>
#include <asm/cputype.h>
+#include <asm/io.h>
#include <asm/smp_plat.h>
#include <asm/tlbflush.h>
@@ -29,18 +31,22 @@
/* Basic configuration for ACPI */
#ifdef CONFIG_ACPI
+pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
+
/* ACPI table mapping after acpi_permanent_mmap is set */
static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
acpi_size size)
{
+ /* For normal memory we already have a cacheable mapping. */
+ if (memblock_is_map_memory(phys))
+ return (void __iomem *)__phys_to_virt(phys);
+
/*
- * EFI's reserve_regions() call adds memory with the WB attribute
- * to memblock via early_init_dt_add_memory_arch().
+ * We should still honor the memory's attribute here because
+ * crash dump kernel possibly excludes some ACPI (reclaim)
+ * regions from memblock list.
*/
- if (!memblock_is_memory(phys))
- return ioremap(phys, size);
-
- return ioremap_cache(phys, size);
+ return __ioremap(phys, size, __acpi_get_mem_attribute(phys));
}
#define acpi_os_ioremap acpi_os_ioremap
@@ -125,7 +131,10 @@ static inline const char *acpi_get_enable_method(int cpu)
* for compatibility.
*/
#define acpi_disable_cmcff 1
-pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
+{
+ return __acpi_get_mem_attribute(addr);
+}
#endif /* CONFIG_ACPI_APEI */
#ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..ed46dc188b22 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -18,6 +18,7 @@
#include <linux/acpi.h>
#include <linux/bootmem.h>
#include <linux/cpumask.h>
+#include <linux/efi.h>
#include <linux/efi-bgrt.h>
#include <linux/init.h>
#include <linux/irq.h>
@@ -29,13 +30,9 @@
#include <asm/cputype.h>
#include <asm/cpu_ops.h>
+#include <asm/pgtable.h>
#include <asm/smp_plat.h>
-#ifdef CONFIG_ACPI_APEI
-# include <linux/efi.h>
-# include <asm/pgtable.h>
-#endif
-
int acpi_noirq = 1; /* skip ACPI IRQ initialization */
int acpi_disabled = 1;
EXPORT_SYMBOL(acpi_disabled);
@@ -239,8 +236,7 @@ void __init acpi_boot_table_init(void)
}
}
-#ifdef CONFIG_ACPI_APEI
-pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
+pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
{
/*
* According to "Table 8 Map: EFI memory types to AArch64 memory
@@ -261,4 +257,3 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
return __pgprot(PROT_NORMAL_NC);
return __pgprot(PROT_DEVICE_nGnRnE);
}
-#endif
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 5889cbea60b8..566ef0a9edb5 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
* non-early mapping of the UEFI system table and virtual mappings for all
* EFI_MEMORY_RUNTIME regions.
*/
-static int __init arm_enable_runtime_services(void)
+void __init efi_enter_virtual_mode(void)
{
u64 mapsize;
if (!efi_enabled(EFI_BOOT)) {
pr_info("EFI services will not be available.\n");
- return 0;
+ return;
+ }
+
+ mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
+
+ if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
+ pr_err("Failed to remap EFI memory map\n");
+ return;
}
if (efi_runtime_disabled()) {
pr_info("EFI runtime services will be disabled.\n");
- return 0;
+ return;
}
if (efi_enabled(EFI_RUNTIME_SERVICES)) {
pr_info("EFI runtime services access via paravirt.\n");
- return 0;
+ return;
}
pr_info("Remapping and enabling EFI services.\n");
- mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
-
- if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
- pr_err("Failed to remap EFI memory map\n");
- return -ENOMEM;
- }
-
if (!efi_virtmap_init()) {
pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
- return -ENOMEM;
+ return;
}
/* Set up runtime services function pointers */
efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
-
- return 0;
}
-early_initcall(arm_enable_runtime_services);
void efi_virtmap_load(void)
{
--
2.17.0
Hi Akashi,
Thanks for the patchset - we have been waiting for quite some time for
this fix so that crashkernel can boot on arm64 machines which support
boot'ing via ACPI tables.
I have tested this on my huawei-taishan arm64 board, so:
Tested-by: Bhupesh Sharma <[email protected]>
BTW, if possible I would suggest to use:
Reported-by: Bhupesh Sharma <[email protected]>
rather than:
Reported-by: Bhupesh Sharma <[email protected]>
Thanks,
Bhupesh
On Fri, Jun 15, 2018 at 1:26 PM, AKASHI Takahiro
<[email protected]> wrote:
> # apologies for a bit late updates
>
> This patch series is a set of bug fixes to address kexec/kdump
> failures which are sometimes observed on ACPI-only system and reported
> in LAK-ML before.
>
> In short, the phenomena are:
> 1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
> by a new kernel (or other data) being loaded into System RAM. Currently
> kexec may possibly allocate space ignoring such "reserved" regions.
> We will see no messages after "Bye!"
>
> 2. crash dump (kdump) kernel can fail to boot and get into panic due to
> an alignment fault when accessing ACPI tables. This can happen because
> those tables are not always properly aligned while they are mapped
> non-cacheable (ioremap'ed) as they are not recognized as part of System
> RAM under the current implementation.
>
> After discussing several possibilities to address those issues,
> the agreed approach, in my understanding, is
> * to add resource entries for every "reserved", i.e. memblock_reserve(),
> regions to /proc/iomem.
> (NOMAP regions, also marked as "reserved," remains at top-level for
> backward compatibility.)
> * For case (1), user space (kexec-tools) should rule out such regions
> in searching for free space for loaded data.
> * For case (2), the kernel should access ACPI tables by mapping
> them with appropriate memory attributes described in UEFI memory map.
> (This means that it doesn't require any changes in /proc/iomem, and
> hence user space.)
>
> Please find past discussions about /proc/iomem in [1].
>
> Patch#1 addresses kexec case, for which you are also required to update
> user space. See necessary patches in [2]. If you want to review Patch#1,
> please also take a look at and review [2].
>
> Patch#2 and #3 addresses kdump case. This is a revised version after
> Ard's comments.[3]
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565980.html
> [2] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/resv_mem
> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/573655.html
>
> AKASHI Takahiro (2):
> arm64: acpi,efi: fix alignment fault in accessing ACPI tables at kdump
> init: map UEFI memory map early if on arm or arm64
>
> James Morse (1):
> arm64: export memblock_reserve()d regions via /proc/iomem
>
> arch/arm64/include/asm/acpi.h | 23 ++++++++++++------
> arch/arm64/kernel/acpi.c | 11 +++------
> arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++
> drivers/firmware/efi/arm-runtime.c | 27 ++++++++++-----------
> init/main.c | 3 +++
> 5 files changed, 72 insertions(+), 30 deletions(-)
>
> --
> 2.17.0
>
Hi Akashi,
Thanks for putting this together,
On 15/06/18 08:56, AKASHI Takahiro wrote:
> This patch series is a set of bug fixes to address kexec/kdump
> failures which are sometimes observed on ACPI-only system and reported
> in LAK-ML before.
>
> In short, the phenomena are:
> 1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
> by a new kernel (or other data) being loaded into System RAM. Currently
> kexec may possibly allocate space ignoring such "reserved" regions.
> We will see no messages after "Bye!"
>
> 2. crash dump (kdump) kernel can fail to boot and get into panic due to
> an alignment fault when accessing ACPI tables. This can happen because
> those tables are not always properly aligned while they are mapped
> non-cacheable (ioremap'ed) as they are not recognized as part of System
> RAM under the current implementation.
>
> After discussing several possibilities to address those issues,
> the agreed approach, in my understanding, is
> * to add resource entries for every "reserved", i.e. memblock_reserve(),
> regions to /proc/iomem.
> (NOMAP regions, also marked as "reserved," remains at top-level for
> backward compatibility.)
This means user-space can tell the difference between reserved-system-ram and
reserved-address-space.
> * For case (1), user space (kexec-tools) should rule out such regions
> in searching for free space for loaded data.
... but doesn't today, because it fails to account for second-level entries.
We've always had second-level entries, so this is a user-space bug. We need both
fixed to fix the issue.
Our attempts to fix this just in the kernel reached a dead end, because Kdump
needs to include reserved-system-ram, whereas kexec has to avoid it. User-space
needs to be able to tell reserved-system-ram and reserved-address-space apart.
Hence we need to expose that information, and pick it up in user-space.
Patched-kernel and unpatch-user-space will work the same way it does today, as
the additional reserved regions are ignored by user-space.
Unpatched-kernel and patched-user-space will also work the same way it does
today as the additional reserved regions are missing.
I think this is the only way forwards on this issue...
> * For case (2), the kernel should access ACPI tables by mapping
> them with appropriate memory attributes described in UEFI memory map.
> (This means that it doesn't require any changes in /proc/iomem, and
> hence user space.)
(this one is handled entirely in the kernel)
Thanks,
James
Hi Akashi,
On 15/06/18 08:56, AKASHI Takahiro wrote:
> This is a fix against the issue that crash dump kernel may hang up
> during booting, which can happen on any ACPI-based system with "ACPI
> Reclaim Memory."
>
> (kernel messages after panic kicked off kdump)
> (snip...)
> Bye!
> (snip...)
> ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> Internal error: Oops: 96000021 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> task: ffff000008d05180 task.stack: ffff000008cc0000
> PC is at acpi_ns_lookup+0x25c/0x3c0
> LR is at acpi_ds_load1_begin_op+0xa4/0x294
> (snip...)
> Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> Call trace:
> (snip...)
> [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> ---[ end trace c46ed37f9651c58e ]---
> Kernel panic - not syncing: Fatal exception
> Rebooting in 10 seconds..
>
> (diagnosis)
> * This fault is a data abort, alignment fault (ESR=0x96000021)
> during reading out ACPI table.
> * Initial ACPI tables are normally stored in system ram and marked as
> "ACPI Reclaim memory" by the firmware.
> * After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
> memory as MEMBLOCK_NOMAP"), those regions are differently handled
> as they are "memblock-reserved", without NOMAP bit.
> * So they are now excluded from device tree's "usable-memory-range"
> which kexec-tools determines based on a current view of /proc/iomem.
> * When crash dump kernel boots up, it tries to accesses ACPI tables by
> mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
> since they are no longer part of mapped system ram.
> * Given that ACPI accessor/helper functions are compiled in without
> unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
> any unaligned access to ACPI tables can cause a fatal panic.
>
> With this patch, acpi_os_ioremap() always honors memory attribute
> information provided by the firmware (EFI) and retaining cacheability
> allows the kernel safe access to ACPI tables.
> Please note that arm_enable_runtime_services() is now renamed to
> efi_enter_virtual_mode() due to the similarity to x86's.
Just a rename?:
> drivers/firmware/efi/arm-runtime.c | 27 ++++++++++++---------------
> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 32f465a80e4e..d53c95f4e1a9 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -29,18 +31,22 @@
>
> /* Basic configuration for ACPI */
> #ifdef CONFIG_ACPI
> +pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
> +
> /* ACPI table mapping after acpi_permanent_mmap is set */
> static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
> acpi_size size)
> {
> + /* For normal memory we already have a cacheable mapping. */
> + if (memblock_is_map_memory(phys))
> + return (void __iomem *)__phys_to_virt(phys);
> /*
> - * EFI's reserve_regions() call adds memory with the WB attribute
> - * to memblock via early_init_dt_add_memory_arch().
> + * We should still honor the memory's attribute here because
> + * crash dump kernel possibly excludes some ACPI (reclaim)
> + * regions from memblock list.
> */
(Even without kdump we would still need this. Regions ACPI wants mapped may not
be covered by the linear map. In this case we need to use the attributes
firmware described in the UEFI memory map. Kdump exacerbates this by
artificially reducing the range of the linear map.)
> - if (!memblock_is_memory(phys))
> - return ioremap(phys, size);
> -
> - return ioremap_cache(phys, size);
> + return __ioremap(phys, size, __acpi_get_mem_attribute(phys));
> }
> diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
> index 5889cbea60b8..566ef0a9edb5 100644
> --- a/drivers/firmware/efi/arm-runtime.c
> +++ b/drivers/firmware/efi/arm-runtime.c
> @@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
> * non-early mapping of the UEFI system table and virtual mappings for all
> * EFI_MEMORY_RUNTIME regions.
> */
> -static int __init arm_enable_runtime_services(void)
> +void __init efi_enter_virtual_mode(void)
> {
> u64 mapsize;
>
> if (!efi_enabled(EFI_BOOT)) {
> pr_info("EFI services will not be available.\n");
> - return 0;
> + return;
> + }
> +
> + mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> +
> + if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> + pr_err("Failed to remap EFI memory map\n");
> + return;
> }
>
> if (efi_runtime_disabled()) {
> pr_info("EFI runtime services will be disabled.\n");
> - return 0;
> + return;
> }
>
> if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> pr_info("EFI runtime services access via paravirt.\n");
> - return 0;
> + return;
> }
>
> pr_info("Remapping and enabling EFI services.\n");
>
> - mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> -
> - if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> - pr_err("Failed to remap EFI memory map\n");
> - return -ENOMEM;
> - }
> -
> if (!efi_virtmap_init()) {
> pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
> - return -ENOMEM;
> + return;
> }
>
> /* Set up runtime services function pointers */
> efi_native_runtime_setup();
> set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> -
> - return 0;
> }
Please have the drivers/firmware/efi/arm-runtime.c changes in a separate patch
(maybe combine it with patch 3). The 'efi/arm: ' prefix is more likely to catch
the maintainers attention.
I think this is what Ard meant by:
| Could you please move the changes to this file and init/main.c into a
| separate patch?
https://patchwork.kernel.org/patch/10361761/
> -early_initcall(arm_enable_runtime_services);
With just this patch, surely nothing ever calls arm_enable_runtime_services(),
and now acpi_os_ioremap() will return device memory for anything that isn't part
of the linear region. (This breaks RAS).
This will make it difficult to bisect through for any RAS or
efi-runtime-services issue. Its easily fixed: please put the efi+init changes in
a patch before the acpi_os_ioremap() changes.
Otherwise, looks good to me!
Thanks,
James
James,
Thank you for follow-up explanation.
I have nothing to add :)
-Takahiro AKASHI
On Fri, Jun 15, 2018 at 05:29:32PM +0100, James Morse wrote:
> Hi Akashi,
>
> Thanks for putting this together,
>
> On 15/06/18 08:56, AKASHI Takahiro wrote:
> > This patch series is a set of bug fixes to address kexec/kdump
> > failures which are sometimes observed on ACPI-only system and reported
> > in LAK-ML before.
> >
> > In short, the phenomena are:
> > 1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
> > by a new kernel (or other data) being loaded into System RAM. Currently
> > kexec may possibly allocate space ignoring such "reserved" regions.
> > We will see no messages after "Bye!"
> >
> > 2. crash dump (kdump) kernel can fail to boot and get into panic due to
> > an alignment fault when accessing ACPI tables. This can happen because
> > those tables are not always properly aligned while they are mapped
> > non-cacheable (ioremap'ed) as they are not recognized as part of System
> > RAM under the current implementation.
> >
> > After discussing several possibilities to address those issues,
> > the agreed approach, in my understanding, is
> > * to add resource entries for every "reserved", i.e. memblock_reserve(),
> > regions to /proc/iomem.
> > (NOMAP regions, also marked as "reserved," remains at top-level for
> > backward compatibility.)
>
> This means user-space can tell the difference between reserved-system-ram and
> reserved-address-space.
>
>
> > * For case (1), user space (kexec-tools) should rule out such regions
> > in searching for free space for loaded data.
>
> ... but doesn't today, because it fails to account for second-level entries.
> We've always had second-level entries, so this is a user-space bug. We need both
> fixed to fix the issue.
>
> Our attempts to fix this just in the kernel reached a dead end, because Kdump
> needs to include reserved-system-ram, whereas kexec has to avoid it. User-space
> needs to be able to tell reserved-system-ram and reserved-address-space apart.
> Hence we need to expose that information, and pick it up in user-space.
>
> Patched-kernel and unpatch-user-space will work the same way it does today, as
> the additional reserved regions are ignored by user-space.
>
> Unpatched-kernel and patched-user-space will also work the same way it does
> today as the additional reserved regions are missing.
>
> I think this is the only way forwards on this issue...
>
>
> > * For case (2), the kernel should access ACPI tables by mapping
> > them with appropriate memory attributes described in UEFI memory map.
> > (This means that it doesn't require any changes in /proc/iomem, and
> > hence user space.)
>
> (this one is handled entirely in the kernel)
>
>
> Thanks,
>
> James
James,
On Fri, Jun 15, 2018 at 05:30:08PM +0100, James Morse wrote:
> Hi Akashi,
>
> On 15/06/18 08:56, AKASHI Takahiro wrote:
> > This is a fix against the issue that crash dump kernel may hang up
> > during booting, which can happen on any ACPI-based system with "ACPI
> > Reclaim Memory."
> >
> > (kernel messages after panic kicked off kdump)
> > (snip...)
> > Bye!
> > (snip...)
> > ACPI: Core revision 20170728
> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> > Internal error: Oops: 96000021 [#1] SMP
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> > task: ffff000008d05180 task.stack: ffff000008cc0000
> > PC is at acpi_ns_lookup+0x25c/0x3c0
> > LR is at acpi_ds_load1_begin_op+0xa4/0x294
> > (snip...)
> > Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> > Call trace:
> > (snip...)
> > [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> > [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> > [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> > [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> > [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> > [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> > [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> > [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> > [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> > [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> > [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> > [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> > [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> > Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> > ---[ end trace c46ed37f9651c58e ]---
> > Kernel panic - not syncing: Fatal exception
> > Rebooting in 10 seconds..
> >
> > (diagnosis)
> > * This fault is a data abort, alignment fault (ESR=0x96000021)
> > during reading out ACPI table.
> > * Initial ACPI tables are normally stored in system ram and marked as
> > "ACPI Reclaim memory" by the firmware.
> > * After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
> > memory as MEMBLOCK_NOMAP"), those regions are differently handled
> > as they are "memblock-reserved", without NOMAP bit.
> > * So they are now excluded from device tree's "usable-memory-range"
> > which kexec-tools determines based on a current view of /proc/iomem.
> > * When crash dump kernel boots up, it tries to accesses ACPI tables by
> > mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
> > since they are no longer part of mapped system ram.
> > * Given that ACPI accessor/helper functions are compiled in without
> > unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
> > any unaligned access to ACPI tables can cause a fatal panic.
> >
> > With this patch, acpi_os_ioremap() always honors memory attribute
> > information provided by the firmware (EFI) and retaining cacheability
> > allows the kernel safe access to ACPI tables.
>
>
> > Please note that arm_enable_runtime_services() is now renamed to
> > efi_enter_virtual_mode() due to the similarity to x86's.
>
> Just a rename?:
and maps EFI memory map whether or not runtime service is enabled.
> > drivers/firmware/efi/arm-runtime.c | 27 ++++++++++++---------------
>
>
>
> > diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> > index 32f465a80e4e..d53c95f4e1a9 100644
> > --- a/arch/arm64/include/asm/acpi.h
> > +++ b/arch/arm64/include/asm/acpi.h
> > @@ -29,18 +31,22 @@
> >
> > /* Basic configuration for ACPI */
> > #ifdef CONFIG_ACPI
> > +pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
> > +
> > /* ACPI table mapping after acpi_permanent_mmap is set */
> > static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
> > acpi_size size)
> > {
> > + /* For normal memory we already have a cacheable mapping. */
> > + if (memblock_is_map_memory(phys))
> > + return (void __iomem *)__phys_to_virt(phys);
>
> > /*
> > - * EFI's reserve_regions() call adds memory with the WB attribute
> > - * to memblock via early_init_dt_add_memory_arch().
> > + * We should still honor the memory's attribute here because
> > + * crash dump kernel possibly excludes some ACPI (reclaim)
> > + * regions from memblock list.
> > */
>
> (Even without kdump we would still need this. Regions ACPI wants mapped may not
> be covered by the linear map. In this case we need to use the attributes
> firmware described in the UEFI memory map. Kdump exacerbates this by
> artificially reducing the range of the linear map.)
>
>
> > - if (!memblock_is_memory(phys))
> > - return ioremap(phys, size);
> > -
> > - return ioremap_cache(phys, size);
> > + return __ioremap(phys, size, __acpi_get_mem_attribute(phys));
> > }
>
>
>
> > diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
> > index 5889cbea60b8..566ef0a9edb5 100644
> > --- a/drivers/firmware/efi/arm-runtime.c
> > +++ b/drivers/firmware/efi/arm-runtime.c
> > @@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
> > * non-early mapping of the UEFI system table and virtual mappings for all
> > * EFI_MEMORY_RUNTIME regions.
> > */
> > -static int __init arm_enable_runtime_services(void)
> > +void __init efi_enter_virtual_mode(void)
> > {
> > u64 mapsize;
> >
> > if (!efi_enabled(EFI_BOOT)) {
> > pr_info("EFI services will not be available.\n");
> > - return 0;
> > + return;
> > + }
> > +
> > + mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> > +
> > + if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> > + pr_err("Failed to remap EFI memory map\n");
> > + return;
> > }
> >
> > if (efi_runtime_disabled()) {
> > pr_info("EFI runtime services will be disabled.\n");
> > - return 0;
> > + return;
> > }
> >
> > if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> > pr_info("EFI runtime services access via paravirt.\n");
> > - return 0;
> > + return;
> > }
> >
> > pr_info("Remapping and enabling EFI services.\n");
> >
> > - mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> > -
> > - if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> > - pr_err("Failed to remap EFI memory map\n");
> > - return -ENOMEM;
> > - }
> > -
> > if (!efi_virtmap_init()) {
> > pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
> > - return -ENOMEM;
> > + return;
> > }
> >
> > /* Set up runtime services function pointers */
> > efi_native_runtime_setup();
> > set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> > -
> > - return 0;
> > }
>
> Please have the drivers/firmware/efi/arm-runtime.c changes in a separate patch
> (maybe combine it with patch 3). The 'efi/arm: ' prefix is more likely to catch
> the maintainers attention.
>
> I think this is what Ard meant by:
> | Could you please move the changes to this file and init/main.c into a
> | separate patch?
>
> https://patchwork.kernel.org/patch/10361761/
>
>
> > -early_initcall(arm_enable_runtime_services);
>
> With just this patch, surely nothing ever calls arm_enable_runtime_services(),
> and now acpi_os_ioremap() will return device memory for anything that isn't part
> of the linear region. (This breaks RAS).
Actually I noticed the issue.
> This will make it difficult to bisect through for any RAS or
> efi-runtime-services issue. Its easily fixed: please put the efi+init changes in
> a patch before the acpi_os_ioremap() changes.
I was reluctant to put different part of code changes into one.
But if nobody cares, I will do so in three patches.
* change arm_enable_runtime_services() with renaming
* move this function earlier in start_kernel()
* modify acpi_os_ioremap()
Thanks,
-Takahiro AKASHI
> Otherwise, looks good to me!
>
>
> Thanks,
>
> James