This patch series is a set of bug fixes to address kexec/kdump
failures which are sometimes observed on ACPI-only system and reported
in LAK-ML before.
In short, the phenomena are:
1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
by a new kernel (or other data) being loaded into System RAM. Currently
kexec may possibly allocate space ignoring such "reserved" regions.
We will see no messages after "Bye!"
2. crash dump (kdump) kernel can fail to boot and get into panic due to
an alignment fault when accessing ACPI tables. This can happen because
those tables are not always properly aligned while they are mapped
non-cacheable (ioremap'ed) as they are not recognized as part of System
RAM under the current implementation.
After discussing several possibilities to address those issues,
the agreed approach, in my understanding, is
* to add resource entries for every "reserved", i.e. memblock_reserve(),
regions to /proc/iomem.
(NOMAP regions, also marked as "reserved," remains at top-level for
backward compatibility. User-space can tell the difference between
reserved-system-ram and reserved-address-space.)
* For case (1), user space (kexec-tools) should rule out such regions
in searching for free space for loaded data.
* For case (2), the kernel should access ACPI tables by mapping
them with appropriate memory attributes described in UEFI memory map.
(This means that it doesn't require any changes in /proc/iomem, and
hence user space.)
Please find past discussions about /proc/iomem in [1].
--- more words from James ---
Our attempts to fix this just in the kernel reached a dead end, because Kdump
needs to include reserved-system-ram, whereas kexec has to avoid it. User-space
needs to be able to tell reserved-system-ram and reserved-address-space apart.
Hence we need to expose that information, and pick it up in user-space.
Patched-kernel and unpatch-user-space will work the same way it does today, as
the additional reserved regions are ignored by user-space.
Unpatched-kernel and patched-user-space will also work the same way it does
today as the additional reserved regions are missing.
--->8---
Patch#1 addresses kexec case, for which you are also required to update
user space. See necessary patches in [2]. If you want to review Patch#1,
please also take a look at and review [2].
Patch#2 to #4 addresses kdump case. This is a revised version after
Ard's comments.[3]
Changes in v2 (2018, June 19, 2018)
* re-organise v1's patch#2 and #3 into v2's #2, #3 and #4
not to break bisect
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565980.html
[2] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/resv_mem
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/573655.html
AKASHI Takahiro (3):
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: map UEFI memory map earlier on boot
arm64: acpi: fix alignment fault in accessing ACPI
James Morse (1):
arm64: export memblock_reserve()d regions via /proc/iomem
arch/arm64/include/asm/acpi.h | 23 ++++++++++++------
arch/arm64/kernel/acpi.c | 11 +++------
arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++
drivers/firmware/efi/arm-runtime.c | 27 ++++++++++-----------
init/main.c | 3 +++
5 files changed, 72 insertions(+), 30 deletions(-)
--
2.17.0
From: James Morse <[email protected]>
There has been some confusion around what is necessary to prevent kexec
overwriting important memory regions. memblock: reserve, or nomap?
Only memblock nomap regions are reported via /proc/iomem, kexec's
user-space doesn't know about memblock_reserve()d regions.
Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
and thus possible for kexec to overwrite with the new kernel or initrd.
But this was always broken, as the UEFI memory map is also reserved
and not marked as nomap.
Exporting both nomap and reserved memblock types is a nuisance as
they live in different memblock structures which we can't walk at
the same time.
Take a second walk over memblock.reserved and add new 'reserved'
subnodes for the memblock_reserved() regions that aren't already
described by the existing code. (e.g. Kernel Code)
We use reserve_region_with_split() to find the gaps in existing named
regions. This handles the gap between 'kernel code' and 'kernel data'
which is memblock_reserve()d, but already partially described by
request_standard_resources(). e.g.:
| 80000000-dfffffff : System RAM
| 80080000-80ffffff : Kernel code
| 81000000-8158ffff : reserved
| 81590000-8237efff : Kernel data
| a0000000-dfffffff : Crash kernel
| e00f0000-f949ffff : System RAM
reserve_region_with_split needs kzalloc() which isn't available when
request_standard_resources() is called, use an initcall.
Reported-by: Bhupesh Sharma <[email protected]>
Reported-by: Tyler Baicar <[email protected]>
Suggested-by: Akashi Takahiro <[email protected]>
Signed-off-by: James Morse <[email protected]>
Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support")
CC: Ard Biesheuvel <[email protected]>
CC: Mark Rutland <[email protected]>
---
arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 30ad2f085d1f..5b4fac434c84 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
}
}
+static int __init reserve_memblock_reserved_regions(void)
+{
+ phys_addr_t start, end, roundup_end = 0;
+ struct resource *mem, *res;
+ u64 i;
+
+ for_each_reserved_mem_region(i, &start, &end) {
+ if (end <= roundup_end)
+ continue; /* done already */
+
+ start = __pfn_to_phys(PFN_DOWN(start));
+ end = __pfn_to_phys(PFN_UP(end)) - 1;
+ roundup_end = end;
+
+ res = kzalloc(sizeof(*res), GFP_ATOMIC);
+ if (WARN_ON(!res))
+ return -ENOMEM;
+ res->start = start;
+ res->end = end;
+ res->name = "reserved";
+ res->flags = IORESOURCE_MEM;
+
+ mem = request_resource_conflict(&iomem_resource, res);
+ /*
+ * We expected memblock_reserve() regions to conflict with
+ * memory created by request_standard_resources().
+ */
+ if (WARN_ON_ONCE(!mem))
+ continue;
+ kfree(res);
+
+ reserve_region_with_split(mem, start, end, "reserved");
+ }
+
+ return 0;
+}
+arch_initcall(reserve_memblock_reserved_regions);
+
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
void __init setup_arch(char **cmdline_p)
--
2.17.0
Under the current implementation, UEFI memory map will be mapped and made
available in virtual mappings only if runtime services are enabled.
But in a later patch, we want to use UEFI memory map in acpi_os_ioremap()
to create mappings of ACPI tables using memory attributes described in
UEFI memory map.
So, as a first step, arm_enter_runtime_services() will be modified
so that UEFI memory map will be always accessible.
See a relevant commit:
arm64: acpi: fix alignment fault in accessing ACPI tables
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/arm-runtime.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 5889cbea60b8..30ac5c82051e 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -115,6 +115,13 @@ static int __init arm_enable_runtime_services(void)
return 0;
}
+ mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
+
+ if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
+ pr_err("Failed to remap EFI memory map\n");
+ return 0;
+ }
+
if (efi_runtime_disabled()) {
pr_info("EFI runtime services will be disabled.\n");
return 0;
@@ -127,13 +134,6 @@ static int __init arm_enable_runtime_services(void)
pr_info("Remapping and enabling EFI services.\n");
- mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
-
- if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
- pr_err("Failed to remap EFI memory map\n");
- return -ENOMEM;
- }
-
if (!efi_virtmap_init()) {
pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
return -ENOMEM;
--
2.17.0
This is a fix against the issue that crash dump kernel may hang up
during booting, which can happen on any ACPI-based system with "ACPI
Reclaim Memory."
(kernel messages after panic kicked off kdump)
(snip...)
Bye!
(snip...)
ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
Internal error: Oops: 96000021 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
task: ffff000008d05180 task.stack: ffff000008cc0000
PC is at acpi_ns_lookup+0x25c/0x3c0
LR is at acpi_ds_load1_begin_op+0xa4/0x294
(snip...)
Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
Call trace:
(snip...)
[<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[<ffff000008b70d50>] start_kernel+0x3b4/0x43c
Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
---[ end trace c46ed37f9651c58e ]---
Kernel panic - not syncing: Fatal exception
Rebooting in 10 seconds..
(diagnosis)
* This fault is a data abort, alignment fault (ESR=0x96000021)
during reading out ACPI table.
* Initial ACPI tables are normally stored in system ram and marked as
"ACPI Reclaim memory" by the firmware.
* After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
memory as MEMBLOCK_NOMAP"), those regions are differently handled
as they are "memblock-reserved", without NOMAP bit.
* So they are now excluded from device tree's "usable-memory-range"
which kexec-tools determines based on a current view of /proc/iomem.
* When crash dump kernel boots up, it tries to accesses ACPI tables by
mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
since they are no longer part of mapped system ram.
* Given that ACPI accessor/helper functions are compiled in without
unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
any unaligned access to ACPI tables can cause a fatal panic.
With this patch, acpi_os_ioremap() always honors memory attribute
information provided by the firmware (EFI) and retaining cacheability
allows the kernel safe access to ACPI tables.
Signed-off-by: AKASHI Takahiro <[email protected]>
Suggested-by: James Morse <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Reported-by and Tested-by: Bhupesh Sharma <[email protected]>
---
arch/arm64/include/asm/acpi.h | 23 ++++++++++++++++-------
arch/arm64/kernel/acpi.c | 11 +++--------
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 0db62a4cbce2..68bc18cb2b85 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -12,10 +12,12 @@
#ifndef _ASM_ACPI_H
#define _ASM_ACPI_H
+#include <linux/efi.h>
#include <linux/memblock.h>
#include <linux/psci.h>
#include <asm/cputype.h>
+#include <asm/io.h>
#include <asm/smp_plat.h>
#include <asm/tlbflush.h>
@@ -29,18 +31,22 @@
/* Basic configuration for ACPI */
#ifdef CONFIG_ACPI
+pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
+
/* ACPI table mapping after acpi_permanent_mmap is set */
static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
acpi_size size)
{
+ /* For normal memory we already have a cacheable mapping. */
+ if (memblock_is_map_memory(phys))
+ return (void __iomem *)__phys_to_virt(phys);
+
/*
- * EFI's reserve_regions() call adds memory with the WB attribute
- * to memblock via early_init_dt_add_memory_arch().
+ * We should still honor the memory's attribute here because
+ * crash dump kernel possibly excludes some ACPI (reclaim)
+ * regions from memblock list.
*/
- if (!memblock_is_memory(phys))
- return ioremap(phys, size);
-
- return ioremap_cache(phys, size);
+ return __ioremap(phys, size, __acpi_get_mem_attribute(phys));
}
#define acpi_os_ioremap acpi_os_ioremap
@@ -129,7 +135,10 @@ static inline const char *acpi_get_enable_method(int cpu)
* for compatibility.
*/
#define acpi_disable_cmcff 1
-pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
+{
+ return __acpi_get_mem_attribute(addr);
+}
#endif /* CONFIG_ACPI_APEI */
#ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..ed46dc188b22 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -18,6 +18,7 @@
#include <linux/acpi.h>
#include <linux/bootmem.h>
#include <linux/cpumask.h>
+#include <linux/efi.h>
#include <linux/efi-bgrt.h>
#include <linux/init.h>
#include <linux/irq.h>
@@ -29,13 +30,9 @@
#include <asm/cputype.h>
#include <asm/cpu_ops.h>
+#include <asm/pgtable.h>
#include <asm/smp_plat.h>
-#ifdef CONFIG_ACPI_APEI
-# include <linux/efi.h>
-# include <asm/pgtable.h>
-#endif
-
int acpi_noirq = 1; /* skip ACPI IRQ initialization */
int acpi_disabled = 1;
EXPORT_SYMBOL(acpi_disabled);
@@ -239,8 +236,7 @@ void __init acpi_boot_table_init(void)
}
}
-#ifdef CONFIG_ACPI_APEI
-pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
+pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
{
/*
* According to "Table 8 Map: EFI memory types to AArch64 memory
@@ -261,4 +257,3 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
return __pgprot(PROT_NORMAL_NC);
return __pgprot(PROT_DEVICE_nGnRnE);
}
-#endif
--
2.17.0
Since arm_enter_runtime_services() was modified to always create a virtual
mapping of UEFI memory map in the previous patch, it is now renamed to
efi_enter_virtual_mode() and called earlier before acpi_load_tables()
in acpi_early_init().
This will allow us to use UEFI memory map in acpi_os_ioremap() to create
mappings of ACPI tables using memory attributes described in UEFI memory
map.
See a relevant commit:
arm64: acpi: fix alignment fault in accessing ACPI tables
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Andrew Morton <[email protected]>
---
drivers/firmware/efi/arm-runtime.c | 15 ++++++---------
init/main.c | 3 +++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 30ac5c82051e..566ef0a9edb5 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
* non-early mapping of the UEFI system table and virtual mappings for all
* EFI_MEMORY_RUNTIME regions.
*/
-static int __init arm_enable_runtime_services(void)
+void __init efi_enter_virtual_mode(void)
{
u64 mapsize;
if (!efi_enabled(EFI_BOOT)) {
pr_info("EFI services will not be available.\n");
- return 0;
+ return;
}
mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
pr_err("Failed to remap EFI memory map\n");
- return 0;
+ return;
}
if (efi_runtime_disabled()) {
pr_info("EFI runtime services will be disabled.\n");
- return 0;
+ return;
}
if (efi_enabled(EFI_RUNTIME_SERVICES)) {
pr_info("EFI runtime services access via paravirt.\n");
- return 0;
+ return;
}
pr_info("Remapping and enabling EFI services.\n");
if (!efi_virtmap_init()) {
pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
- return -ENOMEM;
+ return;
}
/* Set up runtime services function pointers */
efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
-
- return 0;
}
-early_initcall(arm_enable_runtime_services);
void efi_virtmap_load(void)
{
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..532fc0d02353 100644
--- a/init/main.c
+++ b/init/main.c
@@ -694,6 +694,9 @@ asmlinkage __visible void __init start_kernel(void)
debug_objects_mem_init();
setup_per_cpu_pageset();
numa_policy_init();
+ if (IS_ENABLED(CONFIG_EFI) &&
+ (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)))
+ efi_enter_virtual_mode();
acpi_early_init();
if (late_time_init)
late_time_init();
--
2.17.0
On 06/19/2018 01:44 AM, AKASHI Takahiro wrote:
> +static int __init reserve_memblock_reserved_regions(void)
> +{
> + phys_addr_t start, end, roundup_end = 0;
> + struct resource *mem, *res;
> + u64 i;
> +
> + for_each_reserved_mem_region(i, &start, &end) {
> + if (end <= roundup_end)
> + continue; /* done already */
> +
> + start = __pfn_to_phys(PFN_DOWN(start));
> + end = __pfn_to_phys(PFN_UP(end)) - 1;
> + roundup_end = end;
> +
> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
> + if (WARN_ON(!res))
> + return -ENOMEM;
> + res->start = start;
> + res->end = end;
> + res->name = "reserved";
> + res->flags = IORESOURCE_MEM;
> +
> + mem = request_resource_conflict(&iomem_resource, res);
> + /*
> + * We expected memblock_reserve() regions to conflict with
> + * memory created by request_standard_resources().
> + */
> + if (WARN_ON_ONCE(!mem))
> + continue;
> + kfree(res);
Why is kfree() after the conditional continue? This is a memory leak.
> +
> + reserve_region_with_split(mem, start, end, "reserved");
> + }
> +
> + return 0;
> +}
> +arch_initcall(reserve_memblock_reserved_regions);
> +
> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>
> void __init setup_arch(char **cmdline_p)
>
Hi Dave,
On 19/06/18 14:37, Dave Kleikamp wrote:
> On 06/19/2018 01:44 AM, AKASHI Takahiro wrote:
>> +static int __init reserve_memblock_reserved_regions(void)
>> +{
>> + phys_addr_t start, end, roundup_end = 0;
>> + struct resource *mem, *res;
>> + u64 i;
>> +
>> + for_each_reserved_mem_region(i, &start, &end) {
>> + if (end <= roundup_end)
>> + continue; /* done already */
>> +
>> + start = __pfn_to_phys(PFN_DOWN(start));
>> + end = __pfn_to_phys(PFN_UP(end)) - 1;
>> + roundup_end = end;
>> +
>> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
>> + if (WARN_ON(!res))
>> + return -ENOMEM;
>> + res->start = start;
>> + res->end = end;
>> + res->name = "reserved";
>> + res->flags = IORESOURCE_MEM;
>> +
>> + mem = request_resource_conflict(&iomem_resource, res);
>> + /*
>> + * We expected memblock_reserve() regions to conflict with
>> + * memory created by request_standard_resources().
>> + */
>> + if (WARN_ON_ONCE(!mem))
>> + continue;
>> + kfree(res);
>
> Why is kfree() after the conditional continue? This is a memory leak.
request_resource_conflict() inserts res into the iomem_resource tree, or returns
the conflict if there is already something at this address.
We expect something at this address: a 'System RAM' section added by
request_resource(). This code wants the conflicting 'System RAM' entry so that
reserve_region_with_split() can fill in the gaps below it with 'reserved'. See
the commit-message for an example.
If there was no conflict, it means the memory map doesn't look like we expect,
we can't pass NULL to reserve_region_with_split(), and we just inserted the
'reserved' at the top level. Freeing res at this point would be a use-after-free
hanging from /proc/iomem.
This code generates a WARN_ON_ONCE() and leaves the 'reserved' description where
it is.
Trying to cleanup here is pointless, we can remove the resource entry and free
it ... but we still want to report it as reserved, which is what just happened,
just not quite how we we wanted it.
If you ever see this warning, it means some RAM stopped being nomap between
request_resources() and reserve_memblock_reserved_regions(). I can't find any
case where that ever happens.
If all that makes sense: how can I improve the comment above the WARN_ON_ONCE()
to make it clearer?
Thanks,
James
>> +
>> + reserve_region_with_split(mem, start, end, "reserved");
>> + }
>> +
>> + return 0;
>> +}
>> +arch_initcall(reserve_memblock_reserved_regions);
>> +
>> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>
>> void __init setup_arch(char **cmdline_p)
>>
On 06/19/2018 10:00 AM, James Morse wrote:
> Hi Dave,
>
> On 19/06/18 14:37, Dave Kleikamp wrote:
>> On 06/19/2018 01:44 AM, AKASHI Takahiro wrote:
>>> +static int __init reserve_memblock_reserved_regions(void)
>>> +{
>>> + phys_addr_t start, end, roundup_end = 0;
>>> + struct resource *mem, *res;
>>> + u64 i;
>>> +
>>> + for_each_reserved_mem_region(i, &start, &end) {
>>> + if (end <= roundup_end)
>>> + continue; /* done already */
>>> +
>>> + start = __pfn_to_phys(PFN_DOWN(start));
>>> + end = __pfn_to_phys(PFN_UP(end)) - 1;
>>> + roundup_end = end;
>>> +
>>> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
>>> + if (WARN_ON(!res))
>>> + return -ENOMEM;
>>> + res->start = start;
>>> + res->end = end;
>>> + res->name = "reserved";
>>> + res->flags = IORESOURCE_MEM;
>>> +
>>> + mem = request_resource_conflict(&iomem_resource, res);
>>> + /*
>>> + * We expected memblock_reserve() regions to conflict with
>>> + * memory created by request_standard_resources().
>>> + */
>>> + if (WARN_ON_ONCE(!mem))
>>> + continue;
>>> + kfree(res);
>>
>> Why is kfree() after the conditional continue? This is a memory leak.
>
> request_resource_conflict() inserts res into the iomem_resource tree, or returns
> the conflict if there is already something at this address.
>
> We expect something at this address: a 'System RAM' section added by
> request_resource(). This code wants the conflicting 'System RAM' entry so that
> reserve_region_with_split() can fill in the gaps below it with 'reserved'. See
> the commit-message for an example.
>
> If there was no conflict, it means the memory map doesn't look like we expect,
> we can't pass NULL to reserve_region_with_split(), and we just inserted the
> 'reserved' at the top level. Freeing res at this point would be a use-after-free
> hanging from /proc/iomem.
> This code generates a WARN_ON_ONCE() and leaves the 'reserved' description where
> it is.
Okay. I get it.
> Trying to cleanup here is pointless, we can remove the resource entry and free
> it ... but we still want to report it as reserved, which is what just happened,
> just not quite how we we wanted it.
>
> If you ever see this warning, it means some RAM stopped being nomap between
> request_resources() and reserve_memblock_reserved_regions(). I can't find any
> case where that ever happens.
>
>
> If all that makes sense: how can I improve the comment above the WARN_ON_ONCE()
> to make it clearer?
I guess something like you described above.
/*
* We expect a 'System RAM' section at this address. In the unexpected
* case where one is not found, request_resource_conflict() will insert
* res into the iomem_resource tree.
*/
Do you think this is clearer?
>
>
> Thanks,
>
> James
>
>
>>> +
>>> + reserve_region_with_split(mem, start, end, "reserved");
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +arch_initcall(reserve_memblock_reserved_regions);
>>> +
>>> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>>
>>> void __init setup_arch(char **cmdline_p)
>>>
>
Hi Akashi,
On 19/06/18 07:44, AKASHI Takahiro wrote:
> This is a fix against the issue that crash dump kernel may hang up
> during booting, which can happen on any ACPI-based system with "ACPI
> Reclaim Memory."
>
> (kernel messages after panic kicked off kdump)
> (snip...)
> Bye!
> (snip...)
> ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> Internal error: Oops: 96000021 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> task: ffff000008d05180 task.stack: ffff000008cc0000
> PC is at acpi_ns_lookup+0x25c/0x3c0
> LR is at acpi_ds_load1_begin_op+0xa4/0x294
> (snip...)
> Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> Call trace:
> (snip...)
> [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> ---[ end trace c46ed37f9651c58e ]---
> Kernel panic - not syncing: Fatal exception
> Rebooting in 10 seconds..
>
> (diagnosis)
> * This fault is a data abort, alignment fault (ESR=0x96000021)
> during reading out ACPI table.
> * Initial ACPI tables are normally stored in system ram and marked as
> "ACPI Reclaim memory" by the firmware.
> * After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
> memory as MEMBLOCK_NOMAP"), those regions are differently handled
> as they are "memblock-reserved", without NOMAP bit.
> * So they are now excluded from device tree's "usable-memory-range"
> which kexec-tools determines based on a current view of /proc/iomem.
> * When crash dump kernel boots up, it tries to accesses ACPI tables by
> mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
> since they are no longer part of mapped system ram.
> * Given that ACPI accessor/helper functions are compiled in without
> unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
> any unaligned access to ACPI tables can cause a fatal panic.
>
> With this patch, acpi_os_ioremap() always honors memory attribute
> information provided by the firmware (EFI) and retaining cacheability
> allows the kernel safe access to ACPI tables.
Reviewed-by: James Morse <[email protected]>
Thanks,
James
Hi Akashi,
On 19/06/18 07:44, AKASHI Takahiro wrote:
> Under the current implementation, UEFI memory map will be mapped and made
> available in virtual mappings only if runtime services are enabled.
> But in a later patch, we want to use UEFI memory map in acpi_os_ioremap()
> to create mappings of ACPI tables using memory attributes described in
> UEFI memory map.
>
> So, as a first step, arm_enter_runtime_services() will be modified
> so that UEFI memory map will be always accessible.
>
> See a relevant commit:
> arm64: acpi: fix alignment fault in accessing ACPI tables
For what its worth:
Acked-by: James Morse <[email protected]>
Thanks,
James
On Tue, Jun 19, 2018 at 10:22:46AM -0500, Dave Kleikamp wrote:
> On 06/19/2018 10:00 AM, James Morse wrote:
> > Hi Dave,
> >
> > On 19/06/18 14:37, Dave Kleikamp wrote:
> >> On 06/19/2018 01:44 AM, AKASHI Takahiro wrote:
> >>> +static int __init reserve_memblock_reserved_regions(void)
> >>> +{
> >>> + phys_addr_t start, end, roundup_end = 0;
> >>> + struct resource *mem, *res;
> >>> + u64 i;
> >>> +
> >>> + for_each_reserved_mem_region(i, &start, &end) {
> >>> + if (end <= roundup_end)
> >>> + continue; /* done already */
> >>> +
> >>> + start = __pfn_to_phys(PFN_DOWN(start));
> >>> + end = __pfn_to_phys(PFN_UP(end)) - 1;
> >>> + roundup_end = end;
> >>> +
> >>> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
> >>> + if (WARN_ON(!res))
> >>> + return -ENOMEM;
> >>> + res->start = start;
> >>> + res->end = end;
> >>> + res->name = "reserved";
> >>> + res->flags = IORESOURCE_MEM;
> >>> +
> >>> + mem = request_resource_conflict(&iomem_resource, res);
> >>> + /*
> >>> + * We expected memblock_reserve() regions to conflict with
> >>> + * memory created by request_standard_resources().
> >>> + */
> >>> + if (WARN_ON_ONCE(!mem))
> >>> + continue;
> >>> + kfree(res);
> >>
> >> Why is kfree() after the conditional continue? This is a memory leak.
> >
> > request_resource_conflict() inserts res into the iomem_resource tree, or returns
> > the conflict if there is already something at this address.
> >
> > We expect something at this address: a 'System RAM' section added by
> > request_resource(). This code wants the conflicting 'System RAM' entry so that
> > reserve_region_with_split() can fill in the gaps below it with 'reserved'. See
> > the commit-message for an example.
> >
> > If there was no conflict, it means the memory map doesn't look like we expect,
> > we can't pass NULL to reserve_region_with_split(), and we just inserted the
> > 'reserved' at the top level. Freeing res at this point would be a use-after-free
> > hanging from /proc/iomem.
> > This code generates a WARN_ON_ONCE() and leaves the 'reserved' description where
> > it is.
>
> Okay. I get it.
>
> > Trying to cleanup here is pointless, we can remove the resource entry and free
> > it ... but we still want to report it as reserved, which is what just happened,
> > just not quite how we we wanted it.
> >
> > If you ever see this warning, it means some RAM stopped being nomap between
> > request_resources() and reserve_memblock_reserved_regions(). I can't find any
> > case where that ever happens.
> >
> >
> > If all that makes sense: how can I improve the comment above the WARN_ON_ONCE()
> > to make it clearer?
>
> I guess something like you described above.
>
> /*
> * We expect a 'System RAM' section at this address. In the unexpected
> * case where one is not found, request_resource_conflict() will insert
> * res into the iomem_resource tree.
> */
>
> Do you think this is clearer?
If this is the only change needed in my patchset, I'd like the maintainers
to take care of it instead of my posting another version.
Thanks,
-Takahiro AKASHI
> >
> >
> > Thanks,
> >
> > James
> >
> >
> >>> +
> >>> + reserve_region_with_split(mem, start, end, "reserved");
> >>> + }
> >>> +
> >>> + return 0;
> >>> +}
> >>> +arch_initcall(reserve_memblock_reserved_regions);
> >>> +
> >>> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
> >>>
> >>> void __init setup_arch(char **cmdline_p)
> >>>
> >
On Tue, Jul 3, 2018 at 12:17 PM, AKASHI Takahiro
<[email protected]> wrote:
> On Tue, Jun 19, 2018 at 10:22:46AM -0500, Dave Kleikamp wrote:
>> On 06/19/2018 10:00 AM, James Morse wrote:
>> > Hi Dave,
>> >
>> > On 19/06/18 14:37, Dave Kleikamp wrote:
>> >> On 06/19/2018 01:44 AM, AKASHI Takahiro wrote:
>> >>> +static int __init reserve_memblock_reserved_regions(void)
>> >>> +{
>> >>> + phys_addr_t start, end, roundup_end = 0;
>> >>> + struct resource *mem, *res;
>> >>> + u64 i;
>> >>> +
>> >>> + for_each_reserved_mem_region(i, &start, &end) {
>> >>> + if (end <= roundup_end)
>> >>> + continue; /* done already */
>> >>> +
>> >>> + start = __pfn_to_phys(PFN_DOWN(start));
>> >>> + end = __pfn_to_phys(PFN_UP(end)) - 1;
>> >>> + roundup_end = end;
>> >>> +
>> >>> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
>> >>> + if (WARN_ON(!res))
>> >>> + return -ENOMEM;
>> >>> + res->start = start;
>> >>> + res->end = end;
>> >>> + res->name = "reserved";
>> >>> + res->flags = IORESOURCE_MEM;
>> >>> +
>> >>> + mem = request_resource_conflict(&iomem_resource, res);
>> >>> + /*
>> >>> + * We expected memblock_reserve() regions to conflict with
>> >>> + * memory created by request_standard_resources().
>> >>> + */
>> >>> + if (WARN_ON_ONCE(!mem))
>> >>> + continue;
>> >>> + kfree(res);
>> >>
>> >> Why is kfree() after the conditional continue? This is a memory leak.
>> >
>> > request_resource_conflict() inserts res into the iomem_resource tree, or returns
>> > the conflict if there is already something at this address.
>> >
>> > We expect something at this address: a 'System RAM' section added by
>> > request_resource(). This code wants the conflicting 'System RAM' entry so that
>> > reserve_region_with_split() can fill in the gaps below it with 'reserved'. See
>> > the commit-message for an example.
>> >
>> > If there was no conflict, it means the memory map doesn't look like we expect,
>> > we can't pass NULL to reserve_region_with_split(), and we just inserted the
>> > 'reserved' at the top level. Freeing res at this point would be a use-after-free
>> > hanging from /proc/iomem.
>> > This code generates a WARN_ON_ONCE() and leaves the 'reserved' description where
>> > it is.
>>
>> Okay. I get it.
>>
>> > Trying to cleanup here is pointless, we can remove the resource entry and free
>> > it ... but we still want to report it as reserved, which is what just happened,
>> > just not quite how we we wanted it.
>> >
>> > If you ever see this warning, it means some RAM stopped being nomap between
>> > request_resources() and reserve_memblock_reserved_regions(). I can't find any
>> > case where that ever happens.
>> >
>> >
>> > If all that makes sense: how can I improve the comment above the WARN_ON_ONCE()
>> > to make it clearer?
>>
>> I guess something like you described above.
>>
>> /*
>> * We expect a 'System RAM' section at this address. In the unexpected
>> * case where one is not found, request_resource_conflict() will insert
>> * res into the iomem_resource tree.
>> */
>>
>> Do you think this is clearer?
>
> If this is the only change needed in my patchset, I'd like the maintainers
> to take care of it instead of my posting another version.
>
+1.
crashkernel booting on arm64 machines which support boot via acpi
tables is broken since a long time, so it would be great to get these
into upstream asap.
I think the comment can be addressed while picking up the patchset in
the maintainer's tree.
I am not sure whether it will go through the ARM tree (Will) or the
EFI tree (Ard), but since this has a Tested-by (from myself) and
Reviewed-by (from James), probably this can be pulled in to allow
further tests via the maintainer's tree.
Thanks,
Bhupesh
>> >
>> >
>> > Thanks,
>> >
>> > James
>> >
>> >
>> >>> +
>> >>> + reserve_region_with_split(mem, start, end, "reserved");
>> >>> + }
>> >>> +
>> >>> + return 0;
>> >>> +}
>> >>> +arch_initcall(reserve_memblock_reserved_regions);
>> >>> +
>> >>> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>> >>>
>> >>> void __init setup_arch(char **cmdline_p)
>> >>>
>> >
On 07/03/2018 01:47 AM, AKASHI Takahiro wrote:
> On Tue, Jun 19, 2018 at 10:22:46AM -0500, Dave Kleikamp wrote:
>> On 06/19/2018 10:00 AM, James Morse wrote:
>>> Hi Dave,
>>>
>>> On 19/06/18 14:37, Dave Kleikamp wrote:
>>>> On 06/19/2018 01:44 AM, AKASHI Takahiro wrote:
>>>>> +static int __init reserve_memblock_reserved_regions(void)
>>>>> +{
>>>>> + phys_addr_t start, end, roundup_end = 0;
>>>>> + struct resource *mem, *res;
>>>>> + u64 i;
>>>>> +
>>>>> + for_each_reserved_mem_region(i, &start, &end) {
>>>>> + if (end <= roundup_end)
>>>>> + continue; /* done already */
>>>>> +
>>>>> + start = __pfn_to_phys(PFN_DOWN(start));
>>>>> + end = __pfn_to_phys(PFN_UP(end)) - 1;
>>>>> + roundup_end = end;
>>>>> +
>>>>> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
>>>>> + if (WARN_ON(!res))
>>>>> + return -ENOMEM;
>>>>> + res->start = start;
>>>>> + res->end = end;
>>>>> + res->name = "reserved";
>>>>> + res->flags = IORESOURCE_MEM;
>>>>> +
>>>>> + mem = request_resource_conflict(&iomem_resource, res);
>>>>> + /*
>>>>> + * We expected memblock_reserve() regions to conflict with
>>>>> + * memory created by request_standard_resources().
>>>>> + */
>>>>> + if (WARN_ON_ONCE(!mem))
>>>>> + continue;
>>>>> + kfree(res);
>>>>
>>>> Why is kfree() after the conditional continue? This is a memory leak.
>>>
>>> request_resource_conflict() inserts res into the iomem_resource tree, or returns
>>> the conflict if there is already something at this address.
>>>
>>> We expect something at this address: a 'System RAM' section added by
>>> request_resource(). This code wants the conflicting 'System RAM' entry so that
>>> reserve_region_with_split() can fill in the gaps below it with 'reserved'. See
>>> the commit-message for an example.
>>>
>>> If there was no conflict, it means the memory map doesn't look like we expect,
>>> we can't pass NULL to reserve_region_with_split(), and we just inserted the
>>> 'reserved' at the top level. Freeing res at this point would be a use-after-free
>>> hanging from /proc/iomem.
>>> This code generates a WARN_ON_ONCE() and leaves the 'reserved' description where
>>> it is.
>>
>> Okay. I get it.
>>
>>> Trying to cleanup here is pointless, we can remove the resource entry and free
>>> it ... but we still want to report it as reserved, which is what just happened,
>>> just not quite how we we wanted it.
>>>
>>> If you ever see this warning, it means some RAM stopped being nomap between
>>> request_resources() and reserve_memblock_reserved_regions(). I can't find any
>>> case where that ever happens.
>>>
>>>
>>> If all that makes sense: how can I improve the comment above the WARN_ON_ONCE()
>>> to make it clearer?
>>
>> I guess something like you described above.
>>
>> /*
>> * We expect a 'System RAM' section at this address. In the unexpected
>> * case where one is not found, request_resource_conflict() will insert
>> * res into the iomem_resource tree.
>> */
>>
>> Do you think this is clearer?
>
> If this is the only change needed in my patchset, I'd like the maintainers
> to take care of it instead of my posting another version.
Agreed. The sooner this gets fixed, the better.
Thanks,
Dave
>
> Thanks,
> -Takahiro AKASHI
>
>>>
>>>
>>> Thanks,
>>>
>>> James
>>>
>>>
>>>>> +
>>>>> + reserve_region_with_split(mem, start, end, "reserved");
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +arch_initcall(reserve_memblock_reserved_regions);
>>>>> +
>>>>> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>>>>
>>>>> void __init setup_arch(char **cmdline_p)
>>>>>
>>>
Hi all,
[Ard -- please can you look at the EFI parts of this patch]
On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
> Since arm_enter_runtime_services() was modified to always create a virtual
> mapping of UEFI memory map in the previous patch, it is now renamed to
> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
> in acpi_early_init().
>
> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
> mappings of ACPI tables using memory attributes described in UEFI memory
> map.
>
> See a relevant commit:
> arm64: acpi: fix alignment fault in accessing ACPI tables
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> drivers/firmware/efi/arm-runtime.c | 15 ++++++---------
> init/main.c | 3 +++
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
> index 30ac5c82051e..566ef0a9edb5 100644
> --- a/drivers/firmware/efi/arm-runtime.c
> +++ b/drivers/firmware/efi/arm-runtime.c
> @@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
> * non-early mapping of the UEFI system table and virtual mappings for all
> * EFI_MEMORY_RUNTIME regions.
> */
> -static int __init arm_enable_runtime_services(void)
> +void __init efi_enter_virtual_mode(void)
> {
> u64 mapsize;
>
> if (!efi_enabled(EFI_BOOT)) {
> pr_info("EFI services will not be available.\n");
> - return 0;
> + return;
> }
>
> mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
>
> if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> pr_err("Failed to remap EFI memory map\n");
> - return 0;
> + return;
> }
>
> if (efi_runtime_disabled()) {
> pr_info("EFI runtime services will be disabled.\n");
> - return 0;
> + return;
> }
>
> if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> pr_info("EFI runtime services access via paravirt.\n");
> - return 0;
> + return;
> }
>
> pr_info("Remapping and enabling EFI services.\n");
>
> if (!efi_virtmap_init()) {
> pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
> - return -ENOMEM;
> + return;
> }
>
> /* Set up runtime services function pointers */
> efi_native_runtime_setup();
> set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> -
> - return 0;
> }
> -early_initcall(arm_enable_runtime_services);
>
> void efi_virtmap_load(void)
> {
> diff --git a/init/main.c b/init/main.c
> index 3b4ada11ed52..532fc0d02353 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -694,6 +694,9 @@ asmlinkage __visible void __init start_kernel(void)
> debug_objects_mem_init();
> setup_per_cpu_pageset();
> numa_policy_init();
> + if (IS_ENABLED(CONFIG_EFI) &&
> + (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)))
> + efi_enter_virtual_mode();
Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
a few lines later, *after* acpi_early_init() has been called.
The rest of the series looks fine to me, but I'm not comfortable taking
changes like this via the arm64 tree.
Will
On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
> Hi all,
>
> [Ard -- please can you look at the EFI parts of this patch]
>
> On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
>> Since arm_enter_runtime_services() was modified to always create a virtual
>> mapping of UEFI memory map in the previous patch, it is now renamed to
>> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
>> in acpi_early_init().
>>
>> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
>> mappings of ACPI tables using memory attributes described in UEFI memory
>> map.
>>
>> See a relevant commit:
>> arm64: acpi: fix alignment fault in accessing ACPI tables
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> ---
>> drivers/firmware/efi/arm-runtime.c | 15 ++++++---------
>> init/main.c | 3 +++
>> 2 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
>> index 30ac5c82051e..566ef0a9edb5 100644
>> --- a/drivers/firmware/efi/arm-runtime.c
>> +++ b/drivers/firmware/efi/arm-runtime.c
>> @@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
>> * non-early mapping of the UEFI system table and virtual mappings for all
>> * EFI_MEMORY_RUNTIME regions.
>> */
>> -static int __init arm_enable_runtime_services(void)
>> +void __init efi_enter_virtual_mode(void)
>> {
>> u64 mapsize;
>>
>> if (!efi_enabled(EFI_BOOT)) {
>> pr_info("EFI services will not be available.\n");
>> - return 0;
>> + return;
>> }
>>
>> mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
>>
>> if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
>> pr_err("Failed to remap EFI memory map\n");
>> - return 0;
>> + return;
>> }
>>
>> if (efi_runtime_disabled()) {
>> pr_info("EFI runtime services will be disabled.\n");
>> - return 0;
>> + return;
>> }
>>
>> if (efi_enabled(EFI_RUNTIME_SERVICES)) {
>> pr_info("EFI runtime services access via paravirt.\n");
>> - return 0;
>> + return;
>> }
>>
>> pr_info("Remapping and enabling EFI services.\n");
>>
>> if (!efi_virtmap_init()) {
>> pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
>> - return -ENOMEM;
>> + return;
>> }
>>
>> /* Set up runtime services function pointers */
>> efi_native_runtime_setup();
>> set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
>> -
>> - return 0;
>> }
>> -early_initcall(arm_enable_runtime_services);
>>
>> void efi_virtmap_load(void)
>> {
>> diff --git a/init/main.c b/init/main.c
>> index 3b4ada11ed52..532fc0d02353 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -694,6 +694,9 @@ asmlinkage __visible void __init start_kernel(void)
>> debug_objects_mem_init();
>> setup_per_cpu_pageset();
>> numa_policy_init();
>> + if (IS_ENABLED(CONFIG_EFI) &&
>> + (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)))
>> + efi_enter_virtual_mode();
>
> Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
> It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
> a few lines later, *after* acpi_early_init() has been called.
>
Currently, there is a gap where we have already torn down the early
mapping and haven't created the definitive mapping of the UEFI memory
map. There are other reasons why this is an issue, and I recently
proposed [0] myself to address one of them (and I didn't remember this
particular series, or the fact that I actually suggested this approach
IIRC)
Akashi-san, could you please confirm whether the patch below would be
sufficient for you? Apologies for going back and forth on this, but I
agree with Will that we should try to avoid warts like the one above
in generic code.
[0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
> The rest of the series looks fine to me, but I'm not comfortable taking
> changes like this via the arm64 tree.
>
> Will
On Wed, Jul 04, 2018 at 08:49:32PM +0200, Ard Biesheuvel wrote:
> On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
> > Hi all,
> >
> > [Ard -- please can you look at the EFI parts of this patch]
> >
> > On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
> >> Since arm_enter_runtime_services() was modified to always create a virtual
> >> mapping of UEFI memory map in the previous patch, it is now renamed to
> >> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
> >> in acpi_early_init().
> >>
> >> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
> >> mappings of ACPI tables using memory attributes described in UEFI memory
> >> map.
> >>
> >> See a relevant commit:
> >> arm64: acpi: fix alignment fault in accessing ACPI tables
> >>
> >> Signed-off-by: AKASHI Takahiro <[email protected]>
> >> Cc: Ard Biesheuvel <[email protected]>
> >> Cc: Andrew Morton <[email protected]>
> >> ---
> >> drivers/firmware/efi/arm-runtime.c | 15 ++++++---------
> >> init/main.c | 3 +++
> >> 2 files changed, 9 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
> >> index 30ac5c82051e..566ef0a9edb5 100644
> >> --- a/drivers/firmware/efi/arm-runtime.c
> >> +++ b/drivers/firmware/efi/arm-runtime.c
> >> @@ -106,46 +106,43 @@ static bool __init efi_virtmap_init(void)
> >> * non-early mapping of the UEFI system table and virtual mappings for all
> >> * EFI_MEMORY_RUNTIME regions.
> >> */
> >> -static int __init arm_enable_runtime_services(void)
> >> +void __init efi_enter_virtual_mode(void)
> >> {
> >> u64 mapsize;
> >>
> >> if (!efi_enabled(EFI_BOOT)) {
> >> pr_info("EFI services will not be available.\n");
> >> - return 0;
> >> + return;
> >> }
> >>
> >> mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> >>
> >> if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> >> pr_err("Failed to remap EFI memory map\n");
> >> - return 0;
> >> + return;
> >> }
> >>
> >> if (efi_runtime_disabled()) {
> >> pr_info("EFI runtime services will be disabled.\n");
> >> - return 0;
> >> + return;
> >> }
> >>
> >> if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> >> pr_info("EFI runtime services access via paravirt.\n");
> >> - return 0;
> >> + return;
> >> }
> >>
> >> pr_info("Remapping and enabling EFI services.\n");
> >>
> >> if (!efi_virtmap_init()) {
> >> pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
> >> - return -ENOMEM;
> >> + return;
> >> }
> >>
> >> /* Set up runtime services function pointers */
> >> efi_native_runtime_setup();
> >> set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> >> -
> >> - return 0;
> >> }
> >> -early_initcall(arm_enable_runtime_services);
> >>
> >> void efi_virtmap_load(void)
> >> {
> >> diff --git a/init/main.c b/init/main.c
> >> index 3b4ada11ed52..532fc0d02353 100644
> >> --- a/init/main.c
> >> +++ b/init/main.c
> >> @@ -694,6 +694,9 @@ asmlinkage __visible void __init start_kernel(void)
> >> debug_objects_mem_init();
> >> setup_per_cpu_pageset();
> >> numa_policy_init();
> >> + if (IS_ENABLED(CONFIG_EFI) &&
> >> + (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)))
> >> + efi_enter_virtual_mode();
> >
> > Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
> > It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
> > a few lines later, *after* acpi_early_init() has been called.
> >
>
> Currently, there is a gap where we have already torn down the early
> mapping and haven't created the definitive mapping of the UEFI memory
> map. There are other reasons why this is an issue, and I recently
> proposed [0] myself to address one of them (and I didn't remember this
> particular series, or the fact that I actually suggested this approach
> IIRC)
>
> Akashi-san, could you please confirm whether the patch below would be
> sufficient for you? Apologies for going back and forth on this, but I
> agree with Will that we should try to avoid warts like the one above
> in generic code.
>
> [0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
I think that this patch will also work.
Please drop my patch#2 and #3 if you want to pick up my patchset, Will.
Thanks,
-Takahiro AKASHI
> > The rest of the series looks fine to me, but I'm not comfortable taking
> > changes like this via the arm64 tree.
> >
> > Will
Hi Akashi,
On 05/07/18 10:43, AKASHI Takahiro wrote:
> On Wed, Jul 04, 2018 at 08:49:32PM +0200, Ard Biesheuvel wrote:
>> On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
>>> On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
>>>> Since arm_enter_runtime_services() was modified to always create a virtual
>>>> mapping of UEFI memory map in the previous patch, it is now renamed to
>>>> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
>>>> in acpi_early_init().
>>>>
>>>> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
>>>> mappings of ACPI tables using memory attributes described in UEFI memory
>>>> map.
>>> Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
>>> It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
>>> a few lines later, *after* acpi_early_init() has been called.
>> Currently, there is a gap where we have already torn down the early
>> mapping and haven't created the definitive mapping of the UEFI memory
>> map. There are other reasons why this is an issue, and I recently
>> proposed [0] myself to address one of them
>> Akashi-san, could you please confirm whether the patch below would be
>> sufficient for you? Apologies for going back and forth on this, but I
>> agree with Will that we should try to avoid warts like the one above
>> in generic code.
>>
>> [0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
>
> I think that this patch will also work.
> Please drop my patch#2 and #3 if you want to pick up my patchset, Will.
Patch 2 is what changes arm_enable_runtime_services() to map the efi memory map
before bailing out due to efi=noruntime.
Without it, 'efi=noruntime' means no-acpi-tables.
Thanks,
James
On Thu, Jul 05, 2018 at 12:02:15PM +0100, James Morse wrote:
> On 05/07/18 10:43, AKASHI Takahiro wrote:
> > On Wed, Jul 04, 2018 at 08:49:32PM +0200, Ard Biesheuvel wrote:
> >> On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
> >>> On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
> >>>> Since arm_enter_runtime_services() was modified to always create a virtual
> >>>> mapping of UEFI memory map in the previous patch, it is now renamed to
> >>>> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
> >>>> in acpi_early_init().
> >>>>
> >>>> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
> >>>> mappings of ACPI tables using memory attributes described in UEFI memory
> >>>> map.
>
> >>> Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
> >>> It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
> >>> a few lines later, *after* acpi_early_init() has been called.
>
> >> Currently, there is a gap where we have already torn down the early
> >> mapping and haven't created the definitive mapping of the UEFI memory
> >> map. There are other reasons why this is an issue, and I recently
> >> proposed [0] myself to address one of them
>
> >> Akashi-san, could you please confirm whether the patch below would be
> >> sufficient for you? Apologies for going back and forth on this, but I
> >> agree with Will that we should try to avoid warts like the one above
> >> in generic code.
> >>
> >> [0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
> >
> > I think that this patch will also work.
> > Please drop my patch#2 and #3 if you want to pick up my patchset, Will.
>
> Patch 2 is what changes arm_enable_runtime_services() to map the efi memory map
> before bailing out due to efi=noruntime.
>
> Without it, 'efi=noruntime' means no-acpi-tables.
So it sounds like we want patch 2. Akashi, given that this series is only
four patches, please can you send out a v3 with the stuff that should be
reviewed and merged? Otherwise, there's a real risk we end up with breakage
that goes unnoticed initially.
Thanks,
Will
On 19 June 2018 at 08:44, AKASHI Takahiro <[email protected]> wrote:
> Under the current implementation, UEFI memory map will be mapped and made
> available in virtual mappings only if runtime services are enabled.
> But in a later patch, we want to use UEFI memory map in acpi_os_ioremap()
> to create mappings of ACPI tables using memory attributes described in
> UEFI memory map.
>
> So, as a first step, arm_enter_runtime_services() will be modified
> so that UEFI memory map will be always accessible.
>
> See a relevant commit:
> arm64: acpi: fix alignment fault in accessing ACPI tables
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
This may be taken via the arm64 tree.
> ---
> drivers/firmware/efi/arm-runtime.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
> index 5889cbea60b8..30ac5c82051e 100644
> --- a/drivers/firmware/efi/arm-runtime.c
> +++ b/drivers/firmware/efi/arm-runtime.c
> @@ -115,6 +115,13 @@ static int __init arm_enable_runtime_services(void)
> return 0;
> }
>
> + mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> +
> + if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> + pr_err("Failed to remap EFI memory map\n");
> + return 0;
> + }
> +
> if (efi_runtime_disabled()) {
> pr_info("EFI runtime services will be disabled.\n");
> return 0;
> @@ -127,13 +134,6 @@ static int __init arm_enable_runtime_services(void)
>
> pr_info("Remapping and enabling EFI services.\n");
>
> - mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> -
> - if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> - pr_err("Failed to remap EFI memory map\n");
> - return -ENOMEM;
> - }
> -
> if (!efi_virtmap_init()) {
> pr_err("UEFI virtual mapping missing or invalid -- runtime services will not be available\n");
> return -ENOMEM;
> --
> 2.17.0
>
On 19 June 2018 at 08:44, AKASHI Takahiro <[email protected]> wrote:
> This is a fix against the issue that crash dump kernel may hang up
> during booting, which can happen on any ACPI-based system with "ACPI
> Reclaim Memory."
>
> (kernel messages after panic kicked off kdump)
> (snip...)
> Bye!
> (snip...)
> ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> Internal error: Oops: 96000021 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> task: ffff000008d05180 task.stack: ffff000008cc0000
> PC is at acpi_ns_lookup+0x25c/0x3c0
> LR is at acpi_ds_load1_begin_op+0xa4/0x294
> (snip...)
> Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> Call trace:
> (snip...)
> [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> ---[ end trace c46ed37f9651c58e ]---
> Kernel panic - not syncing: Fatal exception
> Rebooting in 10 seconds..
>
> (diagnosis)
> * This fault is a data abort, alignment fault (ESR=0x96000021)
> during reading out ACPI table.
> * Initial ACPI tables are normally stored in system ram and marked as
> "ACPI Reclaim memory" by the firmware.
> * After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
> memory as MEMBLOCK_NOMAP"), those regions are differently handled
> as they are "memblock-reserved", without NOMAP bit.
> * So they are now excluded from device tree's "usable-memory-range"
> which kexec-tools determines based on a current view of /proc/iomem.
> * When crash dump kernel boots up, it tries to accesses ACPI tables by
> mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
> since they are no longer part of mapped system ram.
> * Given that ACPI accessor/helper functions are compiled in without
> unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
> any unaligned access to ACPI tables can cause a fatal panic.
>
> With this patch, acpi_os_ioremap() always honors memory attribute
> information provided by the firmware (EFI) and retaining cacheability
> allows the kernel safe access to ACPI tables.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Suggested-by: James Morse <[email protected]>
> Suggested-by: Ard Biesheuvel <[email protected]>
> Reported-by and Tested-by: Bhupesh Sharma <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/include/asm/acpi.h | 23 ++++++++++++++++-------
> arch/arm64/kernel/acpi.c | 11 +++--------
> 2 files changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 0db62a4cbce2..68bc18cb2b85 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -12,10 +12,12 @@
> #ifndef _ASM_ACPI_H
> #define _ASM_ACPI_H
>
> +#include <linux/efi.h>
> #include <linux/memblock.h>
> #include <linux/psci.h>
>
> #include <asm/cputype.h>
> +#include <asm/io.h>
> #include <asm/smp_plat.h>
> #include <asm/tlbflush.h>
>
> @@ -29,18 +31,22 @@
>
> /* Basic configuration for ACPI */
> #ifdef CONFIG_ACPI
> +pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
> +
> /* ACPI table mapping after acpi_permanent_mmap is set */
> static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
> acpi_size size)
> {
> + /* For normal memory we already have a cacheable mapping. */
> + if (memblock_is_map_memory(phys))
> + return (void __iomem *)__phys_to_virt(phys);
> +
> /*
> - * EFI's reserve_regions() call adds memory with the WB attribute
> - * to memblock via early_init_dt_add_memory_arch().
> + * We should still honor the memory's attribute here because
> + * crash dump kernel possibly excludes some ACPI (reclaim)
> + * regions from memblock list.
> */
> - if (!memblock_is_memory(phys))
> - return ioremap(phys, size);
> -
> - return ioremap_cache(phys, size);
> + return __ioremap(phys, size, __acpi_get_mem_attribute(phys));
> }
> #define acpi_os_ioremap acpi_os_ioremap
>
> @@ -129,7 +135,10 @@ static inline const char *acpi_get_enable_method(int cpu)
> * for compatibility.
> */
> #define acpi_disable_cmcff 1
> -pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
> +static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
> +{
> + return __acpi_get_mem_attribute(addr);
> +}
> #endif /* CONFIG_ACPI_APEI */
>
> #ifdef CONFIG_ACPI_NUMA
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index 7b09487ff8fb..ed46dc188b22 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -18,6 +18,7 @@
> #include <linux/acpi.h>
> #include <linux/bootmem.h>
> #include <linux/cpumask.h>
> +#include <linux/efi.h>
> #include <linux/efi-bgrt.h>
> #include <linux/init.h>
> #include <linux/irq.h>
> @@ -29,13 +30,9 @@
>
> #include <asm/cputype.h>
> #include <asm/cpu_ops.h>
> +#include <asm/pgtable.h>
> #include <asm/smp_plat.h>
>
> -#ifdef CONFIG_ACPI_APEI
> -# include <linux/efi.h>
> -# include <asm/pgtable.h>
> -#endif
> -
> int acpi_noirq = 1; /* skip ACPI IRQ initialization */
> int acpi_disabled = 1;
> EXPORT_SYMBOL(acpi_disabled);
> @@ -239,8 +236,7 @@ void __init acpi_boot_table_init(void)
> }
> }
>
> -#ifdef CONFIG_ACPI_APEI
> -pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
> +pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
> {
> /*
> * According to "Table 8 Map: EFI memory types to AArch64 memory
> @@ -261,4 +257,3 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
> return __pgprot(PROT_NORMAL_NC);
> return __pgprot(PROT_DEVICE_nGnRnE);
> }
> -#endif
> --
> 2.17.0
>
On 19 June 2018 at 08:44, AKASHI Takahiro <[email protected]> wrote:
> From: James Morse <[email protected]>
>
> There has been some confusion around what is necessary to prevent kexec
> overwriting important memory regions. memblock: reserve, or nomap?
> Only memblock nomap regions are reported via /proc/iomem, kexec's
> user-space doesn't know about memblock_reserve()d regions.
>
> Until commit f56ab9a5b73ca ("efi/arm: Don't mark ACPI reclaim memory
> as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
> and thus possible for kexec to overwrite with the new kernel or initrd.
> But this was always broken, as the UEFI memory map is also reserved
> and not marked as nomap.
>
> Exporting both nomap and reserved memblock types is a nuisance as
> they live in different memblock structures which we can't walk at
> the same time.
>
> Take a second walk over memblock.reserved and add new 'reserved'
> subnodes for the memblock_reserved() regions that aren't already
> described by the existing code. (e.g. Kernel Code)
>
> We use reserve_region_with_split() to find the gaps in existing named
> regions. This handles the gap between 'kernel code' and 'kernel data'
> which is memblock_reserve()d, but already partially described by
> request_standard_resources(). e.g.:
> | 80000000-dfffffff : System RAM
> | 80080000-80ffffff : Kernel code
> | 81000000-8158ffff : reserved
> | 81590000-8237efff : Kernel data
> | a0000000-dfffffff : Crash kernel
> | e00f0000-f949ffff : System RAM
>
> reserve_region_with_split needs kzalloc() which isn't available when
> request_standard_resources() is called, use an initcall.
>
> Reported-by: Bhupesh Sharma <[email protected]>
> Reported-by: Tyler Baicar <[email protected]>
> Suggested-by: Akashi Takahiro <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support")
> CC: Ard Biesheuvel <[email protected]>
> CC: Mark Rutland <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/kernel/setup.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 30ad2f085d1f..5b4fac434c84 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -241,6 +241,44 @@ static void __init request_standard_resources(void)
> }
> }
>
> +static int __init reserve_memblock_reserved_regions(void)
> +{
> + phys_addr_t start, end, roundup_end = 0;
> + struct resource *mem, *res;
> + u64 i;
> +
> + for_each_reserved_mem_region(i, &start, &end) {
> + if (end <= roundup_end)
> + continue; /* done already */
> +
> + start = __pfn_to_phys(PFN_DOWN(start));
> + end = __pfn_to_phys(PFN_UP(end)) - 1;
> + roundup_end = end;
> +
> + res = kzalloc(sizeof(*res), GFP_ATOMIC);
> + if (WARN_ON(!res))
> + return -ENOMEM;
> + res->start = start;
> + res->end = end;
> + res->name = "reserved";
> + res->flags = IORESOURCE_MEM;
> +
> + mem = request_resource_conflict(&iomem_resource, res);
> + /*
> + * We expected memblock_reserve() regions to conflict with
> + * memory created by request_standard_resources().
> + */
> + if (WARN_ON_ONCE(!mem))
> + continue;
> + kfree(res);
> +
> + reserve_region_with_split(mem, start, end, "reserved");
> + }
> +
> + return 0;
> +}
> +arch_initcall(reserve_memblock_reserved_regions);
> +
> u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>
> void __init setup_arch(char **cmdline_p)
> --
> 2.17.0
>
On 5 July 2018 at 18:48, Will Deacon <[email protected]> wrote:
> On Thu, Jul 05, 2018 at 12:02:15PM +0100, James Morse wrote:
>> On 05/07/18 10:43, AKASHI Takahiro wrote:
>> > On Wed, Jul 04, 2018 at 08:49:32PM +0200, Ard Biesheuvel wrote:
>> >> On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
>> >>> On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
>> >>>> Since arm_enter_runtime_services() was modified to always create a virtual
>> >>>> mapping of UEFI memory map in the previous patch, it is now renamed to
>> >>>> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
>> >>>> in acpi_early_init().
>> >>>>
>> >>>> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
>> >>>> mappings of ACPI tables using memory attributes described in UEFI memory
>> >>>> map.
>>
>> >>> Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
>> >>> It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
>> >>> a few lines later, *after* acpi_early_init() has been called.
>>
>> >> Currently, there is a gap where we have already torn down the early
>> >> mapping and haven't created the definitive mapping of the UEFI memory
>> >> map. There are other reasons why this is an issue, and I recently
>> >> proposed [0] myself to address one of them
>>
>> >> Akashi-san, could you please confirm whether the patch below would be
>> >> sufficient for you? Apologies for going back and forth on this, but I
>> >> agree with Will that we should try to avoid warts like the one above
>> >> in generic code.
>> >>
>> >> [0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
>> >
>> > I think that this patch will also work.
>> > Please drop my patch#2 and #3 if you want to pick up my patchset, Will.
>>
>> Patch 2 is what changes arm_enable_runtime_services() to map the efi memory map
>> before bailing out due to efi=noruntime.
>>
>> Without it, 'efi=noruntime' means no-acpi-tables.
>
> So it sounds like we want patch 2. Akashi, given that this series is only
> four patches, please can you send out a v3 with the stuff that should be
> reviewed and merged? Otherwise, there's a real risk we end up with breakage
> that goes unnoticed initially.
>
Yes, we want patches #1, #2 and #4, and this one can be replaced with
my patch above. Everything can be taken via the arm64 tree as far as I
am concerned.
On Fri, Jul 06, 2018 at 12:31:49AM +0200, Ard Biesheuvel wrote:
> On 5 July 2018 at 18:48, Will Deacon <[email protected]> wrote:
> > On Thu, Jul 05, 2018 at 12:02:15PM +0100, James Morse wrote:
> >> On 05/07/18 10:43, AKASHI Takahiro wrote:
> >> > On Wed, Jul 04, 2018 at 08:49:32PM +0200, Ard Biesheuvel wrote:
> >> >> On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
> >> >>> On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
> >> >>>> Since arm_enter_runtime_services() was modified to always create a virtual
> >> >>>> mapping of UEFI memory map in the previous patch, it is now renamed to
> >> >>>> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
> >> >>>> in acpi_early_init().
> >> >>>>
> >> >>>> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
> >> >>>> mappings of ACPI tables using memory attributes described in UEFI memory
> >> >>>> map.
> >>
> >> >>> Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
> >> >>> It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
> >> >>> a few lines later, *after* acpi_early_init() has been called.
> >>
> >> >> Currently, there is a gap where we have already torn down the early
> >> >> mapping and haven't created the definitive mapping of the UEFI memory
> >> >> map. There are other reasons why this is an issue, and I recently
> >> >> proposed [0] myself to address one of them
> >>
> >> >> Akashi-san, could you please confirm whether the patch below would be
> >> >> sufficient for you? Apologies for going back and forth on this, but I
> >> >> agree with Will that we should try to avoid warts like the one above
> >> >> in generic code.
> >> >>
> >> >> [0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
> >> >
> >> > I think that this patch will also work.
> >> > Please drop my patch#2 and #3 if you want to pick up my patchset, Will.
> >>
> >> Patch 2 is what changes arm_enable_runtime_services() to map the efi memory map
> >> before bailing out due to efi=noruntime.
> >>
> >> Without it, 'efi=noruntime' means no-acpi-tables.
> >
> > So it sounds like we want patch 2. Akashi, given that this series is only
> > four patches, please can you send out a v3 with the stuff that should be
> > reviewed and merged? Otherwise, there's a real risk we end up with breakage
> > that goes unnoticed initially.
> >
>
> Yes, we want patches #1, #2 and #4, and this one can be replaced with
> my patch above. Everything can be taken via the arm64 tree as far as I
> am concerned.
I almost believed that my patch#2 was just a preparatory one for patch#3
where arm_enable_runtime_services() is moved aggressively forward.
But acpi_os_ioremap() is not a __init function and I can now agree to
keeping patch#2.
Meanwhile, the consequent code with Ard's patch would look like:
---8<---
static int __init arm_enable_runtime_services(void)
{
...
efi_memmap_unmap();
mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
pr_err("Failed to remap EFI memory map\n");
return 0;
}
...
}
--->8---
It seems to me that it makes no sense.
Is it okay to take them out?
-Takahiro AKASHI
On Fri, Jul 06, 2018 at 09:42:28AM +0900, AKASHI Takahiro wrote:
> On Fri, Jul 06, 2018 at 12:31:49AM +0200, Ard Biesheuvel wrote:
> > On 5 July 2018 at 18:48, Will Deacon <[email protected]> wrote:
> > > On Thu, Jul 05, 2018 at 12:02:15PM +0100, James Morse wrote:
> > >> On 05/07/18 10:43, AKASHI Takahiro wrote:
> > >> > On Wed, Jul 04, 2018 at 08:49:32PM +0200, Ard Biesheuvel wrote:
> > >> >> On 4 July 2018 at 19:06, Will Deacon <[email protected]> wrote:
> > >> >>> On Tue, Jun 19, 2018 at 03:44:23PM +0900, AKASHI Takahiro wrote:
> > >> >>>> Since arm_enter_runtime_services() was modified to always create a virtual
> > >> >>>> mapping of UEFI memory map in the previous patch, it is now renamed to
> > >> >>>> efi_enter_virtual_mode() and called earlier before acpi_load_tables()
> > >> >>>> in acpi_early_init().
> > >> >>>>
> > >> >>>> This will allow us to use UEFI memory map in acpi_os_ioremap() to create
> > >> >>>> mappings of ACPI tables using memory attributes described in UEFI memory
> > >> >>>> map.
> > >>
> > >> >>> Hmm, this is ugly as hell. Is there nothing else we can piggy-back off?
> > >> >>> It's also fairly jarring that, on x86, efi_enter_virtual_mode() is called
> > >> >>> a few lines later, *after* acpi_early_init() has been called.
> > >>
> > >> >> Currently, there is a gap where we have already torn down the early
> > >> >> mapping and haven't created the definitive mapping of the UEFI memory
> > >> >> map. There are other reasons why this is an issue, and I recently
> > >> >> proposed [0] myself to address one of them
> > >>
> > >> >> Akashi-san, could you please confirm whether the patch below would be
> > >> >> sufficient for you? Apologies for going back and forth on this, but I
> > >> >> agree with Will that we should try to avoid warts like the one above
> > >> >> in generic code.
> > >> >>
> > >> >> [0] https://marc.info/?l=linux-efi&m=152930773507524&w=2
> > >> >
> > >> > I think that this patch will also work.
> > >> > Please drop my patch#2 and #3 if you want to pick up my patchset, Will.
> > >>
> > >> Patch 2 is what changes arm_enable_runtime_services() to map the efi memory map
> > >> before bailing out due to efi=noruntime.
> > >>
> > >> Without it, 'efi=noruntime' means no-acpi-tables.
> > >
> > > So it sounds like we want patch 2. Akashi, given that this series is only
> > > four patches, please can you send out a v3 with the stuff that should be
> > > reviewed and merged? Otherwise, there's a real risk we end up with breakage
> > > that goes unnoticed initially.
> > >
> >
> > Yes, we want patches #1, #2 and #4, and this one can be replaced with
> > my patch above. Everything can be taken via the arm64 tree as far as I
> > am concerned.
>
> I almost believed that my patch#2 was just a preparatory one for patch#3
> where arm_enable_runtime_services() is moved aggressively forward.
> But acpi_os_ioremap() is not a __init function and I can now agree to
> keeping patch#2.
>
> Meanwhile, the consequent code with Ard's patch would look like:
> ---8<---
> static int __init arm_enable_runtime_services(void)
> {
> ...
> efi_memmap_unmap();
>
> mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
>
> if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> pr_err("Failed to remap EFI memory map\n");
> return 0;
> }
> ...
> }
> --->8---
> It seems to me that it makes no sense.
Oops, it does. Comments at efi_memmap_init_late() say:
---8<---
* The reason there are two EFI memmap initialisation
* (efi_memmap_init_early() and this late version) is because the
* early EFI memmap should be explicitly unmapped once EFI
* initialisation is complete as the fixmap space used to map the EFI
* memmap (via early_memremap()) is a scarce resource.
--->8---
> Is it okay to take them out?
Never mind.
>
> -Takahiro AKASHI
Hi Akashi,
On Fri, Jul 06, 2018 at 10:33:13AM +0900, AKASHI Takahiro wrote:
> On Fri, Jul 06, 2018 at 09:42:28AM +0900, AKASHI Takahiro wrote:
> > I almost believed that my patch#2 was just a preparatory one for patch#3
> > where arm_enable_runtime_services() is moved aggressively forward.
> > But acpi_os_ioremap() is not a __init function and I can now agree to
> > keeping patch#2.
> >
> > Meanwhile, the consequent code with Ard's patch would look like:
> > ---8<---
> > static int __init arm_enable_runtime_services(void)
> > {
> > ...
> > efi_memmap_unmap();
> >
> > mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
> >
> > if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
> > pr_err("Failed to remap EFI memory map\n");
> > return 0;
> > }
> > ...
> > }
> > --->8---
> > It seems to me that it makes no sense.
>
> Oops, it does. Comments at efi_memmap_init_late() say:
> ---8<---
> * The reason there are two EFI memmap initialisation
> * (efi_memmap_init_early() and this late version) is because the
> * early EFI memmap should be explicitly unmapped once EFI
> * initialisation is complete as the fixmap space used to map the EFI
> * memmap (via early_memremap()) is a scarce resource.
> --->8---
>
> > Is it okay to take them out?
>
> Never mind.
I'm struggling with your monologue...
Please can you send a v3 of the series, containing the patches that you
think are necessary, along with the Acks you've collected?
Thanks,
Will