2020-01-13 17:24:17

by Ard Biesheuvel

[permalink] [raw]
Subject: [GIT PULL 00/13] More EFI updates for v5.6

The following changes since commit 4444f8541dad16fefd9b8807ad1451e806ef1d94:

efi: Allow disabling PCI busmastering on bridges during boot (2020-01-10 18:55:04 +0100)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next

for you to fetch changes up to 743c5dd59f7e4b9e7a28be6a8f0e8d03271a98ab:

efi: Fix handling of multiple efi_fake_mem= entries (2020-01-13 10:41:53 +0100)

----------------------------------------------------------------
Third and final batch of EFI updates for v5.6:
- A few touchups for the x86 libstub changes that have already been queued
up.
- Fix memory leaks and crash bugs in the EFI fake_mem driver, which may be
used more heavily in the future for device-dax soft reservation (Dan)
- Avoid RWX mappings in the EFI page tables when possible.
- Avoid PCIe probing failures if the EFI framebuffer is probed first when
the new of_devlink feature is active.
- Move the support code for the old EFI memory mapping style into its only
user, the SGI UV1+ support code.

----------------------------------------------------------------
Anshuman Khandual (1):
efi: Fix comment for efi_mem_type() wrt absent physical addresses

Ard Biesheuvel (7):
efi/libstub/x86: use const attribute for efi_is_64bit()
efi/libstub/x86: use mandatory 16-byte stack alignment in mixed mode
x86/mm: fix NX bit clearing issue in kernel_map_pages_in_pgd
efi/x86: don't map the entire kernel text RW for mixed mode
efi/x86: avoid RWX mappings for all of DRAM
efi/x86: limit EFI old memory map to SGI UV machines
efi/arm: defer probe of PCIe backed efifb on DT systems

Arnd Bergmann (1):
efi/libstub/x86: fix unused-variable warning

Dan Williams (4):
efi: Add a flags parameter to efi_memory_map
efi: Add tracking for dynamically allocated memmaps
efi: Fix efi_memmap_alloc() leaks
efi: Fix handling of multiple efi_fake_mem= entries

Documentation/admin-guide/kernel-parameters.txt | 3 +-
arch/x86/boot/compressed/eboot.c | 16 +-
arch/x86/boot/compressed/efi_thunk_64.S | 46 ++----
arch/x86/boot/compressed/head_64.S | 7 +-
arch/x86/include/asm/efi.h | 28 ++--
arch/x86/kernel/kexec-bzimage64.c | 2 +-
arch/x86/mm/pat/set_memory.c | 8 +-
arch/x86/platform/efi/efi.c | 40 +++--
arch/x86/platform/efi/efi_64.c | 190 ++++--------------------
arch/x86/platform/efi/quirks.c | 44 +++---
arch/x86/platform/uv/bios_uv.c | 164 +++++++++++++++++++-
drivers/firmware/efi/arm-init.c | 107 ++++++++++++-
drivers/firmware/efi/efi.c | 2 +-
drivers/firmware/efi/fake_mem.c | 43 +++---
drivers/firmware/efi/memmap.c | 95 ++++++++----
include/linux/efi.h | 17 ++-
16 files changed, 471 insertions(+), 341 deletions(-)


2020-01-13 17:24:27

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 02/13] efi/libstub/x86: use mandatory 16-byte stack alignment in mixed mode

Reduce the stack frame of the EFI stub's mixed mode thunk routine by
8 bytes, by moving the GDT and return addresses to EBP and EBX, which
we need to preserve anyway, since their top halves will be cleared by
the call into 32-bit firmware code. Doing so results in the UEFI code
being entered with a 16 byte aligned stack, as mandated by the UEFI
spec, fixing the last occurrence in the 64-bit kernel where we violate
this requirement.

Also, move the saved GDT from a global variable to an unused part of the
stack frame, and touch up some other parts of the code.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/efi_thunk_64.S | 46 +++++++------------------
1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/arch/x86/boot/compressed/efi_thunk_64.S b/arch/x86/boot/compressed/efi_thunk_64.S
index d040ff5458e5..8fb7f6799c52 100644
--- a/arch/x86/boot/compressed/efi_thunk_64.S
+++ b/arch/x86/boot/compressed/efi_thunk_64.S
@@ -27,12 +27,9 @@ SYM_FUNC_START(__efi64_thunk)
push %rbp
push %rbx

- subq $8, %rsp
- leaq 1f(%rip), %rax
- movl %eax, 4(%rsp)
- leaq efi_gdt64(%rip), %rax
- movl %eax, (%rsp)
- movl %eax, 2(%rax) /* Fixup the gdt base address */
+ leaq 1f(%rip), %rbp
+ leaq efi_gdt64(%rip), %rbx
+ movl %ebx, 2(%rbx) /* Fixup the gdt base address */

movl %ds, %eax
push %rax
@@ -48,12 +45,10 @@ SYM_FUNC_START(__efi64_thunk)
movl %esi, 0x0(%rsp)
movl %edx, 0x4(%rsp)
movl %ecx, 0x8(%rsp)
- movq %r8, %rsi
- movl %esi, 0xc(%rsp)
- movq %r9, %rsi
- movl %esi, 0x10(%rsp)
+ movl %r8d, 0xc(%rsp)
+ movl %r9d, 0x10(%rsp)

- sgdt save_gdt(%rip)
+ sgdt 0x14(%rsp)

/*
* Switch to gdt with 32-bit segments. This is the firmware GDT
@@ -68,11 +63,10 @@ SYM_FUNC_START(__efi64_thunk)
pushq %rax
lretq

-1: addq $32, %rsp
+1: lgdt 0x14(%rsp)
+ addq $32, %rsp
movq %rdi, %rax

- lgdt save_gdt(%rip)
-
pop %rbx
movl %ebx, %ss
pop %rbx
@@ -83,15 +77,9 @@ SYM_FUNC_START(__efi64_thunk)
/*
* Convert 32-bit status code into 64-bit.
*/
- test %rax, %rax
- jz 1f
- movl %eax, %ecx
- andl $0x0fffffff, %ecx
- andl $0xf0000000, %eax
- shl $32, %rax
- or %rcx, %rax
-1:
- addq $8, %rsp
+ roll $1, %eax
+ rorq $1, %rax
+
pop %rbx
pop %rbp
ret
@@ -135,9 +123,7 @@ SYM_FUNC_START_LOCAL(efi_enter32)
*/
cli

- movl 56(%esp), %eax
- movl %eax, 2(%eax)
- lgdtl (%eax)
+ lgdtl (%ebx)

movl %cr4, %eax
btsl $(X86_CR4_PAE_BIT), %eax
@@ -154,9 +140,8 @@ SYM_FUNC_START_LOCAL(efi_enter32)
xorl %eax, %eax
lldt %ax

- movl 60(%esp), %eax
pushl $__KERNEL_CS
- pushl %eax
+ pushl %ebp

/* Enable paging */
movl %cr0, %eax
@@ -172,11 +157,6 @@ SYM_DATA_START(efi32_boot_gdt)
.quad 0
SYM_DATA_END(efi32_boot_gdt)

-SYM_DATA_START_LOCAL(save_gdt)
- .word 0
- .quad 0
-SYM_DATA_END(save_gdt)
-
SYM_DATA_START(efi_gdt64)
.word efi_gdt64_end - efi_gdt64
.long 0 /* Filled out by user */
--
2.20.1

2020-01-13 17:24:40

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 04/13] x86/mm: fix NX bit clearing issue in kernel_map_pages_in_pgd

Commit 15f003d20782 ("x86/mm/pat: Don't implicitly allow _PAGE_RW in
kernel_map_pages_in_pgd()") modified kernel_map_pages_in_pgd() to
manage writable permissions of memory mappings in the EFI page
table in a different way, but in the process, it removed the
ability to clear NX attributes from read-only mappings, by
clobbering the clear mask if _PAGE_RW is not being requested.

Failure to remove the NX attribute from read-only mappings is
unlikely to be a security issue, but it does prevent us from
tightening the permissions in the EFI page tables going forward,
so let's fix it now.

Fixes: 15f003d20782 ("x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/mm/pat/set_memory.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 20823392f4f2..62a8ebe72a52 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2215,7 +2215,7 @@ int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
.pgd = pgd,
.numpages = numpages,
.mask_set = __pgprot(0),
- .mask_clr = __pgprot(0),
+ .mask_clr = __pgprot(~page_flags & (_PAGE_NX|_PAGE_RW)),
.flags = 0,
};

@@ -2224,12 +2224,6 @@ int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
if (!(__supported_pte_mask & _PAGE_NX))
goto out;

- if (!(page_flags & _PAGE_NX))
- cpa.mask_clr = __pgprot(_PAGE_NX);
-
- if (!(page_flags & _PAGE_RW))
- cpa.mask_clr = __pgprot(_PAGE_RW);
-
if (!(page_flags & _PAGE_ENC))
cpa.mask_clr = pgprot_encrypted(cpa.mask_clr);

--
2.20.1

2020-01-13 17:24:52

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 09/13] efi: Fix comment for efi_mem_type() wrt absent physical addresses

From: Anshuman Khandual <[email protected]>

A previous commit f99afd08a45f ("efi: Update efi_mem_type() to return an
error rather than 0") changed the return value from EFI_RESERVED_TYPE to
-EINVAL when the searched physical address is not present in any memory
descriptor. But the comment preceding the function never changed. Let's
change the comment now to reflect the new return value -EINVAL.

Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/efi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 2b02cb165f16..621220ab3d0e 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -908,7 +908,7 @@ u64 efi_mem_attributes(unsigned long phys_addr)
*
* Search in the EFI memory map for the region covering @phys_addr.
* Returns the EFI memory type if the region was found in the memory
- * map, EFI_RESERVED_TYPE (zero) otherwise.
+ * map, -EINVAL otherwise.
*/
int efi_mem_type(unsigned long phys_addr)
{
--
2.20.1

2020-01-13 17:24:58

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 12/13] efi: Fix efi_memmap_alloc() leaks

From: Dan Williams <[email protected]>

With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
updated and replaced multiple times. When that happens a previous
dynamically allocated efi memory map can be garbage collected. Use the
new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
allocated memory map is being replaced.

Debug statements in efi_memmap_free() reveal:

efi: __efi_memmap_free:37: phys: 0x23ffdd580 size: 2688 flags: 0x2
efi: __efi_memmap_free:37: phys: 0x9db00 size: 2640 flags: 0x2
efi: __efi_memmap_free:37: phys: 0x9e580 size: 2640 flags: 0x2

...a savings of 7968 bytes on a qemu boot with 2 entries specified to
efi_fake_mem=.

[ ardb: added a comment to clarify that efi_memmap_free() does nothing when
called from efi_clean_memmap(), i.e., with data->flags == 0x0 ]

Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/memmap.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 04dfa56b994b..501672166502 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
return PFN_PHYS(page_to_pfn(p));
}

+static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+{
+ if (flags & EFI_MEMMAP_MEMBLOCK) {
+ if (slab_is_available())
+ memblock_free_late(phys, size);
+ else
+ memblock_free(phys, size);
+ } else if (flags & EFI_MEMMAP_SLAB) {
+ struct page *p = pfn_to_page(PHYS_PFN(phys));
+ unsigned int order = get_order(size);
+
+ free_pages((unsigned long) page_address(p), order);
+ }
+}
+
+static void __init efi_memmap_free(void)
+{
+ __efi_memmap_free(efi.memmap.phys_map,
+ efi.memmap.desc_size * efi.memmap.nr_map,
+ efi.memmap.flags);
+}
+
/**
* efi_memmap_alloc - Allocate memory for the EFI memory map
* @num_entries: Number of entries in the allocated map.
@@ -100,6 +122,9 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
return -ENOMEM;
}

+ /* NOP if data->flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB) == 0 */
+ efi_memmap_free();
+
map.phys_map = data->phys_map;
map.nr_map = data->size / data->desc_size;
map.map_end = map.map + data->size;
--
2.20.1

2020-01-13 17:25:01

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 13/13] efi: Fix handling of multiple efi_fake_mem= entries

From: Dan Williams <[email protected]>

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. An empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map. When
that happens efi_memmap_insert() may overrun its allocation, and if you
are lucky will spill over to an unmapped page leading to crash
signature like the following rather than silent corruption:

BUG: unable to handle page fault for address: ffffffffff281000
[..]
RIP: 0010:efi_memmap_insert+0x11d/0x191
[..]
Call Trace:
? bgrt_init+0xbe/0xbe
? efi_arch_mem_reserve+0x1cb/0x228
? acpi_parse_bgrt+0xa/0xd
? acpi_table_parse+0x86/0xb8
? acpi_boot_init+0x494/0x4e3
? acpi_parse_x2apic+0x87/0x87
? setup_acpi_sci+0xa2/0xa2
? setup_arch+0x8db/0x9e1
? start_kernel+0x6a/0x547
? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" introduced more occurrences where
efi_memmap_insert() is invoked after an efi_fake_mem= configuration has
been parsed. Previously the side effects of vestigial empty entries were
benign, but with commit af1648984828 that follow-on efi_memmap_insert()
invocation triggers efi_memmap_insert() overruns.

Link: https://lore.kernel.org/r/[email protected]
Reported-by: Dave Young <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/fake_mem.c | 31 ++++++++++++++++---------------
drivers/firmware/efi/memmap.c | 2 +-
include/linux/efi.h | 2 ++
3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index a8d20568d532..6e0f34a38171 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -34,25 +34,16 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
return 0;
}

-void __init efi_fake_memmap(void)
+static void __init efi_fake_range(struct efi_mem_range *efi_range)
{
struct efi_memory_map_data data = { 0 };
int new_nr_map = efi.memmap.nr_map;
efi_memory_desc_t *md;
void *new_memmap;
- int i;
-
- if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
- return;

/* count up the number of EFI memory descriptor */
- for (i = 0; i < nr_fake_mem; i++) {
- for_each_efi_memory_desc(md) {
- struct range *r = &efi_fake_mems[i].range;
-
- new_nr_map += efi_memmap_split_count(md, r);
- }
- }
+ for_each_efi_memory_desc(md)
+ new_nr_map += efi_memmap_split_count(md, &efi_range->range);

/* allocate memory for new EFI memmap */
if (efi_memmap_alloc(new_nr_map, &data) != 0)
@@ -61,17 +52,27 @@ void __init efi_fake_memmap(void)
/* create new EFI memmap */
new_memmap = early_memremap(data.phys_map, data.size);
if (!new_memmap) {
- memblock_free(data.phys_map, data.size);
+ __efi_memmap_free(data.phys_map, data.size, data.flags);
return;
}

- for (i = 0; i < nr_fake_mem; i++)
- efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
+ efi_memmap_insert(&efi.memmap, new_memmap, efi_range);

/* swap into new EFI memmap */
early_memunmap(new_memmap, data.size);

efi_memmap_install(&data);
+}
+
+void __init efi_fake_memmap(void)
+{
+ int i;
+
+ if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
+ return;
+
+ for (i = 0; i < nr_fake_mem; i++)
+ efi_fake_range(&efi_fake_mems[i]);

/* print new EFI memmap */
efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 501672166502..2ff1883dc788 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,7 +29,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
return PFN_PHYS(page_to_pfn(p));
}

-static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
{
if (flags & EFI_MEMMAP_MEMBLOCK) {
if (slab_is_available())
diff --git a/include/linux/efi.h b/include/linux/efi.h
index adbe421835c1..7efd7072cca5 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -976,6 +976,8 @@ extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);

extern int __init efi_memmap_alloc(unsigned int num_entries,
struct efi_memory_map_data *data);
+extern void __efi_memmap_free(u64 phys, unsigned long size,
+ unsigned long flags);
extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
extern void __init efi_memmap_unmap(void);
--
2.20.1

2020-01-13 17:25:07

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 03/13] efi/libstub/x86: fix unused-variable warning

From: Arnd Bergmann <[email protected]>

The only users of these got removed, so they also need to be
removed to avoid warnings:

arch/x86/boot/compressed/eboot.c: In function 'setup_efi_pci':
arch/x86/boot/compressed/eboot.c:117:16: error: unused variable 'nr_pci' [-Werror=unused-variable]
unsigned long nr_pci;
^~~~~~
arch/x86/boot/compressed/eboot.c: In function 'setup_uga':
arch/x86/boot/compressed/eboot.c:244:16: error: unused variable 'nr_ugas' [-Werror=unused-variable]
unsigned long nr_ugas;
^~~~~~~

Fixes: 2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/eboot.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index ab3a40283db7..82e26d0ff075 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -118,7 +118,6 @@ static void setup_efi_pci(struct boot_params *params)
void **pci_handle = NULL;
efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
unsigned long size = 0;
- unsigned long nr_pci;
struct setup_data *data;
efi_handle_t h;
int i;
@@ -245,7 +244,6 @@ setup_uga(struct screen_info *si, efi_guid_t *uga_proto, unsigned long size)
u32 width, height;
void **uga_handle = NULL;
efi_uga_draw_protocol_t *uga = NULL, *first_uga;
- unsigned long nr_ugas;
efi_handle_t handle;
int i;

--
2.20.1

2020-01-13 17:25:38

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 01/13] efi/libstub/x86: use const attribute for efi_is_64bit()

Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
function with the 'const' attribute, which permits the compiler to
optimize away any redundant calls. Since we have two different entry
points for 32 and 64 bit firmware in the startup code, this also
simplifies the C code since we'll enter it with the efi_is64 variable
already set.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/eboot.c | 14 ++++++--------
arch/x86/boot/compressed/head_64.S | 7 +++----
arch/x86/include/asm/efi.h | 2 +-
3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 4afd29eb5b34..ab3a40283db7 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -21,16 +21,18 @@
#include "eboot.h"

static efi_system_table_t *sys_table;
-static bool efi_is64 = IS_ENABLED(CONFIG_X86_64);
+extern const bool efi_is64;

__pure efi_system_table_t *efi_system_table(void)
{
return sys_table;
}

-__pure bool efi_is_64bit(void)
+__attribute_const__ bool efi_is_64bit(void)
{
- return efi_is64;
+ if (IS_ENABLED(CONFIG_EFI_MIXED))
+ return efi_is64;
+ return IS_ENABLED(CONFIG_X64_64);
}

static efi_status_t
@@ -710,8 +712,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
*/
struct boot_params *efi_main(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
- struct boot_params *boot_params,
- bool is64)
+ struct boot_params *boot_params)
{
struct desc_ptr *gdt = NULL;
struct setup_header *hdr = &boot_params->hdr;
@@ -721,9 +722,6 @@ struct boot_params *efi_main(efi_handle_t handle,

sys_table = sys_table_arg;

- if (IS_ENABLED(CONFIG_EFI_MIXED))
- efi_is64 = is64;
-
/* Check if we were booted by the EFI firmware */
if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
goto fail;
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 44a6bb6964b5..1f1f6c8139b3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -211,10 +211,9 @@ SYM_FUNC_START(startup_32)
movl efi32_boot_args(%ebp), %edi
cmp $0, %edi
jz 1f
- leal handover_entry(%ebp), %eax
+ leal efi64_stub_entry(%ebp), %eax
movl %esi, %edx
movl efi32_boot_args+4(%ebp), %esi
- movl $0x0, %ecx
1:
#endif
pushl %eax
@@ -242,6 +241,7 @@ SYM_FUNC_START(efi32_stub_entry)
movl %ecx, efi32_boot_args(%ebp)
movl %edx, efi32_boot_args+4(%ebp)
sgdtl efi32_boot_gdt(%ebp)
+ movb $0, efi_is64(%ebp)

/* Disable paging */
movl %cr0, %eax
@@ -452,8 +452,6 @@ SYM_CODE_END(startup_64)
.org 0x390
SYM_FUNC_START(efi64_stub_entry)
SYM_FUNC_START_ALIAS(efi_stub_entry)
- movq $1, %rcx
-SYM_INNER_LABEL(handover_entry, SYM_L_LOCAL)
and $~0xf, %rsp /* realign the stack */
call efi_main
movq %rax,%rsi
@@ -632,6 +630,7 @@ SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)

#ifdef CONFIG_EFI_MIXED
SYM_DATA_LOCAL(efi32_boot_args, .long 0, 0)
+SYM_DATA(efi_is64, .byte 1)
#endif

/*
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 383f7a0fc170..0a58468a7203 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -221,7 +221,7 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,

/* arch specific definitions used by the stub code */

-__pure bool efi_is_64bit(void);
+__attribute_const__ bool efi_is_64bit(void);

static inline bool efi_is_native(void)
{
--
2.20.1

2020-01-13 17:25:42

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 06/13] efi/x86: avoid RWX mappings for all of DRAM

The EFI code creates RWX mappings for all memory regions that are
occupied after the stub completes, and in the mixed mode case, it
even creates RWX mappings for all of the remaining DRAM as well.

Let's try to avoid this, by setting the NX bit for all memory
regions except the ones that are marked as EFI runtime services
code [which means text+rodata+data in practice, so we cannot mark
them read-only right away]. For cases of buggy firmware where boot
services code is called during SetVirtualAddressMap(), map those
regions with exec permissions as well - they will be unmapped in
efi_free_boot_services().

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/platform/efi/efi_64.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6ec58ff60b56..3eb23966e30a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -365,10 +365,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
* as trim_bios_range() will reserve the first page and isolate it away
* from memory allocators anyway.
*/
- pf = _PAGE_RW;
- if (sev_active())
- pf |= _PAGE_ENC;
-
if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
@@ -410,6 +406,22 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
unsigned long pfn;
pgd_t *pgd = efi_mm.pgd;

+ /*
+ * EFI_RUNTIME_SERVICES_CODE regions typically cover PE/COFF
+ * executable images in memory that consist of both R-X and
+ * RW- sections, so we cannot apply read-only or non-exec
+ * permissions just yet. However, modern EFI systems provide
+ * a memory attributes table that describes those sections
+ * with the appropriate restricted permissions, which are
+ * applied in efi_runtime_update_mappings() below. All other
+ * regions can be mapped non-executable at this point, with
+ * the exception of boot services code regions, but those will
+ * be unmapped again entirely in efi_free_boot_services().
+ */
+ if (md->type != EFI_BOOT_SERVICES_CODE &&
+ md->type != EFI_RUNTIME_SERVICES_CODE)
+ flags |= _PAGE_NX;
+
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;

--
2.20.1

2020-01-13 17:25:43

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 08/13] efi/arm: defer probe of PCIe backed efifb on DT systems

The new of_devlink support breaks PCIe probing on ARM platforms booting
via UEFI if the firmware exposes a EFI framebuffer that is backed by a
PCI device. The reason is that the probing order gets reversed,
resulting in a resource conflict on the framebuffer memory window when
the PCIe probes last, causing it to give up entirely.

Given that we rely on PCI quirks to deal with EFI framebuffers that get
moved around in memory, we cannot simply drop the memory reservation, so
instead, let's use the device link infrastructure to register this
dependency, and force the probing to occur in the expected order.

Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/arm-init.c | 107 ++++++++++++++++++++++++++++++--
1 file changed, 103 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 904fa09e6a6b..d99f5b0c8a09 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -10,10 +10,12 @@
#define pr_fmt(fmt) "efi: " fmt

#include <linux/efi.h>
+#include <linux/fwnode.h>
#include <linux/init.h>
#include <linux/memblock.h>
#include <linux/mm_types.h>
#include <linux/of.h>
+#include <linux/of_address.h>
#include <linux/of_fdt.h>
#include <linux/platform_device.h>
#include <linux/screen_info.h>
@@ -276,15 +278,112 @@ void __init efi_init(void)
efi_memmap_unmap();
}

+static bool efifb_overlaps_pci_range(const struct of_pci_range *range)
+{
+ u64 fb_base = screen_info.lfb_base;
+
+ if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+ fb_base |= (u64)(unsigned long)screen_info.ext_lfb_base << 32;
+
+ return fb_base >= range->cpu_addr &&
+ fb_base < (range->cpu_addr + range->size);
+}
+
+static struct device_node *find_pci_overlap_node(void)
+{
+ struct device_node *np;
+
+ for_each_node_by_type(np, "pci") {
+ struct of_pci_range_parser parser;
+ struct of_pci_range range;
+ int err;
+
+ err = of_pci_range_parser_init(&parser, np);
+ if (err) {
+ pr_warn("of_pci_range_parser_init() failed: %d\n", err);
+ continue;
+ }
+
+ for_each_of_pci_range(&parser, &range)
+ if (efifb_overlaps_pci_range(&range))
+ return np;
+ }
+ return NULL;
+}
+
+/*
+ * If the efifb framebuffer is backed by a PCI graphics controller, we have
+ * to ensure that this relation is expressed using a device link when
+ * running in DT mode, or the probe order may be reversed, resulting in a
+ * resource reservation conflict on the memory window that the efifb
+ * framebuffer steals from the PCIe host bridge.
+ */
+static int efifb_add_links(const struct fwnode_handle *fwnode,
+ struct device *dev)
+{
+ struct device_node *sup_np;
+ struct device *sup_dev;
+
+ sup_np = find_pci_overlap_node();
+
+ /*
+ * If there's no PCI graphics controller backing the efifb, we are
+ * done here.
+ */
+ if (!sup_np)
+ return 0;
+
+ sup_dev = get_dev_from_fwnode(&sup_np->fwnode);
+ of_node_put(sup_np);
+
+ /*
+ * Return -ENODEV if the PCI graphics controller device hasn't been
+ * registered yet. This ensures that efifb isn't allowed to probe
+ * and this function is retried again when new devices are
+ * registered.
+ */
+ if (!sup_dev)
+ return -ENODEV;
+
+ /*
+ * If this fails, retrying this function at a later point won't
+ * change anything. So, don't return an error after this.
+ */
+ if (!device_link_add(dev, sup_dev, 0))
+ dev_warn(dev, "device_link_add() failed\n");
+
+ put_device(sup_dev);
+
+ return 0;
+}
+
+static const struct fwnode_operations efifb_fwnode_ops = {
+ .add_links = efifb_add_links,
+};
+
+static struct fwnode_handle efifb_fwnode = {
+ .ops = &efifb_fwnode_ops,
+};
+
static int __init register_gop_device(void)
{
- void *pd;
+ struct platform_device *pd;
+ int err;

if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
return 0;

- pd = platform_device_register_data(NULL, "efi-framebuffer", 0,
- &screen_info, sizeof(screen_info));
- return PTR_ERR_OR_ZERO(pd);
+ pd = platform_device_alloc("efi-framebuffer", 0);
+ if (!pd)
+ return -ENOMEM;
+
+ if (IS_ENABLED(CONFIG_PCI))
+ pd->dev.fwnode = &efifb_fwnode;
+
+ err = platform_device_add_data(pd, &screen_info, sizeof(screen_info));
+ if (err)
+ return err;
+
+ return platform_device_add(pd);
}
subsys_initcall(register_gop_device);
--
2.20.1

2020-01-13 17:25:45

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

The mixed mode thunking routine requires a part of it to be
mapped 1:1, and for this reason, we currently map the entire
kernel .text read/write in the EFI page tables, which is bad.

In fact, the kernel_map_pages_in_pgd() invocation that installs
this mapping is entirely redundant, since all of DRAM is already
1:1 mapped read/write in the EFI page tables when we reach this
point, which means that .rodata is mapped read-write as well.

So let's remap both .text and .rodata read-only in the EFI
page tables.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/platform/efi/efi_64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c13fa2150976..6ec58ff60b56 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -391,11 +391,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)

efi_scratch.phys_stack = page_to_phys(page + 1); /* stack grows down */

- npages = (_etext - _text) >> PAGE_SHIFT;
+ npages = (__end_rodata_aligned - _text) >> PAGE_SHIFT;
text = __pa(_text);
pfn = text >> PAGE_SHIFT;

- pf = _PAGE_RW | _PAGE_ENC;
+ pf = _PAGE_ENC;
if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
--
2.20.1

2020-01-13 17:25:52

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 10/13] efi: Add a flags parameter to efi_memory_map

From: Dan Williams <[email protected]>

In preparation for garbage collecting dynamically allocated efi memory
maps, where the allocation method of memblock vs slab needs to be
recalled, convert the existing 'late' flag into a 'flags' bitmask.

Arrange for the flag to be passed via 'struct efi_memory_map_data'. This
structure grows additional flags in follow-on changes.

Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/memmap.c | 31 +++++++++++++++++--------------
include/linux/efi.h | 4 +++-
2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 38b686c67b17..4f98eb12c172 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -52,21 +52,20 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
/**
* __efi_memmap_init - Common code for mapping the EFI memory map
* @data: EFI memory map data
- * @late: Use early or late mapping function?
*
* This function takes care of figuring out which function to use to
* map the EFI memory map in efi.memmap based on how far into the boot
* we are.
*
- * During bootup @late should be %false since we only have access to
- * the early_memremap*() functions as the vmalloc space isn't setup.
- * Once the kernel is fully booted we can fallback to the more robust
- * memremap*() API.
+ * During bootup EFI_MEMMAP_LATE in data->flags should be clear since we
+ * only have access to the early_memremap*() functions as the vmalloc
+ * space isn't setup. Once the kernel is fully booted we can fallback
+ * to the more robust memremap*() API.
*
* Returns zero on success, a negative error code on failure.
*/
static int __init
-__efi_memmap_init(struct efi_memory_map_data *data, bool late)
+__efi_memmap_init(struct efi_memory_map_data *data)
{
struct efi_memory_map map;
phys_addr_t phys_map;
@@ -76,7 +75,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)

phys_map = data->phys_map;

- if (late)
+ if (data->flags & EFI_MEMMAP_LATE)
map.map = memremap(phys_map, data->size, MEMREMAP_WB);
else
map.map = early_memremap(phys_map, data->size);
@@ -92,7 +91,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)

map.desc_version = data->desc_version;
map.desc_size = data->desc_size;
- map.late = late;
+ map.flags = data->flags;

set_bit(EFI_MEMMAP, &efi.flags);

@@ -111,9 +110,10 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
int __init efi_memmap_init_early(struct efi_memory_map_data *data)
{
/* Cannot go backwards */
- WARN_ON(efi.memmap.late);
+ WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);

- return __efi_memmap_init(data, false);
+ data->flags = 0;
+ return __efi_memmap_init(data);
}

void __init efi_memmap_unmap(void)
@@ -121,7 +121,7 @@ void __init efi_memmap_unmap(void)
if (!efi_enabled(EFI_MEMMAP))
return;

- if (!efi.memmap.late) {
+ if (!(efi.memmap.flags & EFI_MEMMAP_LATE)) {
unsigned long size;

size = efi.memmap.desc_size * efi.memmap.nr_map;
@@ -162,13 +162,14 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
struct efi_memory_map_data data = {
.phys_map = addr,
.size = size,
+ .flags = EFI_MEMMAP_LATE,
};

/* Did we forget to unmap the early EFI memmap? */
WARN_ON(efi.memmap.map);

/* Were we already called? */
- WARN_ON(efi.memmap.late);
+ WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);

/*
* It makes no sense to allow callers to register different
@@ -178,7 +179,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
data.desc_version = efi.memmap.desc_version;
data.desc_size = efi.memmap.desc_size;

- return __efi_memmap_init(&data, true);
+ return __efi_memmap_init(&data);
}

/**
@@ -195,6 +196,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
{
struct efi_memory_map_data data;
+ unsigned long flags;

efi_memmap_unmap();

@@ -202,8 +204,9 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
data.size = efi.memmap.desc_size * nr_map;
data.desc_version = efi.memmap.desc_version;
data.desc_size = efi.memmap.desc_size;
+ data.flags = efi.memmap.flags & EFI_MEMMAP_LATE;

- return __efi_memmap_init(&data, efi.memmap.late);
+ return __efi_memmap_init(&data);
}

/**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 7e8e25b1d11c..f117d68c314e 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -767,6 +767,7 @@ struct efi_memory_map_data {
unsigned long size;
unsigned long desc_version;
unsigned long desc_size;
+ unsigned long flags;
};

struct efi_memory_map {
@@ -776,7 +777,8 @@ struct efi_memory_map {
int nr_map;
unsigned long desc_version;
unsigned long desc_size;
- bool late;
+#define EFI_MEMMAP_LATE (1UL << 0)
+ unsigned long flags;
};

struct efi_mem_range {
--
2.20.1

2020-01-13 17:26:05

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 11/13] efi: Add tracking for dynamically allocated memmaps

From: Dan Williams <[email protected]>

In preparation for fixing efi_memmap_alloc() leaks, add support for
recording whether the memmap was dynamically allocated from slab,
memblock, or is the original physical memmap provided by the platform.

Given this tracking is established in efi_memmap_alloc() and needs to be
carried to efi_memmap_install(), use 'struct efi_memory_map_data' to
convey the flags.

Some small cleanups result from this reorganization, specifically the
removal of local variables for 'phys' and 'size' that are already
tracked in @data.

Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/platform/efi/efi.c | 10 +++++--
arch/x86/platform/efi/quirks.c | 23 +++++++---------
drivers/firmware/efi/fake_mem.c | 14 +++++-----
drivers/firmware/efi/memmap.c | 47 ++++++++++++++++++---------------
include/linux/efi.h | 11 +++++---
5 files changed, 56 insertions(+), 49 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 4e46d2d24352..59f7f6d60cf6 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -304,10 +304,16 @@ static void __init efi_clean_memmap(void)
}

if (n_removal > 0) {
- u64 size = efi.memmap.nr_map - n_removal;
+ struct efi_memory_map_data data = {
+ .phys_map = efi.memmap.phys_map,
+ .desc_version = efi.memmap.desc_version,
+ .desc_size = efi.memmap.desc_size,
+ .size = data.desc_size * (efi.memmap.nr_map - n_removal),
+ .flags = 0,
+ };

pr_warn("Removing %d invalid memory map entries.\n", n_removal);
- efi_memmap_install(efi.memmap.phys_map, size);
+ efi_memmap_install(&data);
}
}

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index fe46ddf6c761..46807b7606da 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -243,7 +243,7 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
*/
void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
{
- phys_addr_t new_phys, new_size;
+ struct efi_memory_map_data data = { 0 };
struct efi_mem_range mr;
efi_memory_desc_t md;
int num_entries;
@@ -271,24 +271,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
num_entries = efi_memmap_split_count(&md, &mr.range);
num_entries += efi.memmap.nr_map;

- new_size = efi.memmap.desc_size * num_entries;
-
- new_phys = efi_memmap_alloc(num_entries);
- if (!new_phys) {
+ if (efi_memmap_alloc(num_entries, &data) != 0) {
pr_err("Could not allocate boot services memmap\n");
return;
}

- new = early_memremap(new_phys, new_size);
+ new = early_memremap(data.phys_map, data.size);
if (!new) {
pr_err("Failed to map new boot services memmap\n");
return;
}

efi_memmap_insert(&efi.memmap, new, &mr);
- early_memunmap(new, new_size);
+ early_memunmap(new, data.size);

- efi_memmap_install(new_phys, num_entries);
+ efi_memmap_install(&data);
e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
e820__update_table(e820_table);
}
@@ -407,7 +404,7 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)

void __init efi_free_boot_services(void)
{
- phys_addr_t new_phys, new_size;
+ struct efi_memory_map_data data = { 0 };
efi_memory_desc_t *md;
int num_entries = 0;
void *new, *new_md;
@@ -462,14 +459,12 @@ void __init efi_free_boot_services(void)
if (!num_entries)
return;

- new_size = efi.memmap.desc_size * num_entries;
- new_phys = efi_memmap_alloc(num_entries);
- if (!new_phys) {
+ if (efi_memmap_alloc(num_entries, &data) != 0) {
pr_err("Failed to allocate new EFI memmap\n");
return;
}

- new = memremap(new_phys, new_size, MEMREMAP_WB);
+ new = memremap(data.phys_map, data.size, MEMREMAP_WB);
if (!new) {
pr_err("Failed to map new EFI memmap\n");
return;
@@ -493,7 +488,7 @@ void __init efi_free_boot_services(void)

memunmap(new);

- if (efi_memmap_install(new_phys, num_entries)) {
+ if (efi_memmap_install(&data) != 0) {
pr_err("Could not install new EFI memmap\n");
return;
}
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index bb9fc70d0cfa..a8d20568d532 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -36,9 +36,9 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)

void __init efi_fake_memmap(void)
{
+ struct efi_memory_map_data data = { 0 };
int new_nr_map = efi.memmap.nr_map;
efi_memory_desc_t *md;
- phys_addr_t new_memmap_phy;
void *new_memmap;
int i;

@@ -55,15 +55,13 @@ void __init efi_fake_memmap(void)
}

/* allocate memory for new EFI memmap */
- new_memmap_phy = efi_memmap_alloc(new_nr_map);
- if (!new_memmap_phy)
+ if (efi_memmap_alloc(new_nr_map, &data) != 0)
return;

/* create new EFI memmap */
- new_memmap = early_memremap(new_memmap_phy,
- efi.memmap.desc_size * new_nr_map);
+ new_memmap = early_memremap(data.phys_map, data.size);
if (!new_memmap) {
- memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
+ memblock_free(data.phys_map, data.size);
return;
}

@@ -71,9 +69,9 @@ void __init efi_fake_memmap(void)
efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);

/* swap into new EFI memmap */
- early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
+ early_memunmap(new_memmap, data.size);

- efi_memmap_install(new_memmap_phy, new_nr_map);
+ efi_memmap_install(&data);

/* print new EFI memmap */
efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 4f98eb12c172..04dfa56b994b 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -32,6 +32,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
/**
* efi_memmap_alloc - Allocate memory for the EFI memory map
* @num_entries: Number of entries in the allocated map.
+ * @data: efi memmap installation parameters
*
* Depending on whether mm_init() has already been invoked or not,
* either memblock or "normal" page allocation is used.
@@ -39,14 +40,29 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
* Returns the physical address of the allocated memory map on
* success, zero on failure.
*/
-phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+int __init efi_memmap_alloc(unsigned int num_entries,
+ struct efi_memory_map_data *data)
{
- unsigned long size = num_entries * efi.memmap.desc_size;
-
- if (slab_is_available())
- return __efi_memmap_alloc_late(size);
+ /* Expect allocation parameters are zero initialized */
+ WARN_ON(data->phys_map || data->size);
+
+ data->size = num_entries * efi.memmap.desc_size;
+ data->desc_version = efi.memmap.desc_version;
+ data->desc_size = efi.memmap.desc_size;
+ data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK);
+ data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
+
+ if (slab_is_available()) {
+ data->flags |= EFI_MEMMAP_SLAB;
+ data->phys_map = __efi_memmap_alloc_late(data->size);
+ } else {
+ data->flags |= EFI_MEMMAP_MEMBLOCK;
+ data->phys_map = __efi_memmap_alloc_early(data->size);
+ }

- return __efi_memmap_alloc_early(size);
+ if (!data->phys_map)
+ return -ENOMEM;
+ return 0;
}

/**
@@ -64,8 +80,7 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
*
* Returns zero on success, a negative error code on failure.
*/
-static int __init
-__efi_memmap_init(struct efi_memory_map_data *data)
+static int __init __efi_memmap_init(struct efi_memory_map_data *data)
{
struct efi_memory_map map;
phys_addr_t phys_map;
@@ -184,8 +199,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)

/**
* efi_memmap_install - Install a new EFI memory map in efi.memmap
- * @addr: Physical address of the memory map
- * @nr_map: Number of entries in the memory map
+ * @ctx: map allocation parameters (address, size, flags)
*
* Unlike efi_memmap_init_*(), this function does not allow the caller
* to switch from early to late mappings. It simply uses the existing
@@ -193,20 +207,11 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
*
* Returns zero on success, a negative error code on failure.
*/
-int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
+int __init efi_memmap_install(struct efi_memory_map_data *data)
{
- struct efi_memory_map_data data;
- unsigned long flags;
-
efi_memmap_unmap();

- data.phys_map = addr;
- data.size = efi.memmap.desc_size * nr_map;
- data.desc_version = efi.memmap.desc_version;
- data.desc_size = efi.memmap.desc_size;
- data.flags = efi.memmap.flags & EFI_MEMMAP_LATE;
-
- return __efi_memmap_init(&data);
+ return __efi_memmap_init(data);
}

/**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f117d68c314e..adbe421835c1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -759,8 +759,8 @@ typedef union {

/*
* Architecture independent structure for describing a memory map for the
- * benefit of efi_memmap_init_early(), saving us the need to pass four
- * parameters.
+ * benefit of efi_memmap_init_early(), and for passing context between
+ * efi_memmap_alloc() and efi_memmap_install().
*/
struct efi_memory_map_data {
phys_addr_t phys_map;
@@ -778,6 +778,8 @@ struct efi_memory_map {
unsigned long desc_version;
unsigned long desc_size;
#define EFI_MEMMAP_LATE (1UL << 0)
+#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
+#define EFI_MEMMAP_SLAB (1UL << 2)
unsigned long flags;
};

@@ -972,11 +974,12 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
#endif
extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);

-extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
+extern int __init efi_memmap_alloc(unsigned int num_entries,
+ struct efi_memory_map_data *data);
extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
extern void __init efi_memmap_unmap(void);
-extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
+extern int __init efi_memmap_install(struct efi_memory_map_data *data);
extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
struct range *range);
extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
--
2.20.1

2020-01-13 17:26:24

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 07/13] efi/x86: limit EFI old memory map to SGI UV machines

We carry a quirk in the x86 EFI code to switch back to an older
method of mapping the EFI runtime services memory regions, because
it was deemed risky at the time to implement a new method without
providing a fallback to the old method in case problems arose.

Such problems did arise, but they appear to be limited to SGI UV1
machines, and so these are the only ones for which the fallback gets
enabled automatically (via a DMI quirk). The fallback can be enabled
manually as well, by passing efi=old_map, but there is very little
evidence that suggests that this is something that is being relied
upon in the field.

Given that UV1 support is not enabled by default by the distros
(Ubuntu, Fedora), there is no point in carrying this fallback code
all the time if there are no other users. So let's move it into the
UV support code, and document that efi=old_map now requires this
support code to be enabled.

Note that efi=old_map has been used in the past on other SGI UV
machines to work around kernel regressions in production, so we
keep the option to enable it by hand, but only if the kernel was
built with UV support.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 3 +-
arch/x86/include/asm/efi.h | 26 +--
arch/x86/kernel/kexec-bzimage64.c | 2 +-
arch/x86/platform/efi/efi.c | 30 ++--
arch/x86/platform/efi/efi_64.c | 166 +-----------------
arch/x86/platform/efi/quirks.c | 21 ++-
arch/x86/platform/uv/bios_uv.c | 164 ++++++++++++++++-
7 files changed, 211 insertions(+), 201 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 994632ae48de..e5f043f0342a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1168,8 +1168,7 @@
"nosoftreserve", "disable_early_pci_dma",
"no_disable_early_pci_dma" }
old_map [X86-64]: switch to the old ioremap-based EFI
- runtime services mapping. 32-bit still uses this one by
- default.
+ runtime services mapping. [Needs CONFIG_X86_UV=y]
nochunk: disable reading files in "chunks" in the EFI
boot stub, as chunking can cause problems with some
firmware implementations.
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0a58468a7203..86169a24b0d8 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -20,13 +20,16 @@
* This is the main reason why we're doing stable VA mappings for RT
* services.
*
- * This flag is used in conjunction with a chicken bit called
- * "efi=old_map" which can be used as a fallback to the old runtime
- * services mapping method in case there's some b0rkage with a
- * particular EFI implementation (haha, it is hard to hold up the
- * sarcasm here...).
+ * SGI UV1 machines are known to be incompatible with this scheme, so we
+ * provide an opt-out for these machines via a DMI quirk that sets the
+ * attribute below.
*/
-#define EFI_OLD_MEMMAP EFI_ARCH_1
+#define EFI_UV1_MEMMAP EFI_ARCH_1
+
+static inline bool efi_have_uv1_memmap(void)
+{
+ return IS_ENABLED(CONFIG_X86_UV) && efi_enabled(EFI_UV1_MEMMAP);
+}

#define EFI32_LOADER_SIGNATURE "EL32"
#define EFI64_LOADER_SIGNATURE "EL64"
@@ -119,7 +122,7 @@ struct efi_scratch {
kernel_fpu_begin(); \
firmware_restrict_branch_speculation_start(); \
\
- if (!efi_enabled(EFI_OLD_MEMMAP)) \
+ if (!efi_have_uv1_memmap()) \
efi_switch_mm(&efi_mm); \
})

@@ -128,7 +131,7 @@ struct efi_scratch {

#define arch_efi_call_virt_teardown() \
({ \
- if (!efi_enabled(EFI_OLD_MEMMAP)) \
+ if (!efi_have_uv1_memmap()) \
efi_switch_mm(efi_scratch.prev_mm); \
\
firmware_restrict_branch_speculation_end(); \
@@ -172,6 +175,8 @@ extern void efi_delete_dummy_variable(void);
extern void efi_switch_mm(struct mm_struct *mm);
extern void efi_recover_from_page_fault(unsigned long phys_addr);
extern void efi_free_boot_services(void);
+extern pgd_t * __init efi_uv1_memmap_phys_prolog(void);
+extern void __init efi_uv1_memmap_phys_epilog(pgd_t *save_pgd);

struct efi_setup_data {
u64 fw_vendor;
@@ -203,10 +208,7 @@ static inline bool efi_runtime_supported(void)
if (IS_ENABLED(CONFIG_X86_64) == efi_enabled(EFI_64BIT))
return true;

- if (IS_ENABLED(CONFIG_EFI_MIXED) && !efi_enabled(EFI_OLD_MEMMAP))
- return true;
-
- return false;
+ return IS_ENABLED(CONFIG_EFI_MIXED);
}

extern void parse_efi_setup(u64 phys_addr, u32 data_len);
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index d2f4e706a428..f293d872602a 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -177,7 +177,7 @@ setup_efi_state(struct boot_params *params, unsigned long params_load_addr,
* acpi_rsdp=<addr> on kernel command line to make second kernel boot
* without efi.
*/
- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
return 0;

params->secure_boot = boot_params.secure_boot;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index b931c4bea284..4e46d2d24352 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -497,6 +497,8 @@ void __init efi_init(void)
efi_print_memmap();
}

+#if defined(CONFIG_X86_32) || defined(CONFIG_X86_UV)
+
void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
{
u64 addr, npages;
@@ -561,6 +563,8 @@ void __init old_map_region(efi_memory_desc_t *md)
(unsigned long long)md->phys_addr);
}

+#endif
+
/* Merge contiguous regions of the same type and attribute */
static void __init efi_merge_regions(void)
{
@@ -659,7 +663,7 @@ static inline void *efi_map_next_entry_reverse(void *entry)
*/
static void *efi_map_next_entry(void *entry)
{
- if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
+ if (!efi_have_uv1_memmap() && efi_enabled(EFI_64BIT)) {
/*
* Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
* config table feature requires us to map all entries
@@ -791,11 +795,11 @@ static void __init kexec_enter_virtual_mode(void)

/*
* We don't do virtual mode, since we don't do runtime services, on
- * non-native EFI. With efi=old_map, we don't do runtime services in
+ * non-native EFI. With the UV1 memmap, we don't do runtime services in
* kexec kernel because in the initial boot something else might
* have been mapped at these virtual addresses.
*/
- if (efi_is_mixed() || efi_enabled(EFI_OLD_MEMMAP)) {
+ if (efi_is_mixed() || efi_have_uv1_memmap()) {
efi_memmap_unmap();
clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
return;
@@ -861,9 +865,9 @@ static void __init kexec_enter_virtual_mode(void)
*
* The old method which used to update that memory descriptor with the
* virtual address obtained from ioremap() is still supported when the
- * kernel is booted with efi=old_map on its command line. Same old
- * method enabled the runtime services to be called without having to
- * thunk back into physical mode for every invocation.
+ * kernel is booted on SG1 UV1 hardware. Same old method enabled the
+ * runtime services to be called without having to thunk back into
+ * physical mode for every invocation.
*
* The new method does a pagetable switch in a preemption-safe manner
* so that we're in a different address space when calling a runtime
@@ -976,20 +980,6 @@ void __init efi_enter_virtual_mode(void)
efi_dump_pagetable();
}

-static int __init arch_parse_efi_cmdline(char *str)
-{
- if (!str) {
- pr_warn("need at least one option\n");
- return -EINVAL;
- }
-
- if (parse_option_str(str, "old_map"))
- set_bit(EFI_OLD_MEMMAP, &efi.flags);
-
- return 0;
-}
-early_param("efi", arch_parse_efi_cmdline);
-
bool efi_is_table_address(unsigned long phys_addr)
{
unsigned int i;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 3eb23966e30a..8d1869ff1033 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -57,134 +57,6 @@ static u64 efi_va = EFI_VA_START;

struct efi_scratch efi_scratch;

-static void __init early_code_mapping_set_exec(int executable)
-{
- efi_memory_desc_t *md;
-
- if (!(__supported_pte_mask & _PAGE_NX))
- return;
-
- /* Make EFI service code area executable */
- for_each_efi_memory_desc(md) {
- if (md->type == EFI_RUNTIME_SERVICES_CODE ||
- md->type == EFI_BOOT_SERVICES_CODE)
- efi_set_executable(md, executable);
- }
-}
-
-static void __init efi_old_memmap_phys_epilog(pgd_t *save_pgd);
-
-static pgd_t * __init efi_old_memmap_phys_prolog(void)
-{
- unsigned long vaddr, addr_pgd, addr_p4d, addr_pud;
- pgd_t *save_pgd, *pgd_k, *pgd_efi;
- p4d_t *p4d, *p4d_k, *p4d_efi;
- pud_t *pud;
-
- int pgd;
- int n_pgds, i, j;
-
- early_code_mapping_set_exec(1);
-
- n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
- save_pgd = kmalloc_array(n_pgds, sizeof(*save_pgd), GFP_KERNEL);
- if (!save_pgd)
- return NULL;
-
- /*
- * Build 1:1 identity mapping for efi=old_map usage. Note that
- * PAGE_OFFSET is PGDIR_SIZE aligned when KASLR is disabled, while
- * it is PUD_SIZE ALIGNED with KASLR enabled. So for a given physical
- * address X, the pud_index(X) != pud_index(__va(X)), we can only copy
- * PUD entry of __va(X) to fill in pud entry of X to build 1:1 mapping.
- * This means here we can only reuse the PMD tables of the direct mapping.
- */
- for (pgd = 0; pgd < n_pgds; pgd++) {
- addr_pgd = (unsigned long)(pgd * PGDIR_SIZE);
- vaddr = (unsigned long)__va(pgd * PGDIR_SIZE);
- pgd_efi = pgd_offset_k(addr_pgd);
- save_pgd[pgd] = *pgd_efi;
-
- p4d = p4d_alloc(&init_mm, pgd_efi, addr_pgd);
- if (!p4d) {
- pr_err("Failed to allocate p4d table!\n");
- goto out;
- }
-
- for (i = 0; i < PTRS_PER_P4D; i++) {
- addr_p4d = addr_pgd + i * P4D_SIZE;
- p4d_efi = p4d + p4d_index(addr_p4d);
-
- pud = pud_alloc(&init_mm, p4d_efi, addr_p4d);
- if (!pud) {
- pr_err("Failed to allocate pud table!\n");
- goto out;
- }
-
- for (j = 0; j < PTRS_PER_PUD; j++) {
- addr_pud = addr_p4d + j * PUD_SIZE;
-
- if (addr_pud > (max_pfn << PAGE_SHIFT))
- break;
-
- vaddr = (unsigned long)__va(addr_pud);
-
- pgd_k = pgd_offset_k(vaddr);
- p4d_k = p4d_offset(pgd_k, vaddr);
- pud[j] = *pud_offset(p4d_k, vaddr);
- }
- }
- pgd_offset_k(pgd * PGDIR_SIZE)->pgd &= ~_PAGE_NX;
- }
-
- __flush_tlb_all();
- return save_pgd;
-out:
- efi_old_memmap_phys_epilog(save_pgd);
- return NULL;
-}
-
-static void __init efi_old_memmap_phys_epilog(pgd_t *save_pgd)
-{
- /*
- * After the lock is released, the original page table is restored.
- */
- int pgd_idx, i;
- int nr_pgds;
- pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
-
- nr_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
-
- for (pgd_idx = 0; pgd_idx < nr_pgds; pgd_idx++) {
- pgd = pgd_offset_k(pgd_idx * PGDIR_SIZE);
- set_pgd(pgd_offset_k(pgd_idx * PGDIR_SIZE), save_pgd[pgd_idx]);
-
- if (!pgd_present(*pgd))
- continue;
-
- for (i = 0; i < PTRS_PER_P4D; i++) {
- p4d = p4d_offset(pgd,
- pgd_idx * PGDIR_SIZE + i * P4D_SIZE);
-
- if (!p4d_present(*p4d))
- continue;
-
- pud = (pud_t *)p4d_page_vaddr(*p4d);
- pud_free(&init_mm, pud);
- }
-
- p4d = (p4d_t *)pgd_page_vaddr(*pgd);
- p4d_free(&init_mm, p4d);
- }
-
- kfree(save_pgd);
-
- __flush_tlb_all();
- early_code_mapping_set_exec(0);
-}
-
EXPORT_SYMBOL_GPL(efi_mm);

/*
@@ -203,7 +75,7 @@ int __init efi_alloc_page_tables(void)
pud_t *pud;
gfp_t gfp_mask;

- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
return 0;

gfp_mask = GFP_KERNEL | __GFP_ZERO;
@@ -244,7 +116,7 @@ void efi_sync_low_kernel_mappings(void)
pud_t *pud_k, *pud_efi;
pgd_t *efi_pgd = efi_mm.pgd;

- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
return;

/*
@@ -338,7 +210,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
unsigned npages;
pgd_t *pgd = efi_mm.pgd;

- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
return 0;

/*
@@ -439,7 +311,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
unsigned long size = md->num_pages << PAGE_SHIFT;
u64 pa = md->phys_addr;

- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
return old_map_region(md);

/*
@@ -496,26 +368,6 @@ void __init efi_map_region_fixed(efi_memory_desc_t *md)
__map_region(md, md->virt_addr);
}

-void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
- u32 type, u64 attribute)
-{
- unsigned long last_map_pfn;
-
- if (type == EFI_MEMORY_MAPPED_IO)
- return ioremap(phys_addr, size);
-
- last_map_pfn = init_memory_mapping(phys_addr, phys_addr + size);
- if ((last_map_pfn << PAGE_SHIFT) < phys_addr + size) {
- unsigned long top = last_map_pfn << PAGE_SHIFT;
- efi_ioremap(top, size - (top - phys_addr), type, attribute);
- }
-
- if (!(attribute & EFI_MEMORY_WB))
- efi_memory_uc((u64)(unsigned long)__va(phys_addr), size);
-
- return (void __iomem *)__va(phys_addr);
-}
-
void __init parse_efi_setup(u64 phys_addr, u32 data_len)
{
efi_setup = phys_addr + sizeof(struct setup_data);
@@ -564,7 +416,7 @@ void __init efi_runtime_update_mappings(void)
{
efi_memory_desc_t *md;

- if (efi_enabled(EFI_OLD_MEMMAP)) {
+ if (efi_have_uv1_memmap()) {
if (__supported_pte_mask & _PAGE_NX)
runtime_code_page_mkexec();
return;
@@ -618,7 +470,7 @@ void __init efi_runtime_update_mappings(void)
void __init efi_dump_pagetable(void)
{
#ifdef CONFIG_EFI_PGT_DUMP
- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
ptdump_walk_pgd_level(NULL, efi_mm.pgd);
@@ -1045,8 +897,8 @@ efi_status_t __init efi_set_virtual_address_map(unsigned long memory_map_size,
descriptor_version,
virtual_map);

- if (efi_enabled(EFI_OLD_MEMMAP)) {
- save_pgd = efi_old_memmap_phys_prolog();
+ if (efi_have_uv1_memmap()) {
+ save_pgd = efi_uv1_memmap_phys_prolog();
if (!save_pgd)
return EFI_ABORTED;
} else {
@@ -1065,7 +917,7 @@ efi_status_t __init efi_set_virtual_address_map(unsigned long memory_map_size,
kernel_fpu_end();

if (save_pgd)
- efi_old_memmap_phys_epilog(save_pgd);
+ efi_uv1_memmap_phys_epilog(save_pgd);
else
efi_switch_mm(efi_scratch.prev_mm);

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index eb421cb35108..fe46ddf6c761 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -384,10 +384,10 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)

/*
* To Do: Remove this check after adding functionality to unmap EFI boot
- * services code/data regions from direct mapping area because
- * "efi=old_map" maps EFI regions in swapper_pg_dir.
+ * services code/data regions from direct mapping area because the UV1
+ * memory map maps EFI regions in swapper_pg_dir.
*/
- if (efi_enabled(EFI_OLD_MEMMAP))
+ if (efi_have_uv1_memmap())
return;

/*
@@ -558,7 +558,7 @@ int __init efi_reuse_config(u64 tables, int nr_tables)
return ret;
}

-static const struct dmi_system_id sgi_uv1_dmi[] = {
+static const struct dmi_system_id sgi_uv1_dmi[] __initconst = {
{ NULL, "SGI UV1",
{ DMI_MATCH(DMI_PRODUCT_NAME, "Stoutland Platform"),
DMI_MATCH(DMI_PRODUCT_VERSION, "1.0"),
@@ -581,8 +581,15 @@ void __init efi_apply_memmap_quirks(void)
}

/* UV2+ BIOS has a fix for this issue. UV1 still needs the quirk. */
- if (dmi_check_system(sgi_uv1_dmi))
- set_bit(EFI_OLD_MEMMAP, &efi.flags);
+ if (dmi_check_system(sgi_uv1_dmi)) {
+ if (IS_ENABLED(CONFIG_X86_UV)) {
+ set_bit(EFI_UV1_MEMMAP, &efi.flags);
+ } else {
+ pr_warn("EFI runtime disabled, needs CONFIG_X86_UV=y on UV1\n");
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ efi_memmap_unmap();
+ }
+ }
}

/*
@@ -720,7 +727,7 @@ void efi_recover_from_page_fault(unsigned long phys_addr)
/*
* Make sure that an efi runtime service caused the page fault.
* "efi_mm" cannot be used to check if the page fault had occurred
- * in the firmware context because efi=old_map doesn't use efi_pgd.
+ * in the firmware context because the UV1 memmap doesn't use efi_pgd.
*/
if (efi_rts_work.efi_rts_id == EFI_NONE)
return;
diff --git a/arch/x86/platform/uv/bios_uv.c b/arch/x86/platform/uv/bios_uv.c
index 5c0e2eb5d87c..7c2b8c5d0b49 100644
--- a/arch/x86/platform/uv/bios_uv.c
+++ b/arch/x86/platform/uv/bios_uv.c
@@ -31,10 +31,10 @@ static s64 __uv_bios_call(enum uv_bios_cmd which, u64 a1, u64 a2, u64 a3,
return BIOS_STATUS_UNIMPLEMENTED;

/*
- * If EFI_OLD_MEMMAP is set, we need to fall back to using our old EFI
+ * If EFI_UV1_MEMMAP is set, we need to fall back to using our old EFI
* callback method, which uses efi_call() directly, with the kernel page tables:
*/
- if (unlikely(efi_enabled(EFI_OLD_MEMMAP))) {
+ if (unlikely(efi_enabled(EFI_UV1_MEMMAP))) {
kernel_fpu_begin();
ret = efi_call((void *)__va(tab->function), (u64)which, a1, a2, a3, a4, a5);
kernel_fpu_end();
@@ -217,3 +217,163 @@ int uv_bios_init(void)
pr_info("UV: UVsystab: Revision:%x\n", uv_systab->revision);
return 0;
}
+
+static void __init early_code_mapping_set_exec(int executable)
+{
+ efi_memory_desc_t *md;
+
+ if (!(__supported_pte_mask & _PAGE_NX))
+ return;
+
+ /* Make EFI service code area executable */
+ for_each_efi_memory_desc(md) {
+ if (md->type == EFI_RUNTIME_SERVICES_CODE ||
+ md->type == EFI_BOOT_SERVICES_CODE)
+ efi_set_executable(md, executable);
+ }
+}
+
+void __init efi_uv1_memmap_phys_epilog(pgd_t *save_pgd)
+{
+ /*
+ * After the lock is released, the original page table is restored.
+ */
+ int pgd_idx, i;
+ int nr_pgds;
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+
+ nr_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
+
+ for (pgd_idx = 0; pgd_idx < nr_pgds; pgd_idx++) {
+ pgd = pgd_offset_k(pgd_idx * PGDIR_SIZE);
+ set_pgd(pgd_offset_k(pgd_idx * PGDIR_SIZE), save_pgd[pgd_idx]);
+
+ if (!pgd_present(*pgd))
+ continue;
+
+ for (i = 0; i < PTRS_PER_P4D; i++) {
+ p4d = p4d_offset(pgd,
+ pgd_idx * PGDIR_SIZE + i * P4D_SIZE);
+
+ if (!p4d_present(*p4d))
+ continue;
+
+ pud = (pud_t *)p4d_page_vaddr(*p4d);
+ pud_free(&init_mm, pud);
+ }
+
+ p4d = (p4d_t *)pgd_page_vaddr(*pgd);
+ p4d_free(&init_mm, p4d);
+ }
+
+ kfree(save_pgd);
+
+ __flush_tlb_all();
+ early_code_mapping_set_exec(0);
+}
+
+pgd_t * __init efi_uv1_memmap_phys_prolog(void)
+{
+ unsigned long vaddr, addr_pgd, addr_p4d, addr_pud;
+ pgd_t *save_pgd, *pgd_k, *pgd_efi;
+ p4d_t *p4d, *p4d_k, *p4d_efi;
+ pud_t *pud;
+
+ int pgd;
+ int n_pgds, i, j;
+
+ early_code_mapping_set_exec(1);
+
+ n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
+ save_pgd = kmalloc_array(n_pgds, sizeof(*save_pgd), GFP_KERNEL);
+ if (!save_pgd)
+ return NULL;
+
+ /*
+ * Build 1:1 identity mapping for UV1 memmap usage. Note that
+ * PAGE_OFFSET is PGDIR_SIZE aligned when KASLR is disabled, while
+ * it is PUD_SIZE ALIGNED with KASLR enabled. So for a given physical
+ * address X, the pud_index(X) != pud_index(__va(X)), we can only copy
+ * PUD entry of __va(X) to fill in pud entry of X to build 1:1 mapping.
+ * This means here we can only reuse the PMD tables of the direct mapping.
+ */
+ for (pgd = 0; pgd < n_pgds; pgd++) {
+ addr_pgd = (unsigned long)(pgd * PGDIR_SIZE);
+ vaddr = (unsigned long)__va(pgd * PGDIR_SIZE);
+ pgd_efi = pgd_offset_k(addr_pgd);
+ save_pgd[pgd] = *pgd_efi;
+
+ p4d = p4d_alloc(&init_mm, pgd_efi, addr_pgd);
+ if (!p4d) {
+ pr_err("Failed to allocate p4d table!\n");
+ goto out;
+ }
+
+ for (i = 0; i < PTRS_PER_P4D; i++) {
+ addr_p4d = addr_pgd + i * P4D_SIZE;
+ p4d_efi = p4d + p4d_index(addr_p4d);
+
+ pud = pud_alloc(&init_mm, p4d_efi, addr_p4d);
+ if (!pud) {
+ pr_err("Failed to allocate pud table!\n");
+ goto out;
+ }
+
+ for (j = 0; j < PTRS_PER_PUD; j++) {
+ addr_pud = addr_p4d + j * PUD_SIZE;
+
+ if (addr_pud > (max_pfn << PAGE_SHIFT))
+ break;
+
+ vaddr = (unsigned long)__va(addr_pud);
+
+ pgd_k = pgd_offset_k(vaddr);
+ p4d_k = p4d_offset(pgd_k, vaddr);
+ pud[j] = *pud_offset(p4d_k, vaddr);
+ }
+ }
+ pgd_offset_k(pgd * PGDIR_SIZE)->pgd &= ~_PAGE_NX;
+ }
+
+ __flush_tlb_all();
+ return save_pgd;
+out:
+ efi_uv1_memmap_phys_epilog(save_pgd);
+ return NULL;
+}
+
+void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
+ u32 type, u64 attribute)
+{
+ unsigned long last_map_pfn;
+
+ if (type == EFI_MEMORY_MAPPED_IO)
+ return ioremap(phys_addr, size);
+
+ last_map_pfn = init_memory_mapping(phys_addr, phys_addr + size);
+ if ((last_map_pfn << PAGE_SHIFT) < phys_addr + size) {
+ unsigned long top = last_map_pfn << PAGE_SHIFT;
+ efi_ioremap(top, size - (top - phys_addr), type, attribute);
+ }
+
+ if (!(attribute & EFI_MEMORY_WB))
+ efi_memory_uc((u64)(unsigned long)__va(phys_addr), size);
+
+ return (void __iomem *)__va(phys_addr);
+}
+
+static int __init arch_parse_efi_cmdline(char *str)
+{
+ if (!str) {
+ pr_warn("need at least one option\n");
+ return -EINVAL;
+ }
+
+ if (parse_option_str(str, "old_map"))
+ set_bit(EFI_UV1_MEMMAP, &efi.flags);
+
+ return 0;
+}
+early_param("efi", arch_parse_efi_cmdline);
--
2.20.1

2020-01-20 08:26:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL 00/13] More EFI updates for v5.6


* Ard Biesheuvel <[email protected]> wrote:

> The following changes since commit 4444f8541dad16fefd9b8807ad1451e806ef1d94:
>
> efi: Allow disabling PCI busmastering on bridges during boot (2020-01-10 18:55:04 +0100)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next
>
> for you to fetch changes up to 743c5dd59f7e4b9e7a28be6a8f0e8d03271a98ab:
>
> efi: Fix handling of multiple efi_fake_mem= entries (2020-01-13 10:41:53 +0100)
>
> ----------------------------------------------------------------
> Third and final batch of EFI updates for v5.6:
> - A few touchups for the x86 libstub changes that have already been queued
> up.
> - Fix memory leaks and crash bugs in the EFI fake_mem driver, which may be
> used more heavily in the future for device-dax soft reservation (Dan)
> - Avoid RWX mappings in the EFI page tables when possible.
> - Avoid PCIe probing failures if the EFI framebuffer is probed first when
> the new of_devlink feature is active.
> - Move the support code for the old EFI memory mapping style into its only
> user, the SGI UV1+ support code.
>
> ----------------------------------------------------------------
> Anshuman Khandual (1):
> efi: Fix comment for efi_mem_type() wrt absent physical addresses
>
> Ard Biesheuvel (7):
> efi/libstub/x86: use const attribute for efi_is_64bit()
> efi/libstub/x86: use mandatory 16-byte stack alignment in mixed mode
> x86/mm: fix NX bit clearing issue in kernel_map_pages_in_pgd
> efi/x86: don't map the entire kernel text RW for mixed mode
> efi/x86: avoid RWX mappings for all of DRAM
> efi/x86: limit EFI old memory map to SGI UV machines
> efi/arm: defer probe of PCIe backed efifb on DT systems
>
> Arnd Bergmann (1):
> efi/libstub/x86: fix unused-variable warning
>
> Dan Williams (4):
> efi: Add a flags parameter to efi_memory_map
> efi: Add tracking for dynamically allocated memmaps
> efi: Fix efi_memmap_alloc() leaks
> efi: Fix handling of multiple efi_fake_mem= entries
>
> Documentation/admin-guide/kernel-parameters.txt | 3 +-
> arch/x86/boot/compressed/eboot.c | 16 +-
> arch/x86/boot/compressed/efi_thunk_64.S | 46 ++----
> arch/x86/boot/compressed/head_64.S | 7 +-
> arch/x86/include/asm/efi.h | 28 ++--
> arch/x86/kernel/kexec-bzimage64.c | 2 +-
> arch/x86/mm/pat/set_memory.c | 8 +-
> arch/x86/platform/efi/efi.c | 40 +++--
> arch/x86/platform/efi/efi_64.c | 190 ++++--------------------
> arch/x86/platform/efi/quirks.c | 44 +++---
> arch/x86/platform/uv/bios_uv.c | 164 +++++++++++++++++++-
> drivers/firmware/efi/arm-init.c | 107 ++++++++++++-
> drivers/firmware/efi/efi.c | 2 +-
> drivers/firmware/efi/fake_mem.c | 43 +++---
> drivers/firmware/efi/memmap.c | 95 ++++++++----
> include/linux/efi.h | 17 ++-
> 16 files changed, 471 insertions(+), 341 deletions(-)

Thanks Ard, I've applied these and the 3 fixes in the second series to
tip:efi/core. It all merged nicely and looks good here. Let me know if
there's anything amiss.

Thanks,

Ingo

2020-01-20 08:47:10

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [GIT PULL 00/13] More EFI updates for v5.6

On Mon, 20 Jan 2020 at 09:25, Ingo Molnar <[email protected]> wrote:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > The following changes since commit 4444f8541dad16fefd9b8807ad1451e806ef1d94:
> >
> > efi: Allow disabling PCI busmastering on bridges during boot (2020-01-10 18:55:04 +0100)
> >
> > are available in the Git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next
> >
> > for you to fetch changes up to 743c5dd59f7e4b9e7a28be6a8f0e8d03271a98ab:
> >
> > efi: Fix handling of multiple efi_fake_mem= entries (2020-01-13 10:41:53 +0100)
> >
> > ----------------------------------------------------------------
> > Third and final batch of EFI updates for v5.6:
> > - A few touchups for the x86 libstub changes that have already been queued
> > up.
> > - Fix memory leaks and crash bugs in the EFI fake_mem driver, which may be
> > used more heavily in the future for device-dax soft reservation (Dan)
> > - Avoid RWX mappings in the EFI page tables when possible.
> > - Avoid PCIe probing failures if the EFI framebuffer is probed first when
> > the new of_devlink feature is active.
> > - Move the support code for the old EFI memory mapping style into its only
> > user, the SGI UV1+ support code.
> >
> > ----------------------------------------------------------------
> > Anshuman Khandual (1):
> > efi: Fix comment for efi_mem_type() wrt absent physical addresses
> >
> > Ard Biesheuvel (7):
> > efi/libstub/x86: use const attribute for efi_is_64bit()
> > efi/libstub/x86: use mandatory 16-byte stack alignment in mixed mode
> > x86/mm: fix NX bit clearing issue in kernel_map_pages_in_pgd
> > efi/x86: don't map the entire kernel text RW for mixed mode
> > efi/x86: avoid RWX mappings for all of DRAM
> > efi/x86: limit EFI old memory map to SGI UV machines
> > efi/arm: defer probe of PCIe backed efifb on DT systems
> >
> > Arnd Bergmann (1):
> > efi/libstub/x86: fix unused-variable warning
> >
> > Dan Williams (4):
> > efi: Add a flags parameter to efi_memory_map
> > efi: Add tracking for dynamically allocated memmaps
> > efi: Fix efi_memmap_alloc() leaks
> > efi: Fix handling of multiple efi_fake_mem= entries
> >
> > Documentation/admin-guide/kernel-parameters.txt | 3 +-
> > arch/x86/boot/compressed/eboot.c | 16 +-
> > arch/x86/boot/compressed/efi_thunk_64.S | 46 ++----
> > arch/x86/boot/compressed/head_64.S | 7 +-
> > arch/x86/include/asm/efi.h | 28 ++--
> > arch/x86/kernel/kexec-bzimage64.c | 2 +-
> > arch/x86/mm/pat/set_memory.c | 8 +-
> > arch/x86/platform/efi/efi.c | 40 +++--
> > arch/x86/platform/efi/efi_64.c | 190 ++++--------------------
> > arch/x86/platform/efi/quirks.c | 44 +++---
> > arch/x86/platform/uv/bios_uv.c | 164 +++++++++++++++++++-
> > drivers/firmware/efi/arm-init.c | 107 ++++++++++++-
> > drivers/firmware/efi/efi.c | 2 +-
> > drivers/firmware/efi/fake_mem.c | 43 +++---
> > drivers/firmware/efi/memmap.c | 95 ++++++++----
> > include/linux/efi.h | 17 ++-
> > 16 files changed, 471 insertions(+), 341 deletions(-)
>
> Thanks Ard, I've applied these and the 3 fixes in the second series to
> tip:efi/core. It all merged nicely and looks good here. Let me know if
> there's anything amiss.
>

Thanks Ingo,

Apart from the mismatched parens in the commit log of the top commit,
everything looks fine.

It does appear that the KASAN fix is not 100% sufficient for mixed
mode, so I'll need to apply another fix on top there, but we may have
other things to fix during the cycle so I'll leave that for later.

2020-01-22 07:04:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL 00/13] More EFI updates for v5.6


* Ard Biesheuvel <[email protected]> wrote:

> > Thanks Ard, I've applied these and the 3 fixes in the second series to
> > tip:efi/core. It all merged nicely and looks good here. Let me know if
> > there's anything amiss.
>
> Thanks Ingo,
>
> Apart from the mismatched parens in the commit log of the top commit,
> everything looks fine.

Indeed - and since it was the top commit I amended it, and while at it I
fixed not just the parenthesis error I introduced, but added the right
SHA1 as well:

efi/x86: Disallow efi=old_map in mixed mode

Before:

1f299fad1e31: ("efi/x86: Limit EFI old memory map to SGI UV machines")

enabling the old EFI memory map on mixed mode systems
disabled EFI runtime services altogether.

> It does appear that the KASAN fix is not 100% sufficient for mixed
> mode, so I'll need to apply another fix on top there, but we may have
> other things to fix during the cycle so I'll leave that for later.

Sounds good!

Thanks,

Ingo

2020-04-08 12:00:41

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Wed, 8 Apr 2020 at 12:42, Jiri Slaby <[email protected]> wrote:
>
> On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
> > The mixed mode thunking routine requires a part of it to be
> > mapped 1:1, and for this reason, we currently map the entire
> > kernel .text read/write in the EFI page tables, which is bad.
> >
> > In fact, the kernel_map_pages_in_pgd() invocation that installs
> > this mapping is entirely redundant, since all of DRAM is already
> > 1:1 mapped read/write in the EFI page tables when we reach this
> > point, which means that .rodata is mapped read-write as well.
> >
> > So let's remap both .text and .rodata read-only in the EFI
> > page tables.
>
> This patch causes unhandled page faults in mixed mode:
>
> > BUG: unable to handle page fault for address: 000000001557ee88
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0003) - permissions violation
> > PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
> > Oops: 0003 [#1] SMP PTI
> > CPU: 1 PID: 191 Comm: systemd-escape Not tainted
> 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
> 02/06/2015
> > RIP: 0008:0x3d2eed95
> > Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
> 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
> be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
> > RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
> > RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
> > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
> > RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
> > R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
> knlGS:0000000000000000
> > CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
> > CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > Modules linked in: efivarfs
> > CR2: 000000001557ee88
>
> EFI apparently tries to write to now read-only memory.
>
> See:
> https://bugzilla.suse.com/show_bug.cgi?id=1168645
>
> Reverting it on the top of 5.6 fixes the issue.
>
> I am using
> /usr/share/qemu/ovmf-ia32-code.bin
> /usr/share/qemu/ovmf-ia32-vars.bin
> from qemu-ovmf-ia32-202002-1.1.noarch rpm.
>

Do you have a git tree for Suse's OVMF fork? I did a lot of testing
with upstream OVMF, and never ran into this issue.


> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/platform/efi/efi_64.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> > index c13fa2150976..6ec58ff60b56 100644
> > --- a/arch/x86/platform/efi/efi_64.c
> > +++ b/arch/x86/platform/efi/efi_64.c
> > @@ -391,11 +391,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
> >
> > efi_scratch.phys_stack = page_to_phys(page + 1); /* stack grows down */
> >
> > - npages = (_etext - _text) >> PAGE_SHIFT;
> > + npages = (__end_rodata_aligned - _text) >> PAGE_SHIFT;
> > text = __pa(_text);
> > pfn = text >> PAGE_SHIFT;
> >
> > - pf = _PAGE_RW | _PAGE_ENC;
> > + pf = _PAGE_ENC;
> > if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
> > pr_err("Failed to map kernel text 1:1\n");
> > return 1;
> >
>
> thanks,
> --
> js
> suse labs

2020-04-08 12:01:38

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
> The mixed mode thunking routine requires a part of it to be
> mapped 1:1, and for this reason, we currently map the entire
> kernel .text read/write in the EFI page tables, which is bad.
>
> In fact, the kernel_map_pages_in_pgd() invocation that installs
> this mapping is entirely redundant, since all of DRAM is already
> 1:1 mapped read/write in the EFI page tables when we reach this
> point, which means that .rodata is mapped read-write as well.
>
> So let's remap both .text and .rodata read-only in the EFI
> page tables.

This patch causes unhandled page faults in mixed mode:

> BUG: unable to handle page fault for address: 000000001557ee88
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0003) - permissions violation
> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
> Oops: 0003 [#1] SMP PTI
> CPU: 1 PID: 191 Comm: systemd-escape Not tainted
5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
02/06/2015
> RIP: 0008:0x3d2eed95
> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
knlGS:0000000000000000
> CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> Modules linked in: efivarfs
> CR2: 000000001557ee88

EFI apparently tries to write to now read-only memory.

See:
https://bugzilla.suse.com/show_bug.cgi?id=1168645

Reverting it on the top of 5.6 fixes the issue.

I am using
/usr/share/qemu/ovmf-ia32-code.bin
/usr/share/qemu/ovmf-ia32-vars.bin
from qemu-ovmf-ia32-202002-1.1.noarch rpm.

> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/platform/efi/efi_64.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index c13fa2150976..6ec58ff60b56 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -391,11 +391,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>
> efi_scratch.phys_stack = page_to_phys(page + 1); /* stack grows down */
>
> - npages = (_etext - _text) >> PAGE_SHIFT;
> + npages = (__end_rodata_aligned - _text) >> PAGE_SHIFT;
> text = __pa(_text);
> pfn = text >> PAGE_SHIFT;
>
> - pf = _PAGE_RW | _PAGE_ENC;
> + pf = _PAGE_ENC;
> if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
> pr_err("Failed to map kernel text 1:1\n");
> return 1;
>

thanks,
--
js
suse labs

2020-04-08 12:21:30

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

Ccing Gary.

On 08. 04. 20, 12:47, Ard Biesheuvel wrote:
> On Wed, 8 Apr 2020 at 12:42, Jiri Slaby <[email protected]> wrote:
>>
>> On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
>>> The mixed mode thunking routine requires a part of it to be
>>> mapped 1:1, and for this reason, we currently map the entire
>>> kernel .text read/write in the EFI page tables, which is bad.
>>>
>>> In fact, the kernel_map_pages_in_pgd() invocation that installs
>>> this mapping is entirely redundant, since all of DRAM is already
>>> 1:1 mapped read/write in the EFI page tables when we reach this
>>> point, which means that .rodata is mapped read-write as well.
>>>
>>> So let's remap both .text and .rodata read-only in the EFI
>>> page tables.
>>
>> This patch causes unhandled page faults in mixed mode:
>>
>>> BUG: unable to handle page fault for address: 000000001557ee88
>>> #PF: supervisor write access in kernel mode
>>> #PF: error_code(0x0003) - permissions violation
>>> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
>>> Oops: 0003 [#1] SMP PTI
>>> CPU: 1 PID: 191 Comm: systemd-escape Not tainted
>> 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
>> 02/06/2015
>>> RIP: 0008:0x3d2eed95
>>> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
>> 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
>> be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
>>> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
>>> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
>>> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
>>> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
>>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>> FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
>> knlGS:0000000000000000
>>> CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
>>> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Call Trace:
>>> Modules linked in: efivarfs
>>> CR2: 000000001557ee88
>>
>> EFI apparently tries to write to now read-only memory.
>>
>> See:
>> https://bugzilla.suse.com/show_bug.cgi?id=1168645
>>
>> Reverting it on the top of 5.6 fixes the issue.
>>
>> I am using
>> /usr/share/qemu/ovmf-ia32-code.bin
>> /usr/share/qemu/ovmf-ia32-vars.bin
>> from qemu-ovmf-ia32-202002-1.1.noarch rpm.
>>
>
> Do you have a git tree for Suse's OVMF fork? I did a lot of testing
> with upstream OVMF, and never ran into this issue.

Not really a git tree, but the sources are here:
https://build.opensuse.org/package/show/openSUSE:Factory/ovmf

--
js
suse labs

2020-04-09 07:52:24

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Wed, 8 Apr 2020 at 12:51, Jiri Slaby <[email protected]> wrote:
>
> Ccing Gary.
>
> On 08. 04. 20, 12:47, Ard Biesheuvel wrote:
> > On Wed, 8 Apr 2020 at 12:42, Jiri Slaby <[email protected]> wrote:
> >>
> >> On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
> >>> The mixed mode thunking routine requires a part of it to be
> >>> mapped 1:1, and for this reason, we currently map the entire
> >>> kernel .text read/write in the EFI page tables, which is bad.
> >>>
> >>> In fact, the kernel_map_pages_in_pgd() invocation that installs
> >>> this mapping is entirely redundant, since all of DRAM is already
> >>> 1:1 mapped read/write in the EFI page tables when we reach this
> >>> point, which means that .rodata is mapped read-write as well.
> >>>
> >>> So let's remap both .text and .rodata read-only in the EFI
> >>> page tables.
> >>
> >> This patch causes unhandled page faults in mixed mode:
> >>
> >>> BUG: unable to handle page fault for address: 000000001557ee88
> >>> #PF: supervisor write access in kernel mode
> >>> #PF: error_code(0x0003) - permissions violation
> >>> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
> >>> Oops: 0003 [#1] SMP PTI
> >>> CPU: 1 PID: 191 Comm: systemd-escape Not tainted
> >> 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
> >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
> >> 02/06/2015
> >>> RIP: 0008:0x3d2eed95
> >>> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
> >> 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
> >> be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
> >>> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
> >>> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
> >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
> >>> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
> >>> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
> >>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >>> FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
> >> knlGS:0000000000000000
> >>> CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
> >>> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
> >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Call Trace:
> >>> Modules linked in: efivarfs
> >>> CR2: 000000001557ee88
> >>
> >> EFI apparently tries to write to now read-only memory.
> >>
> >> See:
> >> https://bugzilla.suse.com/show_bug.cgi?id=1168645
> >>
> >> Reverting it on the top of 5.6 fixes the issue.
> >>
> >> I am using
> >> /usr/share/qemu/ovmf-ia32-code.bin
> >> /usr/share/qemu/ovmf-ia32-vars.bin
> >> from qemu-ovmf-ia32-202002-1.1.noarch rpm.
> >>
> >
> > Do you have a git tree for Suse's OVMF fork? I did a lot of testing
> > with upstream OVMF, and never ran into this issue.
>
> Not really a git tree, but the sources are here:
> https://build.opensuse.org/package/show/openSUSE:Factory/ovmf
>


Anywhere I can get an actual build? The src rpm only has the sources,
and the i586 rpm has nothing except

$ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
/usr/share/doc/packages/ovmf
/usr/share/doc/packages/ovmf/README

2020-04-09 08:08:30

by Gary Lin

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, Apr 09, 2020 at 09:51:20AM +0200, Ard Biesheuvel wrote:
> On Wed, 8 Apr 2020 at 12:51, Jiri Slaby <[email protected]> wrote:
> >
> > Ccing Gary.
> >
> > On 08. 04. 20, 12:47, Ard Biesheuvel wrote:
> > > On Wed, 8 Apr 2020 at 12:42, Jiri Slaby <[email protected]> wrote:
> > >>
> > >> On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
> > >>> The mixed mode thunking routine requires a part of it to be
> > >>> mapped 1:1, and for this reason, we currently map the entire
> > >>> kernel .text read/write in the EFI page tables, which is bad.
> > >>>
> > >>> In fact, the kernel_map_pages_in_pgd() invocation that installs
> > >>> this mapping is entirely redundant, since all of DRAM is already
> > >>> 1:1 mapped read/write in the EFI page tables when we reach this
> > >>> point, which means that .rodata is mapped read-write as well.
> > >>>
> > >>> So let's remap both .text and .rodata read-only in the EFI
> > >>> page tables.
> > >>
> > >> This patch causes unhandled page faults in mixed mode:
> > >>
> > >>> BUG: unable to handle page fault for address: 000000001557ee88
> > >>> #PF: supervisor write access in kernel mode
> > >>> #PF: error_code(0x0003) - permissions violation
> > >>> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
> > >>> Oops: 0003 [#1] SMP PTI
> > >>> CPU: 1 PID: 191 Comm: systemd-escape Not tainted
> > >> 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
> > >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
> > >> 02/06/2015
> > >>> RIP: 0008:0x3d2eed95
> > >>> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
> > >> 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
> > >> be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
> > >>> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
> > >>> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
> > >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
> > >>> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
> > >>> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
> > >>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > >>> FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
> > >> knlGS:0000000000000000
> > >>> CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
> > >>> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
> > >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >>> Call Trace:
> > >>> Modules linked in: efivarfs
> > >>> CR2: 000000001557ee88
> > >>
> > >> EFI apparently tries to write to now read-only memory.
> > >>
> > >> See:
> > >> https://bugzilla.suse.com/show_bug.cgi?id=1168645
> > >>
> > >> Reverting it on the top of 5.6 fixes the issue.
> > >>
> > >> I am using
> > >> /usr/share/qemu/ovmf-ia32-code.bin
> > >> /usr/share/qemu/ovmf-ia32-vars.bin
> > >> from qemu-ovmf-ia32-202002-1.1.noarch rpm.
> > >>
> > >
> > > Do you have a git tree for Suse's OVMF fork? I did a lot of testing
> > > with upstream OVMF, and never ran into this issue.
> >
> > Not really a git tree, but the sources are here:
> > https://build.opensuse.org/package/show/openSUSE:Factory/ovmf
> >
>
>
> Anywhere I can get an actual build? The src rpm only has the sources,
> and the i586 rpm has nothing except
>
> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> /usr/share/doc/packages/ovmf
> /usr/share/doc/packages/ovmf/README

Hmmm, it's weird that OBS doesn't list all derived files.
Anyway, the ia32 ovmf is available in
http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm

Gary Lin

2020-04-09 08:12:05

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On 09. 04. 20, 10:06, Gary Lin wrote:
> On Thu, Apr 09, 2020 at 09:51:20AM +0200, Ard Biesheuvel wrote:
>> On Wed, 8 Apr 2020 at 12:51, Jiri Slaby <[email protected]> wrote:
>>>
>>> Ccing Gary.
>>>
>>> On 08. 04. 20, 12:47, Ard Biesheuvel wrote:
>>>> On Wed, 8 Apr 2020 at 12:42, Jiri Slaby <[email protected]> wrote:
>>>>>
>>>>> On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
>>>>>> The mixed mode thunking routine requires a part of it to be
>>>>>> mapped 1:1, and for this reason, we currently map the entire
>>>>>> kernel .text read/write in the EFI page tables, which is bad.
>>>>>>
>>>>>> In fact, the kernel_map_pages_in_pgd() invocation that installs
>>>>>> this mapping is entirely redundant, since all of DRAM is already
>>>>>> 1:1 mapped read/write in the EFI page tables when we reach this
>>>>>> point, which means that .rodata is mapped read-write as well.
>>>>>>
>>>>>> So let's remap both .text and .rodata read-only in the EFI
>>>>>> page tables.
>>>>>
>>>>> This patch causes unhandled page faults in mixed mode:
>>>>>
>>>>>> BUG: unable to handle page fault for address: 000000001557ee88
>>>>>> #PF: supervisor write access in kernel mode
>>>>>> #PF: error_code(0x0003) - permissions violation
>>>>>> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
>>>>>> Oops: 0003 [#1] SMP PTI
>>>>>> CPU: 1 PID: 191 Comm: systemd-escape Not tainted
>>>>> 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
>>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
>>>>> 02/06/2015
>>>>>> RIP: 0008:0x3d2eed95
>>>>>> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
>>>>> 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
>>>>> be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
>>>>>> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
>>>>>> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
>>>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
>>>>>> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
>>>>>> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
>>>>>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>>>> FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
>>>>> knlGS:0000000000000000
>>>>>> CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
>>>>>> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> Call Trace:
>>>>>> Modules linked in: efivarfs
>>>>>> CR2: 000000001557ee88
>>>>>
>>>>> EFI apparently tries to write to now read-only memory.
>>>>>
>>>>> See:
>>>>> https://bugzilla.suse.com/show_bug.cgi?id=1168645
>>>>>
>>>>> Reverting it on the top of 5.6 fixes the issue.
>>>>>
>>>>> I am using
>>>>> /usr/share/qemu/ovmf-ia32-code.bin
>>>>> /usr/share/qemu/ovmf-ia32-vars.bin
>>>>> from qemu-ovmf-ia32-202002-1.1.noarch rpm.
>>>>>
>>>>
>>>> Do you have a git tree for Suse's OVMF fork? I did a lot of testing
>>>> with upstream OVMF, and never ran into this issue.
>>>
>>> Not really a git tree, but the sources are here:
>>> https://build.opensuse.org/package/show/openSUSE:Factory/ovmf
>>>
>>
>>
>> Anywhere I can get an actual build? The src rpm only has the sources,
>> and the i586 rpm has nothing except
>>
>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
>> /usr/share/doc/packages/ovmf
>> /usr/share/doc/packages/ovmf/README
>
> Hmmm, it's weird that OBS doesn't list all derived files.
> Anyway, the ia32 ovmf is available in
> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm

It indeed does:
https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard

Note that the ia32 version is noarch, built on i586.

thanks,
--
js
suse labs

2020-04-09 08:20:31

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 10:10, Jiri Slaby <[email protected]> wrote:
>
> On 09. 04. 20, 10:06, Gary Lin wrote:
> > On Thu, Apr 09, 2020 at 09:51:20AM +0200, Ard Biesheuvel wrote:
> >> On Wed, 8 Apr 2020 at 12:51, Jiri Slaby <[email protected]> wrote:
> >>>
> >>> Ccing Gary.
> >>>
> >>> On 08. 04. 20, 12:47, Ard Biesheuvel wrote:
> >>>> On Wed, 8 Apr 2020 at 12:42, Jiri Slaby <[email protected]> wrote:
> >>>>>
> >>>>> On 13. 01. 20, 18:22, Ard Biesheuvel wrote:
> >>>>>> The mixed mode thunking routine requires a part of it to be
> >>>>>> mapped 1:1, and for this reason, we currently map the entire
> >>>>>> kernel .text read/write in the EFI page tables, which is bad.
> >>>>>>
> >>>>>> In fact, the kernel_map_pages_in_pgd() invocation that installs
> >>>>>> this mapping is entirely redundant, since all of DRAM is already
> >>>>>> 1:1 mapped read/write in the EFI page tables when we reach this
> >>>>>> point, which means that .rodata is mapped read-write as well.
> >>>>>>
> >>>>>> So let's remap both .text and .rodata read-only in the EFI
> >>>>>> page tables.
> >>>>>
> >>>>> This patch causes unhandled page faults in mixed mode:
> >>>>>
> >>>>>> BUG: unable to handle page fault for address: 000000001557ee88
> >>>>>> #PF: supervisor write access in kernel mode
> >>>>>> #PF: error_code(0x0003) - permissions violation
> >>>>>> PGD fd52063 P4D fd52063 PUD fd53063 PMD 154000e1
> >>>>>> Oops: 0003 [#1] SMP PTI
> >>>>>> CPU: 1 PID: 191 Comm: systemd-escape Not tainted
> >>>>> 5.6.2-20.gb22bc26-default #1 openSUSE Tumbleweed (unreleased)
> >>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0
> >>>>> 02/06/2015
> >>>>>> RIP: 0008:0x3d2eed95
> >>>>>> Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00 8b 55 0c 6a 00
> >>>>> 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4 10 eb 07 <89> 03
> >>>>> be 05 00 00 80 a1 74 63 31 3d 83 c0 48 e8 44 d2 ff ff eb 05
> >>>>>> RSP: 0018:000000000fd66fa0 EFLAGS: 00010002
> >>>>>> RAX: 0000000000000001 RBX: 000000001557ee88 RCX: 000000003d1f1120
> >>>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
> >>>>>> RBP: 000000000fd66fd8 R08: 000000001557ee88 R09: 0000000000000000
> >>>>>> R10: 0000000000000055 R11: 0000000000000000 R12: 0000000015bcf000
> >>>>>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >>>>>> FS: 00007f36ee9dc940(0000) GS:ffff9b903d700000(0000)
> >>>>> knlGS:0000000000000000
> >>>>>> CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
> >>>>>> CR2: 000000001557ee88 CR3: 000000000fd5e000 CR4: 00000000000006e0
> >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>> Call Trace:
> >>>>>> Modules linked in: efivarfs
> >>>>>> CR2: 000000001557ee88
> >>>>>
> >>>>> EFI apparently tries to write to now read-only memory.
> >>>>>
> >>>>> See:
> >>>>> https://bugzilla.suse.com/show_bug.cgi?id=1168645
> >>>>>
> >>>>> Reverting it on the top of 5.6 fixes the issue.
> >>>>>
> >>>>> I am using
> >>>>> /usr/share/qemu/ovmf-ia32-code.bin
> >>>>> /usr/share/qemu/ovmf-ia32-vars.bin
> >>>>> from qemu-ovmf-ia32-202002-1.1.noarch rpm.
> >>>>>
> >>>>
> >>>> Do you have a git tree for Suse's OVMF fork? I did a lot of testing
> >>>> with upstream OVMF, and never ran into this issue.
> >>>
> >>> Not really a git tree, but the sources are here:
> >>> https://build.opensuse.org/package/show/openSUSE:Factory/ovmf
> >>>
> >>
> >>
> >> Anywhere I can get an actual build? The src rpm only has the sources,
> >> and the i586 rpm has nothing except
> >>
> >> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> >> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> >> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> >> /usr/share/doc/packages/ovmf
> >> /usr/share/doc/packages/ovmf/README
> >
> > Hmmm, it's weird that OBS doesn't list all derived files.
> > Anyway, the ia32 ovmf is available in
> > http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
>
> It indeed does:
> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
>
> Note that the ia32 version is noarch, built on i586.
>

I am not able to reproduce this issue using the linked firmware image
and a 5.6 x86_64_defconfig with efivarfs built in.

Could anyone share the full log, please, along with the kernel config
that was used? Also, it would be good to know if it is reproducible
using a kernel built from upstream.

2020-04-09 08:36:05

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
>>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
>>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
>>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
>>>> /usr/share/doc/packages/ovmf
>>>> /usr/share/doc/packages/ovmf/README
>>>
>>> Hmmm, it's weird that OBS doesn't list all derived files.
>>> Anyway, the ia32 ovmf is available in
>>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
>>
>> It indeed does:
>> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
>>
>> Note that the ia32 version is noarch, built on i586.
>>
>
> I am not able to reproduce this issue using the linked firmware image
> and a 5.6 x86_64_defconfig with efivarfs built in.

Yeah, I had to use the distro config too. Not sure what the trigger is.
Maybe some NUMA configs or something.

> Could anyone share the full log, please, along with the kernel config
> that was used?

Both uploaded:
http://decibel.fi.muni.cz/~xslaby/err/

Note that I switched the for-me-necessary =m configs to =y. So that it
is enough to build bzImage, w/o modules...

> Also, it would be good to know if it is reproducible
> using a kernel built from upstream.

Sure, I was bisecting the upstream kernel:
git bisect start
# bad: [7111951b8d4973bda27ff663f2cf18b663d15b48] Linux 5.6
git bisect bad 7111951b8d4973bda27ff663f2cf18b663d15b48
# good: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
git bisect good d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
# skip: [9f68e3655aae6d49d6ba05dd263f99f33c2567af] Merge tag
'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm
git bisect skip 9f68e3655aae6d49d6ba05dd263f99f33c2567af
# good: [b4a4bd0f2629ec2ece7690de1b4721529da29871] irqchip/gic-v4.1: Add
VPE INVALL callback
git bisect good b4a4bd0f2629ec2ece7690de1b4721529da29871
# good: [c130d2dc93cd03323494d82dbe7b5fb0d101ab62] rcu: Rename some
instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
git bisect good c130d2dc93cd03323494d82dbe7b5fb0d101ab62
# good: [0aee99a1ea53de1aedcf96a4d52d6161ffba011a] iio: gyro: adis16136:
rework locks using ADIS library's state lock
git bisect good 0aee99a1ea53de1aedcf96a4d52d6161ffba011a
# bad: [83576e32a71717d1912b7dcb247a0f15613272da] Merge branch
'macb-TSO-bug-fixes'
git bisect bad 83576e32a71717d1912b7dcb247a0f15613272da
# bad: [7ba31c3f2f1ee095d8126f4d3757fc3b2bc3c838] Merge tag
'staging-5.6-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect bad 7ba31c3f2f1ee095d8126f4d3757fc3b2bc3c838
# good: [f76e4c167ea2212e23c15ee7e601a865e822c291] net: phy: add default
ARCH_BCM_IPROC for MDIO_BCM_IPROC
git bisect good f76e4c167ea2212e23c15ee7e601a865e822c291
# bad: [bd2463ac7d7ec51d432f23bf0e893fb371a908cd] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad bd2463ac7d7ec51d432f23bf0e893fb371a908cd
# good: [e279160f491392f1345f6eb4b0eeec5a6a2ecdd7] Merge tag
'timers-core-2020-01-27' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e279160f491392f1345f6eb4b0eeec5a6a2ecdd7
# bad: [511fdb78442229ac11057b4a55c3f03c253c062f] Merge branch
'x86-mtrr-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 511fdb78442229ac11057b4a55c3f03c253c062f
# bad: [2180f214f4a5d8e2d8b7138d9a59246ee05753b9] Merge branch
'locking-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 2180f214f4a5d8e2d8b7138d9a59246ee05753b9
# good: [d99391ec2b42d827d92003dcdcb96fadac9d862b] Merge branch
'core-rcu-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d99391ec2b42d827d92003dcdcb96fadac9d862b
# good: [57ad87ddce79b6d54f8e442d0ecf4b5bbe8c5a9e] Merge branch 'x86/mm'
into efi/core, to pick up dependencies
git bisect good 57ad87ddce79b6d54f8e442d0ecf4b5bbe8c5a9e
# good: [ac3c76cc6d6deef573dd8c14232f20c6aa744f83] efi/libstub/x86: Use
mandatory 16-byte stack alignment in mixed mode
git bisect good ac3c76cc6d6deef573dd8c14232f20c6aa744f83
# bad: [484a418d075488c6999528247cc711d12c373447] efi: Fix handling of
multiple efi_fake_mem= entries
git bisect bad 484a418d075488c6999528247cc711d12c373447
# bad: [1f299fad1e312947c974c6a1d8a3a484f27a6111] efi/x86: Limit EFI old
memory map to SGI UV machines
git bisect bad 1f299fad1e312947c974c6a1d8a3a484f27a6111
# good: [75fbef0a8b6b4bb19b9a91b5214f846c2dc5139e] x86/mm: Fix NX bit
clearing issue in kernel_map_pages_in_pgd
git bisect good 75fbef0a8b6b4bb19b9a91b5214f846c2dc5139e
# bad: [97bb9cdc32108036170d9d0d208257168f80d9e9] efi/x86: Avoid RWX
mappings for all of DRAM
git bisect bad 97bb9cdc32108036170d9d0d208257168f80d9e9
# bad: [d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c] efi/x86: Don't map the
entire kernel text RW for mixed mode
git bisect bad d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c
# first bad commit: [d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c] efi/x86:
Don't map the entire kernel text RW for mixed mode

thanks,
--
js
suse labs

2020-04-09 09:12:30

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
>
> On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
> >>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> >>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> >>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> >>>> /usr/share/doc/packages/ovmf
> >>>> /usr/share/doc/packages/ovmf/README
> >>>
> >>> Hmmm, it's weird that OBS doesn't list all derived files.
> >>> Anyway, the ia32 ovmf is available in
> >>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
> >>
> >> It indeed does:
> >> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
> >>
> >> Note that the ia32 version is noarch, built on i586.
> >>
> >
> > I am not able to reproduce this issue using the linked firmware image
> > and a 5.6 x86_64_defconfig with efivarfs built in.
>
> Yeah, I had to use the distro config too. Not sure what the trigger is.
> Maybe some NUMA configs or something.
>
> > Could anyone share the full log, please, along with the kernel config
> > that was used?
>
> Both uploaded:
> http://decibel.fi.muni.cz/~xslaby/err/
>

With the same config, I am still not seeing the issue.


> Note that I switched the for-me-necessary =m configs to =y. So that it
> is enough to build bzImage, w/o modules...
>

Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?

In the mean time, I will try to install Tumbleweed from scratch. Do
you have any steps I could follow to reproduce your setup?


> > Also, it would be good to know if it is reproducible
> > using a kernel built from upstream.
>
> Sure, I was bisecting the upstream kernel:
> git bisect start
> # bad: [7111951b8d4973bda27ff663f2cf18b663d15b48] Linux 5.6
> git bisect bad 7111951b8d4973bda27ff663f2cf18b663d15b48
> # good: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
> git bisect good d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
> # skip: [9f68e3655aae6d49d6ba05dd263f99f33c2567af] Merge tag
> 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm
> git bisect skip 9f68e3655aae6d49d6ba05dd263f99f33c2567af
> # good: [b4a4bd0f2629ec2ece7690de1b4721529da29871] irqchip/gic-v4.1: Add
> VPE INVALL callback
> git bisect good b4a4bd0f2629ec2ece7690de1b4721529da29871
> # good: [c130d2dc93cd03323494d82dbe7b5fb0d101ab62] rcu: Rename some
> instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
> git bisect good c130d2dc93cd03323494d82dbe7b5fb0d101ab62
> # good: [0aee99a1ea53de1aedcf96a4d52d6161ffba011a] iio: gyro: adis16136:
> rework locks using ADIS library's state lock
> git bisect good 0aee99a1ea53de1aedcf96a4d52d6161ffba011a
> # bad: [83576e32a71717d1912b7dcb247a0f15613272da] Merge branch
> 'macb-TSO-bug-fixes'
> git bisect bad 83576e32a71717d1912b7dcb247a0f15613272da
> # bad: [7ba31c3f2f1ee095d8126f4d3757fc3b2bc3c838] Merge tag
> 'staging-5.6-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect bad 7ba31c3f2f1ee095d8126f4d3757fc3b2bc3c838
> # good: [f76e4c167ea2212e23c15ee7e601a865e822c291] net: phy: add default
> ARCH_BCM_IPROC for MDIO_BCM_IPROC
> git bisect good f76e4c167ea2212e23c15ee7e601a865e822c291
> # bad: [bd2463ac7d7ec51d432f23bf0e893fb371a908cd] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect bad bd2463ac7d7ec51d432f23bf0e893fb371a908cd
> # good: [e279160f491392f1345f6eb4b0eeec5a6a2ecdd7] Merge tag
> 'timers-core-2020-01-27' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good e279160f491392f1345f6eb4b0eeec5a6a2ecdd7
> # bad: [511fdb78442229ac11057b4a55c3f03c253c062f] Merge branch
> 'x86-mtrr-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 511fdb78442229ac11057b4a55c3f03c253c062f
> # bad: [2180f214f4a5d8e2d8b7138d9a59246ee05753b9] Merge branch
> 'locking-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 2180f214f4a5d8e2d8b7138d9a59246ee05753b9
> # good: [d99391ec2b42d827d92003dcdcb96fadac9d862b] Merge branch
> 'core-rcu-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good d99391ec2b42d827d92003dcdcb96fadac9d862b
> # good: [57ad87ddce79b6d54f8e442d0ecf4b5bbe8c5a9e] Merge branch 'x86/mm'
> into efi/core, to pick up dependencies
> git bisect good 57ad87ddce79b6d54f8e442d0ecf4b5bbe8c5a9e
> # good: [ac3c76cc6d6deef573dd8c14232f20c6aa744f83] efi/libstub/x86: Use
> mandatory 16-byte stack alignment in mixed mode
> git bisect good ac3c76cc6d6deef573dd8c14232f20c6aa744f83
> # bad: [484a418d075488c6999528247cc711d12c373447] efi: Fix handling of
> multiple efi_fake_mem= entries
> git bisect bad 484a418d075488c6999528247cc711d12c373447
> # bad: [1f299fad1e312947c974c6a1d8a3a484f27a6111] efi/x86: Limit EFI old
> memory map to SGI UV machines
> git bisect bad 1f299fad1e312947c974c6a1d8a3a484f27a6111
> # good: [75fbef0a8b6b4bb19b9a91b5214f846c2dc5139e] x86/mm: Fix NX bit
> clearing issue in kernel_map_pages_in_pgd
> git bisect good 75fbef0a8b6b4bb19b9a91b5214f846c2dc5139e
> # bad: [97bb9cdc32108036170d9d0d208257168f80d9e9] efi/x86: Avoid RWX
> mappings for all of DRAM
> git bisect bad 97bb9cdc32108036170d9d0d208257168f80d9e9
> # bad: [d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c] efi/x86: Don't map the
> entire kernel text RW for mixed mode
> git bisect bad d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c
> # first bad commit: [d9e3d2c4f103200d87f2c243a84c1fd3b3bfea8c] efi/x86:
> Don't map the entire kernel text RW for mixed mode
>
> thanks,
> --
> js
> suse labs

2020-04-09 09:46:07

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 11:09, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
> >
> > On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
> > >>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> > >>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> > >>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> > >>>> /usr/share/doc/packages/ovmf
> > >>>> /usr/share/doc/packages/ovmf/README
> > >>>
> > >>> Hmmm, it's weird that OBS doesn't list all derived files.
> > >>> Anyway, the ia32 ovmf is available in
> > >>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
> > >>
> > >> It indeed does:
> > >> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
> > >>
> > >> Note that the ia32 version is noarch, built on i586.
> > >>
> > >
> > > I am not able to reproduce this issue using the linked firmware image
> > > and a 5.6 x86_64_defconfig with efivarfs built in.
> >
> > Yeah, I had to use the distro config too. Not sure what the trigger is.
> > Maybe some NUMA configs or something.
> >
> > > Could anyone share the full log, please, along with the kernel config
> > > that was used?
> >
> > Both uploaded:
> > http://decibel.fi.muni.cz/~xslaby/err/
> >
>
> With the same config, I am still not seeing the issue.
>
>
> > Note that I switched the for-me-necessary =m configs to =y. So that it
> > is enough to build bzImage, w/o modules...
> >
>
> Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?
>
> In the mean time, I will try to install Tumbleweed from scratch. Do
> you have any steps I could follow to reproduce your setup?
>

The faulting code decodes to

2a:* 89 03 mov %eax,(%ebx) <-- trapping instruction
2c: be 05 00 00 80 mov $0x80000005,%esi
31: a1 74 63 31 3d mov 0x3d316374,%eax
36: 83 c0 48 add $0x48,%eax
39: e8 44 d2 ff ff call 0xffffd282
3e: eb 05 jmp 0x45

which looks suspiciously like

MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c-2390- } else {
MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c-2391-
*DataSize = VarDataSize;
MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c:2392: Status
= EFI_BUFFER_TOO_SMALL;
MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c-2393- goto Done;
MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c-2394- }

in EDK2, so it seems like the GetVariable() is being called with a
datasize pointer that resides in r/o memory

It would be very helpful if we could get the memory dump that
CONFIG_EFI_PGT_DUMP=y provides, as well as some idea of the call stack
that performs get GetVariable() call at this point. The mixed mode
code just passes the address it gets, but it does translate it to a
physical address, which means it accesses the memory via a different
mapping.

2020-04-09 10:10:40

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On 09. 04. 20, 11:09, Ard Biesheuvel wrote:
> On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
>>
>> On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
>>>>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
>>>>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
>>>>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
>>>>>> /usr/share/doc/packages/ovmf
>>>>>> /usr/share/doc/packages/ovmf/README
>>>>>
>>>>> Hmmm, it's weird that OBS doesn't list all derived files.
>>>>> Anyway, the ia32 ovmf is available in
>>>>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
>>>>
>>>> It indeed does:
>>>> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
>>>>
>>>> Note that the ia32 version is noarch, built on i586.
>>>>
>>>
>>> I am not able to reproduce this issue using the linked firmware image
>>> and a 5.6 x86_64_defconfig with efivarfs built in.
>>
>> Yeah, I had to use the distro config too. Not sure what the trigger is.
>> Maybe some NUMA configs or something.
>>
>>> Could anyone share the full log, please, along with the kernel config
>>> that was used?
>>
>> Both uploaded:
>> http://decibel.fi.muni.cz/~xslaby/err/
>>
>
> With the same config, I am still not seeing the issue.
>
>
>> Note that I switched the for-me-necessary =m configs to =y. So that it
>> is enough to build bzImage, w/o modules...
>>
>
> Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?

No problem:
http://decibel.fi.muni.cz/~xslaby/err/dmesg2.txt

> In the mean time, I will try to install Tumbleweed from scratch. Do
> you have any steps I could follow to reproduce your setup?

Not really, just installed TW 64-bit and used efi and grub 32 bit.

--
js
suse labs

2020-04-09 10:48:14

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 12:09, Jiri Slaby <[email protected]> wrote:
>
> On 09. 04. 20, 11:09, Ard Biesheuvel wrote:
> > On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
> >>
> >> On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
> >>>>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> >>>>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> >>>>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> >>>>>> /usr/share/doc/packages/ovmf
> >>>>>> /usr/share/doc/packages/ovmf/README
> >>>>>
> >>>>> Hmmm, it's weird that OBS doesn't list all derived files.
> >>>>> Anyway, the ia32 ovmf is available in
> >>>>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
> >>>>
> >>>> It indeed does:
> >>>> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
> >>>>
> >>>> Note that the ia32 version is noarch, built on i586.
> >>>>
> >>>
> >>> I am not able to reproduce this issue using the linked firmware image
> >>> and a 5.6 x86_64_defconfig with efivarfs built in.
> >>
> >> Yeah, I had to use the distro config too. Not sure what the trigger is.
> >> Maybe some NUMA configs or something.
> >>
> >>> Could anyone share the full log, please, along with the kernel config
> >>> that was used?
> >>
> >> Both uploaded:
> >> http://decibel.fi.muni.cz/~xslaby/err/
> >>
> >
> > With the same config, I am still not seeing the issue.
> >
> >
> >> Note that I switched the for-me-necessary =m configs to =y. So that it
> >> is enough to build bzImage, w/o modules...
> >>
> >
> > Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?
>
> No problem:
> http://decibel.fi.muni.cz/~xslaby/err/dmesg2.txt
>
> > In the mean time, I will try to install Tumbleweed from scratch. Do
> > you have any steps I could follow to reproduce your setup?
>
> Not really, just installed TW 64-bit and used efi and grub 32 bit.
>

OK, so you installed using a 64-bit EFI, and then switched to a 32-bit
one? Or is there a special mixed-mode capable installer?

(The ones I found are x86_64 only)

2020-04-09 11:10:13

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 12:45, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 9 Apr 2020 at 12:09, Jiri Slaby <[email protected]> wrote:
> >
> > On 09. 04. 20, 11:09, Ard Biesheuvel wrote:
> > > On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
> > >>
> > >> On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
> > >>>>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> > >>>>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> > >>>>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> > >>>>>> /usr/share/doc/packages/ovmf
> > >>>>>> /usr/share/doc/packages/ovmf/README
> > >>>>>
> > >>>>> Hmmm, it's weird that OBS doesn't list all derived files.
> > >>>>> Anyway, the ia32 ovmf is available in
> > >>>>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
> > >>>>
> > >>>> It indeed does:
> > >>>> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
> > >>>>
> > >>>> Note that the ia32 version is noarch, built on i586.
> > >>>>
> > >>>
> > >>> I am not able to reproduce this issue using the linked firmware image
> > >>> and a 5.6 x86_64_defconfig with efivarfs built in.
> > >>
> > >> Yeah, I had to use the distro config too. Not sure what the trigger is.
> > >> Maybe some NUMA configs or something.
> > >>
> > >>> Could anyone share the full log, please, along with the kernel config
> > >>> that was used?
> > >>
> > >> Both uploaded:
> > >> http://decibel.fi.muni.cz/~xslaby/err/
> > >>
> > >
> > > With the same config, I am still not seeing the issue.
> > >
> > >
> > >> Note that I switched the for-me-necessary =m configs to =y. So that it
> > >> is enough to build bzImage, w/o modules...
> > >>
> > >
> > > Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?
> >
> > No problem:
> > http://decibel.fi.muni.cz/~xslaby/err/dmesg2.txt
> >
> > > In the mean time, I will try to install Tumbleweed from scratch. Do
> > > you have any steps I could follow to reproduce your setup?
> >
> > Not really, just installed TW 64-bit and used efi and grub 32 bit.
> >
>
> OK, so you installed using a 64-bit EFI, and then switched to a 32-bit
> one? Or is there a special mixed-mode capable installer?
>
> (The ones I found are x86_64 only)

OK, I have managed to install tumbleweed into a 64-bit VM with 64-bit GRUB.

Could you give any instructions how to convert to 32-bit GRUB please?

2020-04-09 11:26:27

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 13:08, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 9 Apr 2020 at 12:45, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Thu, 9 Apr 2020 at 12:09, Jiri Slaby <[email protected]> wrote:
> > >
> > > On 09. 04. 20, 11:09, Ard Biesheuvel wrote:
> > > > On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
> > > >>
> > > >> On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
> > > >>>>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> > > >>>>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> > > >>>>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> > > >>>>>> /usr/share/doc/packages/ovmf
> > > >>>>>> /usr/share/doc/packages/ovmf/README
> > > >>>>>
> > > >>>>> Hmmm, it's weird that OBS doesn't list all derived files.
> > > >>>>> Anyway, the ia32 ovmf is available in
> > > >>>>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
> > > >>>>
> > > >>>> It indeed does:
> > > >>>> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
> > > >>>>
> > > >>>> Note that the ia32 version is noarch, built on i586.
> > > >>>>
> > > >>>
> > > >>> I am not able to reproduce this issue using the linked firmware image
> > > >>> and a 5.6 x86_64_defconfig with efivarfs built in.
> > > >>
> > > >> Yeah, I had to use the distro config too. Not sure what the trigger is.
> > > >> Maybe some NUMA configs or something.
> > > >>
> > > >>> Could anyone share the full log, please, along with the kernel config
> > > >>> that was used?
> > > >>
> > > >> Both uploaded:
> > > >> http://decibel.fi.muni.cz/~xslaby/err/
> > > >>
> > > >
> > > > With the same config, I am still not seeing the issue.
> > > >
> > > >
> > > >> Note that I switched the for-me-necessary =m configs to =y. So that it
> > > >> is enough to build bzImage, w/o modules...
> > > >>
> > > >
> > > > Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?
> > >
> > > No problem:
> > > http://decibel.fi.muni.cz/~xslaby/err/dmesg2.txt
> > >
> > > > In the mean time, I will try to install Tumbleweed from scratch. Do
> > > > you have any steps I could follow to reproduce your setup?
> > >
> > > Not really, just installed TW 64-bit and used efi and grub 32 bit.
> > >
> >
> > OK, so you installed using a 64-bit EFI, and then switched to a 32-bit
> > one? Or is there a special mixed-mode capable installer?
> >
> > (The ones I found are x86_64 only)
>
> OK, I have managed to install tumbleweed into a 64-bit VM with 64-bit GRUB.
>
> Could you give any instructions how to convert to 32-bit GRUB please?

Never mind - I managed to switch. I still don't see the issue though :-(

2020-04-09 11:33:18

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 05/13] efi/x86: don't map the entire kernel text RW for mixed mode

On Thu, 9 Apr 2020 at 13:25, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 9 Apr 2020 at 13:08, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Thu, 9 Apr 2020 at 12:45, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Thu, 9 Apr 2020 at 12:09, Jiri Slaby <[email protected]> wrote:
> > > >
> > > > On 09. 04. 20, 11:09, Ard Biesheuvel wrote:
> > > > > On Thu, 9 Apr 2020 at 10:36, Jiri Slaby <[email protected]> wrote:
> > > > >>
> > > > >> On 09. 04. 20, 10:19, Ard Biesheuvel wrote:
> > > > >>>>>> $ rpm -qlp ~/Downloads/ovmf-202002-1.1.i586.rpm
> > > > >>>>>> warning: /home/ardbie01/Downloads/ovmf-202002-1.1.i586.rpm: Header V3
> > > > >>>>>> RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
> > > > >>>>>> /usr/share/doc/packages/ovmf
> > > > >>>>>> /usr/share/doc/packages/ovmf/README
> > > > >>>>>
> > > > >>>>> Hmmm, it's weird that OBS doesn't list all derived files.
> > > > >>>>> Anyway, the ia32 ovmf is available in
> > > > >>>>> http://download.opensuse.org/tumbleweed/repo/oss/noarch/qemu-ovmf-ia32-202002-1.1.noarch.rpm
> > > > >>>>
> > > > >>>> It indeed does:
> > > > >>>> https://build.opensuse.org/package/binaries/openSUSE:Factory/ovmf/standard
> > > > >>>>
> > > > >>>> Note that the ia32 version is noarch, built on i586.
> > > > >>>>
> > > > >>>
> > > > >>> I am not able to reproduce this issue using the linked firmware image
> > > > >>> and a 5.6 x86_64_defconfig with efivarfs built in.
> > > > >>
> > > > >> Yeah, I had to use the distro config too. Not sure what the trigger is.
> > > > >> Maybe some NUMA configs or something.
> > > > >>
> > > > >>> Could anyone share the full log, please, along with the kernel config
> > > > >>> that was used?
> > > > >>
> > > > >> Both uploaded:
> > > > >> http://decibel.fi.muni.cz/~xslaby/err/
> > > > >>
> > > > >
> > > > > With the same config, I am still not seeing the issue.
> > > > >
> > > > >
> > > > >> Note that I switched the for-me-necessary =m configs to =y. So that it
> > > > >> is enough to build bzImage, w/o modules...
> > > > >>
> > > > >
> > > > > Could you please try running it again with CONFIG_EFI_PGT_DUMP=y enabled?
> > > >
> > > > No problem:
> > > > http://decibel.fi.muni.cz/~xslaby/err/dmesg2.txt
> > > >
> > > > > In the mean time, I will try to install Tumbleweed from scratch. Do
> > > > > you have any steps I could follow to reproduce your setup?
> > > >
> > > > Not really, just installed TW 64-bit and used efi and grub 32 bit.
> > > >
> > >
> > > OK, so you installed using a 64-bit EFI, and then switched to a 32-bit
> > > one? Or is there a special mixed-mode capable installer?
> > >
> > > (The ones I found are x86_64 only)
> >
> > OK, I have managed to install tumbleweed into a 64-bit VM with 64-bit GRUB.
> >
> > Could you give any instructions how to convert to 32-bit GRUB please?
>
> Never mind - I managed to switch. I still don't see the issue though :-(

Success!!

[ 2.416076] BUG: unable to handle page fault for address: 0000000028348e88
[ 2.417698] #PF: supervisor write access in kernel mode
[ 2.418697] #PF: error_code(0x0003) - permissions violation
[ 2.419927] PGD fd61063 P4D fd61063 PUD fd62063 PMD 282000e1
[ 2.421131] Oops: 0003 [#1] SMP PTI
[ 2.421857] CPU: 3 PID: 254 Comm: systemd-system- Not tainted
5.6.0-rc4-default+ #19
[ 2.423364] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 0.0.0 02/06/2015
[ 2.424814] RIP: 0008:0x3eaeed95
[ 2.425376] Code: 8b 45 d4 8b 4d 10 8b 40 04 89 01 89 3b 50 6a 00
8b 55 0c 6a 00 8b 45 08 0f b6 4d e4 6a 01 31 f6 e8 ee c5 fc ff 83 c4
10 eb 07 <89> 03 be 05 00 00 80 a1 74 63 b1 3e 83 c0 48 e8 44 d2 ff ff
eb 05
[ 2.428159] RSP: 0018:000000000fd73fa0 EFLAGS: 00010002
[ 2.429012] RAX: 0000000000000001 RBX: 0000000028348e88 RCX: 000000003e9f1120
[ 2.430063] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
[ 2.431459] RBP: 000000000fd73fd8 R08: 0000000028348e88 R09: 0000000000000000
[ 2.433037] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
[ 2.433386] usb 1-2: New USB device found, idVendor=0627,
idProduct=0001, bcdDevice= 0.00
[ 2.434484] R13: ffffa02f80220000 R14: 0000000000000000 R15: 0000000000000000
[ 2.436404] usb 1-2: New USB device strings: Mfr=1, Product=3, SerialNumber=5
[ 2.437438] FS: 00007f121439c940(0000) GS:ffff8accbd580000(0000)
knlGS:0000000000000000
[ 2.438872] usb 1-2: Product: QEMU USB Tablet
[ 2.439986] CS: 0008 DS: 0018 ES: 0018 CR0: 0000000080050033
[ 2.439987] CR2: 0000000028348e88 CR3: 000000000fd6c005 CR4: 00000000003606e0
[ 2.440878] usb 1-2: Manufacturer: QEMU
[ 2.441686] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.442979] usb 1-2: SerialNumber: 42
[ 2.443559] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.443561] Call Trace:
[ 2.447302] Modules linked in:
[ 2.447830] CR2: 0000000028348e88
[ 2.448459] ---[ end trace 604ea22c03fc4e28 ]---