From: Sai Praneeth <[email protected]>
Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.
I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.
Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.
Sai Praneeth (3):
efi: Use efi_mm in x86 as well as ARM
x86/efi: Replace efi_pgd with efi_mm.pgd
x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
arch/x86/include/asm/efi.h | 29 ++++++++++----------
arch/x86/platform/efi/efi_64.c | 52 ++++++++++++++++++++----------------
arch/x86/platform/efi/efi_thunk_64.S | 2 +-
drivers/firmware/efi/arm-runtime.c | 9 -------
drivers/firmware/efi/efi.c | 9 +++++++
include/linux/efi.h | 2 ++
6 files changed, 55 insertions(+), 48 deletions(-)
--
2.1.4
From: Sai Praneeth <[email protected]>
Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ravi Shankar <[email protected]>
---
arch/x86/platform/efi/efi_64.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 8ff1f95627f9..0bb98c35e178 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -187,8 +187,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
}
-static pgd_t *efi_pgd;
-
/*
* We need our own copy of the higher levels of the page tables
* because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -197,7 +195,7 @@ static pgd_t *efi_pgd;
*/
int __init efi_alloc_page_tables(void)
{
- pgd_t *pgd;
+ pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -225,6 +223,8 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
+ efi_mm.pgd = efi_pgd;
+
return 0;
}
@@ -237,6 +237,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+ pgd_t *efi_pgd = efi_mm.pgd;
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -330,13 +331,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
unsigned long pfn, text;
struct page *page;
unsigned npages;
- pgd_t *pgd;
+ pgd_t *pgd = efi_mm.pgd;
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
- pgd = efi_pgd;
+ efi_scratch.efi_pgt = (pgd_t *)__pa(pgd);
/*
* It can happen that the physical address of new_memmap lands in memory
@@ -400,7 +400,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
{
unsigned long flags = _PAGE_RW;
unsigned long pfn;
- pgd_t *pgd = efi_pgd;
+ pgd_t *pgd = efi_mm.pgd;
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -501,7 +501,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
{
unsigned long pfn;
- pgd_t *pgd = efi_pgd;
+ pgd_t *pgd = efi_mm.pgd;
int err1, err2;
/* Update the 1:1 mapping */
@@ -592,7 +592,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
- ptdump_walk_pgd_level(NULL, efi_pgd);
+ ptdump_walk_pgd_level(NULL, efi_mm.pgd);
#endif
}
--
2.1.4
From: Sai Praneeth <[email protected]>
Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for
x86, using this data structure improves code maintainability and
readability.
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ravi Shankar <[email protected]>
---
drivers/firmware/efi/arm-runtime.c | 9 ---------
drivers/firmware/efi/efi.c | 9 +++++++++
include/linux/efi.h | 2 ++
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
extern u64 efi_system_table;
-static struct mm_struct efi_mm = {
- .mm_rb = RB_ROOT,
- .mm_users = ATOMIC_INIT(2),
- .mm_count = ATOMIC_INIT(1),
- .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
- .page_table_lock = __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
- .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
#ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
#include <asm/ptdump.h>
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b372aad3b449..3abbb25602bc 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,15 @@ struct efi __read_mostly efi = {
};
EXPORT_SYMBOL(efi);
+struct mm_struct efi_mm = {
+ .mm_rb = RB_ROOT,
+ .mm_users = ATOMIC_INIT(2),
+ .mm_count = ATOMIC_INIT(1),
+ .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+ .page_table_lock = __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+ .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8269bcb8ccf7..d1f261d2ce69 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -927,6 +927,8 @@ extern struct efi {
unsigned long flags;
} efi;
+extern struct mm_struct efi_mm;
+
static inline int
efi_guidcmp (efi_guid_t left, efi_guid_t right)
{
--
2.1.4
From: Sai Praneeth <[email protected]>
Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()
Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Cc: Lee, Chun-Yi <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ravi Shankar <[email protected]>
---
arch/x86/include/asm/efi.h | 29 ++++++++++++++---------------
arch/x86/platform/efi/efi_64.c | 36 +++++++++++++++++++++---------------
arch/x86/platform/efi/efi_thunk_64.S | 2 +-
3 files changed, 36 insertions(+), 31 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 2f77bcefe6b4..23b2137a95e5 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -1,10 +1,14 @@
#ifndef _ASM_X86_EFI_H
#define _ASM_X86_EFI_H
+#include <linux/sched/mm.h>
+#include <linux/sched/task.h>
+
#include <asm/fpu/api.h>
#include <asm/pgtable.h>
#include <asm/processor-flags.h>
#include <asm/tlb.h>
+#include <asm/mmu_context.h>
/*
* We map the EFI regions needed for runtime services non-contiguously,
@@ -57,14 +61,13 @@ extern u64 asmlinkage efi_call(void *fp, ...);
#define efi_call_phys(f, args...) efi_call((f), args)
/*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm: store/restore stolen mm_struct while switching to/from efi_mm
*/
struct efi_scratch {
- u64 r15;
- u64 prev_cr3;
- pgd_t *efi_pgt;
- bool use_pgd;
- u64 phys_stack;
+ u64 phys_stack;
+ struct mm_struct *prev_mm;
} __packed;
#define arch_efi_call_virt_setup() \
@@ -73,11 +76,8 @@ struct efi_scratch {
preempt_disable(); \
__kernel_fpu_begin(); \
\
- if (efi_scratch.use_pgd) { \
- efi_scratch.prev_cr3 = read_cr3(); \
- write_cr3((unsigned long)efi_scratch.efi_pgt); \
- __flush_tlb_all(); \
- } \
+ if (!efi_enabled(EFI_OLD_MEMMAP)) \
+ efi_switch_mm(&efi_mm); \
})
#define arch_efi_call_virt(p, f, args...) \
@@ -85,10 +85,8 @@ struct efi_scratch {
#define arch_efi_call_virt_teardown() \
({ \
- if (efi_scratch.use_pgd) { \
- write_cr3(efi_scratch.prev_cr3); \
- __flush_tlb_all(); \
- } \
+ if (!efi_enabled(EFI_OLD_MEMMAP)) \
+ efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable(); \
@@ -130,6 +128,7 @@ extern void __init efi_dump_pagetable(void);
extern void __init efi_apply_memmap_quirks(void);
extern int __init efi_reuse_config(u64 tables, int nr_tables);
extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 0bb98c35e178..e0545f56d703 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -80,9 +80,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
if (!efi_enabled(EFI_OLD_MEMMAP)) {
- save_pgd = (pgd_t *)read_cr3();
- write_cr3((unsigned long)efi_scratch.efi_pgt);
- goto out;
+ efi_switch_mm(&efi_mm);
+ return NULL;
}
early_code_mapping_set_exec(1);
@@ -152,8 +151,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
pud_t *pud;
if (!efi_enabled(EFI_OLD_MEMMAP)) {
- write_cr3((unsigned long)save_pgd);
- __flush_tlb_all();
+ efi_switch_mm(efi_scratch.prev_mm);
return;
}
@@ -336,8 +334,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(pgd);
-
/*
* It can happen that the physical address of new_memmap lands in memory
* which is not mapped in the EFI page table. Therefore we need to go
@@ -350,8 +346,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
return 1;
}
- efi_scratch.use_pgd = true;
-
/*
* Certain firmware versions are way too sentimential and still believe
* they are exclusive and unquestionable owners of the first physical page,
@@ -596,6 +590,22 @@ void __init efi_dump_pagetable(void)
#endif
}
+/*
+ * Makes the calling thread switch to/from efi_mm context. Can be used
+ * for SetVirtualAddressMap() i.e. current->active_mm == init_mm as well
+ * as during efi runtime calls i.e current->active_mm == current_mm.
+ * We are not mm_dropping()/mm_grabbing() any mm, because we are not
+ * losing/creating any references.
+ */
+void efi_switch_mm(struct mm_struct *mm)
+{
+ task_lock(current);
+ efi_scratch.prev_mm = current->active_mm;
+ current->active_mm = mm;
+ switch_mm(efi_scratch.prev_mm, mm, NULL);
+ task_unlock(current);
+}
+
#ifdef CONFIG_EFI_MIXED
extern efi_status_t efi64_thunk(u32, ...);
@@ -649,17 +659,13 @@ efi_status_t efi_thunk_set_virtual_address_map(
efi_sync_low_kernel_mappings();
local_irq_save(flags);
- efi_scratch.prev_cr3 = read_cr3();
- write_cr3((unsigned long)efi_scratch.efi_pgt);
- __flush_tlb_all();
+ efi_switch_mm(&efi_mm);
func = (u32)(unsigned long)phys_set_virtual_address_map;
status = efi64_thunk(func, memory_map_size, descriptor_size,
descriptor_version, virtual_map);
- write_cr3(efi_scratch.prev_cr3);
- __flush_tlb_all();
- local_irq_restore(flags);
+ efi_switch_mm(efi_scratch.prev_mm);
return status;
}
diff --git a/arch/x86/platform/efi/efi_thunk_64.S b/arch/x86/platform/efi/efi_thunk_64.S
index ff85d28c50f2..5cdc72ebbc82 100644
--- a/arch/x86/platform/efi/efi_thunk_64.S
+++ b/arch/x86/platform/efi/efi_thunk_64.S
@@ -32,7 +32,7 @@ ENTRY(efi64_thunk)
* Switch to 1:1 mapped 32-bit stack pointer.
*/
movq %rsp, efi_saved_sp(%rip)
- movq efi_scratch+25(%rip), %rsp
+ movq efi_scratch(%rip), %rsp
/*
* Calculate the physical address of the kernel text.
--
2.1.4
Hi Sai,
On Tue, Aug 29, 2017 at 5:07 AM, Sai Praneeth Prakhya
<[email protected]> wrote:
> From: Sai Praneeth <[email protected]>
>
> Presently, in x86, to invoke any efi function like
> efi_set_virtual_address_map() or any efi_runtime_service() the code path
> typically involves read_cr3() (save previous pgd), write_cr3()
> (write efi_pgd) and calling efi function. Likewise after returning from
> efi function the code path typically involves read_cr3() (save efi_pgd),
> write_cr3() (write previous pgd). We do this couple of times in efi
> subsystem of Linux kernel, instead we can use helper function
> efi_switch_mm() to do this. This improves readability and maintainability.
> Also, instead of maintaining a separate struct "efi_scratch" to store/restore
> efi_pgd, we can use mm_struct to do this.
>
> I have tested this patch set against LUV (Linux UEFI Validation), so I
> think I didn't break any existing configurations. I have tested this
> patch set for
> 1. x86_64,
> 2. x86_32,
> 3. Mixed mode
> with efi=old_map and for kexec kernel. Please let me know if I have
> missed any other configurations.
>
> Changes in V2:
> 1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
> as we are not losing/creating any references.
>
> Sai Praneeth (3):
> efi: Use efi_mm in x86 as well as ARM
> x86/efi: Replace efi_pgd with efi_mm.pgd
> x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
>
> arch/x86/include/asm/efi.h | 29 ++++++++++----------
> arch/x86/platform/efi/efi_64.c | 52 ++++++++++++++++++++----------------
> arch/x86/platform/efi/efi_thunk_64.S | 2 +-
> drivers/firmware/efi/arm-runtime.c | 9 -------
> drivers/firmware/efi/efi.c | 9 +++++++
> include/linux/efi.h | 2 ++
> 6 files changed, 55 insertions(+), 48 deletions(-)
>
> --
> 2.1.4
>
Thanks for this v2.
Introducing the 'efi_switch_mm() ' helper instead of manually
twiddling with %cr3 seems more cleaner.
I have tested this patchset on a x86_64 machine with the following
configurations:
1. Primary kernel boot with efi=old_map
2. Primary kernel boot with new efi map
While it seems that efi=old_map passes, the new efi map boot fails for
me on both the two x86 machine (Dell 3050MT and a SGI - UV300 machine.
It seems we are hitting a NULL pointer deference in
'efi_call_phys_prolog' while accessing '&efi_mm'.
Please see the log below for details:
[ 0.020006] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 0.021000] IP: switch_mm_irqs_off+0x44/0x270
[ 0.021000] PGD 0
[ 0.021000] P4D 0
[ 0.021000]
[ 0.021000] Oops: 0002 [#1] SMP
[ 0.021000] Modules linked in:
[ 0.021000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc7+ #8
[ 0.021000] Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series
BIOS 05/25/2016
[ 0.021000] task: ffffffffb0e104c0 task.stack: ffffffffb0e00000
[ 0.021000] RIP: 0010:switch_mm_irqs_off+0x44/0x270
[ 0.021000] RSP: 0000:ffffffffb0e035d0 EFLAGS: 00010083
[ 0.021000] RAX: 0000000000000000 RBX: ffffffffb0fbe620 RCX: ffffffffb0e61348
[ 0.021000] RDX: ffffffffb0e104c0 RSI: ffffffffb0fbe620 RDI: ffffffffb0f14660
[ 0.021000] RBP: ffffffffb0e035f0 R08: 65202c3120676f6c R09: 0000000000000189
[ 0.021000] R10: 30313d6d6d5f6966 R11: 3432303331363936 R12: ffffffffb0f14660
[ 0.021000] R13: 0000000000000000 R14: 0000000000000bb0 R15: 0000000000000450
[ 0.021000] FS: 0000000000000000(0000) GS:ffff8bf2c3600000(0000)
knlGS:0000000000000000
[ 0.021000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.021000] CR2: 0000000000000000 CR3: 0000005b43e09000 CR4: 00000000000406b0
[ 0.021000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.021000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.021000] Call Trace:
[ 0.021000] switch_mm+0x20/0x30
[ 0.021000] efi_switch_mm+0x49/0x60
[ 0.021000] efi_call_phys_prolog+0x56/0x1b3
[ 0.021000] efi_enter_virtual_mode+0x3a9/0x520
[ 0.021000] start_kernel+0x424/0x4c8
[ 0.021000] ? set_init_arg+0x5a/0x5a
[ 0.021000] ? early_idt_handler_array+0x120/0x120
[ 0.021000] x86_64_start_reservations+0x29/0x2b
[ 0.021000] x86_64_start_kernel+0x151/0x174
[ 0.021000] secondary_startup_64+0x9f/0x9f
[ 0.021000] Code: 2d 82 51 d9 4f 65 c7 05 0f 65 da 4f 01 00 00 00
48 39 f7 0f 84 14 01 00 00 65 48 89 35 f6 64 da 4f 48 8b 86 e8 02 00
00 45 89 ed <f0> 4c 0f ab 28 bf 00 00 00 80 48 03 7e 50 48 8b 05 27 b0
b9 00
[ 0.021000] RIP: switch_mm_irqs_off+0x44/0x270 RSP: ffffffffb0e035d0
[ 0.021000] CR2: 0000000000000000
[ 0.021000] ---[ end trace fb94349305e1cb8b ]---
[ 0.021000] Kernel panic - not syncing: Fatal exception
[ 0.021000] ---[ end Kernel panic - not syncing: Fatal exception
Regards,
Bhupesh
On Sat, Sep 2, 2017 at 7:38 PM, Bhupesh Sharma <[email protected]> wrote:
> Hi Sai,
>
> On Tue, Aug 29, 2017 at 5:07 AM, Sai Praneeth Prakhya
> <[email protected]> wrote:
>> From: Sai Praneeth <[email protected]>
>>
>> Presently, in x86, to invoke any efi function like
>> efi_set_virtual_address_map() or any efi_runtime_service() the code path
>> typically involves read_cr3() (save previous pgd), write_cr3()
>> (write efi_pgd) and calling efi function. Likewise after returning from
>> efi function the code path typically involves read_cr3() (save efi_pgd),
>> write_cr3() (write previous pgd). We do this couple of times in efi
>> subsystem of Linux kernel, instead we can use helper function
>> efi_switch_mm() to do this. This improves readability and maintainability.
>> Also, instead of maintaining a separate struct "efi_scratch" to store/restore
>> efi_pgd, we can use mm_struct to do this.
>>
>> I have tested this patch set against LUV (Linux UEFI Validation), so I
>> think I didn't break any existing configurations. I have tested this
>> patch set for
>> 1. x86_64,
>> 2. x86_32,
>> 3. Mixed mode
>> with efi=old_map and for kexec kernel. Please let me know if I have
>> missed any other configurations.
>>
>> Changes in V2:
>> 1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
>> as we are not losing/creating any references.
>>
>> Sai Praneeth (3):
>> efi: Use efi_mm in x86 as well as ARM
>> x86/efi: Replace efi_pgd with efi_mm.pgd
>> x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
>>
>> arch/x86/include/asm/efi.h | 29 ++++++++++----------
>> arch/x86/platform/efi/efi_64.c | 52 ++++++++++++++++++++----------------
>> arch/x86/platform/efi/efi_thunk_64.S | 2 +-
>> drivers/firmware/efi/arm-runtime.c | 9 -------
>> drivers/firmware/efi/efi.c | 9 +++++++
>> include/linux/efi.h | 2 ++
>> 6 files changed, 55 insertions(+), 48 deletions(-)
>>
>> --
>> 2.1.4
>>
>
> Thanks for this v2.
> Introducing the 'efi_switch_mm() ' helper instead of manually
> twiddling with %cr3 seems more cleaner.
>
> I have tested this patchset on a x86_64 machine with the following
> configurations:
>
> 1. Primary kernel boot with efi=old_map
> 2. Primary kernel boot with new efi map
>
> While it seems that efi=old_map passes, the new efi map boot fails for
> me on both the two x86 machine (Dell 3050MT and a SGI - UV300 machine.
>
> It seems we are hitting a NULL pointer deference in
> 'efi_call_phys_prolog' while accessing '&efi_mm'.
>
> Please see the log below for details:
>
> [ 0.020006] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 0.021000] IP: switch_mm_irqs_off+0x44/0x270
> [ 0.021000] PGD 0
> [ 0.021000] P4D 0
> [ 0.021000]
> [ 0.021000] Oops: 0002 [#1] SMP
> [ 0.021000] Modules linked in:
> [ 0.021000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc7+ #8
> [ 0.021000] Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series
> BIOS 05/25/2016
> [ 0.021000] task: ffffffffb0e104c0 task.stack: ffffffffb0e00000
> [ 0.021000] RIP: 0010:switch_mm_irqs_off+0x44/0x270
> [ 0.021000] RSP: 0000:ffffffffb0e035d0 EFLAGS: 00010083
> [ 0.021000] RAX: 0000000000000000 RBX: ffffffffb0fbe620 RCX: ffffffffb0e61348
> [ 0.021000] RDX: ffffffffb0e104c0 RSI: ffffffffb0fbe620 RDI: ffffffffb0f14660
> [ 0.021000] RBP: ffffffffb0e035f0 R08: 65202c3120676f6c R09: 0000000000000189
> [ 0.021000] R10: 30313d6d6d5f6966 R11: 3432303331363936 R12: ffffffffb0f14660
> [ 0.021000] R13: 0000000000000000 R14: 0000000000000bb0 R15: 0000000000000450
> [ 0.021000] FS: 0000000000000000(0000) GS:ffff8bf2c3600000(0000)
> knlGS:0000000000000000
> [ 0.021000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.021000] CR2: 0000000000000000 CR3: 0000005b43e09000 CR4: 00000000000406b0
> [ 0.021000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.021000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 0.021000] Call Trace:
> [ 0.021000] switch_mm+0x20/0x30
> [ 0.021000] efi_switch_mm+0x49/0x60
> [ 0.021000] efi_call_phys_prolog+0x56/0x1b3
> [ 0.021000] efi_enter_virtual_mode+0x3a9/0x520
> [ 0.021000] start_kernel+0x424/0x4c8
> [ 0.021000] ? set_init_arg+0x5a/0x5a
> [ 0.021000] ? early_idt_handler_array+0x120/0x120
> [ 0.021000] x86_64_start_reservations+0x29/0x2b
> [ 0.021000] x86_64_start_kernel+0x151/0x174
> [ 0.021000] secondary_startup_64+0x9f/0x9f
> [ 0.021000] Code: 2d 82 51 d9 4f 65 c7 05 0f 65 da 4f 01 00 00 00
> 48 39 f7 0f 84 14 01 00 00 65 48 89 35 f6 64 da 4f 48 8b 86 e8 02 00
> 00 45 89 ed <f0> 4c 0f ab 28 bf 00 00 00 80 48 03 7e 50 48 8b 05 27 b0
> b9 00
> [ 0.021000] RIP: switch_mm_irqs_off+0x44/0x270 RSP: ffffffffb0e035d0
> [ 0.021000] CR2: 0000000000000000
> [ 0.021000] ---[ end trace fb94349305e1cb8b ]---
> [ 0.021000] Kernel panic - not syncing: Fatal exception
> [ 0.021000] ---[ end Kernel panic - not syncing: Fatal exception
>
And I forgot to mention that I tried the patchset both with the
efi/next and linus's trees and saw the same result.
I would be happy to help in case you need further details of the test
environment or need help in testing this crash further.
Regards,
Bhupesh
> >
> > Thanks for this v2.
> > Introducing the 'efi_switch_mm() ' helper instead of manually
> > twiddling with %cr3 seems more cleaner.
> >
> > I have tested this patchset on a x86_64 machine with the following
> > configurations:
> >
> > 1. Primary kernel boot with efi=old_map 2. Primary kernel boot with
> > new efi map
> >
> > While it seems that efi=old_map passes, the new efi map boot fails for
> > me on both the two x86 machine (Dell 3050MT and a SGI - UV300 machine.
> >
> > It seems we are hitting a NULL pointer deference in
> > 'efi_call_phys_prolog' while accessing '&efi_mm'.
> >
> > Please see the log below for details:
> >
> > [ 0.020006] BUG: unable to handle kernel NULL pointer dereference
> > at (null)
> > [ 0.021000] IP: switch_mm_irqs_off+0x44/0x270
> > [ 0.021000] Call Trace:
> > [ 0.021000] switch_mm+0x20/0x30
> > [ 0.021000] efi_switch_mm+0x49/0x60
> > [ 0.021000] efi_call_phys_prolog+0x56/0x1b3
> > [ 0.021000] efi_enter_virtual_mode+0x3a9/0x520
> > [ 0.021000] start_kernel+0x424/0x4c8
> > [ 0.021000] ? set_init_arg+0x5a/0x5a
> > [ 0.021000] ? early_idt_handler_array+0x120/0x120
> > [ 0.021000] x86_64_start_reservations+0x29/0x2b
> > [ 0.021000] x86_64_start_kernel+0x151/0x174
> > [ 0.021000] secondary_startup_64+0x9f/0x9f
> > [ 0.021000] Code: 2d 82 51 d9 4f 65 c7 05 0f 65 da 4f 01 00 00 00
> > 48 39 f7 0f 84 14 01 00 00 65 48 89 35 f6 64 da 4f 48 8b 86 e8 02 00
> > 00 45 89 ed <f0> 4c 0f ab 28 bf 00 00 00 80 48 03 7e 50 48 8b 05 27 b0
> > b9 00
> > [ 0.021000] RIP: switch_mm_irqs_off+0x44/0x270 RSP: ffffffffb0e035d0
> > [ 0.021000] CR2: 0000000000000000
> > [ 0.021000] ---[ end trace fb94349305e1cb8b ]---
> > [ 0.021000] Kernel panic - not syncing: Fatal exception
> > [ 0.021000] ---[ end Kernel panic - not syncing: Fatal exception
> >
>
> And I forgot to mention that I tried the patchset both with the efi/next and
> linus's trees and saw the same result.
>
> I would be happy to help in case you need further details of the test environment
> or need help in testing this crash further.
>
> Regards,
> Bhupesh
Hi Bhupesh,
Thanks for trying the patches and raising the concern.
Could you also please also give a try on qemu? (if reproducible, we will be having a common platform to start debugging)
I have tested this patch set on qemu and real machines (different from one's you tried) in our lab and didn’t notice this issue.
Regards,
Sai
Hi Sai,
On Sun, Sep 3, 2017 at 1:16 PM, Prakhya, Sai Praneeth
<[email protected]> wrote:
>> >
>> > Thanks for this v2.
>> > Introducing the 'efi_switch_mm() ' helper instead of manually
>> > twiddling with %cr3 seems more cleaner.
>> >
>> > I have tested this patchset on a x86_64 machine with the following
>> > configurations:
>> >
>> > 1. Primary kernel boot with efi=old_map 2. Primary kernel boot with
>> > new efi map
>> >
>> > While it seems that efi=old_map passes, the new efi map boot fails for
>> > me on both the two x86 machine (Dell 3050MT and a SGI - UV300 machine.
>> >
>> > It seems we are hitting a NULL pointer deference in
>> > 'efi_call_phys_prolog' while accessing '&efi_mm'.
>> >
>> > Please see the log below for details:
>> >
>> > [ 0.020006] BUG: unable to handle kernel NULL pointer dereference
>> > at (null)
>> > [ 0.021000] IP: switch_mm_irqs_off+0x44/0x270
>> > [ 0.021000] Call Trace:
>> > [ 0.021000] switch_mm+0x20/0x30
>> > [ 0.021000] efi_switch_mm+0x49/0x60
>> > [ 0.021000] efi_call_phys_prolog+0x56/0x1b3
>> > [ 0.021000] efi_enter_virtual_mode+0x3a9/0x520
>> > [ 0.021000] start_kernel+0x424/0x4c8
>> > [ 0.021000] ? set_init_arg+0x5a/0x5a
>> > [ 0.021000] ? early_idt_handler_array+0x120/0x120
>> > [ 0.021000] x86_64_start_reservations+0x29/0x2b
>> > [ 0.021000] x86_64_start_kernel+0x151/0x174
>> > [ 0.021000] secondary_startup_64+0x9f/0x9f
>> > [ 0.021000] Code: 2d 82 51 d9 4f 65 c7 05 0f 65 da 4f 01 00 00 00
>> > 48 39 f7 0f 84 14 01 00 00 65 48 89 35 f6 64 da 4f 48 8b 86 e8 02 00
>> > 00 45 89 ed <f0> 4c 0f ab 28 bf 00 00 00 80 48 03 7e 50 48 8b 05 27 b0
>> > b9 00
>> > [ 0.021000] RIP: switch_mm_irqs_off+0x44/0x270 RSP: ffffffffb0e035d0
>> > [ 0.021000] CR2: 0000000000000000
>> > [ 0.021000] ---[ end trace fb94349305e1cb8b ]---
>> > [ 0.021000] Kernel panic - not syncing: Fatal exception
>> > [ 0.021000] ---[ end Kernel panic - not syncing: Fatal exception
>> >
>>
>> And I forgot to mention that I tried the patchset both with the efi/next and
>> linus's trees and saw the same result.
>>
>> I would be happy to help in case you need further details of the test environment
>> or need help in testing this crash further.
>>
>> Regards,
>> Bhupesh
>
> Hi Bhupesh,
>
> Thanks for trying the patches and raising the concern.
> Could you also please also give a try on qemu? (if reproducible, we will be having a common platform to start debugging)
> I have tested this patch set on qemu and real machines (different from one's you tried) in our lab and didn’t notice this issue.
>
I get a similar crash on Qemu with linus's master branch and the V2
applied on top of it. Here are the details of my test environment:
1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
edk2.git/ovmf-x64
2. I used linus's master branch (HEAD - commit:
b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
of the same.
3. I use the following qemu command line to launch the test:
# /usr/local/bin/qemu-system-x86_64 --version
QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
# /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap -m
$MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
-vga std -boot c -cpu host -kernel $KERNEL -append
"crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd
$INITRAMFS -bios $OVMF_FW_PATH
And here is the crash log:
[ 0.006054] general protection fault: 0000 [#1] SMP
[ 0.006459] Modules linked in:
[ 0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
[ 0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 0.0.0 02/06/2015
[ 0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000
[ 0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
[ 0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086
[ 0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX: 000077ff80000000
[ 0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI: 0000000000019a00
[ 0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09: ffff88007d085000
[ 0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12: ffffffff81e5c6a0
[ 0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15: 0000000000000001
[ 0.007000] FS: 0000000000000000(0000) GS:ffff88007d400000(0000)
knlGS:0000000000000000
[ 0.007000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4: 00000000000406b0
[ 0.007000] Call Trace:
[ 0.007000] switch_mm+0xd/0x20
[ 0.007000] ? switch_mm+0xd/0x20
[ 0.007000] efi_switch_mm+0x3e/0x4a
[ 0.007000] efi_call_phys_prolog+0x28/0x1ac
[ 0.007000] efi_enter_virtual_mode+0x35a/0x48f
[ 0.007000] start_kernel+0x332/0x3b8
[ 0.007000] x86_64_start_reservations+0x2a/0x2c
[ 0.007000] x86_64_start_kernel+0x178/0x18b
[ 0.007000] secondary_startup_64+0xa5/0xa5
[ 0.007000] ? secondary_startup_64+0xa5/0xa5
[ 0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
7e 89
[ 0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80
[ 0.007000] ---[ end trace bfa55bf4e4765255 ]---
[ 0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
the idle task!
4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
firmware and 64-bit x86 kernel) with your patches, the primary kernel
boots fine on Qemu:
ovmf firmware used in this case - edk2.git/ovmf-ia32
5. Also, if I append 'efi=old_map' to the bootargs (for the failing
case in point 3 above), I see the primary kernel boots fine on Qemu as
well.
Regards,
Bhupesh
> I get a similar crash on Qemu with linus's master branch and the V2
> applied on top of it. Here are the details of my test environment:
>
> 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> edk2.git/ovmf-x64
>
> 2. I used linus's master branch (HEAD - commit:
> b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> of the same.
>
> 3. I use the following qemu command line to launch the test:
>
> # /usr/local/bin/qemu-system-x86_64 --version
> QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
>
> # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap -m
> $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> -vga std -boot c -cpu host -kernel $KERNEL -append
> "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd
> $INITRAMFS -bios $OVMF_FW_PATH
>
> And here is the crash log:
>
> [ 0.006054] general protection fault: 0000 [#1] SMP
> [ 0.006459] Modules linked in:
> [ 0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> [ 0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [ 0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000
> [ 0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> [ 0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086
> [ 0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX: 000077ff80000000
> [ 0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI: 0000000000019a00
> [ 0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09: ffff88007d085000
> [ 0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12: ffffffff81e5c6a0
> [ 0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15: 0000000000000001
> [ 0.007000] FS: 0000000000000000(0000) GS:ffff88007d400000(0000)
> knlGS:0000000000000000
> [ 0.007000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4: 00000000000406b0
> [ 0.007000] Call Trace:
> [ 0.007000] switch_mm+0xd/0x20
> [ 0.007000] ? switch_mm+0xd/0x20
> [ 0.007000] efi_switch_mm+0x3e/0x4a
> [ 0.007000] efi_call_phys_prolog+0x28/0x1ac
> [ 0.007000] efi_enter_virtual_mode+0x35a/0x48f
> [ 0.007000] start_kernel+0x332/0x3b8
> [ 0.007000] x86_64_start_reservations+0x2a/0x2c
> [ 0.007000] x86_64_start_kernel+0x178/0x18b
> [ 0.007000] secondary_startup_64+0xa5/0xa5
> [ 0.007000] ? secondary_startup_64+0xa5/0xa5
> [ 0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> 7e 89
> [ 0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80
> [ 0.007000] ---[ end trace bfa55bf4e4765255 ]---
> [ 0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> the idle task!
>
> 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> firmware and 64-bit x86 kernel) with your patches, the primary kernel
> boots fine on Qemu:
>
> ovmf firmware used in this case - edk2.git/ovmf-ia32
>
> 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> case in point 3 above), I see the primary kernel boots fine on Qemu as
> well.
>
> Regards,
> Bhupesh
Hi Bhupesh,
Thanks a lot for the detailed explanation. They are helpful to reproduce
the issue quickly. From my initial debug, I think that AMD SME +
efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
the issue on qemu.
I have tried the following combinations (all tests are on qemu):
On Linus's tree:
1. With SME and efi_mm and -cpu host -> panics
2. With SME and efi_mm and !-cpu host -> boots
3. With SME and !efi_mm and -cpu host -> boots
4. With SME and !efi_mm and !-cpu host -> boots
5. With !SME and efi_mm and -cpu host -> boots
6. With !SME and efi_mm and !-cpu host -> boots
7. With !SME and !efi_mm and -cpu host -> boots
8. With !SME and !efi_mm and !-cpu host -> boots
On Matt's tree (no SME):
1. With efi_mm and -cpu host -> boots
2. With efi_mm and !-cpu host -> boots
3. With !efi_mm and -cpu host -> boots
4. With !efi_mm and !-cpu host -> boots
Summary:
On Matt's tree (next branch), I am unable to reproduce the issue because
they don't have SME patches.
On Linus's tree, with SME patches
(b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host
switch enabled in qemu, I was able to reproduce the issue.
Could you please confirm if you are seeing the same behavior?
Specially on real machines (I think, this is equivalent to -cpu host on
qemu) because in earlier mails you have mentioned that you were able to
reproduce this on Matt's tree, but according to my theory it shouldn't
be the case because Matt's three doesn't have SME patches.
Did you back port (b1b6f83ac938d176742c85757960dec2cf10e468) this commit
to Matt's tree and then applied my patches?
Your confirmation will help us in debugging the right issue.
Regards,
Sai
On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > I get a similar crash on Qemu with linus's master branch and the V2
> > applied on top of it. Here are the details of my test environment:
> >
> > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > edk2.git/ovmf-x64
> >
> > 2. I used linus's master branch (HEAD - commit:
> > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > of the same.
> >
> > 3. I use the following qemu command line to launch the test:
> >
> > # /usr/local/bin/qemu-system-x86_64 --version
> > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> >
> > # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap -m
> > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> > -vga std -boot c -cpu host -kernel $KERNEL -append
> > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd
> > $INITRAMFS -bios $OVMF_FW_PATH
> >
> > And here is the crash log:
> >
> > [ 0.006054] general protection fault: 0000 [#1] SMP
> > [ 0.006459] Modules linked in:
> > [ 0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > [ 0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 0.0.0 02/06/2015
> > [ 0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000
> > [ 0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > [ 0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086
> > [ 0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX: 000077ff80000000
> > [ 0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI: 0000000000019a00
> > [ 0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09: ffff88007d085000
> > [ 0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12: ffffffff81e5c6a0
> > [ 0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15: 0000000000000001
> > [ 0.007000] FS: 0000000000000000(0000) GS:ffff88007d400000(0000)
> > knlGS:0000000000000000
> > [ 0.007000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4: 00000000000406b0
> > [ 0.007000] Call Trace:
> > [ 0.007000] switch_mm+0xd/0x20
> > [ 0.007000] ? switch_mm+0xd/0x20
> > [ 0.007000] efi_switch_mm+0x3e/0x4a
> > [ 0.007000] efi_call_phys_prolog+0x28/0x1ac
> > [ 0.007000] efi_enter_virtual_mode+0x35a/0x48f
> > [ 0.007000] start_kernel+0x332/0x3b8
> > [ 0.007000] x86_64_start_reservations+0x2a/0x2c
> > [ 0.007000] x86_64_start_kernel+0x178/0x18b
> > [ 0.007000] secondary_startup_64+0xa5/0xa5
> > [ 0.007000] ? secondary_startup_64+0xa5/0xa5
> > [ 0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> > 7e 89
> > [ 0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80
> > [ 0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > [ 0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > the idle task!
> >
> > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > firmware and 64-bit x86 kernel) with your patches, the primary kernel
> > boots fine on Qemu:
> >
> > ovmf firmware used in this case - edk2.git/ovmf-ia32
> >
> > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > case in point 3 above), I see the primary kernel boots fine on Qemu as
> > well.
> >
> > Regards,
> > Bhupesh
>
> Hi Bhupesh,
>
> Thanks a lot for the detailed explanation. They are helpful to reproduce
> the issue quickly. From my initial debug, I think that AMD SME +
> efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
> the issue on qemu.
>
> I have tried the following combinations (all tests are on qemu):
> On Linus's tree:
> 1. With SME and efi_mm and -cpu host -> panics
> 2. With SME and efi_mm and !-cpu host -> boots
> 3. With SME and !efi_mm and -cpu host -> boots
> 4. With SME and !efi_mm and !-cpu host -> boots
> 5. With !SME and efi_mm and -cpu host -> boots
> 6. With !SME and efi_mm and !-cpu host -> boots
> 7. With !SME and !efi_mm and -cpu host -> boots
> 8. With !SME and !efi_mm and !-cpu host -> boots
>
> On Matt's tree (no SME):
> 1. With efi_mm and -cpu host -> boots
> 2. With efi_mm and !-cpu host -> boots
> 3. With !efi_mm and -cpu host -> boots
> 4. With !efi_mm and !-cpu host -> boots
>
> Summary:
> On Matt's tree (next branch), I am unable to reproduce the issue because
> they don't have SME patches.
>
> On Linus's tree, with SME patches
> (b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host
> switch enabled in qemu, I was able to reproduce the issue.
>
> Could you please confirm if you are seeing the same behavior?
> Specially on real machines (I think, this is equivalent to -cpu host on
> qemu) because in earlier mails you have mentioned that you were able to
> reproduce this on Matt's tree, but according to my theory it shouldn't
> be the case because Matt's three doesn't have SME patches.
> Did you back port (b1b6f83ac938d176742c85757960dec2cf10e468) this commit
> to Matt's tree and then applied my patches?
>
> Your confirmation will help us in debugging the right issue.
>
> Regards,
> Sai
Sorry! I am not sure if it's the SME patches or the PCID based TLB flush
patches (most likely the later because they change switch_mm() code).
Both the patches along with 5-level paging were in the same pull request
sent from Ingo to Linus. So, SME patches above really means this commit
id (b1b6f83ac938d176742c85757960dec2cf10e468) in Linus's tree. I will
debug this issue further and will send a V3 but to be sure that I am
debugging the right issue, Bhupesh, Could you please update me as
requested in earlier mail?
Regards,
Sai
> -----Original Message-----
> From: Sai Praneeth Prakhya [mailto:[email protected]]
> Sent: Tuesday, September 5, 2017 7:43 PM
> To: Bhupesh Sharma <[email protected]>
> Cc: [email protected]; [email protected]; Matt Fleming
> <[email protected]>; Ard Biesheuvel <[email protected]>;
> [email protected]; Borislav Petkov <[email protected]>; Luck, Tony
> <[email protected]>; [email protected]; [email protected]; Neri, Ricardo
> <[email protected]>; Shankar, Ravi V <[email protected]>
> Subject: Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of
> manually
>
> On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > > I get a similar crash on Qemu with linus's master branch and the V2
> > > applied on top of it. Here are the details of my test environment:
> > >
> > > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > > edk2.git/ovmf-x64
> > >
> > > 2. I used linus's master branch (HEAD - commit:
> > > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > > of the same.
> > >
> > > 3. I use the following qemu command line to launch the test:
> > >
> > > # /usr/local/bin/qemu-system-x86_64 --version QEMU emulator version
> > > 2.9.50 (v2.9.0-526-g76d20ea) Copyright (c) 2003-2017 Fabrice Bellard
> > > and the QEMU Project developers
> > >
> > > # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap
> > > -m $MEMSIZE -nographic -drive
> > > file=$DISK_IMAGE,if=virtio,format=qcow2
> > > -vga std -boot c -cpu host -kernel $KERNEL -append
> > > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd
> > > $INITRAMFS -bios $OVMF_FW_PATH
> > >
> > > And here is the crash log:
> > >
> > > [ 0.006054] general protection fault: 0000 [#1] SMP
> > > [ 0.006459] Modules linked in:
> > > [ 0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > > [ 0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS 0.0.0 02/06/2015
> > > [ 0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000
> > > [ 0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > > [ 0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086
> > > [ 0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX:
> 000077ff80000000
> > > [ 0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI:
> 0000000000019a00
> > > [ 0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09:
> ffff88007d085000
> > > [ 0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12:
> ffffffff81e5c6a0
> > > [ 0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15:
> 0000000000000001
> > > [ 0.007000] FS: 0000000000000000(0000) GS:ffff88007d400000(0000)
> > > knlGS:0000000000000000
> > > [ 0.007000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4:
> 00000000000406b0
> > > [ 0.007000] Call Trace:
> > > [ 0.007000] switch_mm+0xd/0x20
> > > [ 0.007000] ? switch_mm+0xd/0x20
> > > [ 0.007000] efi_switch_mm+0x3e/0x4a
> > > [ 0.007000] efi_call_phys_prolog+0x28/0x1ac
> > > [ 0.007000] efi_enter_virtual_mode+0x35a/0x48f
> > > [ 0.007000] start_kernel+0x332/0x3b8
> > > [ 0.007000] x86_64_start_reservations+0x2a/0x2c
> > > [ 0.007000] x86_64_start_kernel+0x178/0x18b
> > > [ 0.007000] secondary_startup_64+0xa5/0xa5
> > > [ 0.007000] ? secondary_startup_64+0xa5/0xa5
> > > [ 0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87
> > > fb 7e 89
> > > [ 0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80
> > > [ 0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > > [ 0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> > > [ 0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > > the idle task!
> > >
> > > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > > firmware and 64-bit x86 kernel) with your patches, the primary
> > > kernel boots fine on Qemu:
> > >
> > > ovmf firmware used in this case - edk2.git/ovmf-ia32
> > >
> > > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > > case in point 3 above), I see the primary kernel boots fine on Qemu
> > > as well.
> > >
> > > Regards,
> > > Bhupesh
> >
> > Hi Bhupesh,
> >
> > Thanks a lot for the detailed explanation. They are helpful to
> > reproduce the issue quickly. From my initial debug, I think that AMD
> > SME + efi_mm_struct patches + -cpu host (in qemu) are required to
> > reproduce the issue on qemu.
> >
> > I have tried the following combinations (all tests are on qemu):
> > On Linus's tree:
> > 1. With SME and efi_mm and -cpu host -> panics 2. With SME and
> > efi_mm and !-cpu host -> boots 3. With SME and !efi_mm and -cpu host
> > -> boots 4. With SME and !efi_mm and !-cpu host -> boots 5. With !SME
> > and efi_mm and -cpu host -> boots 6. With !SME and efi_mm and !-cpu
> > host -> boots 7. With !SME and !efi_mm and -cpu host -> boots 8. With
> > !SME and !efi_mm and !-cpu host -> boots
> >
> > On Matt's tree (no SME):
> > 1. With efi_mm and -cpu host -> boots 2. With efi_mm and !-cpu host
> > -> boots 3. With !efi_mm and -cpu host -> boots 4. With !efi_mm and
> > !-cpu host -> boots
> >
> > Summary:
> > On Matt's tree (next branch), I am unable to reproduce the issue
> > because they don't have SME patches.
> >
> > On Linus's tree, with SME patches
> > (b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu
> > host switch enabled in qemu, I was able to reproduce the issue.
> >
> > Could you please confirm if you are seeing the same behavior?
> > Specially on real machines (I think, this is equivalent to -cpu host
> > on
> > qemu) because in earlier mails you have mentioned that you were able
> > to reproduce this on Matt's tree, but according to my theory it
> > shouldn't be the case because Matt's three doesn't have SME patches.
> > Did you back port (b1b6f83ac938d176742c85757960dec2cf10e468) this
> > commit to Matt's tree and then applied my patches?
> >
> > Your confirmation will help us in debugging the right issue.
> >
> > Regards,
> > Sai
>
> Sorry! I am not sure if it's the SME patches or the PCID based TLB flush patches
> (most likely the later because they change switch_mm() code).
> Both the patches along with 5-level paging were in the same pull request sent
> from Ingo to Linus. So, SME patches above really means this commit id
> (b1b6f83ac938d176742c85757960dec2cf10e468) in Linus's tree. I will debug this
> issue further and will send a V3 but to be sure that I am debugging the right
> issue, Bhupesh, Could you please update me as requested in earlier mail?
>
> Regards,
> Sai
Hi Bhupesh,
Could you please append "nopcid" to kernel command line parameters and see if the issue goes away (on both qemu and real machines)?
Regards,
Sai
Hi Sai,
There were several combinations suggested in your threads, so it took
me some time to try them out and document the different behaviours.
Please see them inline:
On Wed, Sep 6, 2017 at 2:30 PM, Prakhya, Sai Praneeth
<[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: Sai Praneeth Prakhya [mailto:[email protected]]
>> Sent: Tuesday, September 5, 2017 7:43 PM
>> To: Bhupesh Sharma <[email protected]>
>> Cc: [email protected]; [email protected]; Matt Fleming
>> <[email protected]>; Ard Biesheuvel <[email protected]>;
>> [email protected]; Borislav Petkov <[email protected]>; Luck, Tony
>> <[email protected]>; [email protected]; [email protected]; Neri, Ricardo
>> <[email protected]>; Shankar, Ravi V <[email protected]>
>> Subject: Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of
>> manually
>>
>> On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
>> > > I get a similar crash on Qemu with linus's master branch and the V2
>> > > applied on top of it. Here are the details of my test environment:
>> > >
>> > > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
>> > > edk2.git/ovmf-x64
>> > >
>> > > 2. I used linus's master branch (HEAD - commit:
>> > > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
>> > > of the same.
>> > >
>> > > 3. I use the following qemu command line to launch the test:
>> > >
>> > > # /usr/local/bin/qemu-system-x86_64 --version QEMU emulator version
>> > > 2.9.50 (v2.9.0-526-g76d20ea) Copyright (c) 2003-2017 Fabrice Bellard
>> > > and the QEMU Project developers
>> > >
>> > > # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap
>> > > -m $MEMSIZE -nographic -drive
>> > > file=$DISK_IMAGE,if=virtio,format=qcow2
>> > > -vga std -boot c -cpu host -kernel $KERNEL -append
>> > > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd
>> > > $INITRAMFS -bios $OVMF_FW_PATH
>> > >
>> > > And here is the crash log:
>> > >
>> > > [ 0.006054] general protection fault: 0000 [#1] SMP
>> > > [ 0.006459] Modules linked in:
>> > > [ 0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
>> > > [ 0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> > > BIOS 0.0.0 02/06/2015
>> > > [ 0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000
>> > > [ 0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
>> > > [ 0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086
>> > > [ 0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX:
>> 000077ff80000000
>> > > [ 0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI:
>> 0000000000019a00
>> > > [ 0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09:
>> ffff88007d085000
>> > > [ 0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12:
>> ffffffff81e5c6a0
>> > > [ 0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15:
>> 0000000000000001
>> > > [ 0.007000] FS: 0000000000000000(0000) GS:ffff88007d400000(0000)
>> > > knlGS:0000000000000000
>> > > [ 0.007000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > [ 0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4:
>> 00000000000406b0
>> > > [ 0.007000] Call Trace:
>> > > [ 0.007000] switch_mm+0xd/0x20
>> > > [ 0.007000] ? switch_mm+0xd/0x20
>> > > [ 0.007000] efi_switch_mm+0x3e/0x4a
>> > > [ 0.007000] efi_call_phys_prolog+0x28/0x1ac
>> > > [ 0.007000] efi_enter_virtual_mode+0x35a/0x48f
>> > > [ 0.007000] start_kernel+0x332/0x3b8
>> > > [ 0.007000] x86_64_start_reservations+0x2a/0x2c
>> > > [ 0.007000] x86_64_start_kernel+0x178/0x18b
>> > > [ 0.007000] secondary_startup_64+0xa5/0xa5
>> > > [ 0.007000] ? secondary_startup_64+0xa5/0xa5
>> > > [ 0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
>> > > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
>> > > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87
>> > > fb 7e 89
>> > > [ 0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80
>> > > [ 0.007000] ---[ end trace bfa55bf4e4765255 ]---
>> > > [ 0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
>> > > [ 0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
>> > > the idle task!
>> > >
>> > > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
>> > > firmware and 64-bit x86 kernel) with your patches, the primary
>> > > kernel boots fine on Qemu:
>> > >
>> > > ovmf firmware used in this case - edk2.git/ovmf-ia32
>> > >
>> > > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
>> > > case in point 3 above), I see the primary kernel boots fine on Qemu
>> > > as well.
>> > >
>> > > Regards,
>> > > Bhupesh
>> >
>> > Hi Bhupesh,
>> >
>> > Thanks a lot for the detailed explanation. They are helpful to
>> > reproduce the issue quickly. From my initial debug, I think that AMD
>> > SME + efi_mm_struct patches + -cpu host (in qemu) are required to
>> > reproduce the issue on qemu.
>> >
>> > I have tried the following combinations (all tests are on qemu):
>> > On Linus's tree:
>> > 1. With SME and efi_mm and -cpu host -> panics 2. With SME and
>> > efi_mm and !-cpu host -> boots 3. With SME and !efi_mm and -cpu host
>> > -> boots 4. With SME and !efi_mm and !-cpu host -> boots 5. With !SME
>> > and efi_mm and -cpu host -> boots 6. With !SME and efi_mm and !-cpu
>> > host -> boots 7. With !SME and !efi_mm and -cpu host -> boots 8. With
>> > !SME and !efi_mm and !-cpu host -> boots
>> >
>> > On Matt's tree (no SME):
>> > 1. With efi_mm and -cpu host -> boots 2. With efi_mm and !-cpu host
>> > -> boots 3. With !efi_mm and -cpu host -> boots 4. With !efi_mm and
>> > !-cpu host -> boots
>> >
>> > Summary:
>> > On Matt's tree (next branch), I am unable to reproduce the issue
>> > because they don't have SME patches.
>> >
>> > On Linus's tree, with SME patches
>> > (b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu
>> > host switch enabled in qemu, I was able to reproduce the issue.
>> >
>> > Could you please confirm if you are seeing the same behavior?
>> > Specially on real machines (I think, this is equivalent to -cpu host
>> > on
>> > qemu) because in earlier mails you have mentioned that you were able
>> > to reproduce this on Matt's tree, but according to my theory it
>> > shouldn't be the case because Matt's three doesn't have SME patches.
>> > Did you back port (b1b6f83ac938d176742c85757960dec2cf10e468) this
>> > commit to Matt's tree and then applied my patches?
[snip..]
> Hi Bhupesh,
>
> Could you please append "nopcid" to kernel command line parameters and see if the issue goes away (on both qemu and real machines)?
(1) On Linus's tree, with SME + 5-level page table + PCID based tlb
flush patches (i.e. b1b6f83ac938d176742c85757960dec2cf10e468) and your
v2 patchset applied:
a) when 'nopcid' is specified in bootargs:
- qemu: primary kernel boots fine.
- Real efi test hardware (SGI UV 300 machine): primary boots _fails_
b) when 'nopcid' is _not_ specified in bootargs:
- qemu: primary kernel boot _fails_.
- Real efi test hardware (SGI UV 300 machine): primary boot _fails_.
(2) On Matt's tree, with c4d2793e5a07d5e63d91715a4393fe47c8345112 as
head and your v2 patchset applied:
a) when 'nopcid' is specified in bootargs:
- qemu: primary kernel boots fine.
- Real efi test hardware (SGI UV 300 machine): primary boot _fails_
b) when 'nopcid' is _not_ specified in bootargs:
- qemu: primary kernel boots fine.
- Real efi test hardware (SGI UV 300 machine): primary boot _fails_
So, in summary it seems that the primary kernel boot _fails_ with your
v2 patchset on the real hardware for me irrespective of whether I use
Matt's tree or Linus's tree:
a) I would suggest that you perform some more checks on real hardware
as qemu boot tests sometimes do not expose the problems we might see
when booting a kernel on efi capable hardware.
b) Also do note that both Matt's tree and Linus's tree work fine on
this hardware for me (with the 'nopcid' added to the bootargs)
Please let me know if I can help further in debugging the same.
Regards,
Bhupesh
> So, in summary it seems that the primary kernel boot _fails_ with your
> v2 patchset on the real hardware for me irrespective of whether I use Matt's
> tree or Linus's tree:
>
> a) I would suggest that you perform some more checks on real hardware as
> qemu boot tests sometimes do not expose the problems we might see when
> booting a kernel on efi capable hardware.
>
> b) Also do note that both Matt's tree and Linus's tree work fine on this hardware
> for me (with the 'nopcid' added to the bootargs)
>
Hi Bhupesh,
Thanks a lot! for the detailed explanation. I will address these issues in V3.
Regards,
Sai